Building A Central Logging Service In-House

Original Source: https://www.smashingmagazine.com/2018/05/building-central-logging-service/

Building A Central Logging Service In-House

Building A Central Logging Service In-House

Akhil Labudubariki

2018-05-30T13:30:22+02:00
2018-05-30T12:20:54+00:00

We all know how important debugging is for improving application performance and features. BrowserStack runs one million sessions a day on a highly distributed application stack! Each involves several moving parts, as a client’s single session can span multiple components across several geographic regions.

Without the right framework and tools, the debugging process can be a nightmare. In our case, we needed a way to collect events happening during different stages of each process in order to get an in-depth understanding of everything taking place during a session. With our infrastructure, solving this problem became complicated as each component might have multiple events from their lifecycle of processing a request.

That’s why we developed our own in-house Central Logging Service tool (CLS) to record all important events logged during a session. These events help our developers identify conditions where something goes wrong in a session and helps keep track of certain key product metrics.

Debugging data ranges from simple things like API response latency to monitoring a user’s network health. In this article, we share our story of building our CLS tool which collects 70G of relevant chronological data per day from 100+ components reliably, at scale and with two M3.large EC2 instances.

Getting the process just right ain’t an easy task. That’s why we’ve set up ‘this-is-how-I-work’-sessions — with smart cookies sharing what works really well for them. A part of the Smashing Membership, of course.

Explore features →

Smashing TV, with live sessions for professional designers and developers.

The Decision To Build In-House

First, let’s consider why we built our CLS tool in-house rather than used an existing solution. Each of our sessions sends 15 events on average, from multiple components to the service – translating into approximately 15 million total events per day.

Our service needed the ability to store all this data. We sought a complete solution to support event storing, sending and querying across events. As we considered third-party solutions such as Amplitude and Keen, our evaluation metrics included cost, performance in handling high parallel requests and ease of adoption. Unfortunately, we could not find a fit that met all our requirements within budget – although benefits would have included saving time and minimizing alerts. While it would take additional effort, we decided to develop an in-house solution ourselves.

Building in-house

One of the biggest issues with building In-house is the amount of resources that we need to spend to maintain it. (Image credit: Source: Digiday)

Technical Details

In terms of architecting for our component, we outlined the following basic requirements:

Client Performance
Does not impact the performance of the client/component sending the events.
Scale
Able to handle a high number of requests in parallel.
Service performance
Quick to process all events being sent to it.
Insight into data
Each event logged needs to have some meta information to be able to uniquely identify the component or user, account or message and give more information to help the developer debug faster.
Queryable interface
Developers can query all events for a particular session, helping to debug a particular session, build component health reports, or generate meaningful performance statistics of our systems.
Faster and easier adoption
Easy integration with an existing or new component without burdening teams and taking up their resources.
Low maintenance
We are a small engineering team, so we sought a solution to minimize alerts!

Building Our CLS Solution

Decision 1: Choosing An Interface To Expose

In developing CLS, we obviously didn’t want to lose any of our data, but we didn’t want component performance to take a hit either. Not to mention the additional factor of preventing existing components from becoming more complicated, since it would delay overall adoption and release. In determining our interface, we considered the following choices:

Storing events in local Redis in each component, as a background processor pushes it to CLS. However, this requires a change in all components, along with an introduction of Redis for components which didn’t already contain it.
A Publisher – Subscriber model, where Redis is closer to the CLS. As everyone publishes events, again we have the factor of components running across the globe. During the time of high-traffic, this would delay components. Further, this write could intermittently jump up to five seconds (due to the internet alone).
Sending events over UDP, which offers a lesser impact on application performance. In this case data would be sent and forgotten, however, the disadvantage here would be data loss.

Interestingly, our data loss over UDP was less than 0.1 percent, which was an acceptable amount for us to consider building such a service. We were able to convince all teams that this amount of loss was worth the performance, and went ahead to leverage a UDP interface that listened to all events being sent.

While one result was a smaller impact on an application’s performance, we did face an issue as UDP traffic was not allowed from all networks, mostly from our users’ – causing us in some cases to receive no data at all. As a workaround, we supported logging events using HTTP requests. All events coming from the user’s side would be sent via HTTP, whereas all events being recorded from our components would be via UDP.

Decision 2: Tech Stack (Language, Framework & Storage)

We are a Ruby shop. However, we were uncertain if Ruby would be a better choice for our particular problem. Our service would have to handle a lot of incoming requests, as well as process a lot of writes. With the Global Interpreter lock, achieving multithreading or concurrency would be difficult in Ruby (please don’t take offense – we love Ruby!). So we needed a solution that would help us achieve this kind of concurrency.

We were also keen to evaluate a new language in our tech stack, and this project seemed perfect for experimenting with new things. That’s when we decided to give Golang a shot since it offered inbuilt support for concurrency and lightweight threads and go-routines. Each logged data point resembles a key-value pair where ‘key’ is the event and ‘value’ serves as its associated value.

But having a simple key and value is not enough to retrieve a session related data – there is more metadata to it. To address this, we decided any event needing to be logged would have a session ID along with its key and value. We also added extra fields like timestamp, user ID and the component logging the data, so that it became more easy to fetch and analyze data.

Now that we decided on our payload structure, we had to choose our datastore. We considered Elastic Search, but we also wanted to support update requests for keys. This would trigger the entire document to be re-indexed, which might affect the performance of our writes. MongoDB made more sense as a datastore since it would be easier to query all events based on any of the data fields that would be added. This was easy!

Decision 3: DB Size Is Huge And Query And Archiving Sucks!

In order to cut maintenance, our service would have to handle as many events as possible. Given the rate that BrowserStack releases features and products, we were certain the number of our events would increase at higher rates over time, meaning our service would have to continue to perform well. As space increases, reads and writes take more time – which could be a huge hit on the service’s performance.

The first solution we explored was moving logs from a certain period away from the database (in our case, we decided on 15 days). To do this, we created a different database for each day, allowing us to find logs older than a particular period without having to scan all written documents. Now we continually remove databases older than 15 days from Mongo, while of course keeping backups just in case.

The only leftover piece was a developer interface to query session-related data. Honestly, this was the easiest problem to solve. We provide an HTTP interface, where people can query for session related events in the corresponding database in the MongoDB, for any data having a particular session ID.

Architecture

Let’s talk about the internal components of the service, considering the following points:

As previously discussed, we needed two interfaces – one listening over UDP and another listening over HTTP. So we built two servers, again one for each interface, to listen for events. As soon as an event arrives, we parse it to check whether it has the required fields – these are session ID, key, and value. If it does not, the data is dropped. Otherwise, the data is passed over a Go channel to another goroutine, whose sole responsibility is to write to the MongoDB.
A possible concern here is writing to the MongoDB. If writes to the MongoDB are slower than the rate data is received, this creates a bottleneck. This, in turn, starves other incoming events and means dropped data. The server, therefore, should be fast in processing incoming logs and be ready to process ones upcoming. To address the issue, we split the server into two parts: the first receives all events and queues them up for the second, which processes and writes them into the MongoDB.
For queuing we chose Redis. By dividing the entire component into these two pieces we reduced the server’s workload, giving it room to handle more logs.
We wrote a small service using Sinatra server to handle all the work of querying MongoDB with given parameters. It returns an HTML/JSON response to developers when they need information on a particular session.

All these processes happily run on a single m3.large instance.

CLS v1

CLS v1: A representation of the system’s first architecture. All the components are running on one single machine.

Feature Requests

As our CLS tool saw more use over time, it needed more features. Below, we discuss these and how they were added.

Missing Metadata

Gradually as the number of components in BrowserStack increases, we’ve demanded more from CLS. For example, we needed the ability to log events from components lacking a session ID. Otherwise obtaining one would burden our infrastructure, in the form of affecting application performance and incurring traffic on our main servers.

We addressed this by enabling event logging using other keys, such as terminal and user IDs. Now whenever a session is created or updated, CLS is informed with the session ID, as well as the respective user and terminal IDs. It stores a map that can be retrieved by the process of writing to MongoDB. Whenever an event that contains either the user or terminal ID is retrieved, the session ID is added.

Handle Spamming (Code Issues In Other Components)

CLS also faced the usual difficulties with handling spam events. We often found deploys in components that generated a huge volume of requests sent to CLS. Other logs would suffer in the process, as the server became too busy to process these and important logs were dropped.

For the most part, most of the data being logged were via HTTP requests. To control them we enable rate limiting on nginx (using the limit_req_zone module), which blocks requests from any IP we found hitting requests more than a certain number in a small amount of time. Of course, we do leverage health reports on all blocked IPs and inform the responsible teams.

Scale v2

As our sessions per day increased, data being logged to CLS was also increasing. This affected the queries our developers were running daily, and soon the bottleneck we had was with the machine itself. Our setup consisted of two core machines running all of the above components, along with a bunch of scripts to query Mongo and keep track of key metrics for each product. Over time, data on the machine had increased heavily and scripts began to take a lot of CPU time. Even after trying to optimizing Mongo queries, we always came back to the same issues.

To solve this, we added another machine for running health report scripts and the interface to query these sessions. The process involved booting a new machine and setting up a slave of the Mongo running on the main machine. This has helped reduce the CPU spikes we saw every day caused by these scripts.

CLS v2

CLS v2: A representation of the current system’s architecture. Logs are written to the master machine and they are synced on the slave machine. Developer’s queries run on the slave machine.

Conclusion

Building a service for a task as simple as data logging can get complicated, as the amount of data increases. This article discusses the solutions we explored, along with challenges faced while solving this problem. We experimented with Golang to see how well it would fit with our ecosystem, and so far we have been satisfied. Our choice to create an internal service rather than paying for an external one has been wonderfully cost-efficient. We also didn’t have to scale our setup to another machine until much later – when the volume of our sessions increased. Of course, our choices in developing CLS were completely based on our requirements and priorities.

Today CLS handles up to 15 million events every day, constituting up to 70 GB of data. This data is being used to help us solve any issues our customers face during any session. We also use this data for other purposes. Given the insights each session’s data provides on different products and internal components, we’ve begun leveraging this data to keep track of each product. This is achieved by extracting the key metrics for all the important components.

All in all, we’ve seen great success in building our own CLS tool. If it makes sense for you, I recommend you consider doing the same!

Smashing Editorial
(rb, ra, il)

Using ES Modules in the Browser Today

Original Source: https://www.sitepoint.com/using-es-modules/

This article will show you how you can use ES modules in the browser today.

Until recently, JavaScript had no concept of modules. It wasn’t possible to directly reference or include one JavaScript file in another. And as applications grew in size and complexity, this made writing JavaScript for the browser tricky.

One common solution is to load arbitrary scripts in a web page using <script> tags. However, this brings its own problems. For example, each script initiates a render-blocking HTTP request, which can make JS-heavy pages feel sluggish and slow. Dependency management also becomes complicated, as load order matters.

ES6 (ES2015) went some way to addressing this situation by introducing a single, native module standard. (You can read more about ES6 modules here.) However, as browser support for ES6 modules was initially poor, people started using module loaders to bundle dependencies into a single ES5 cross-browser compatible file. This process introduces its own issues and degree of complexity.

But good news is at hand. Browser support is getting ever better, so let’s look at how you can use ES6 modules in today’s browsers.

The Current ES Modules Landscape

Safari, Chrome, Firefox and Edge all support the ES6 Modules import syntax (Firefox behind a flag), here’s what they look like.

<script type=”module”>
import { tag } from ‘./html.js’

const h1 = tag(‘h1’, ‘? Hello Modules!’)
document.body.appendChild(h1)
</script>

// html.js
export function tag (tag, text) {
const el = document.createElement(tag)
el.textContent = text

return el
}

Or as an external script:

<script type=”module” src=”app.js”></script>

// app.js
import { tag } from ‘./html.js’

const h1 = tag(‘h1’, ‘? Hello Modules!’)
document.body.appendChild(h1)

Simply add type=”module” to your script tags and the browser will load them as ES Modules. The browser will follow all import paths, downloading and executing each module only once.

ES modules: Network graph showing loading

Older browsers won’t execute scripts with an unknown “type”, but you can define fallback scripts with the nomodule attribute:

<script type=”module” src=”module.js”></script>
<script nomodule src=”fallback.js”></script>

Continue reading %Using ES Modules in the Browser Today%

Overflow – Turn Your Designs into Playable User Flow Diagrams That Tell a Story

Original Source: https://www.webdesignerdepot.com/2018/05/overflow-turn-your-designs-into-playable-user-flow-diagrams-that-tell-a-story/

Designing the best user flow for your product is definitely not an easy task. It requires several iterations before getting it right. Creating and updating user flow diagrams has largely been considered a painful process for designers, with many of them skipping it entirely because of this. Presenting user flows to stakeholders and actually getting them to understand and follow the user’s journey might actually be the most challenging part.

Overflow helps you do exactly that. It empowers you to effectively communicate your work, while fully engaging your audience with an interactive user flow presentation.

Create User Flows in Minutes

Creating user flow diagrams with Overflow is a quick and enjoyable experience. You can connect and sync Overflow with your favorite design tool, maintaining all your layers and artboards. Easily drag magnets to create your connectors, add text, shapes and images to enrich your presentation. Customize the look using styles and themes to create a fully custom branded presentation that fits your designs and audience.

Present Your Designs

Presenting your designs with Overflow, will always make you look good. You can present your designs with an interactive flow presentation, navigating through your entire flow using arrow keys or clicking on the connectors. Show the big picture with a bird’s eye view of your flow, or zoom in to focus on specific details. If you want to present your flow screen by screen you can easily switch to the out of the box rapid prototype mode.

Share to Get Valuable Feedback

Share your user flow diagrams on Overflow Cloud and let your audience experience a magical journey on their web browser or mobile device. Export in PDF, PNG, or print your user flows and stick on walls.

So far more than 35,000 designers have tried Overflow, and they loved it. Overflow is currently in public beta and available to download, for free.

Add Realistic Chalk and Sketch Lettering Effects with Sketch’it – only $5!

Source

p img {display:inline-block; margin-right:10px;}
.alignleft {float:left;}
p.showcase {clear:both;}
body#browserfriendly p, body#podcast p, div#emailbody p{margin:0;}

Keeping Node.js Fast: Tools, Techniques, And Tips For Making High-Performance Node.js Servers

Original Source: https://www.smashingmagazine.com/2018/06/nodejs-tools-techniques-performance-servers/

Keeping Node.js Fast: Tools, Techniques, And Tips For Making High-Performance Node.js Servers

Keeping Node.js Fast: Tools, Techniques, And Tips For Making High-Performance Node.js Servers

David Mark Clements

2018-06-07T13:45:51+02:00
2018-06-07T12:12:00+00:00

If you’ve been building anything with Node.js for long enough, then you’ve no doubt experienced the pain of unexpected speed issues. JavaScript is an evented, asynchronous language. That can make reasoning about performance tricky, as will become apparent. The surging popularity of Node.js has exposed the need for tooling, techniques and thinking suited to the constraints of server-side JavaScript.

When it comes to performance, what works in the browser doesn’t necessarily suit Node.js. So, how do we make sure a Node.js implementation is fast and fit for purpose? Let’s walk through a hands-on example.

Tools

Node is a very versatile platform, but one of the predominant applications is creating networked processes. We’re going to focus on profiling the most common of these: HTTP web servers.

We’ll need a tool that can blast a server with lots of requests while measuring the performance. For example, we can use AutoCannon:

npm install -g autocannon

Other good HTTP benchmarking tools include Apache Bench (ab) and wrk2, but AutoCannon is written in Node, provides similar (or sometimes greater) load pressure, and is very easy to install on Windows, Linux, and Mac OS X.

Nope, we can’t do any magic tricks, but we have articles, books and webinars featuring techniques we all can use to improve our work. Smashing Members get a seasoned selection of magic front-end tricks — e.g. live designing sessions and perf audits, too. Just sayin’! 😉

Explore Smashing Wizardry →

Smashing Cat, just preparing to do some magic stuff.

After we’ve established a baseline performance measurement, if we decide our process could be faster we’ll need some way to diagnose problems with the process. A great tool for diagnosing various performance issues is Node Clinic, which can also be installed with npm:

npm –install -g clinic

This actually installs a suite of tools. We’ll be using Clinic Doctor and Clinic Flame (a wrapper around 0x) as we go.

Note: For this hands-on example we’ll need Node 8.11.2 or higher.

The Code

Our example case is a simple REST server with a single resource: a large JSON payload exposed as a GET route at /seed/v1. The server is an app folder which consists of a package.json file (depending on restify 7.1.0), an index.js file and a util.js file.

The index.js file for our server looks like so:

‘use strict’

const restify = require(‘restify’)
const { etagger, timestamp, fetchContent } = require(‘./util’)()
const server = restify.createServer()

server.use(etagger().bind(server))

server.get(‘/seed/v1’, function (req, res, next) {
fetchContent(req.url, (err, content) => {
if (err) return next(err)
res.send({data: content, url: req.url, ts: timestamp()})
next()
})
})

server.listen(3000)

This server is representative of the common case of serving client-cached dynamic content. This is achieved with the etagger middleware, which calculates an ETag header for the latest state of the content.

The util.js file provides implementation pieces that would commonly be used in such a scenario, a function to fetch the relevant content from a backend, the etag middleware and a timestamp function that supplies timestamps on a minute-by-minute basis:

‘use strict’

require(‘events’).defaultMaxListeners = Infinity
const crypto = require(‘crypto’)

module.exports = () => {
const content = crypto.rng(5000).toString(‘hex’)
const ONE_MINUTE = 60000
var last = Date.now()

function timestamp () {
var now = Date.now()
if (now — last >= ONE_MINUTE) last = now
return last
}

function etagger () {
var cache = {}
var afterEventAttached = false
function attachAfterEvent (server) {
if (attachAfterEvent === true) return
afterEventAttached = true
server.on(‘after’, (req, res) => {
if (res.statusCode !== 200) return
if (!res._body) return
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
const etag = crypto.createHash(‘sha512’)
.update(JSON.stringify(res._body))
.digest()
.toString(‘hex’)
if (cache[key] !== etag) cache[key] = etag
})
}
return function (req, res, next) {
attachAfterEvent(this)
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
if (key in cache) res.set(‘Etag’, cache[key])
res.set(‘Cache-Control’, ‘public, max-age=120’)
next()
}
}

function fetchContent (url, cb) {
setImmediate(() => {
if (url !== ‘/seed/v1’) cb(Object.assign(Error(‘Not Found’), {statusCode: 404}))
else cb(null, content)
})
}

return { timestamp, etagger, fetchContent }

}

By no means take this code as an example of best practices! There are multiple code smells in this file, but we’ll locate them as we measure and profile the application.

To get the full source for our starting point, the slow server can be found over here.

Profiling

In order to profile, we need two terminals, one for starting the application, and the other for load testing it.

In one terminal, within the app, folder we can run:

node index.js

In another terminal we can profile it like so:

autocannon -c100 localhost:3000/seed/v1

This will open 100 concurrent connections and bombard the server with requests for ten seconds.

The results should be something similar to the following (Running 10s test @ http://localhost:3000/seed/v1 — 100 connections):

Stat
Avg
Stdev
Max

Latency (ms)
3086.81
1725.2
5554

Req/Sec
23.1
19.18
65

Bytes/Sec
237.98 kB
197.7 kB
688.13 kB

231 requests in 10s, 2.4 MB read

Results will vary depending on the machine. However, considering that a “Hello World” Node.js server is easily capable of thirty thousand requests per second on that machine that produced these results, 23 requests per second with an average latency exceeding 3 seconds is dismal.

Diagnosing

Discovering The Problem Area

We can diagnose the application with a single command, thanks to Clinic Doctor’s –on-port command. Within the app folder we run:

clinic doctor –on-port=’autocannon -c100 localhost:$PORT/seed/v1’ — node index.js

This will create an HTML file that will automatically open in our browser when profiling is complete.

The results should look something like the following:

Clinic Doctor has detected an Event Loop issue

Clinic Doctor results

The Doctor is telling us that we have probably had an Event Loop issue.

Along with the message near the top of the UI, we can also see that the Event Loop chart is red, and shows a constantly increasing delay. Before we dig deeper into what this means, let’s first understand the effect the diagnosed issue is having on the other metrics.

We can see the CPU is consistently at or above 100% as the process works hard to process queued requests. Node’s JavaScript engine (V8) actually uses two CPU cores. One for the Event Loop and the other for Garbage Collection. When we see the CPU spiking up to 120% in some cases, the process is collecting objects related to handled requests.

We see this correlated in the Memory graph. The solid line in the Memory chart is the Heap Used metric. Any time there’s a spike in CPU we see a fall in the Heap Used line, showing that memory is being deallocated.

Active Handles are unaffected by the Event Loop delay. An active handle is an object that represents either I/O (such as a socket or file handle) or a timer (such as a setInterval). We instructed AutoCannon to open 100 connections (-c100). Active handles stay a consistent count of 103. The other three are handles for STDOUT, STDERR, and the handle for the server itself.

If we click the Recommendations panel at the bottom of the screen, we should see something like the following:

Clinic Doctor recommendations panel opened

Viewing issue specific recommendations

Short-Term Mitigation

Root cause analysis of serious performance issues can take time. In the case of a live deployed project, it’s worth adding overload protection to servers or services. The idea of overload protection is to monitor event loop delay (among other things), and respond with “503 Service Unavailable” if a threshold is passed. This allows a load balancer to fail over to other instances, or in the worst case means users will have to refresh. The overload-protection module can provide this with minimum overhead for Express, Koa, and Restify. The Hapi framework has a load configuration setting which provides the same protection.

Understanding The Problem Area

As the short explanation in Clinic Doctor explains, if the Event Loop is delayed to the level that we’re observing it’s very likely that one or more functions are “blocking” the Event Loop.

It’s especially important with Node.js to recognize this primary JavaScript characteristic: asynchronous events cannot occur until currently executing code has completed.

This is why a setTimeout cannot be precise.

For instance, try running the following in a browser’s DevTools or the Node REPL:

console.time(‘timeout’)
setTimeout(console.timeEnd, 100, ‘timeout’)
let n = 1e7
while (n–) Math.random()

The resulting time measurement will never be 100ms. It will likely be in the range of 150ms to 250ms. The setTimeout scheduled an asynchronous operation (console.timeEnd), but the currently executing code has not yet complete; there are two more lines. The currently executing code is known as the current “tick.” For the tick to complete, Math.random has to be called ten million times. If this takes 100ms, then the total time before the timeout resolves will be 200ms (plus however long it takes the setTimeout function to actually queue the timeout beforehand, usually a couple of milliseconds).

In a server-side context, if an operation in the current tick is taking a long time to complete requests cannot be handled, and data fetching cannot occur because asynchronous code will not be executed until the current tick has completed. This means that computationally expensive code will slow down all interactions with the server. So it’s recommended to split out resource intense work into separate processes and call them from the main server, this will avoid cases where on rarely used but expensive route slows down the performance of other frequently used but inexpensive routes.

The example server has some code that is blocking the Event Loop, so the next step is to locate that code.

Analyzing

One way to quickly identify poorly performing code is to create and analyze a flame graph. A flame graph represents function calls as blocks sitting on top of each other — not over time but in aggregate. The reason it’s called a ‘flame graph’ is because it typically uses an orange to red color scheme, where the redder a block is the “hotter” a function is, meaning, the more it’s likely to be blocking the event loop. Capturing data for a flame graph is conducted through sampling the CPU — meaning that a snapshot of the function that is currently being executed and it’s stack is taken. The heat is determined by the percentage of time during profiling that a given function is at the top of the stack (e.g. the function currently being executed) for each sample. If it’s not the last function to ever be called within that stack, then it’s likely to be blocking the event loop.

Let’s use clinic flame to generate a flame graph of the example application:

clinic flame –on-port=’autocannon -c100 localhost:$PORT/seed/v1’ — node index.js

The result should open in our browser with something like the following:

Clinic’s flame graph shows that server.on is the bottleneck

Clinic’s flame graph visualization

The width of a block represents how much time it spent on CPU overall. Three main stacks can be observed taking up the most time, all of them highlighting server.on as the hottest function. In truth, all three stacks are the same. They diverge because during profiling optimized and unoptimized functions are treated as separate call frames. Functions prefixed with a * are optimized by the JavaScript engine, and those prefixed with a ~ are unoptimized. If the optimized state isn’t important to us, we can simplify the graph further by pressing the Merge button. This should lead to view similar to the following:

Merged flame graph

Merging the flame graph

From the outset, we can infer that the offending code is in the util.js file of the application code.

The slow function is also an event handler: the functions leading up to the function are part of the core events module, and server.on is a fallback name for an anonymous function provided as an event handling function. We can also see that this code isn’t in the same tick as code that actually handles the request. If there were functions in the core, http, net, and stream would be in the stack.

Such core functions can be found by expanding other, much smaller, parts of the flame graph. For instance, try using the search input on the top right of the UI to search for send (the name of both restify and http internal methods). It should be on the right of the graph (functions are alphabetically sorted):

Flame graph has two small blocks highlighted which represent HTTP processing function

Searching the flame graph for HTTP processing functions

Notice how comparatively small all the actual HTTP handling blocks are.

We can click one of the blocks highlighted in cyan which will expand to show functions like writeHead and write in the http_outgoing.js file (part of Node core http library):

Flame graph has zoomed into a different view showing HTTP related stacks

Expanding the flame graph into HTTP relevant stacks

We can click all stacks to return to the main view.

The key point here is that even though the server.on function isn’t in the same tick as the actual request handling code, it’s still affecting the overall server performance by delaying the execution of otherwise performant code.

Debugging

We know from the flame graph that the problematic function is the event handler passed to server.on in the util.js file.

Let’s take a look:

server.on(‘after’, (req, res) => {
if (res.statusCode !== 200) return
if (!res._body) return
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
const etag = crypto.createHash(‘sha512’)
.update(JSON.stringify(res._body))
.digest()
.toString(‘hex’)
if (cache[key] !== etag) cache[key] = etag
})

It’s well known that cryptography tends to be expensive, as does serialization (JSON.stringify) but why don’t they appear in the flame graph? These operations are in the captured samples, but they’re hidden behind the cpp filter. If we press the cpp button we should see something like the following:

Additional blocks related to C++ have been revealed in the flame graph (main view)

Revealing serialization and cryptography C++ frames

The internal V8 instructions relating to both serialization and cryptography are now shown as the hottest stacks and as taking up most of the time. The JSON.stringify method directly calls C++ code; this is why we don’t see a JavaScript function. In the cryptography case, functions like createHash and update are in the data, but they are either inlined (which means they disappear in the merged view) or too small to render.

Once we start to reason about the code in the etagger function it can quickly become apparent that it’s poorly designed. Why are we taking the server instance from the function context? There’s a lot of hashing going on, is all of that necessary? There’s also no If-None-Match header support in the implementation which would mitigate some of the load in some real-world scenarios because clients would only make a head request to determine freshness.

Let’s ignore all of these points for the moment and validate the finding that the actual work being performed in server.on is indeed the bottleneck. This can be achieved by setting the server.on code to an empty function and generating a new flamegraph.

Alter the etagger function to the following:

function etagger () {
var cache = {}
var afterEventAttached = false
function attachAfterEvent (server) {
if (attachAfterEvent === true) return
afterEventAttached = true
server.on(‘after’, (req, res) => {})
}
return function (req, res, next) {
attachAfterEvent(this)
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
if (key in cache) res.set(‘Etag’, cache[key])
res.set(‘Cache-Control’, ‘public, max-age=120’)
next()
}
}

The event listener function passed to server.on is now a no-op.

Let’s run clinic flame again:

clinic flame –on-port=’autocannon -c100 localhost:$PORT/seed/v1′ — node index.js

This should produce a flame graph similar to the following:

Flame graph shows that Node.js event system stacks are still the bottleneck

Flame graph of the server when server.on is an empty function

This looks better, and we should have noticed an increase in request per second. But why is the event emitting code so hot? We would expect at this point for the HTTP processing code to take up the majority of CPU time, there’s nothing executing at all in the server.on event.

This type of bottleneck is caused by a function being executed more than it should be.

The following suspicious code at the top of util.js may be a clue:

require(‘events’).defaultMaxListeners = Infinity

Let’s remove this line and start our process with the –trace-warnings flag:

node –trace-warnings index.js

If we profile with AutoCannon in another terminal, like so:

autocannon -c100 localhost:3000/seed/v1

Our process will output something similar to:

(node:96371) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 after listeners added. Use emitter.setMaxListeners() to increase limit
at _addListener (events.js:280:19)
at Server.addListener (events.js:297:10)
at attachAfterEvent
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/util.js:22:14)
at Server.
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/util.js:25:7)
at call
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:164:9)
at next
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:120:9)
at Chain.run
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/chain.js:123:5)
at Server._runUse
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:976:19)
at Server._runRoute
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:918:10)
at Server._afterPre
(/Users/davidclements/z/nearForm/keeping-node-fast/slow/node_modules/restify/lib/server.js:888:10)

Node is telling us that lots of events are being attached to the server object. This is strange because there’s a boolean that checks if the event has been attached and then returns early essentially making attachAfterEvent a no-op after the first event is attached.

Let’s take a look at the attachAfterEvent function:

var afterEventAttached = false
function attachAfterEvent (server) {
if (attachAfterEvent === true) return
afterEventAttached = true
server.on(‘after’, (req, res) => {})
}

The conditional check is wrong! It checks whether attachAfterEvent is true instead of afterEventAttached. This means a new event is being attached to the server instance on every request, and then all prior attached events are being fired after each request. Whoops!

Optimizing

Now that we’ve discovered the problem areas, let’s see if we can make the server faster.

Low-Hanging Fruit

Let’s put the server.on listener code back (instead of an empty function) and use the correct boolean name in the conditional check. Our etagger function looks as follows:

function etagger () {
var cache = {}
var afterEventAttached = false
function attachAfterEvent (server) {
if (afterEventAttached === true) return
afterEventAttached = true
server.on(‘after’, (req, res) => {
if (res.statusCode !== 200) return
if (!res._body) return
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
const etag = crypto.createHash(‘sha512’)
.update(JSON.stringify(res._body))
.digest()
.toString(‘hex’)
if (cache[key] !== etag) cache[key] = etag
})
}
return function (req, res, next) {
attachAfterEvent(this)
const key = crypto.createHash(‘sha512’)
.update(req.url)
.digest()
.toString(‘hex’)
if (key in cache) res.set(‘Etag’, cache[key])
res.set(‘Cache-Control’, ‘public, max-age=120’)
next()
}
}

Now we check our fix by profiling again. Start the server in one terminal:

node index.js

Then profile with AutoCannon:

autocannon -c100 localhost:3000/seed/v1

We should see results somewhere in the range of a 200 times improvement (Running 10s test @ http://localhost:3000/seed/v1 — 100 connections):

Stat
Avg
Stdev
Max

Latency (ms)
19.47
4.29
103

Req/Sec
5011.11
506.2
5487

Bytes/Sec
51.8 MB
5.45 MB
58.72 MB

50k requests in 10s, 519.64 MB read

It’s important to balance potential server cost reductions with development costs. We need to define, in our own situational contexts, how far we need to go in optimizing a project. Otherwise, it can be all too easy to put 80% of the effort into 20% of the speed enhancements. Do the constraints of the project justify this?

In some scenarios, it could be appropriate to achieve a 200 times improvement with a low hanging fruit and call it a day. In others, we may want to make our implementation as fast as it can possibly be. It really depends on project priorities.

One way to control resource spend is to set a goal. For instance, 10 times improvement, or 4000 requests per second. Basing this on business needs makes the most sense. For instance, if server costs are 100% over budget, we can set a goal of 2x improvement.

Is your pattern library up to date today? Alla Kholmatova has just finished a fully fledged book on Design Systems and how to get them right. With common traps, gotchas and the lessons she learned. Hardcover, eBook. Just sayin’.

Table of Contents →

Taking It Further

If we produce a new flame graph of our server, we should see something similar to the following:

Flame graph still shows server.on as the bottleneck, but a smaller bottleneck

Flame graph after the performance bug fix has been made

The event listener is still the bottleneck, it’s still taking up one-third of CPU time during profiling (the width is about one third the whole graph).

What additional gains can be made, and are the changes (along with their associated disruption) worth making?

With an optimized implementation, which is nonetheless slightly more constrained, the following performance characteristics can be achieved (Running 10s test @ http://localhost:3000/seed/v1 — 10 connections):

Stat
Avg
Stdev
Max

Latency (ms)
0.64
0.86
17

Req/Sec
8330.91
757.63
8991

Bytes/Sec
84.17 MB
7.64 MB
92.27 MB

92k requests in 11s, 937.22 MB read

While a 1.6x improvement is significant, it arguable depends on the situation whether the effort, changes, and code disruption necessary to create this improvement are justified. Especially when compared to the 200x improvement on the original implementation with a single bug fix.

To achieve this improvement, the same iterative technique of profile, generate flamegraph, analyze, debug, and optimize was used to arrive at the final optimized server, the code for which can be found here.

The final changes to reach 8000 req/s were:

Don’t build objects and then serialize, build a string of JSON directly;
Use something unique about the content to define it’s Etag, rather than creating a hash;
Don’t hash the URL, use it directly as the key.

These changes are slightly more involved, a little more disruptive to the code base, and leave the etagger middleware a little less flexible because it puts the burden on the route to provide the Etag value. But it achieves an extra 3000 requests per second on the profiling machine.

Let’s take a look at a flame graph for these final improvements:

Flame graph shows that internal code related to the net module is now the bottleneck

Healthy flame graph after all performance improvements

The hottest part of the flame graph is part of Node core, in the net module. This is ideal.

Preventing Performance Problems

To round off, here are some suggestions on ways to prevent performance issues in before they are deployed.

Using performance tools as informal checkpoints during development can filter out performance bugs before they make it into production. Making AutoCannon and Clinic (or equivalents) part of everyday development tooling is recommended.

When buying into a framework, find out what it’s policy on performance is. If the framework does not prioritize performance, then it’s important to check whether that aligns with infrastructural practices and business goals. For instance, Restify has clearly (since the release of version 7) invested in enhancing the library’s performance. However, if low cost and high speed is an absolute priority, consider Fastify which has been measured as 17% faster by a Restify contributor.

Watch out for other widely impacting library choices — especially consider logging. As developers fix issues, they may decide to add additional log output to help debug related problems in the future. If an unperformant logger is used, this can strangle performance over time after the fashion of the boiling frog fable. The pino logger is the fastest newline delimited JSON logger available for Node.js.

Finally, always remember that the Event Loop is a shared resource. A Node.js server is ultimately constrained by the slowest logic in the hottest path.

Smashing Editorial
(rb, ra, il)

Microsoft to Buy GitHub; Controversy Scheduled for This Week

Original Source: https://www.webdesignerdepot.com/2018/06/microsoft-to-buy-github-controversy-scheduled-for-this-week/

So yeah, what the title said. Microsoft is buying GitHub for 7.5 BILLION with a “B” US dollars. This is officially this week’s Big DealTM, and everyone’s going to be talking about it. It would not be quite accurate to say that GitHub powers software development as a whole, but it powers a lot of it. GitHub’s friendliness to — and free repositories for — open source software have made it nigh on indispensable for many developers around the world.

So now some people are freaking out. People unfamiliar with tech history or the open source world might wonder why. After all, companies change hands all the time. Sometimes that works out for consumers, and sometimes it doesn’t. I personally think it will work out, but I can understand why some people are angry.

GitHub’s friendliness to…open source software have made it nigh on indispensable for many developers

You see, once upon a time, Microsoft was the de facto bad guy of the tech world, and many people still see them that way. From the very beginning, MS embraced some pretty predatory business practices that put them in bad standing with users. Even after the famous antitrust case that broke their impending monopoly on web browsers (yeah, that almost happened), Microsoft has a record of buying good products and then killing them at a rate that rivals Electronic Arts.

What’s more, the Linux and open source community in particular got burned over the years, as Microsoft made a habit of using their advertising budget to spread unsubstantiated claims about Linux, other enterprise-focused operating systems, and open source data security options. People are still sore about that.

products Microsoft hasn’t killed have often ended up feeling rather lackluster

The products Microsoft hasn’t killed have often ended up feeling rather lackluster. Think of Skype, for example.

But I don’t think all is lost. No, Microsoft didn’t suddenly have a collective change of heart, and turn into do-gooders. I think they’ve just realized that ticking off everyone who isn’t them is a poor long-term business strategy. We live in a world where consumers increasingly demand that corporations at least pretend to be good guys, and so Microsoft seems to have changed their modus operandi, to some extent.

They bought LinkedIn for over 20 billion USD, and have let it run more or less as it did before. They released Visual Studio Code—one of the best code editors for Windows that we’ve had in a while—and it’s even open source.

Most telling, they killed Codeplex, their onetime competitor to GitHub, and started putting a lot of their own open source code on the latter platform. All of these actions directly contradict the old patterns Microsoft used to follow.

If they care at all about the goodwill they have earned themselves in the past few years, it would be best to let GitHub be GitHub. If they continue to follow this new pattern, they probably will. Indeed, in Microsoft’s own post on the subject, they state that they intend to let GitHub operate independently.

Acquisition will empower developers, accelerate GitHub’s growth and advance Microsoft services with new audiences

So do we believe them? Why buy GitHub at all, if they’re not going to monetize the hell out of it? Well they will, just not in the way everybody seems to fear. Microsoft doesn’t make most of their money from Windows by selling it to individual users. They do it by selling it to enterprise-level customers, and supporting it. The same goes for Microsoft Office Subscriptions. The indications seem to be pointing in the same direction for GitHub.

Microsoft will most likely develop and sell enterprise-specific tools and services around GitHub to entice their biggest customers onto the platform. They don’t want your money, they want that corporation money. I strongly suspect that for most individual developers and open source projects, the GitHub experience will remain unchanged.

So the average dev could probably look at this sale as a positive change, or at least a neutral one. Failing that, there’s always Gitlab or Bitbucket.

Add Realistic Chalk and Sketch Lettering Effects with Sketch’it – only $5!

Source

p img {display:inline-block; margin-right:10px;}
.alignleft {float:left;}
p.showcase {clear:both;}
body#browserfriendly p, body#podcast p, div#emailbody p{margin:0;}

How To Create An Innovative Web Design Agency Website in 5 Steps

Original Source: http://feedproxy.google.com/~r/Designrfix/~3/G7CLXu9AnnI/how-to-create-an-innovative-web-design-agency-website-in-5-steps

If you want your business to be prosperous and popular among customers, it’s indispensable to create a website for it. The worldwide web is the first place to which people refer in search of new knowledge, inspiration, and resources that will get specific types of services done in the pro way. Are you a freelance […]

The post How To Create An Innovative Web Design Agency Website in 5 Steps appeared first on designrfix.com.

How to Zoom This Close Into Google Maps

Original Source: https://www.hongkiat.com/blog/how-to-zoom-this-close-into-google-maps/

It is almost impossible to imagine doing day-trips or traveling to a new place without checking it out on Google Maps. Unfortunately, it restricts to zoom in after a certain level. However, there is…

Visit hongkiat.com for full content.

The Trouble with Cheap eCommerce

Original Source: http://feedproxy.google.com/~r/1stwebdesigner/~3/S6YL4ZuvVoQ/

It used to be that building an eCommerce website was an arduous and expensive task. And while that is still the case in some specialized situations, those barriers have been largely removed when it comes to more mainstream usage.

Take WooCommerce, for example. It’s a free eCommerce plugin for WordPress; the most widely used content management system on Earth. Right out of the box, it enables anyone to sell their products and accept payments online. If you need more specialized functionality, it’s widely available in the form of free or reasonably-priced premium extensions.

This is certainly a great development for small businesses that don’t necessarily have a huge budget for building a website. But, what type of impact does it have on eCommerce overall? And what, if any, negative side effects have “cheap” eCommerce platforms had on web designers?

Square Holes and Round Pegs

It seems that, no matter how many online stores you build, no two will be exactly the same. Products, services and even business owners are all variables that need to be taken into consideration – and that’s even before you start the design and development process.

On the surface, it may look as though a tool like WooCommerce is perfect to handle all the different quirks that go along with customizing an eCommerce site. After all, you get to pick and choose which extensions you need. Plus, skilled developers can even create their own solutions.

Yet, it often feels like we’re trying to bend and shape extensions or the basic cart itself to fit into our own narrow use cases. The results are mixed, with some features essentially going against the grain of what the original software was intended to do.

Yes, we have options, but what if those options don’t really align with our needs? And this isn’t just limited to WooCommerce. Other, more proprietary eCommerce suites aren’t necessarily more flexible – some are even less so.

The problem here is that a one-size-fits-all approach means that site owners won’t necessarily get everything they want. That shouldn’t even be a problem, as low-cost solutions aren’t meant to attend to each and every need. But that brings us to the next point.

Square Holes and Round Pegs

The Expectation of Low (Or No) Cost

Because the barrier to entry is so low, many seem to think that eCommerce can and should be done on the cheap. The expectation is that, no matter the need, a top-quality shop can be built for very little cost.

Sometimes, that expectation actually comes to fruition. Depending on a client’s specific needs, it is possible to build something that looks great and performs the necessary functions on a tight budget. However, it doesn’t mean that every case is going to turn out that well.

The more realistic view is that each and every feature that goes into a website has an associated cost. This is especially true for eCommerce, where a seemingly “little” tweak can take up a lot of time and resources to implement.

But because there are so many free and low-cost tools out there, some clients simply expect that everything can be taken care of with minimal effort and at extremely little cost. Personally, I’ve seen cases where site owners refused to even purchase a fairly cheap but specific bit of functionality that was critical to making sure orders came through correctly.

It may be a risk they were willing to take, but the approach was very short-sighted.

The Expectation of Low (Or No) Cost

Above All, eCommerce is an Investment

As the professional designers and developers in the room, it’s up to us to communicate exactly what goes into making an eCommerce site work. That doesn’t need to include every single technical detail. But it should include an honest assessment of how complex the entire process is and that one-size-fits-all often means making some sacrifices.

Even more important is that clients should understand that an investment in their eCommerce site is an investment in their own success. It’s understandable that some far-flung features could be put on hold until there are more resources available. However, there are some types of functionality that are simply too vital to skimp on.

Once, I worked with a client who utilized a SaaS shopping cart provider that increased their monthly subscription fees. As any of us might, the client lamented the fact that costs were going up. But when you looked at the bigger picture, the price hike was miniscule when compared to the amount of money being made off of the website itself. It was a relatively small price to pay for success.

If that same client were to sell through more traditional brick-and-mortar channels, their overhead costs would have been significantly more. Yet, because they were used to paying very little for eCommerce capabilities, the expectations were completely different.

This is a point worth making to clients who scoff at paying a bit extra for a worthy investment. Relatively speaking, the potential rewards for doing things the right way can easily outweigh the initial cost.

eCommerce is an Investment

Keeping It Real

While free and low-cost eCommerce isn’t the right way to go for everyone, it can still be quite effective for many businesses. The key is in understanding what it can and can’t do, along with keeping realistic expectations.

The bottom line is that you’re not going to create a site that works exactly like Amazon on a shoestring budget. Clients often see what the “big” players are doing and naturally want to mimic their success. While we can certainly understand their hopes, we also need to communicate what can be done for what they’re able to spend.

Overall, it’s great to see that anyone can enter the eCommerce game. Our goal as designers should be to help clients learn about the positives, negatives and realities of selling online.


Brand Identity for New York City Architecture Firm Dash Marshall

Original Source: http://feedproxy.google.com/~r/abduzeedo/~3/Cll8SUM5pwk/brand-identity-new-york-city-architecture-firm-dash-marshall

Brand Identity for New York City Architecture Firm Dash Marshall

Brand Identity for New York City Architecture Firm Dash Marshall

abduzeedo
Jun 05, 2018

TwoPoints.Net shared a beautiful brand identity project for the New York City architecture firm Dash Marshall. When designing the corporate identity they realized that architecture acts in the intersection of the old and the new, the static and the flexible, the properties of matter and the lives of people. Within these constraints, Dash Marshall creates spaces which tell the stories of their habitants and invites them to create new ones.

“Just as Michel de Certeau argued that spatial stories are what actuate the notion of place, our physical environments can give rise to new characters and events by organizing, proffering and collectivizing human sensibilities. They may even allow certain transgressions to occur, as the Independent Group aspired to do. For this reason, an architecture that upholds its commitment to its users holds tremendous power: its narratives of the past and present are the framework from which to imagine the future scripts of tomorrow.” writes Esther Choi (estherchoi.net) in the preface of the book “Matter Battle, 45 Lessons Learned” by Dash Marshal

The obvious eventually came to us as a surprise. Today’s corporate communication has become almost exclusively digital. It is context-responsive, morphological and semiological, and almost unaware of physical constraints. To design a consistent visual language for an architecture office, acting in the material, but communicating in the immaterial world, was the challenge. Our solution is a flexible visual identity which works within a confined space of the letters “D” and “M”. Like outer walls of an apartment or the plot of a house, the letters “DM” create a confined space, but within this framework nearly anything is possible.

To tell the stories of Dash Marshall we have not just designed their Visual Identity, but also their website, the book “Matter Battles: 45 Lessons Learned” and the booklet “Small Measures”.

Client: Dash Marshall
Year: 2015—2018

The letters “DM”, drawn in the isometric perspective, are the archetype of the visual identity. The lines of the letters may be removed and colored, creating a multitude of variations of the icon.

Brand Identity

Dash Marshall’s architecture plays with contradictions as old and new, classic and modern, emotional and rational. To visualize these contrasts we added the drawn Berlingske to the constructed graphic system.

“Matter Battle, 45 Lessons Learned” by Dash Marshall.

Producing a beautiful book has to be considered today a statement in itself. The time, work and money going into a physical object, which will be given away to only 200 select individuals, shows the appreciation of the constraints of the physical world.

Along with the big book, comes a smaller, shorter book called “Small Measures”, focusing on the details of the projects and presenting them only in cropped images. The combination of a large and small book give Dash Marshall the flexibility to convey their work in different ways based on the needs of a given situation. A small book for small meetings, A big book for more substantial introductions, or both for moments of special gratitude.

 

branding


Brand Identity for Really. by Tata&Friends Studio

Original Source: http://feedproxy.google.com/~r/abduzeedo/~3/i9qU546aZbM/brand-identity-really-tatafriends-studio

Brand Identity for Really. by Tata&Friends Studio

Brand Identity for Really. by Tata&Friends Studio

abduzeedo
Jun 04, 2018

Tata&Friends Studio shared a beautiful brand identity project on their Behance profile. It’s for Everis’ content agency. There are many things to talk about the design solution but for me, the most important is the simplicity. I love seeing projects that rely on simple typography with handpicked visual ornaments to focus on the basics. It’s all the contrast of types and wise usage of white space.

Really is the content agency of everis. As content creators their work range from illustrations, infographics – to video production, a wide range of different creative projects. Really craft contents and visual solutions for brands. Our approach was to define the naming and visual universe of the brand. 

Brand identity

Naming: 

The name Really. is a statement, represents a solution, a final product, something to be proud of. 

Visual: 

We use holographic stamping to create “the metaphor of everything” in order to represent the creative result.


Tata&Friends Studio is ta design muscle for positive brands. They believe in process, research, experiments, curiosity and positive thinking. It is a collaborative studio, a place to grow, to collaborate, to learn and to share knowledge.