Optimizing Javascript for fun and for profit
I often feel like javascript code in general runs much slower than it could, simply because it’s not optimized properly. Here is a summary of common optimization techniques I’ve found useful. Note that the tradeoff for performance is often readability, so the question of when to go for performance versus readability is a question left to the reader. I’ll also note that talking about optimization necessarily requires talking about benchmarking. Micro-optimizing a function for hours to have it run 100x faster is meaningless if the function only represented a fraction of the actual overall runtime to start with. If one is optimizing, the first and most important step is benchmarking. I’ll cover the topic in the later points. Be also aware that micro-benchmarks are often flawed, and that may include those presented here. I’ve done my best to avoid those traps, but don’t blindly apply any of the points presented here without benchmarking.
I have included runnable examples for all cases where it’s possible. They show by default the results I got on my machine (brave 122 on archlinux) but you can run them yourself. As much as I hate to say it, Firefox has fallen a bit behind in the optimization game, and represents a very small fraction of the traffic for now, so I don’t recommend using the results you’d get on Firefox as useful indicators.
0. Avoid work
This might sound evident, but it needs to be here because there can’t be another first step to optimization: if you’re trying to optimize, you should first look into avoiding work. This includes concepts like memoization, laziness and incremental computation. This would be applied differently depending on the context. In React, for example, that would mean applying memo()
, useMemo()
and other applicable primitives.
1. Avoid string comparisons
Javascript makes it easy to hide the real cost of string comparisons. If you need to compare strings in C, you’d use the strcmp(a, b)
function. Javascript uses ===
instead, so you don’t see the strcmp
. But it’s there, and a string comparison will usually (but not always) require comparing each of the characters in the string with the ones in the other string; string comparison is O(n)
. One common JavaScript pattern to avoid is strings-as-enums. But with the advent of TypeScript this should be easily avoidable, as
enums are integers by default.
Here is a comparison of the costs:
As you can see, the difference can be significant. The difference isn’t necessarily due to the strcmp
cost as engines can sometimes use a string pool and compare by reference, but it’s also due to the fact that integers are usually passed by value in JS engines, whereas strings are always passed as pointers, and memory accesses are expensive (see section 5). In string-heavy code, this can have a huge impact.
For a real-world example, I was able to make this JSON5 javascript parser run 2x faster* just by replacing string constants with numbers.
*Unfortunately it wasn’t merged, but that’s how open-source is.
2. Avoid different shapes
Javascript engines try to optimize code by assuming that objects have a specific shape, and that functions will receive objects of the same shape. This allows them to store the keys of the shape once for all objects of that shape, and the values in a separate flat array. To represent it in javascript:
For example, at runtime if the following function receives two objects with the shape { x: number, y: number }
, the engine is going to speculate that future objects will have the same shape, and generate machine code optimized for that shape.
If one would instead pass an object not with the shape { x, y }
but with the shape { y, x }
, the engine would need to undo its speculation and the function would suddenly become considerably slower. I’m going to limit my explanation here because you should read the excellent post from mraleph if you want more details, but I’m going to highlight that V8 in particular has 3 modes, for accesses that are: monomorphic (1 shape), polymorphic (2-4 shapes), and megamorphic (5+ shapes). Let’s say you really want to stay monomorphic, because the slowdown is drastic:
What the eff should I do about this?
Easier said than done but: create all your objects with the exact same shape. Even something as trivial as writing your React component props in a different order can trigger this.
For example, here are simple cases I found in React’s codebase, but they already had a much higher impact case of the same problem a few years ago because they were initializing an object with an integer, then later storing a float. Yes, changing the type also changes the shape. Yes, there are integer and float types hidden behind number
. Deal with it.
3. Avoid array/object methods
I love functional programming as much as anyone else, but unless you’re working in Haskell/OCaml/Rust where functional code gets compiled to efficient machine code, functional will always be slower than imperative.
The problem with those methods is that:
- They need to make a full copy of the array, and those copies later need to be freed by the garbage collector. We will explore more in details the issues of memory I/O in section 5.
- They loop N times for N operations, whereas a
for
loop would allow looping once.
Object methods such as Object.values()
, Object.keys()
and Object.entries()
suffer from similar problems, as they also allocate more data, and memory accesses are the root of all performance issues. No really I swear, I’ll show you in section 5.
4. Avoid indirection
Another place to look for optimization gains is any source of indirection, of which I can see 3 main sources:
The proxy benchmark is particularly brutal on V8 at the moment. Last time I checked, proxy objects were always falling back from the JIT to the interpreter, seeing from those results it might still be the case.
I also wanted to showcase accessing a deeply nested object vs direct access, but engines are very good at optimizing away object accesses via escape analysis when there is a hot loop and a constant object. I inserted a bit of indirection to prevent it.
5. Avoid cache misses
This point requires a bit of low-level knowledge, but has implications even in javascript, so I’ll explain. From the CPU point of view, retrieving memory from RAM is slow. To speed things up, it uses mainly two optimizations.
5.1 Prefetching
The first one is prefetching: it fetches more memory ahead of time, in the hope that it’s the memory you’ll be interested in. It always guesses that if you request one memory address, you’ll be interested in the memory region that comes right after that. So accessing data sequentially is the key. In the following example, we can observe the impact of accessing memory in random order.
What the eff should I do about this?
This aspect is probably the hardest to put in practice, because javascript doesn’t have a way of placing objects in memory, but you can use that knowledge to your advantage as in the example above, for example to operate on data before re-ordering or sorting it. You cannot assume that objects created sequentially will stay at the same location after some time because the garbage collector might move them around. There is one exception to that, and it’s arrays of numbers, preferably TypedArray
instances:
For a more detailed example, see this link* .
*Note that it contains some optimizations that are now outdated, but it’s still accurate overall.
5.2 Caching in L1/2/3
The second optimization CPUs use is the L1/L2/L3 caches: those are like faster RAMs, but they are also more expensive, so they are much smaller. They contain RAM data, but act as an LRU cache. Data comes in while it’s “hot” (being worked on), and is written back to the main RAM when new working data needs the space. So the key here is use as little data as possible to keep your working dataset in the fast caches. In the following example, we can observe what are the effects of busting each of the successive caches.
What the eff should I do about this?
Ruthlessly eliminate every single data or memory allocations that can be eliminated. The smaller your dataset is, the faster your program will run. Memory I/O is the bottleneck for 95% of programs. Another good strategy can be to split your work into chunks, and ensure you work on a small dataset at a time.
For more details on CPU and memory, see this link.
6. Avoid large objects
As explained in section 2, engines use shapes to optimize objects. However, when the shape grows too large, the engine has no choice but to use a regular hashmap (like a Map
object). And as we saw in section 5, cache misses decrease performance significantly. Hashmaps are prone to this because their data is usually randomly & evenly distributed over the memory region they occupy. Let’s see how it behaves with this map of some users indexed by their ID.
And we can also observe how the performance keeps degrading as the object size grows:
What the eff should I do about this?
As demonstrated above, avoid having to frequently index into large objects. Prefer turning the object into an array beforehand. Organizing your data to have the ID on the model can help, as you can use Object.values()
and not have to refer to the key map to get the ID.
7. Use eval
Some javascript patterns are hard to optimize for engines, and by using eval()
or its derivatives you can make those patterns disappear. In this example, we can observe how using eval()
avoids the cost of creating an object with a dynamic object key:
Another good use-case for eval
could be to compile a filter predicate function where you discard the branches that you know will never be taken. In general, any function that is going to be run in a very hot loop is a good candidate for this kind of optimization.
Obviously the usual warnings about eval()
apply: don’t trust user input, sanitize anything that gets passed into the eval()
‘d code, and don’t create any XSS possibility. Also note that some environments don’t allow access to eval()
, such as browser pages with a CSP.
8. Use strings, carefully
We’ve already seen above how strings are more expensive than they appear. Well I have kind of a good news/bad news situation here, which I’ll announce in the only logical order (bad first, good second): strings are more complex than they appear, but they can also be quite efficient used well.
String operations are a core part of JavaScript due to its context. To optimize string-heavy code, engines had to be creative. And by that I mean, they had to represent the String
object with multiple string representation in C++, depending on the use case. There are two general cases you should worry about, because they hold true for V8 (the most common engine by far), and generally also in other engines.
First, strings concatenated with +
don’t create a copy of the two input strings. The operation creates a pointer to each substring. If it was in typescript, it would be something like this:
Second, string slices also don’t need to create copies: they can simply point to a range in another string. To continue with the example above:
But here’s the issue: once you need to start mutating those bytes, that’s the moment you start paying copy costs. Let’s say we go back to our String
class and try to add a .trimEnd
method:
So let’s jump to an example where we compare using operations that use mutation versus only using concatenation:
What the eff should I do about this?
In general, try to avoid mutation for as long as possible. This includes methods such as .trim()
, .replace()
, etc. Consider how you can avoid those methods. In some engines, string templates can also be slower than +
. In V8 at the moment it’s the case, but might not be in the future so as always, benchmark.
A note on SlicedString
above, you should note that if a small substring to a very large string is alive in memory, it might prevent the garbage collector from collecting the large string! If you’re processing large texts and extracting small strings from it, you might be leaking large amounts of memory.
The solution here is to use mutation methods to our advantage. If we use one of them on small
, it will force a copy, and the old pointer to large
will be lost:
For more details, see string.h on V8 or JSString.h on JavaScriptCore.
9. Use specialization
One important concept in performance optimization is specialization: adapting your logic to fit in the constraints of your particular use-case. This usually means figuring out what conditions are likely to be true for your case, and coding for those conditions.
Let’s say we are a merchant that sometimes needs to add tags to their product list. We know from experience that our tags are usually empty. Knowing that information, we can specialize our function for that case:
This sort of optimization can give you moderate improvements, but those will add up. They are a nice addition to more crucial optimizations, like shapes and memory I/O. But note that specialization can turn against you if your conditions change, so be careful when applying this one.
10. Data structures
I won’t go in details about data structures as they would require their own post. But be aware that using the incorrect data structures for your use-case can have a bigger impact than any of the optimizations above. I would suggest you to be familiar with the native ones like Map
and Set
, and to learn about linked lists, priority queues, trees (RB and B+) and tries.
But for a quick example, let’s compare how Array.includes
does against Set.has
for a small list:
As you can see, the data structure choice makes a very impactful difference.
As a real-world example, I had a case where we were able to reduce the runtime of a function from 5 seconds to 22 milliseconds by switching out an array with a linked list.
11. Benchmarking
I’ve left this section for the end for one reason: I needed to establish credibility with the fun sections above. Now that I (hopefully) have it, let me tell you that benchmarking is the most important part of optimization. Not only is it the most important, but it’s also hard. Even after 20 years of experience, I still sometimes create benchmarks that are flawed, or use the profiling tools incorrectly. So whatever you do, please put the most effort into benchmarking correctly.
11.0 Start with the top
Your priority should always be to optimize the function/section of code that makes up the biggest part of your runtime. If you spend time optimizing anything else than the top, you are wasting time.
11.1 Avoid micro-benchmarks
Run your code in production mode and base your optimizations on those observations. JS engines are very complex, and will often behave differently in micro-benchmarks than in real-world scenarios. For example, take this micro-benchmark:
If you’ve payed attention sooner, you will realize that the engine will specialize the function for the shape { type: string, count: number }
. But does that hold true in your real-world use-case? Are a
and b
always of that shape, or will you receive any kind of shape? If you receive many shapes in production, this function will behave differently then.
11.2 Doubt your results
If you’ve just optimized a function and it now runs 100x faster, doubt it. Try to disprove your results, try it in production mode, throw stuff at it. Similarly, doubt also your tools. The mere fact of observing a benchmark with devtools can modify its behavior.
11.3 Pick your target
Different engines will optimize certain patterns better or worse than others. You should benchmark for the engine(s) that are relevant to you, and prioritize which one is more important. Here’s a real-world example in Babel where improving V8 means decreasing JSC’s performance.
12. Profiling & tools
Various remarks about profiling and devtools.
12.1 Browser gotchas
If you’re profiling in the browser, make sure you use a clean and empty browser profile. I even use a separate browser for this. If you’re profiling and you have browser extensions enabled, they can mess up the measurements. React devtools in particular will substantially affect results, rendering code may appear slower than it appears in the mirror to your users.
12.2 Sample vs structural profiling
Browser profiling tools are sample-based profilers, which take a sample of your stack at regular intervals. This had a big disadvantage: very small but very frequent functions might be called between those samples, and might be very much underreported in the stack charts you’ll get. Use Firefox devtools with a custom sample interval or Chrome devtools with CPU throttling to mitigate this issue.
12.3 The tools of the trade
Beyond the regular browser devtools, it may help to be aware of these options:
-
Chrome devtools have quite a few experimental flags that can help you figure out why things are slow. The style invalidation tracker is invaluable when you need to debug style/layout recalculations in the browser.
https://github.com/iamakulov/devtools-perf-features -
The deoptexplorer-vscode extension allows you to load V8/chromium log files to understand when your code is triggering deoptimizations, such as when you pass different shapes to a function. You don’t need the extension to read log files, but it makes the experience much more pleasant.
https://github.com/microsoft/deoptexplorer-vscode -
You can always compile the debug shell for each JS engine to understand more in details how it works. This allows you to run
perf
and other low-level tools, and also to inspect the bytecode and machine code generated by each engine.
Example for V8 | Example for JSC | Example for SpiderMonkey (missing)
Final notes
Hope you learned some useful tricks. If you have any comments, corrections or questions, email in the footer. I’m always happy to receive feedback or questions from readers.
If you’ve made it this far, I invite you to view The Castle.