22 October 2024

Zen and the art of software engineering

I’ve found over the years that there are two sides to programming. The first side is computer science, and it’s about data structures, algorithms and CPUs. The second side, I call software engineering. This side is about creating readable and maintainable codebases that can evolve over time, and just like Quality in Zen and the art of motorcycle maintenance, is hard to quantify, but easy to qualify: you know it when you see it. This post is a counterpart to Optimizing JS for fun and for profit, where instead of focusing on the performance part of code, I explain the various practices that have helped me create readable and maintainable code.

1. Constraints

My last post on Efficient Typescript touched a bit on the subject, but I cannot emphasize enough how much establishing constraints is an essential part of maintainable code. Constraints give you the possibility of making assumptions later on that will simplify your codebase.

A common way to spot if you are not strict enough is that you often end up adding a lot of defensive checks, for example:

function findUser(id) {
  // avoid SQL injection
  if (typeof id !== 'number')
    throw new Error('ID is not set')
 
  return db.select(`SELECT * FROM users WHERE id = ${id}`)
}

What this block of code should really look like is this:

function findUser(id: number) {
  return db.select(`SELECT * FROM users WHERE id = ${id}`)
}

Now yes, typescript can only help you so much, and you’ll sometimes need to add runtime validation checks. That’s ok. But there’s only ever two places where there should be runtime validation: user input and network data. Eveything else is part of a machine that is fully in your control. The machine can accept fuzzy invalid user-input (and fail appropriately), but it should not contain any fuzziness in its insides. No loose screw, no duct tape. Validate your inputs, and then establish ruthlessly strict constraints for how the inside of the machine works.

But those contraints can also be external. Early internet pioneer Jon Postel said “be conservative in what you send, be liberal in what you accept”. I’m sorry Jon but I don’t buy this one. By being liberal in what you accept, wouldn’t you be encouraging others to build unsound systems? By building a “robust” system that accepts unstrict input, aren’t you encouraging the ecosystem that depends on you to not be robust? In the name of immediate gains, aren’t you condemning the future internet to pay the cost of maintaining the early flaws forever? I say be strict in what you accept; make your API options limited to the essential; find the best way to do a thing, and don’t expose a second way to do it. Yes I’m sure you have an exception to that. Maybe you’re right, maybe you’re not. There are no quantitative rules here, only qualitative judgements.

1.1 Empty value

Another common way in which the lack of constraints manifests itself is frequent null/undefined checks. There is a concept in programming called the empty value, and remembering these two questions helps improve constraints:

What is a good empty value for this type?
Do I need an empty value?

A few examples of empty value:

The empty string ('') for string type
The number -1 for array index positions
null or undefined in conjunction with any type
NaN for float values
An empty array ([]) for any array type

I see too often people defaulting to null/undefined as a way to signal that the value hasn’t been set yet. But do you really need an empty value? An example.

type Dimension = { width: number, height: number }
 
// This is our component's dimensions. It's
// calculated on the first render, so we don't have
// a value for it yet. Let it be null for now.
let rootDimensions = null as Dimension | null

It might not be immediately apparent, but the consequence of picking null as the empty value is that it condemns every single subsequent section of code that uses rootDimensions to add null checks and default values:

let sidebarWidth = (rootDimensions?.width ?? 0) * 0.20
//                                ^^^^^^^^^^^^
// This is going to be repeated *each* time `rootDimensions` is used.

Whereas initializing the dimensions with a monomorphic value negates the need for later null checks:

const EMPTY_DIMENSIONS = { width: 0, height: 0 } as Dimension 
 
let rootDimensions = EMPTY_DIMENSIONS // no more `| null` needed!

1.2 Conclusion

So the question I always ask myself is, which constraints can I place on my system to ensure the least effort needed down the line? Which nullable values can you get rid of? Which APIs can you simplify?

2. Semantics

Code is the halfway point between how a machine needs data to be formatted to be executed and how a human needs data to be formatted to be understood. Writing code is both being able to make it correct and performant, and writing it in a way that conveys the proper semantics to the next person who will read it.

For a machine, a value is a number.
For a human, a value has a meaning.
For a programmer, a value is the union of both.

Use a value not only because it’s the number you need, but also because it has the right meaning. A perfect example of the application of this principle is being able to re-assign a variable to a new variable without changing its value, simply because we need a new meaning for it. To take an example from a codebase I’ve worked on lately:

const isVirtualRow = rowIndexInPage === virtualRowIndex
const isNotVisible = isVirtualRow

Introducing new names and concepts is as important as introducing new values. You need to code for the right balance of computer/human understanding. If you go too far, it becomes harder for the other side to process it.

If you find yourself struggling to find a name for a new concept, take time to pause and figure one out. The name you pick for that concept—be it a variable, a function, or the main class your whole application will depend on—will have an influence on how you and future programmers understand and solve the problem domain. If you don’t have a good satisfying name, give it your best try but more importantly come back to refactor it as soon as you have understood the concept better.

My all-time favourite programming quote summarizes well how important naming is:

Take time to find good names and take time to re-factor those names as much as necessary. As a wise stackoverflow user once said, the process of naming makes you face the horrible fact that you have no idea what the hell you’re doing.

― A reddit user

So which names can you refactor?

Declarative programming

Writing code in a declarative way is also very helpful for readability, and functional programming in particular is very suitable for this. Take for example these two versions:

let result = 0
for (let i = 0; i < numbers.length; i++) {
  let n = Math.round(numbers[i] * 10)
  if (n % 2 !== 0) continue
  result = result + n
}

const result =
  numbers
    .map(n => Math.round(n * 10))
    .filter(n => n % 2 === 0)
    .reduce((a, n) => a + n, 0)

The second one tells you right away the operations that it’s going to do: a transform (map), a filtering pass, and then a reduction to a final value. To get all that information, you only have to pick up the map, filter, and reduce symbols. Whereas for the first version, you need to read the full code to pick up that information.

Using FP and in general any shared programming vocabulary (e.g. design patterns) is a good way to communicate what the code is doing in a short and elegant way. By default, write code with the conventions that are already in use by other programmers.

And yes, that example is the inverse of Optimizing JS: Avoid array/object methods. One version isn’t better than the other. Often, programming is about picking which of performance or maintainability is more important.

3. Beauty

Some would have you think that software engineering is a cold discipline, nothing could be further from the truth. Writing software is an art, it’s composing a virtual machine with logic constructs made visible with written symbols, symbols that communicate with the next human who will read them. Which of us hasn’t known the joy of completing a function, a module, with the knowledge that it runs with the utmost elegance, efficiency and simplicity that we could ever come up with? Material-world artisans can build wonderful contraptions made of lights, rolling balls, levers and pullies. So can we, but ours are made of logic constructs, invisible to the untrained eye.

The way we write softare has a huge influence on how it is perceived by others. The brain is a pattern-matching system that relies on recurring patterns to ease its processing. Inserting symmetry, spacing and alignment—in other words, beauty—in our code makes it easier to read. For example, I initially wrote these words you saw sooner in this format:

For a machine, a value is a number. For a human, a value has a meaning. For a programmer, a value is the union of both.

Although the conventional way to write paragraphs is the one above, I know that inserting newlines before the “For” would break the text in easier to consume bits, as the brain can see the recurring “For”, the somewhat aligned “a value…”, and run its circuits in harmony:

For a machine, a value is a number.
For a human, a value has a meaning.
For a programmer, a value is the union of both.

Writing code isn’t just about semantics, it’s also about writing it in a way that is pleasant to the eye. Here are a few patterns that I frequently follow.

Building pyramids

I always sort code lines this way, and I call these pyramids. Tell me, which one of these blocks seems easier to read for you?

{
  hasUninitialized: items.some(i => i.status === Status.Uninitialized),
  hasReady: items.some(i => i.status === Status.Ready),
  hasInactive: items.some(i => i.status === Status.Inactive),
  hasPending: items.some(i => i.status === Status.Pending),
  hasCompleted: items.some(i => i.status === Status.Completed),
}

{
  hasReady: items.some(i => i.status === Status.Ready),
  hasPending: items.some(i => i.status === Status.Pending),
  hasInactive: items.some(i => i.status === Status.Inactive),
  hasCompleted: items.some(i => i.status === Status.Completed),
  hasUninitialized: items.some(i => i.status === Status.Uninitialized),
}

When I read the second, the eye can flow from one line to the next, harmoniously.

Aligning names & symbols

I also find it particularly important to align sub-sections of identifiers (e.g. common prefixes), as well as common symbols like = assignements.

const DEFAULT_ROW_GROUPING_STRATEGY = 'default'
const DATA_SOURCE_ROW_GROUPING_STRATEGY = 'data-source'

const ROW_GROUPING_STRATEGY_DEFAULT     = 'default'
const ROW_GROUPING_STRATEGY_DATA_SOURCE = 'data-source'

Although the names sound more natural when pronounced out loud for the first block, they’re much easier to read as a whole in the second one.

Aligning for logic

Another way in which alignment can be helpful is when we need have operations on mostly similar expression, with slightly differing semantics. The example below is from a python codebase I worked on, where we split a dataframe into two sub-sections. In this particular case I had to fight the auto-formatter (and colleagues) to have it formatted properly.

// code as the auto-formatter wanted it
average = [df > stats.average, df <= stats.average]
q1 = [df > stats.q1, df <= stats.q1]
outliers = [df > stats.outliers, df <= stats.outliers]
even = [df % 2 == 0, df % 2 == 1]

// code as I wanted it
average = [
  df >  stats.average,
  df <= stats.average,
]
q1 = [
  df >  stats.q1,
  df <= stats.q1,
]
outlier = [
  df >  stats.outlier,
  df <= stats.outlier,
]
even = [
  df % 2 == 0,
  df % 2 == 1,
]

Which of those block would you prefer to code review? The auto-formatted one, or the manually formatted one? For all their benefits, full-code formatters like prettier have gotten many programmers convinced that even talking about formatting and alignment is a negative, best left to machines to deal with. Code is not just about machines. It’s about humans.

I don’t want to have to argue if the { should be after the if or on a line of its own. eslint can do that well enough, no need to let prettier destroy beauty. Here is an example from some color code, in typescript:

// code as prettier wants it
export function newColor(r: number, g: number, b: number, a: number) {
  return (r << OFFSET_R) + (g << OFFSET_G) + (b << OFFSET_B) + (a << OFFSET_A);
}

// code as it should be
export function newColor(r: number, g: number, b: number, a: number) {
  return (
    (r << OFFSET_R) +
    (g << OFFSET_G) +
    (b << OFFSET_B) +
    (a << OFFSET_A)
  );
}

Spacing for clarity

And if the last examples I went for using more vertical space, the opposite can also improve readability.

switch (priority) {
  case DiscreteEventPriority:
    priority = ImmediatePriority;
    break;
  case ContinuousEventPriority:
    priority = UserBlockingPriority;
    break;
  case DefaultEventPriority:
    priority = NormalPriority;
    break;
  case IdleEventPriority:
    priority = IdlePriority;
    break;
  default:
    priority = NormalPriority;
    break;
}

switch (priority) {
  case DiscreteEventPriority:   priority = ImmediatePriority;    break;
  case ContinuousEventPriority: priority = UserBlockingPriority; break;
  case DefaultEventPriority:    priority = NormalPriority;       break;
  case IdleEventPriority:       priority = IdlePriority;         break;
  default:                      priority = NormalPriority;       break;
}

The amount of spacing to use for code follows the same rule that designers use for adding spacing in their designs: use your eyes. They will tell you if you need more or less. They will tell you if what they’re seeing is beautiful or not.

4. Simplicity

I think if I had to summarize software engineering in one expression, it would be managing complexity. Most of our work as software engineers is keeping complexity in check with all the tools we have. One of the characteristics that code should have above all is simplicity. Simple code is easy to read, easy to modify and easy to debug. Code that isn’t simple is much harder to also make correct and performant, thus we should always aim for simplicity first.

In fact, if features are an asset, code is a liability. A general rule, and I think everyone will agree with me, is that the more code you have, the more bugs you have. So a software product in theory should aim to have as many features as possible, while having as little code as possible. That’s why one of my favourite activities as a programmer is deleting large sections of code.

In practice, in the front-end world this manifest itself as implementing features using browser APIs and CSS instead of re-building existing features in javascript (please no more horror like CSS-in-JS ever again).

5. Pleasure

As a last point, I think I need to mention that the root of all qualities in software engineering is pleasure. You write good, elegant, beautiful, simple code because you find pleasure in writing software. If you don’t find pleasure in it, you will never put the effort to make any of the above qualities emerge, and you won’t find joy in your craft.

Conclusion

I hope I was able to pass along some of the zen of programming to you. I can’t claim credit for any of the points here as they’ve all been transmitted to me throughout years of programming. I unfortunately can’t remember where I picked up each of those ideas, but I think I owe at least some of them to Uncle Bob, Joel on software and so many others.