Software Projects and Asynchronicity

11 min readOct 1, 2020

Everything Is Great-ish

I have been writing code for a few decades. It still feels fresh and the prospect of a day at work can still drive me out of a warm bed on a cold morning. There is no grumbling and no conscious effort. Sometimes I look at that first coffee and wonder how I got from the bed to the desk.

Programming languages have arrived and departed. Across the decades it does start to feel a little repetitive. Writing the same control flow construct in the latest, slightly different syntax does become slightly less exciting.

Another truth that has revealed itself over those decades; the job of making a software product has relentlessly become more and more complicated. The early digital journalists in magazines like Byte and Dr Dobb’s Journal, wrote about spaghetti code, DLL hell, SIGSEGVs and blue screens of death.

The magazines may have been casualties of the global dismantling of traditional media, but the problems haven’t gone away. If they seem less of a threat, it has to be because of the wonderful variety of distracting new problems that exist in an operational environment as vast as the Internet.

How Fluent Is Your Bambara?

Writing utilities in C is not like writing a web app and backend combo for an online service. One metric of the difference is that the former involves a single development language, while a recent personal example of the latter involved somewhere between 4 and 8. The final number would depend on whether you count small, declarative languages like Protobuf as programming languages. They don’t “execute” and they make no pretense to being “Turing complete”. However, they do have syntax, compilers and errors to understand.

A friend who works as an English-Japanese interpreter once tried to describe the difference between translation and interpreting to me. She enjoyed the human side to interpreting but admitted that a full day left her drained, in direct contrast to the focused, self-paced nature of a translation job. A full day of talking can leave anyone drained, but with interpreting there is also the constant switching — the need to engage different areas of the brain.

Working with multiple development languages has to incur a similar cost. Of course, it doesn’t sound as good when you mention it to your friends. In social circles, fluency in English, French and a smattering of related West African languages probably marks you as exotic. In those same circles, fluency in C++, Python, Go and a smattering of DSLs marks you in a different way — like wearing a tinfoil hat does.

It’s hard to imagine any modern, web-based service that could be implemented using a single language — “best tool for the job” has become the mantra behind the addition of yet another language to a project. In this context languages are often the gateway to required capabilities, such as Python for machine learning.

Adding a language to a project has wide-ranging ramifications. It adds at least one new process to the operational product and until you have all the aspects of multi-process development running smoothly, you have also added an enduring source of problems and a project risk.

Problems, What Problems

By splitting a design or operational product into multiple processes, you immediately assign yourself at least the following headaches;

process management,
effective messaging,
asynchronicity.

Between them, these issues affect everything from coding style through to most of the development processes, such as the edit-compile-debug cycle, automated testing and deployment. In other words they affect every hour of every working day.

These issues are non-negotiable — they are innately bound with multi-process architectures. Adopting Python into a Golang project for its support of machine learning comes with the following requirements;

multi-process setup of the edit-compile-debug cycle, testing and deployment, for combinations of Python and Go executables,
error-free messaging between Go and Python executables and their respective type systems,
Go and Python applications that are capable of receiving events from each other, at any time.

Any thoughts of avoiding the related work will likely lead to additional project costs and product shortcomings described in the following sections.

Process Management

Of course, the presence of multiple languages is not the only reason for multi-process architectures. Further reasons include concurrency, separation of concerns and physical separation, e.g. where parts of the product must run at different locations. The pairing of browser-based apps and backend services are ubiquitous examples of the latter. Put simply, few software projects dodge the need for multiple processes and that means the headaches prevously listed are familiar to you.

Development involves the repeat execution of the product software. This happens in different configurations and on different machines, in a progression that starts with the edit-compile-debug cycle and ends with the production deployment. Every developer knows the quantity of work involved in the setup and update of these different configurations; the complete picture has become so complex that we now have specialized roles such as deployment engineers.

We only need to look to related tools such as nodemon and Kubernetes for further evidence that process management is a big deal. The former automates the edit-compile-debug cycle for Node.js applications, while the latter if often used for production deployments. Multi-process debugging capabilities such as those offered by vscode are also relevant.

As useful and amazing as these tools are, they are each focused on parts of the wider puzzle. As developers we are responsible for stitching all of these tools together into a functional development environment.

Effective Messaging

Consider a typical online service. An instance of such a service may extend from a browser-based app, to an HTTP-based backend API, to a proprietary RabbitMQ-based service and finally to a database engine with its own query language. In the larger projects, each link in this chain may be the focus of a separate team.

Maintaining integrity of data within this operational landscape is a special challenge. Problems associated with the related edge cases can occur at any time and exhibit the strangest behaviours. Time values can turn schizoid after crossing timezones and big integers can lose bits when they are squashed into smaller integers.

Ensuring these things dont happen usually relies on the mental discipline of developers, constantly checking the correctness of their use of an API. With each link in the chain involving different networking libraries, different messaging libraries (i.e. for encoding and marshalling) and different languages, the potential for a few bits to go astray is undeniable.

It is impossible to cover the full scope of pitfalls waiting for the developer working with network APIs. From poor documentation of 3rd-party APIs, through to unsophisticated messaging libraries that don’t know how to transfer a map of customers. Possibly the worst aspect of these pitfalls is that — unless deliberate techniques are involved —they are uniquely repeated at each link in the chain.

Without dedicated attention, integrity bugs are inevitable and a regular diet of those bugs is not good for anyone. A sign that things have reached a turning point is when you catch yourself in the mirror, and you see this looking back at you.

The strongest strategy for maintaining data integrity across a collection of different processes and different programming languages is to use a canonical reference for all messages, and tooling that can generate the source files needed for each language in use — Protobuf is one such toolset. Protobuf also includes the marshalling and unmarshalling code needed for each language. In a sophisticated development environment, a change to the canonical reference can be propagated to all affected processes with a single command.

Other strategies are used such as treating each network interface as its own data authority. A database engine such as Neo4j with its own query language, is an example of this strategy. All Neo4j clients must perform their own data transformations on every interaction with the database.

RESTful interfaces that document their APIs as a set of names and JSON body definitions represent another strategy. The simple nature of these APIs is one probable reason for their popularity though this strategy can leave many areas undefined. The JSON specification defines integers and floats to be certain sequences of digits and special characters. Minimum and maximum values are not an issue at this lexical level whereas they are obvious issues as numbers travel between language runtimes such as those of Python and C++. Python supports integers of unlimited size whereas natively, C++ does not. In a more general sense, these JSON-based APIs are missing the marshalling and unmarshalling that is present in other strategies like Protobuf. The application is forced to work with data that is effectively an on-the-wire representation, or provide its own transformations.

On top of these more obvious messaging concerns, there is the issue of versioning. Loosely speaking, versioning of messages allows processes to detect the age of the process at the remote end of a connection. There is the assumption that processes of different ages are using slightly different sets of messages. Mismatches may activate dedicated version support or immediate shutdown of the connection. Either of these responses is preferrable to the undefined behaviour that is likely if the processes were to continue.

Deferring the issue of message versioning will keep your software simpler in the short term — possibly getting your product to market more quickly. It also guarantees on-going pain; creating bugs, complicating deployment and affecting service availability.

Lastly, the messaging capability between your processes must be asynchronous. Having pushed through to this point, that’s probably not what you wanted to hear.

Asynchronicity

Perhaps the most pernicious item on the list of headaches is asynchronicity, or the state of being asynchronous. Being asynchronous — or event-driven — is an aspect of software that is known about but also associated with less popular styles of coding (i.e. state machines). Consequently, many projects facing this issue turn to simplifications.

Illustration of the issue can be found in a popular design for online services. An online service will involve a browser-based app and a backend API service. Somewhere in the background there is also a database server. The first two processes will agree on a RESTful API over HTTP. The app can then adopt the traditional RPC model for it’s interactions with the backend service.

The app is focused on the needs of the user. The user enters data into screen widgets and clicks action buttons. Requests are sent to the backend API and responses are received. The app is always waiting for a keyboard entry, a mouse click or a backend response.

The RPC abstraction is used as a facade to keep the minutiae of network communications at a more comfortable distance, leading to better development velocity.

Now consider the operational loss of the database server — a change of state that deserves global notification. There is no way of notifying a browser-based app that is focused on a user action or backend response — it’s not listening. In fact, any app written in the RPC coding style will ignore all attempts to warn it of imminent danger.

There is potential for it to discover the loss on the next request, but in the meantime everything has the appearance of being just fine. Similar examples of the problem can be found where the browser-based app is displaying a representation of system information, such as a table of objects, and there is a significant change of state in one of those objects.

There are a variety of solutions to this problem such as long-polling and upgrading the HTTP connection to a websocket connection. I have worked in projects using both of these solutions and in all cases they were additional to a primary RESTful API, using a completely separate connection.

In effect, the additional connection serves as an event channel, receiving notifications and dispatching them to different parts of the browser-based app. This design is a hybrid that lies in between a pure RESTful design and a fully asynchronous design.

Of course, the real issue is not resolved. The additional connection only succeeds in moving the clash of RPC-based apps and events from the server end of the connection to the client end. There is still the issue of how to inject the received events into apps that remain committed to their RPC style of processing.

Convincing an RPC-based app to respond to events can be difficult work, with the ever present feeling that you are pushing something in a direction it doesn’t want to go. Initial success is rewarded with requests to upgrade further areas of the app to accept events. At some point you realize that eventually you will have rewritten the entire app to be asynchronous and the unpalatable thought follows — why wasn’t the asynchronous model adopted in the first place and how are we going to rid ourselves of this redundant hybrid plumbing? You consider your options.

Taking A Look At What We Do

For someone that is relatively comfortable with writing asynchronous software, this phenomenon of projects that persist with the RPC model is frustrating. It’s as if the wider development community cannot face the plain truth of RPC — it is not fit for purpose in modern, multi-process software.

Our ability to paper over this fact might be brushed off as a case of justifiable cultural inertia. However, the downstream costs are real and the basis for real solutions such as websockets, has been around for nearly ten years.

Some pragmatic reasons for the delay are the lack of good tooling and the learning curve associated with writing asynchronous code. The lack of explicit support for development of state machines in our most popular programming languages and IDEs, only makes that curve steeper.

Another reason for the delay is our own propensity to fix things, to find solutions to problems. In a small percentage of cases those solutions will be less than ideal but better than the alternative of no solution at all; too many of those can mean the end of a project. In this context, asynchronous development is perceived to be the intractable problem and solutions such as the separate event channel, are perceived to be the engineering marvels that save projects. They are marvels — I have seen thousands of lines of difficult code written in stressful circumstances to inject asynchronous events into browser-based apps. We should applaud the skills and determination involved but not the continued presence of RPC. In the associated end-of-project celebrations RPC accepts praise in a passing conflation, when the ironic truth is that it was at the root of so many woes.

We need to move beyond RPC for inter-process communications. It is an abstraction that had an early appeal based at least partly, in avoidance. Networks and multi-process architectures are not procedural environments and when we finally accept this fact, our working days will become easier.