Cadence Workflow and Clojure

What Cadence is, and how to use it with Clojure.

Introduction

A repository with some working code and implementation notes can be found here.

I had the occasion recently to investigate, and cursorily evaluate, a number of workflow orchestration systems for use on a project with which I was involved. One of those systems, Cadence, particularly appealed to me - there was something very Clojuresque about it; certainly something very suggestive of a functional language.

It has the concept of state durability (in workflow functions) that bears more than a passing resemblance to the persistent data structures of Clojure - but extended across time. This concept, similar to checkpoints, opens avenues to consistent, predictable restarts after failures. If one can restore the complete state of a system to a known good-state then one can continue as if the failure had never occurred. Of course, if system-wide (or even better, distributed) non-volatile RAM ever becomes a reality then Cadence would not be needed. This strikes me as essence of the problem Cadence is attempting to solve, or, at least, the gap it’s attempting to bridge. Cadence also allows, through activities, the use of non-persistent data structures which can be considered as being analogous to the concept of a side-effect in Clojure.

The separation of the functional from the side-effect-ing, and the elision of infrastructure and communication failure concerns leaves developers with simpler, almost always more tractable, domain logic concerns and significantly reduces the cognitive load. This is similar to the benefits often realized through the adoption of functional languages.

A Brief Tour to Cadence

Cadence is a workflow automation system developed by Uber. It shares many features with other workflow automation systems but differs by being uniquely fault-oblivious rather than merely fault-tolerant. The approach adopted by Cadence simplifies greatly the work of developers who are relieved of many of the burdens of coordinating activities and recovering from system or service failure.

Cadence is complex but three concepts core to its understanding are

  • The Cadence Service itself,
  • Workflow Workers and
  • Activity Workers

The Cadence service, backed by a persistent data-store such as Cassandra or MySql, is responsible for orchestrating the activities of both type of workers, for maintaining history, and in the case of failure, for recovering the state of all workflows (but not activities).

Conceptually, the Cadence service instructs a Workflow Worker to execute a Workflow function. The Workflow function, which implements business logic, is guaranteed by Cadence to be durable. That is, its state, including its thread stack and thread-local variables, are known and stored by Cadence, and in the case of failure they are restored.

Workflows, like the business processes they typically model, may be long-running. It’s not unusual for a real-world business process to take days or even months to complete, and Cadence provides excellent facilities to support such long-running processes within workflow functions. Therefore, the durability of the workflow functions (with the guaranteed recovery of their states across failures) enables a simple straight-line view of the business logic. This greatly reduces the complexity of the development process by reducing the burden on the developer to anticipate and mitigate all failure modes.

In order to be able to guarantee durability across failures Cadence places a number of restrictions on the code in Workflow functions. The code must be deterministic i.e. executing the code must produce the same result no matter how often it is run. Therefore, certain actions are forbidden within workflow code - examples being: interacting directly with external services, getting the time, getting random values, and creating or suspending threads.

These type of actions are fundamentally non-deterministic and would make full recovery of the workflow state impossible. However, the Cadence API provides alternatives for some of these that produce deterministic behavior; and which assure the recoverability of the function’s local variables, threads and state.

For situations requiring interaction with external services (the outside world), Cadence insists that all communication be conducted through Activities, using Activity Workers. Activities do not share with Workflows any of Cadence’s requirement that they be deterministic. Essentially anything is allowed in activities and any clean-up after failure becomes the responsibility of the developer rather than the Cadence service.

Conceptually, (but not precisely), a Workflow Worker will start an Activity Worker (or multiple Activity Workers) to interact with the outside world. Examples of an Activity might be interacting with a web-service, getting or saving a record to a database, or awaiting human input, such as a decision. Cadence offers no guarantees about activity state, and that state is not recovered in the case of failures of the Cadence infrastructure i.e. within the Cadence service itself.

In order to control a running workflow, or to affect its state, it can be signalled using events delivered by Cadence.

Cadence & Clojure Challenges

The signature of the worker registration function is registerWorkflowImplementationTypes(java.lang.Class<?>... workflowImplementationClasses) and in the documentation there is the note

The reason for registration accepting workflow class, but not the workflow instance is that workflows are stateful and a new instance is created for each workflow execution.

What’s not noted, but implied, is that the constructor for the classes must have zero-arg constructors. This is problematic for Clojure as instance variable declared in deftype will create on constructor taking exactly that number of instance variables as arguments.

You might then consider inheritance of the deftype-d class to workaround the zero arg constructor issue leaving a cleaner, more Clojure-esque result.

However, although deftype can create a Java class with the fields you need, by default these fields are immutable; but you could use :volatile-mutable to allow the fields to be settable. Unfortunately, the bigger problem is that the generated class is public final which effectively eliminates the possibility that we could use the class as a base class in gen-class.

This might have been helpful as we could define a zero-args constructor in gen-class and then using the :constructors field map that constructor to the base class constructor and then assign default values to the field in the :init method. The fact that the deftype-ed class is final eliminates that approach.

Working Cadence & Clojure Code

In order to fully investigate using Clojure with Cadence I developed a small set of demos to demonstrate how it works, works around what doesn’t, and exercises the result. Very little consideration was given to making the code more idiomatic, at least from a Clojure perspective, or even particularly effective. I only making the repository available as it may prove helpful to others who would like to use Clojure with Cadence.

The repository also contains further notes on the implementation and lessons learnt.

What’s Next?

As time allows I’ll probably return to the code, making it more idiomatic. But do let me know if you find it helpful, or share your suggestions for improvement.

Edit this page

Kieran Owens
Kieran Owens
CTO of Timpson Gray

Experienced Technology Leader with a particular interest in the use of functional languages for building accounting systems.

comments powered by Disqus

Related