Convert an Excel Workbook to Clojure Code
An approach to restating Excel formulae and cell dependencies as a collection of connected Clojure expressions that can be evaluated in the context of the values in the Workbook, or in the context of values supplied by Clojure.
Introduction
Developing software in the Accounting & Finance industry has some peculiar considerations that probably don’t exist in other industries. An active and ambitious engineering team’s incumbent competition is often not some big, legacy, bespoke system that needs to be replaced or adjusted, while incorporating new features and capabilities: but is often an Excel workbook (or even a number of connected workbooks) where business logic, rules and calculations are captured, and which continuously evolves in an almost organic way.
As new needs are identified by the business, an analyst or accountant will fire up Excel, pruning and grafting as necessary, and quickly produce a new version incorporating the business' new requirements.
It’s a low-friction and efficient approach. The user with the best insight and information works in a tool with which he or she is completely comfortable. That user knows the inputs, knows the calculations, and knows the desired outcomes. What could be better?
It’s a steep hill for an engineering team to run up to convince anyone that what should properly happen is that the financial-analyst should involve a business-analyst to document the business' requirements, get sign-offs from various stake-holders, schedule some time with the product and engineering teams, who will at some undetermined time in the future produce an artifact that will, most likely, be outdated on the very day on which it arrives.
Yes, that’s a very steep hill!
However, there are a number of very sound reasons to approach any maturation or evolution of critical business systems in this precise manner.
The legal and compliance teams get a little “twitchy” when calculations and data that drive critical business decisions live exclusively in constantly evolving Excel workbooks. How do they verify correctness? How can they systematically evaluate the impact of any changes on business operations? How do they even know that the ex ante and ex post outcomes driven by calculations in Excel workbooks differ, and by how much?
These are difficult questions to answer and make it hard to evaluate operational risk, often requiring the business to engage in complex hoop-jumping in order to avoid the outcome of crippling innovation and evolution, yet remaining committed to the concepts of sound operational governance. The approaches are many and varied - version control; manual and automated validation; automated forecasting and hindcasting with known inputs and expected outputs. The list goes on.
A better approach might be to let the analysts and “business-heads” just go about their business as efficiently as possible, continuing to use Excel as the modelling tool, and only when they are satisfied involve the software developers.
If, at that point, it were possible to convert to workbook to source code then the business would be in a better position to assess the impacts of any changes, whether they be in the mode of operation (i.e. calculations) or outcomes (i.e. the results of those calculations).
A Test Workbook
Let’s construct a relatively simple Excel workbook that records the scores attained by four individuals on two teams, calculates the totals for each team, and indicates which team won based on which team had the highest score.
The Workbook uses only the Excel functions SUMIF
, MAX
and IF
, but the approach would work in concept for any workbook, and is useful for illustrative purposes
Excel as a Graph
An Excel workbook can be expressed as a DAG (directed acyclic graph) where each node is connected to the node(s) on which it depends and that depend on it. Even a node without dependent nodes (e.g. a simple =1+1
) can be incorporated in the DAG by connecting it to a synthetic node ($$ROOT
) at the root of its worksheet, and that node connected to another singleton node representing the root of the workbook ($$ROOT
).
If we process the workbook using the code in the accompanying repository, we can produce a visualization of the DAG as follows:
(-> "SIMPLE-1.xlsx"
(explain-workbook)
(get-cell-dependencies)
(add-graph)
(connect-disconnected-regions)
(:graph)
(uber/viz-graph))
We can see from the DAG produced by the code, that the value of cell B8
(reading left to right) depends on the values of cells C4
, C5
, C3
, B4
, B2
, B5
, C2
, B3
, and A8
, and that its value is also a dependency of cells C8
and C9
. This is consistent with the precedents and dependents information that Excel reports, shown in Fig 1.
We can also produce a visualization of the DAG that includes each node’s attributes by using
(-> "SIMPLE-1.xlsx"
(explain-workbook)
(get-cell-dependencies)
(add-graph)
(connect-disconnected-regions)
(:graph)
(uber/viz-graph :auto-label true))
which will display
The graph, therefore, encodes all the information of the Excel workbook as a convenient Clojure data structure, which can be evaluated or manipulated as required.
A cell’s node attributes contains a :value
key storing the cell’s value as calculated by Excel, and if it’s a formula cell, the text of the formula in the :formula
key.
So far, so good. Now, the next step is to convert the Excel formulae to Clojure code, and to provide a way to evaluate that code in the proper order, as dictated by the DAG.
The code below can be found in a Github repository, which contains significantly more detail about the approach taken and the implementation than can be conveniently covered in this post.
The Excel Graph as Code
Using the data-structure calculated in this way, we can then convert each cell’s Excel formula to Clojure code and recalculate the workbook in our Clojure REPL according to the DAG: first using the values as supplied in the workbook in order to validate that the conversion results in the same calculated values; and then using other input values to test alternative input scenarios and the workbook’s responses to those inputs.
This provides a convenient way to both serialize the workbook as Clojure code, and to evaluate the workbook’s responses to different inputs.
If we run the following from a REPL,
(-> "SIMPLE-1.xlsx"
(explain-workbook)
(get-cell-dependencies)
(add-graph)
(connect-disconnected-regions)
;; recalculate the workbook according to its dependencies,
;; but using clojure code to perform the calculations
(recalc-workbook "Scores")
(simplify-results))
it produces the following output showing the Clojure code generated from the Excel formula (in :clj-code
); the value as calculated by Excel (:excel-value
); and, the value produced by the evaluated Clojure code (:clj-value
).
[{:cell "Scores!B9"
:formula "SUMIF(B$2:B$5,A9,C$2:C$5)"
:clj-code (functions/fn-sumif
(eval-range
"Scores!B$2:B$5")
(eval-range
"Scores!A9")
(eval-range
"Scores!C$2:C$5"))
:excel-value 230.0
:clj-value 230.0}
{:cell "Scores!B8"
:formula "SUMIF(B$2:B$5,A8,C$2:C$5)"
:clj-code (functions/fn-sumif
(eval-range
"Scores!B$2:B$5")
(eval-range
"Scores!A8")
(eval-range
"Scores!C$2:C$5"))
:excel-value 220.0
:clj-value 220.0}
{:cell "Scores!C9"
:formula "IF(B9=MAX(B$8:B$9),\"WINNER\", \"\")"
:clj-code (if
(functions/fn-equal
(eval-range
"Scores!B9")
(functions/fn-max
(eval-range
"Scores!B$8:B$9")))
(str
"WINNER")
(str
""))
:excel-value "WINNER"
:clj-value "WINNER"}
{:cell "Scores!C8"
:formula "IF(B8=MAX(B$8:B$9),\"WINNER\", \"\")"
:clj-code (if
(functions/fn-equal
(eval-range
"Scores!B8")
(functions/fn-max
(eval-range
"Scores!B$8:B$9")))
(str
"WINNER")
(str
""))
:excel-value ""
:clj-value ""}]
You can see that the values calculated by evaluating the workbook in the REPL are precisely equal to the results calculated by Excel itself.
Also, by manipulating the underlying data-structure returned by explain-workbook
it is possible to test the responses of the evaluation to different inputs. For example, it is possible, using Clojure, to update the scores for each player and then to recalculate the workbook using those values.
Conclusion
The above demonstrates the feasibility of using Clojure’s inherent ability to manipulate code data-structures to convert an Excel workbook to Clojure code, and to provide a sound rationale for allowing analysts and accountants to continue to use the tools with which they are most familiar, and to then subsequently take the fruits of their labors and convert them to a form that’s systematically comparable, versionable and testable.