Friday, April 29, 2016

Excel calculations in F# - The Code, Part I

We’ve gone through the basics now — we want to do calculations in Excel, and we’re going to use ExcelDNA and F# to do it. The question now, is how.

I mentioned last time that our function must be able to deal with expressions that are already valid in Excel. If I write

=Calculate(expression)

and Excel knows what that expression is (i.e. give us all the intellisense and highlighting and ability to move boxes around to select cells), then we should be able to figure out what to do.

I decomposed this problem into three separate steps, and the top-level F# function reads as you see below. This, along with all the other code, can be found on Bitbucket.

let Calculate expr handle = 
    expr
    |> GetFormulaFromCell
    |> CalcFormulaFromText
    |> GetReturnValue handle

Overall, we can see how the functional-first style of F# leads us to:

  • prefer function composition
    • here we use the forward pipe operator |> which allows us to put the final argument to a function after the function rather than before, and is defined as let (|>) x f = f x
  • omit type declarations
    • there isn’t a single type declaration anywhere in my function, but hovering over it tells me the signature is 'a -> string -> obj, meaning a curried function that takes a value of any generic type, and a string, and returns an object
  • work naturally with Excel’s model.
    • by returning obj we are giving ourselves the capability to product a result of any type and then downcast it to be displayed by Excel, and by accepting a generic input we can handle text and references as well as raw numbers.

Ignoring the ‘handle’ variable for a second, we’ll briefly look at what each step entails:


GetFormulaFromCell

More composition and pipelining!

let GetFormulaFromCell expr = 
    expr
    |> GetCellText
    |> GetFormulaFromText

I hope that the general idea of what this function is trying to do is clear. It’s trying to get the actual formula that we want to calculate from the calling cell, and it has to do this in a slightly roundabout way — first, it gets the raw text of the calling cell, and then it gets the formula part of the cell.

Why do it like this? Unfortunately, Excel is too good sometimes. If I pass in =Calculate(1+1), then Excel will first do it’s own internal calculation of 1+1, and then call my code with the result: =Calculate(2). Not much use when you’re trying to take the thing in brackets and calculate it yourself!

To get around this, we have to go deep into Interop territory. Getting the raw cell text from the calling site is actually fairly painless once you’ve done it — we can get the reference to the caller, the application, and the active worksheet using ExcelDNA:

let callerRef = XlCall.Excel(XlCall.xlfCaller) :?> ExcelReference
let application = ExcelDnaUtil.Application :?> Application
let sheet = application.ActiveWorkbook.ActiveSheet :?> Worksheet

From this point we can get the formula:

let range = sheet.Cells.[callerRef.rowFirst+1, callerRef.columnFirst+1] :?> Range
let cellText = range.Formula.ToString()

As always I’ll ‘say what I see’ with regard to the code snippets and F# style:

  • The general way in which we interact with Excel is clean, and similar to the C# equivalents
  • Casting feels different: the dynamic downcast operator :?> appears all over the place, which is a byproduct of the lack of the dynamic keyword that is used in the equivalent C# code.
  • Although I haven’t shown it yet, dependency injection was painless. The way I abstracted this section of Excel-dependent code using a discriminated union was such a pleasant contrast to the interface-oriented way in C#

We know have the raw formula text of the cell, in this case =Calculate(expression). The second step allows us to get expression out of this, and looks pretty simple (getting the startOfSubstring and endOfSubstring values is also easy):

let GetFormulaFromText(text : string) = 
    text.Substring(startOfSubstring, text.Length - endOfSubstring text)

Now, at least, we have our expression!


CalcFormulaFromText

This is where the meat and bones of the work is done. I’m going to leave you hanging on this one: it’s sufficiently detailed and fun to merit its own post, which will follow this one. I will say now that it encompasses most of the topic we’ve already covered in F#, and a couple more such as active patterns and a little-known Dijkstra algorithm…


GetReturnValue

Once we’ve decided what the result of the calculation is, putting it back on the spreadsheet still requires a bit of work.

At this point I’ll return to the handle variable that I ignored earlier. Looking at the code now, it might be clearer what it’s purpose is: it’s simply a container for our result in the case that it refers to a complex object which we are unable to show in a cell.

type Result = 
    | ValResult of float
    | HandleResult of string
    | ValListResult of float list

let GetReturnValue handle result  = 
    match result with
    | ValResult v -> v :> obj
    | HandleResult h -> h :> obj
    | ValListResult vs -> Handler.StoreObject(handle, new InternalCaller(), vs) :> obj

This shows a number of features that we’ve already seen, but are worth repeating:

  • The combination of discriminated unions and pattern matching is formidable. Here, it gives us a hook to be able to deal with any result object that we want to have, and give it back to Excel
    • For example, we could add a branch that dealt with returning 2D arrays
    • Alternatively, as we do here, we store the result in memory using a simple object handler that has a backing Dictionary<string, object>.
  • We’ve had to cast to obj again, as we did when interactive with Excel earlier in the post. This is part of F#’s type system: a pattern matching statement must return the same type in each case, and we need to explicitly say that any old obj will do.

Next Time

We get to do some geeky stuff with parsers and algorithms and stuff.

The work so far has given us a pretty generic blueprint: we need to write a function that takes in a generic expression and returns a Result, which is a type over which we have the ability to add more cases.

Sunday, April 17, 2016

Excel calculations in F# - Tools and Requirements

So we want to be able to do better vector arithmetic in Excel.

As this is primarily about doing something in F#, that means we need some way of interfacing with it from .NET code. Enter Excel-DNA. For those unfamiliar, this is a fantastic open-source tool that let’s you create Excel add-ins, and I highly recommend reading their blurb.

Getting it up and running really is as simple as it suggests, which leaves us plenty of time to think about our requirements. At the core, we want to be able to write the following in the Excel formula bar

=Calculate(expression)

and have it correctly evaluate the expression in the following cases:

  • a constant, e.g. 1;
  • a standard arithmetic expressions e.g. 1+2;
  • a reference to another cell e.g. A1 or $BQ$192;
  • a vector e.g. {1;2;3};

and any combination of the cases above, the example in the previous post being =B2+C2 where B2={1;2;3} and C2={4;5;6}.

This elicits the first hard requirement: we’re going to need direct access to the raw text of the calling cell, and any cells that it references.

We can do this at a pretty low level using the Microsoft.Office.Interop.Excel namespace, which gives us direct access to things like workbooks, worksheets, cell ranges and cell formulas. Alternatively, we can harness Excel-DNA’s Integration package to perform a variety of tasks using the XlCall class. In a subsequent post I’ll show F# snippets using both techniques.

The second requirement should be fairly obvious: we need to be able to correctly calculate arithmetic expressions. Thankfully this is a well-trodden path, and involves the following:

  • Parse the incoming text and make sure it’s a valid expression (for example, no mismatched brackets or odd characters);
  • Convert the expression into a format from which it can be calculated (for example, to Reverse Polish Notation);
  • Evaluate the expression.

We’ll say much more about each of these steps from the perspective of F# and functional programming in the future, but suffice to say there’s no breakthrough in Computer Science with the above routine.

Finally, I’m going to introduce a requirement borne out of greed. If we can handle a reference to a vector, surely we can try and handle a reference to anything? From a code perspective, this is equivalent to saying something that fits a given function signature in F# (or implements a given interface in an OO language).

Expanding on this, what if we could:

  • store any .NET object via a user-defined function in Excel;
  • construct a method signature that allows evaluation of an expression involving such objects;
  • store the result of our computations in another .NET object; and
  • retrieve the results in a polymorphic fashion (constant, vector, matrix, …) with another user-defined function.

Much of this is helped by wrapping the Excel Object Handler, but it still sounds like a lofty goal.

In the next post, I’m going to start showing exactly how this is done.

Saturday, April 16, 2016

Excel calculations in F# - The Problem

I’ve been playing around with the more esoteric/academic features of functional programming for a couple of months now, so I think it’s about time to get some hands-on experience.

To do this, I set myself the task of choosing a mini-project to complete in F# — one that might have some real-world use!

I ended up settling on implementing a simple Excel expression parser. In this post I’ll introduce the problem that I want it to solve.


The Problem

I’m guessing you’re familiar with writing expressions in Excel. The program comes with a little formula bar that allows you to write expressions of varying complexity, precede it with the = key, and have it evaluated for you. Excel allows you to do all sorts of things here, but I’m going to focus on a bit of a gap in it’s functionality: vectors.

We begin with a simple question: how do we add two vectors of numbers in Excel? The concept of array formulas already exists, and indeed they get us to a solution pretty quickly. To add {1;2;3} to {4;5;6} we can do this:

Simple vector addition

So far this seems very straightforward. Referring to each vector in its entirety means we can do basic arithmetic on the vectors — addition, subtraction, multiplication, even exponentiation.

Let’s push the program a bit at this stage. Excel is used for a lot of financial modelling, and spreadsheets get very complicated very quickly in these applications. What if, instead of creating each vector explicitly and referring to it, we create it in a single cell as a one-dimensional constant.

If this sounds contrived, imagine that these are lists of 10,000 stock prices that we’re trying to analyse and that we’re getting them from a third-party source. Excel is incredibly resource-hungry — if we take these 10,000 numbers and paste them into 10,000 cells the resulting memory allocation from all the cell formatting etc. is huge compared to putting them all in one cell. Extend this over hundreds of rows in dozens of tabs and we might have a problem.

So, we want to store a vector in a single cell, another vector in another single cell, and add them together. Let’s see what Computer says:

Using a one-dimensional constant

No

This surprised me when I first saw it. After all, if you read the link above on one-dimensional constants, there are plenty of examples where they add and multiply these kind of things together. Even more strange is that we can do exactly what we want by putting both vectors directly into the third expression. To see what I mean, look at the picture below:

With formula expressions

I hope this makes it clear what problem I want to solve.

I want a generic way of being able to refer to array constants, do arithmetic on them, and have the result be another array constant that I can use.

This, I hope, will strike a chord with the hordes of finance professionals writing huge, slow Excel spreadsheets and wishing they could make them faster and more compact.


The Solution

Before I outline a proposed solution, if anyone reading this knows of a way to do what I'm trying to do already, please say so!

Not being an Excel developer, I’m going to stop short of implementing anything that messes with current Excel syntax. That is, I’m not going to make the exact formula you see in the picture above work.

Instead, I’m going to write a new function, let’s call it Calculate, that will do the job for us. More specifically, if we write =Calculate(A1 * A2 ^ A3 + A4) where A1, A2, A3 and A4 are all vectors of numbers, the result will be what we expect from following standard.


Next Time

I’ll introduce the tools that I’ve decided to use to write my Calculate function,.