Friday, April 29, 2016

Excel calculations in F# - The Code, Part I

We’ve gone through the basics now — we want to do calculations in Excel, and we’re going to use ExcelDNA and F# to do it. The question now, is how.

I mentioned last time that our function must be able to deal with expressions that are already valid in Excel. If I write

=Calculate(expression)

and Excel knows what that expression is (i.e. give us all the intellisense and highlighting and ability to move boxes around to select cells), then we should be able to figure out what to do.

I decomposed this problem into three separate steps, and the top-level F# function reads as you see below. This, along with all the other code, can be found on Bitbucket.

let Calculate expr handle = 
    expr
    |> GetFormulaFromCell
    |> CalcFormulaFromText
    |> GetReturnValue handle

Overall, we can see how the functional-first style of F# leads us to:

  • prefer function composition
    • here we use the forward pipe operator |> which allows us to put the final argument to a function after the function rather than before, and is defined as let (|>) x f = f x
  • omit type declarations
    • there isn’t a single type declaration anywhere in my function, but hovering over it tells me the signature is 'a -> string -> obj, meaning a curried function that takes a value of any generic type, and a string, and returns an object
  • work naturally with Excel’s model.
    • by returning obj we are giving ourselves the capability to product a result of any type and then downcast it to be displayed by Excel, and by accepting a generic input we can handle text and references as well as raw numbers.

Ignoring the ‘handle’ variable for a second, we’ll briefly look at what each step entails:


GetFormulaFromCell

More composition and pipelining!

let GetFormulaFromCell expr = 
    expr
    |> GetCellText
    |> GetFormulaFromText

I hope that the general idea of what this function is trying to do is clear. It’s trying to get the actual formula that we want to calculate from the calling cell, and it has to do this in a slightly roundabout way — first, it gets the raw text of the calling cell, and then it gets the formula part of the cell.

Why do it like this? Unfortunately, Excel is too good sometimes. If I pass in =Calculate(1+1), then Excel will first do it’s own internal calculation of 1+1, and then call my code with the result: =Calculate(2). Not much use when you’re trying to take the thing in brackets and calculate it yourself!

To get around this, we have to go deep into Interop territory. Getting the raw cell text from the calling site is actually fairly painless once you’ve done it — we can get the reference to the caller, the application, and the active worksheet using ExcelDNA:

let callerRef = XlCall.Excel(XlCall.xlfCaller) :?> ExcelReference
let application = ExcelDnaUtil.Application :?> Application
let sheet = application.ActiveWorkbook.ActiveSheet :?> Worksheet

From this point we can get the formula:

let range = sheet.Cells.[callerRef.rowFirst+1, callerRef.columnFirst+1] :?> Range
let cellText = range.Formula.ToString()

As always I’ll ‘say what I see’ with regard to the code snippets and F# style:

  • The general way in which we interact with Excel is clean, and similar to the C# equivalents
  • Casting feels different: the dynamic downcast operator :?> appears all over the place, which is a byproduct of the lack of the dynamic keyword that is used in the equivalent C# code.
  • Although I haven’t shown it yet, dependency injection was painless. The way I abstracted this section of Excel-dependent code using a discriminated union was such a pleasant contrast to the interface-oriented way in C#

We know have the raw formula text of the cell, in this case =Calculate(expression). The second step allows us to get expression out of this, and looks pretty simple (getting the startOfSubstring and endOfSubstring values is also easy):

let GetFormulaFromText(text : string) = 
    text.Substring(startOfSubstring, text.Length - endOfSubstring text)

Now, at least, we have our expression!


CalcFormulaFromText

This is where the meat and bones of the work is done. I’m going to leave you hanging on this one: it’s sufficiently detailed and fun to merit its own post, which will follow this one. I will say now that it encompasses most of the topic we’ve already covered in F#, and a couple more such as active patterns and a little-known Dijkstra algorithm…


GetReturnValue

Once we’ve decided what the result of the calculation is, putting it back on the spreadsheet still requires a bit of work.

At this point I’ll return to the handle variable that I ignored earlier. Looking at the code now, it might be clearer what it’s purpose is: it’s simply a container for our result in the case that it refers to a complex object which we are unable to show in a cell.

type Result = 
    | ValResult of float
    | HandleResult of string
    | ValListResult of float list

let GetReturnValue handle result  = 
    match result with
    | ValResult v -> v :> obj
    | HandleResult h -> h :> obj
    | ValListResult vs -> Handler.StoreObject(handle, new InternalCaller(), vs) :> obj

This shows a number of features that we’ve already seen, but are worth repeating:

  • The combination of discriminated unions and pattern matching is formidable. Here, it gives us a hook to be able to deal with any result object that we want to have, and give it back to Excel
    • For example, we could add a branch that dealt with returning 2D arrays
    • Alternatively, as we do here, we store the result in memory using a simple object handler that has a backing Dictionary<string, object>.
  • We’ve had to cast to obj again, as we did when interactive with Excel earlier in the post. This is part of F#’s type system: a pattern matching statement must return the same type in each case, and we need to explicitly say that any old obj will do.

Next Time

We get to do some geeky stuff with parsers and algorithms and stuff.

The work so far has given us a pretty generic blueprint: we need to write a function that takes in a generic expression and returns a Result, which is a type over which we have the ability to add more cases.

No comments:

Post a Comment