Friday, April 29, 2016

Excel calculations in F# - The Code, Part I

We’ve gone through the basics now — we want to do calculations in Excel, and we’re going to use ExcelDNA and F# to do it. The question now, is how.

I mentioned last time that our function must be able to deal with expressions that are already valid in Excel. If I write

=Calculate(expression)

and Excel knows what that expression is (i.e. give us all the intellisense and highlighting and ability to move boxes around to select cells), then we should be able to figure out what to do.

I decomposed this problem into three separate steps, and the top-level F# function reads as you see below. This, along with all the other code, can be found on Bitbucket.

let Calculate expr handle = 
    expr
    |> GetFormulaFromCell
    |> CalcFormulaFromText
    |> GetReturnValue handle

Overall, we can see how the functional-first style of F# leads us to:

  • prefer function composition
    • here we use the forward pipe operator |> which allows us to put the final argument to a function after the function rather than before, and is defined as let (|>) x f = f x
  • omit type declarations
    • there isn’t a single type declaration anywhere in my function, but hovering over it tells me the signature is 'a -> string -> obj, meaning a curried function that takes a value of any generic type, and a string, and returns an object
  • work naturally with Excel’s model.
    • by returning obj we are giving ourselves the capability to product a result of any type and then downcast it to be displayed by Excel, and by accepting a generic input we can handle text and references as well as raw numbers.

Ignoring the ‘handle’ variable for a second, we’ll briefly look at what each step entails:


GetFormulaFromCell

More composition and pipelining!

let GetFormulaFromCell expr = 
    expr
    |> GetCellText
    |> GetFormulaFromText

I hope that the general idea of what this function is trying to do is clear. It’s trying to get the actual formula that we want to calculate from the calling cell, and it has to do this in a slightly roundabout way — first, it gets the raw text of the calling cell, and then it gets the formula part of the cell.

Why do it like this? Unfortunately, Excel is too good sometimes. If I pass in =Calculate(1+1), then Excel will first do it’s own internal calculation of 1+1, and then call my code with the result: =Calculate(2). Not much use when you’re trying to take the thing in brackets and calculate it yourself!

To get around this, we have to go deep into Interop territory. Getting the raw cell text from the calling site is actually fairly painless once you’ve done it — we can get the reference to the caller, the application, and the active worksheet using ExcelDNA:

let callerRef = XlCall.Excel(XlCall.xlfCaller) :?> ExcelReference
let application = ExcelDnaUtil.Application :?> Application
let sheet = application.ActiveWorkbook.ActiveSheet :?> Worksheet

From this point we can get the formula:

let range = sheet.Cells.[callerRef.rowFirst+1, callerRef.columnFirst+1] :?> Range
let cellText = range.Formula.ToString()

As always I’ll ‘say what I see’ with regard to the code snippets and F# style:

  • The general way in which we interact with Excel is clean, and similar to the C# equivalents
  • Casting feels different: the dynamic downcast operator :?> appears all over the place, which is a byproduct of the lack of the dynamic keyword that is used in the equivalent C# code.
  • Although I haven’t shown it yet, dependency injection was painless. The way I abstracted this section of Excel-dependent code using a discriminated union was such a pleasant contrast to the interface-oriented way in C#

We know have the raw formula text of the cell, in this case =Calculate(expression). The second step allows us to get expression out of this, and looks pretty simple (getting the startOfSubstring and endOfSubstring values is also easy):

let GetFormulaFromText(text : string) = 
    text.Substring(startOfSubstring, text.Length - endOfSubstring text)

Now, at least, we have our expression!


CalcFormulaFromText

This is where the meat and bones of the work is done. I’m going to leave you hanging on this one: it’s sufficiently detailed and fun to merit its own post, which will follow this one. I will say now that it encompasses most of the topic we’ve already covered in F#, and a couple more such as active patterns and a little-known Dijkstra algorithm…


GetReturnValue

Once we’ve decided what the result of the calculation is, putting it back on the spreadsheet still requires a bit of work.

At this point I’ll return to the handle variable that I ignored earlier. Looking at the code now, it might be clearer what it’s purpose is: it’s simply a container for our result in the case that it refers to a complex object which we are unable to show in a cell.

type Result = 
    | ValResult of float
    | HandleResult of string
    | ValListResult of float list

let GetReturnValue handle result  = 
    match result with
    | ValResult v -> v :> obj
    | HandleResult h -> h :> obj
    | ValListResult vs -> Handler.StoreObject(handle, new InternalCaller(), vs) :> obj

This shows a number of features that we’ve already seen, but are worth repeating:

  • The combination of discriminated unions and pattern matching is formidable. Here, it gives us a hook to be able to deal with any result object that we want to have, and give it back to Excel
    • For example, we could add a branch that dealt with returning 2D arrays
    • Alternatively, as we do here, we store the result in memory using a simple object handler that has a backing Dictionary<string, object>.
  • We’ve had to cast to obj again, as we did when interactive with Excel earlier in the post. This is part of F#’s type system: a pattern matching statement must return the same type in each case, and we need to explicitly say that any old obj will do.

Next Time

We get to do some geeky stuff with parsers and algorithms and stuff.

The work so far has given us a pretty generic blueprint: we need to write a function that takes in a generic expression and returns a Result, which is a type over which we have the ability to add more cases.

Sunday, April 17, 2016

Excel calculations in F# - Tools and Requirements

So we want to be able to do better vector arithmetic in Excel.

As this is primarily about doing something in F#, that means we need some way of interfacing with it from .NET code. Enter Excel-DNA. For those unfamiliar, this is a fantastic open-source tool that let’s you create Excel add-ins, and I highly recommend reading their blurb.

Getting it up and running really is as simple as it suggests, which leaves us plenty of time to think about our requirements. At the core, we want to be able to write the following in the Excel formula bar

=Calculate(expression)

and have it correctly evaluate the expression in the following cases:

  • a constant, e.g. 1;
  • a standard arithmetic expressions e.g. 1+2;
  • a reference to another cell e.g. A1 or $BQ$192;
  • a vector e.g. {1;2;3};

and any combination of the cases above, the example in the previous post being =B2+C2 where B2={1;2;3} and C2={4;5;6}.

This elicits the first hard requirement: we’re going to need direct access to the raw text of the calling cell, and any cells that it references.

We can do this at a pretty low level using the Microsoft.Office.Interop.Excel namespace, which gives us direct access to things like workbooks, worksheets, cell ranges and cell formulas. Alternatively, we can harness Excel-DNA’s Integration package to perform a variety of tasks using the XlCall class. In a subsequent post I’ll show F# snippets using both techniques.

The second requirement should be fairly obvious: we need to be able to correctly calculate arithmetic expressions. Thankfully this is a well-trodden path, and involves the following:

  • Parse the incoming text and make sure it’s a valid expression (for example, no mismatched brackets or odd characters);
  • Convert the expression into a format from which it can be calculated (for example, to Reverse Polish Notation);
  • Evaluate the expression.

We’ll say much more about each of these steps from the perspective of F# and functional programming in the future, but suffice to say there’s no breakthrough in Computer Science with the above routine.

Finally, I’m going to introduce a requirement borne out of greed. If we can handle a reference to a vector, surely we can try and handle a reference to anything? From a code perspective, this is equivalent to saying something that fits a given function signature in F# (or implements a given interface in an OO language).

Expanding on this, what if we could:

  • store any .NET object via a user-defined function in Excel;
  • construct a method signature that allows evaluation of an expression involving such objects;
  • store the result of our computations in another .NET object; and
  • retrieve the results in a polymorphic fashion (constant, vector, matrix, …) with another user-defined function.

Much of this is helped by wrapping the Excel Object Handler, but it still sounds like a lofty goal.

In the next post, I’m going to start showing exactly how this is done.

Saturday, April 16, 2016

Excel calculations in F# - The Problem

I’ve been playing around with the more esoteric/academic features of functional programming for a couple of months now, so I think it’s about time to get some hands-on experience.

To do this, I set myself the task of choosing a mini-project to complete in F# — one that might have some real-world use!

I ended up settling on implementing a simple Excel expression parser. In this post I’ll introduce the problem that I want it to solve.


The Problem

I’m guessing you’re familiar with writing expressions in Excel. The program comes with a little formula bar that allows you to write expressions of varying complexity, precede it with the = key, and have it evaluated for you. Excel allows you to do all sorts of things here, but I’m going to focus on a bit of a gap in it’s functionality: vectors.

We begin with a simple question: how do we add two vectors of numbers in Excel? The concept of array formulas already exists, and indeed they get us to a solution pretty quickly. To add {1;2;3} to {4;5;6} we can do this:

Simple vector addition

So far this seems very straightforward. Referring to each vector in its entirety means we can do basic arithmetic on the vectors — addition, subtraction, multiplication, even exponentiation.

Let’s push the program a bit at this stage. Excel is used for a lot of financial modelling, and spreadsheets get very complicated very quickly in these applications. What if, instead of creating each vector explicitly and referring to it, we create it in a single cell as a one-dimensional constant.

If this sounds contrived, imagine that these are lists of 10,000 stock prices that we’re trying to analyse and that we’re getting them from a third-party source. Excel is incredibly resource-hungry — if we take these 10,000 numbers and paste them into 10,000 cells the resulting memory allocation from all the cell formatting etc. is huge compared to putting them all in one cell. Extend this over hundreds of rows in dozens of tabs and we might have a problem.

So, we want to store a vector in a single cell, another vector in another single cell, and add them together. Let’s see what Computer says:

Using a one-dimensional constant

No

This surprised me when I first saw it. After all, if you read the link above on one-dimensional constants, there are plenty of examples where they add and multiply these kind of things together. Even more strange is that we can do exactly what we want by putting both vectors directly into the third expression. To see what I mean, look at the picture below:

With formula expressions

I hope this makes it clear what problem I want to solve.

I want a generic way of being able to refer to array constants, do arithmetic on them, and have the result be another array constant that I can use.

This, I hope, will strike a chord with the hordes of finance professionals writing huge, slow Excel spreadsheets and wishing they could make them faster and more compact.


The Solution

Before I outline a proposed solution, if anyone reading this knows of a way to do what I'm trying to do already, please say so!

Not being an Excel developer, I’m going to stop short of implementing anything that messes with current Excel syntax. That is, I’m not going to make the exact formula you see in the picture above work.

Instead, I’m going to write a new function, let’s call it Calculate, that will do the job for us. More specifically, if we write =Calculate(A1 * A2 ^ A3 + A4) where A1, A2, A3 and A4 are all vectors of numbers, the result will be what we expect from following standard.


Next Time

I’ll introduce the tools that I’ve decided to use to write my Calculate function,.

Tuesday, February 16, 2016

Modulo Cons

We’re back with the second week of the Programming Languages course run by the University of Washington.

Once more, here are links to the notes, videos, and exercises. My solutions are on GitHub.


Pattern Matching and Recursion

Doing this exercises drilled home a couple of ideas that we’ve seen before: using pattern matching makes code so much neater, record types are great for defining lightweight pseudo-classes, and thinking recursively is hard when you aren’t used to it!

The contents of the notes and video are actually extremely useful and I could write a pretty long-winded piece on them alone, but here I want to focus on what I got out of trying the exercises.


Tail Recursion Modulo Cons

This gem of a concept cropped up in the very first question that I tried, which was:

Write a function all_except_option, which takes a string and a string list. Return NONE if the
string is not in the list, else return SOME lst where lst is identical to the argument list except the string
is not in it. You may assume the string is in the list at most once.

It came down to a case of two helper functions. For now, let’s forget that we can loop through the list once to do all the processing, and do it in two steps: first, find out whether the supplied string is in the list (i.e. the Any method that takes a predicate); then, if it’s found we remove the item from the list.

To match this logic I wrote two helper functions. The first was fairly easy to make tail recursive:

let found s ss = 
  let rec found_impl sl = 
    match sl with
    | [] -> false
    | h::t -> if h = s then true else found_impl t
  in found_impl ss

The second I found much harder, and ended up fruitless:

let remove_when_found s ss = 
  let rec remove_when_found_impl sl = 
    match sl with
    | [] -> []
    | h::t -> if h = s then t else h :: remove_when_found_impl t 
  in remove_when_found_impl ss

After some Googling, I realised that this second function was tail recursive modulo cons. After quite a lot more Googling, it seemed like the presented solutions on the blogosphere were less than transparent, so I left the function vulnerable to a Stack Overflow Exception.

It seemed like such a small change between the goals of the functions at first: recurse over the same list, check the same predicate, but the difference that returning a bool vs a List makes is significant.

I will return to this example when I understand the concept a lot better, and hopefully be able to write an elegant tail recursive solution.


Thoughts

My first foray into using pattern matching was good fun. I’m sure that I did plenty of things wrong and will improve both style and efficiency over time, but for now I am happy getting used to the new syntax and way of thinking.

Recursion is something that I have yet to quite grasp in my head — perhaps it is because one uses it so frequently in production code, but perhaps I just need to endeavour.

Next time, week three, promising First-Class functions and Closures.

Tuesday, February 09, 2016

Practice Makes Perfect

In the previous few posts there’s been a lot of information, at least from my perspective. This knowledge alone won’t be enough to really get a feel for what it’s like to write code in F#, especially as most of the snippets you’ve seen have been adapted from the book I’ve been going through.

As such, I’ll be backing up what I’ve taken from the book by going through some exercises from the Programming Languages course run by the University of Washington. This is the same syllabus used on Coursera, and so all the lectures videos, notes and problems are free to access online.

The astute observer will note that the course uses ML, not F#. As it turns out this isn’t a huge problem — F# originated from ML and so whilst there are syntactic differences, the similarities prevail,

To get started, I went through the first week’s notes, watched the videos, and completed the exercises. I won’t list all of the solutions here (I’ve put them on GitHub along with a few ‘tests’ that I wrote). However, there were some things that cropped up again and again that I wanted to share.


The Power of Pattern Matching

I was true to the spirit of the exercises, which meant only using features that I had been exposed to in the preceding course notes. This essentially meant not using things like pattern matching, and definitely not using any loops, which left me slightly unsure of how to approach the first question.

So, I brute-forced it:

(* Question 1 : Write a function is_older that takes two dates and evaluates to true or false. It evaluates to true if the first argument is a date that comes before the second argument. (If the two dates are the same, the result is false.) *)

type Date = { Year : int; Month : int; Day : int }

let is_older (firstDate : Date) (secondDate : Date) =
  let compareInts x y = if x < y
                    then 1
                    else if y > x
                    then -1
                    else 0
  in
  if compareInts firstDate.Year secondDate.Year = 1 then true
  else if compareInts firstDate.Year secondDate.Year = -1 then false
  else if compareInts firstDate.Month secondDate.Month = 1 then true
  else if compareInts firstDate.Month secondDate.Month = -1 then false
  else if compareInts firstDate.Day secondDate.Day = 1 then true
  else false

Horrible, right? I’m sure there is a neater way to do it, but this one works…

The other quirk was that you can’t index into n-tuples in F# using the #foo notation in ML, and so I had to define a record type for Date, whereas I suspect the point of the course was to use a tuple of (int * int * int) and access the relevant bits using #1, #2 and #3.


Recursive by default

The next thing that struck, and really continued throughout the exercises, was the idea that the correct way to access the elements of a list was one by one, using recursion. For someone coming from the ‘mixed-mode’ world of the C# List<T> (which has both an array and a linked list to give the dual benefit of constant-time access and logarithmic-time copying), this took a while to get my head around.

In this case I do really mean a while, and I must have taken over half an hour to complete the second (trivial!) exercise:

(* Question 2: Write a function number_in_month that takes a list of dates and a month (i.e., an int) and returns how many dates in the list are in the given month. *)

let number_in_month (dates : Date list) (month : int) =
  let rec number_in_month_impl (dateList : Date list) =
    if List.isEmpty dateList then 0
    else let headIsInMonth = if (List.head dateList).Month = month then 1 else 0
         in headIsInMonth + number_in_month_impl (List.tail dateList)
  in number_in_month_impl dates

This fairly easy-to-understand question turns into a surprisingly nuanced answer (at least the way I did it!). I was surprised that:

  • It really is possible to take the first element of a list, do something with it, and then recurse on the remainder.
  • Despite this, we have to explicitly declare a binding with the rec keyword to signal to F# that we want to use recursion. To me this is extremely unnatural, and I forgot to write rec several times over the course of doing all the exercises. Does the language really need to be told that this thing is recursive? Surely it can infer the fact.
  • The most ‘natural’ pattern for me to implement the recursive part was with an inner routine. When I recursed the outer routine, I ended up with lots of calls to number_in_month dates_tail month, where the value of month didn’t change from iteration to iteration. A quick refactor made it easy to strip the recursive part down to the necessary bones — passing in a smaller list each time.

Once I’d bemoaned the lack of pattern matching and got my head around crafting recursive function, most of the rest of my solutions bore a similar hallmark to those shown. I particularly liked creating nested recursive functions to deal with cases when you need to iterate over two lists — rather than nested for loops in C#, we simple define an ‘inner’ function that recurses over the inner list and call it from an ‘outer’ function which recurses over the outer list. Simple, and much easier to test and reason about.


Suggestions please!

As a total novice to F#, I’m certain that these implementations are utterly horrible. If anyone is reading this and has ideas of how to improve the functions, please either leave me a comment or send me a tweet @malformedfawn! This goes for all of the solutions, not just those copied into this post :)

Next time I’ll finish writing up my thoughts on Chapter 5 of The Book of F#, and then probably do another round of these exercises.

Wednesday, January 27, 2016

The Book of F#, Chapter 5: Let's Get Functional

This chapter looks at what functional programming really is. In it there are a number of concepts that we’ve either seen in code or alluded to in text as part of the previous chapters, and it was valuable to get more detailed information about things that I had ‘seen but not understood’.

I also feel vindicated for skipping Chapter 4 (for the time being): the author suggests that we should always favour a functional approach when developing in F#, which is precisely the reason that I didn’t want to read about its object-oriented capabilities before understanding its functional core.

Also, the chapter is quite long. As such, I’ll probably write this post in two halves and upload the first once it’s finished, with the second half following a couple of days afterwards.


What is Functional Programming

We’ve touched on this before, but it’s nice to think of functional programming as applying idempotent functions to data. That is, the data structures and objects we create are immutable, and the functions we write will always evaluate to the same output for a fixed input.

Bringing these two concepts together is referential transparency, which encapsulates the situation I have just described by stating that an expression can be directly replaced by its result without affecting the program’s execution. An example would be the ability to replace let x = 2 + 3 with let x = 5, which means that the additional operator + is referentially transparent.

Whilst these are only a couple of functional programming concepts, they are two that make a lot of sense to me, and towards which I like to strive even when writing object-oriented code.


Programming with Functions

F# functions are built from the premise that they always accept a single input, and always return a single output. We discussed this, and the handy unit type, in my previous post, but it comes up again here. Specifically, this fact allows the compiler to assume that the last expression evaluated in the function is the return value, so you don’t have to explicitly specify this with a keyword.

Functions as Data

Similar to delegation in C# is the treatment of functions as data. In F# this is given first-class support, which is one of the features that distinguishes it as a functional language. As a result, we can pass around functions like we can any other object, and we’ll see how useful this is later on in the post. For the time being we will just note the use of ->, which is an infix token that means ‘take something of the left-hand type and return something of the right-hand type’, e.g. int -> int.

The concept of higher-order functions is also introduced here, albeit briefly. To the untrained eye this makes function signatures long and confusing (if you haven’t seen it before, could you possibly understand (string -> string) -> string -> unit?), but hopefully I will get used to it given time.


Currying

A lot of detailed stuff is written about curried functions. I am choosing to deliberately simplify my understanding of it as such:

In a language where functions accept exactly one input, currying is how we create things with more than one input.

In essence, if we want to be able to write things like let add a b = a + b, then there has to be something behind the scenes that means this doesn’t type check to a function taking two inputs. This thing is currying, or automatic function chaining, which will look at the statement about and say:

Hey, if add a b needs to return an int, then add a needs to return something that takes b and returns an int.

As you might have figured out, this means the type of add is val add : a:int -> b:int -> int. In English, add is a function that takes an integer a and returns another function. The function it returns takes another integer b and returns a third integer (here a + b).

The author uses a nice trick to show another way of thinking about this: if we write the function as let add a = fun b -> (+) a b, it’s pretty clear that add a returns a function!

This is all very interesting from a theoretical point of view, but it seems to me that in day-to-day development, you really don’t need to know or care that your functions are being curried behind the scenes. Perhaps it helps in understanding the types that are inferred from your code, but beyond that, currying alone does not seem much to write home about.

Partial application

This is another subtle distinction that I really don’t want to get in to. Partial application means that if you call a function with an incomplete set of arguments, it will be evaluated as far as possible.

This seems to have huge benefit for confusion — what if you accidentally miss out a parameter somewhere in your calling code? Rather than not compiling, your work will be valid but ultimately output something of the wrong type, which might be a bit of a shock when you come to use it days or weeks later.

Pipelining and Composition

If I sounded nonplussed about currying, the opposite is true for two other functional features that we are introduced to here.

The ability to send the result of one function to another using one of the pipeline operators (|> and <| for forwards and backwards respectively) seems incredibly useful at first glance. So much of the code that I write comprises half a dozen or so procedural steps, the result of each being passed in to the next statement. The fact that this can be done with an operator in F# is great. Here’s the example from the book:

let fahrenheitToCelsius degreesF = (degreesF - 32.0) * (5.0 / 9.0)

let marchHighTemps = [ 33.0; 30.0; 33.0; 38.0; 36.0; 31.0; 35.0;
                   42.0; 53.0; 65.0; 59.0; 42.0; 31.0; 41.0;
                   49.0; 45.0; 37.0; 42.0; 40.0; 32.0; 33.0;
                   42.0; 48.0; 36.0; 34.0; 38.0; 41.0; 46.0;
                   54.0; 57.0; 59.0 ]
marchHighTemps
|> List.average
|> fahrenheitToCelsius
|> printfn "March Average (C): %f"

Simple, clear, and so much nicer than object-oriented code!

Similar in nature and style is the concept of function composition, which takes two functions and returns a third. Keeping with the example above, we could define

let averageInCelsius = List.average >> fahrenheitToCelsius

which binds two steps of the computation in one function. Pretty damn cool!

I reckon these two features, combined together and in the right project, could be reason alone to pick F# over C#. It’s a bold statement, but notice I said in the right project. I can foresee that in a lot of applications, this kind of syntax won’t be use nearly often enough to reap any reward.


Recursive Functions

We said in the previous post that recursion is the preferred method of looping in a functional language, and now we’ll get to see why, and how.

What I (naïvely) didn’t realise is that all the traditional lopping constructs (while, for and foreach) rely on a chance in state to terminate, and so don’t really mesh with pure functional programming so well. Instead, if we have a recursive function that calls itself, all we need is some base case that causes it to eventually terminate.

This looks very similar to induction to my maths brain, in that we say what to do in the base case as well how any other case relates to it’s predecessor (i.e. the inductive step).

In F#, we do recursion with the rec keyword. The example given is the classic factorial computation:

let rec factorial v =
  match v with | 1 -> 1
               | _ -> v * factorial (v - 1)

This reads almost as-is: if the input is 1 then return 1, otherwise return v times the same function called with v - 1.

On the negative side, this is not tail-call optimised, and is the kind of function that gave rise to the most famous of programming websites, Stack Overflow. Basically, writing the function like this means that if we start with a really big number, we will end up with a new stack frame for each function call, which will cause a Stack Overflow!

We get around this fairly simply by ensuring that the recursive call is done last (it might look like this is the case for our current implementation, but it’s actually the multiplication operation that end up in the tail position). One way to do this is by keeping track of both the current number we are ‘down to’ as well as the product computed so far:

let factorial v =
  let rec fact current product =
    match current with | 0 -> product
                       | _ ->   fact <| current - 1 <| current * product
   fact v 1

As shown here, this form will allow the program to use a JUMP statement or similar and replace the arguments to fact on each iteration, rather than calling the function again.

Whilst you can obviously write a recursive function in C#, the compiler doesn’t support tail recursion, so this gives F# a big thumbs up from me.

Mutually Recursive Functions

I’m not entirely sure how often I will use this concept, but as the name suggests, two functions can call each other recursively. The example from the book is the fibonacci sequence:

let fibonacci n =
  let rec f = function
          | 1 -> 1
          | n -> g (n - 1)
  and g = function
      | 1 -> 0
      | n -> g (n - 1) + f (n - 1)
  f n + g n

Nice, but I cannot immediately think of an example in my current work where I would use this.


Lambda Expressions

These are something that are very familiar to me, as they are used extensively in C#. They seem pretty similar in F# to be honest, using the same -> token and with broadly similar use cases (i.e. when we have inline expressions that are only ever used in a very narrow scope.


Next Time

I’ll leave it there for the time being, and finish this post off within a few days. Here’s a sneak peek of what’s left to cover in this chapter.

  • Closures
  • Functional Types
  • Discriminated Unions
  • Lazy Evaluation
  • Summary

Reprise

As promised (though a week or so after originally planned), it’s time to finish off my overview of what functional programming comprises. It looks like the book is quite in-depth at parts, so I will skip anything that is more details than necessary.


Closures

These are a big feature that I’m already quite familiar with from C#, and are quite well documented (see this post by Jon Skeet as an example).

In a nutshell, they allow a function to capture a value that is visible to it, even if it’s not part of the function. A good example of this is something that increments a global counter:

let g() = 
    let mutable counter = 0;
    fun () -> counter <- counter + 1

The only thing I noticed here was something that I first encountered back in Chapter 3: as of F# 4.0, you can mutable keyword for values that change inside closures. Beforehand this wasn’t possible and you needed a reference cell instead.


Functional Types

This rather long section boils down to Tuples, Records, and a lot of syntax. On first reading this, I thought there was a lot to take in. However, I since read some notes the form part of the University of Washington’s Programming Language Course which explained that tuples are just syntactic sugar for records with particular component naming conventions. I will talk a lot more about this when I eventually go through those notes in a subsequent post, so here I will focus more on the F# syntax rather than the more precise semantics.

Tuples

Tuples are a way to group an arbitrary number of values in a single construct. In F# they are created by separating their arguments with ,, and have a type signature where elements are separated by *:

> let tuple1 = 1.0, "hello";;

val tuple1 : float * string = (1.0, "hello")

They are also, as of .NET 4, in C#. I haven’t worked with them much in a C# context as rather than writing stuff like

var tuple = new Tuple<string, int>("Doug", 28)
Console.WriteLine(tuple.Item1 + ": " + tuple.Item2)

I would rather write

var person = new Person("Doug", 28)
Console.WriteLine(person.Name + ": " + person.Age)

The second version makes it so much more obvious what’s going on.

In F# they seem to have much better first-class support (probably because F# is based off ML, which has first-class tuple support), and so are a bit easier to work with. There are still limitations, though. You can access the first and second elements of a pair with fst and snd, but to access all three elements of a triple you would have to use Item1 etc. or deconstruct the tuple:

let triple = "Doug", 28, "London"
  let name, age, city = triple 
  in (...)

Essentially, you construct only to deconstruct at first use, which doesn’t immediately chime with me.

It’s worth bearing in mind how often tuples are used within F#, though. Consider calling a .NET library function that takes two arguments. We’ve seen already that every function in F# takes in one arguments and returns one result. So how can we call String.Format, for example? Using a tuple! F# converts the function signature of things like this to be a function from the required tuple type, to the output type. This gives the illusion of calling things in C# style, which provides some comfort I suppose.

Furthermore, F# uses tuples in place of out parameters. To be honest, I don’t like using these in C# (the TryGetxxx functions make me squirm!), but using the magic of tuples F# can return two results that are boxed up into one:

let parseSuccessful, parsedValue = Int32.TryParse "10"

This, I admit, is nicer that C#. I’ve already written too many custom return structs for functions where I want to return more than one thing, and F# solves these issues in a trifle.

So, tuples are great for returning stuff and calling .NET library functions, but look a bit rubbish when explicitly creating them yourself.

Record Types

I’ve already said that tuples are just record types with a special syntax. So what does one look like in F#?

type person = { Name : string; Age : int; City : string }

Isn’t this exactly what I wanted earlier in C# and wrote a class for? Yes it is! We get the best of both world here, in my opinion: creating custom data types without needing whole classes, and being able to access the bits of a class in a strongly-typed and clear manner.

There are a few other syntax bits with tuples that might be useful: you can do a ‘copy and update’:

let doug = { Name = "Doug"; Age = 27; City = "London" }
let itsMyBirthday = { doug with Age = 28}

You can also declare individual bits of a tuple as mutable, but that sounds like a road I’d rather not go down given that I am trying to learn a relatively pure functional paradigm.

Finally, you can add a member to a record:

type person = { Name : string; Age : int; City : string }
              member x.Description() =
                x.Name + " lives in " x.City 

This is because they are syntactic sugar for classes! All along we really wanted a Person class, and now we have it. But we also have a really nice way of declaring it, and excellent type-safety.


Discriminated Unions

Returning to the UW Programming Languages course, we can now see that tuples and records are ‘each of’ types.

The F# version of a ‘one of’ type is the discriminated union. This is something where the values are restricted to be one of a set of cases. Each case can be of any type, and these can be recursive:

type Tree =
    | Empty
    | Node of int * Tree * Tree

The above type says that a tree is either the tip of the tree, or it is a node that has a value int, and left and right Tree instances.

This is another area that I’ll cover in more detail when I go through the UW course in a future post, but I think it’s still useful to know now that all this data type is doing is providing us with a way to ‘tag’ data with it’s type, so in the above example we can say that something that’s empty is really Empty, and something that looks like int * Tree * Tree is really a Node.

As of F# 3.1, you can actually create instances of these types using named arguments, so we might have actually written

type Tree =
    | Empty
    | Node of value : int * left : Tree * right : Tree

let newNode = Node (value = 3, left = EmptyTree, right = EmptyTree)

The power of discriminated unions is really in their ability to be pattern matched. As an example, summing the nodes of the tree could be done as:

let rec sumTree tree =
    match tree with
    | Empty -> 0
    | Node(value, left, right) ->
        value + sumTree(left) + sumTree(right)

let myTree = Node(0, Node(1, Node(2, Empty, Empty), Node(3, Empty, Empty)), Node(4, Empty, Empty))

let resultSumTree = sumTree myTree

I don’t think I’ve ever summed the contents of a tree in C#, but it would definitely have more if or switch statements than that!

The final couple of points on discriminated unions are that they can be convenient aliases, and they can be extended with additional members. Coincidentally, I have been reading Eric Lippert’s blog series on OCaml and in one of the posts he uses a very similar aliasing technique to ensure type safety when passing around primitive stuff. That is, we could treat a binary tree value as

type NodeValue = NodeValue of int

In a more complex module, this would prevent us passing in another random int as a node value (e.g. the height of the tree). This syntax is so helpful and so lightweight, I am amazed it isn’t more ubiquitous in programming languages.


Lazy Evaluation

This is another feature that has made it across to C#. Once again it’s something I haven’t used explicitly that much. However, I am familiar with the concept from things like IEnumerable<T>.

In F#, we have syntactic support for lazy evaluation with the lazy keyword, which creates an instance of Lazy<T> that can be executed using the Force() function or Value property. There’s not much more to say at this stage, lazy evaluation is a useful concept and simple to implement in F#!


Summary

There was a lot in this chapter. Some of the concepts made a lot more sense after reading around the topic, espeically in the area of tuples, records and discrinimated unions. Yet further topics were already familiar from C# such as closures and lazy evaluation, and others from my brief forays in to Haskell such as currying and partial application.

The topics slowly sink in, but will take more time! Next up I will break away from the book and go back to some more academic and basic principles to cement my knowledge as well as do some real exercises.

Saturday, January 23, 2016

The Book of F#, Chapter 3: Fundamentals

In some sense, Chapter 3 is the first ‘real’ chapter of this book. In it, we are given a whirlwind tour of a large number of F# facets, some more interesting than others.

What is clear to me though is that you need a decent knowledge of .NET to really understand the content, especially in the sections on Generics and Exceptions.


Immutability and Side Effects

If you’ve studied functional programming before, there will be no surprises here: bindings are immutable by default, and side-effects (in general) need to be explicitly permitted.

I think this is great for two reasons. The first is the usual one: once something is assigned to, you know that its value won’t ever change and so reasoning about it is much easier. The second is more architectural: you end up putting the right things in the right places. If a quantity needs changing after it’s been created, is it really the same quantity?

A basic example would be the Builder Pattern from the object-oriented world. That is the antithesis of immutability and gives you the option to muck around with your object as much as you like, even after construction! There are plenty of ways to have immutable objects in C#, especially now we have getter-only autoprops in C# 6.0, but having a language that won’t let you do it another way is so much nicer!

Another point, which the author raises on this topic, is thread-safety: if something can be changed and can also be accessed simultaneously by multiple threads, who knows when and by what it might be mutated.


Functional Purity

Having said the above about side-effects and immutability, there’s a bombshell: F# is impure! Shock horror for a language that has access to the whole .NET Framework.

What this basically means is that whilst the default case is for functions to be pure, that is to consist of expressions that always return a value with no side effects, you can ask F# to allow variables to be mutated.

Whilst this allows you to write code a lot more easily, you lose the ability to fully reason about your functions as they lose idempotence. I’m taking the decision now that this sort of mutation is bad, and will try and do as little as possible of it! In cases where it seems as though some kind of application-level state is required, I’ll do my best to replace the stateful behaviour with something else.


Bindings

These are how F# determines values or executable code.

The first is the let binding, which associates a name with a value and seems very flexible — you can bind an int, string, function or result with let. let bindings are immutable and are treated like readonly in C#, not as constants. Compile-time constants are instead declared with [<Literal>]. This seems slightly strange: what was wrong with the const modifier?

If you want to be naughty and change your variables, you can allow with the mutable keyword, changing the value with <-. As of F# 4.0, we can also use the mutable keyword for values that change inside closures. Why you would ever want this is beyond me, though!

Now for a slightly more complex concept, the reference cell. A reference cell, which you might declare with let cell = ref 0 encapsulates a mutable variable which we access using ! and change using :=. That’s what the book says, anyway. I still don’t quite get it, in particular why you would want to use it, especially when you consider that when you do something like (example from here):

let cell = ref 0
let cell2 = cell
cell2 := 1

then the value of cell also becomes 1! To me this is confusing and potentially dangerous.

Secondly, we have the use binding: this is similar to using in C# in the sense that it’s for things that implement IDisposable, and those things get disposed automatically when they go out of scope. It’s also worth noting that you can’t use them inside a Module and will get a warning saying that they are treated as let bindings in this context. Essentially this is because modules are treated as static classes that are always in scope.

You can also write using as in C#, and it’s actually more powerful than the C# as things in F# always return a result, so you can assign to variables directly from the using block rather than inside it.

Finally, do bindings execute code outside of a function of value context. This seems to be useful when doing things like initialisation and constructor calling, and I’m sure plenty of other scenarios that I haven’t yet come across


Identifier Naming

This one’s short: we can enclose a string in "", which is helpful for unit testing as we can say things like let "MethodName_StateUnderTest_ExpectedBehaviour" = ....


Core Data Types

This is really just a list of syntactic differences between F# and C#:

  • Uses not rather than ! for boolean negation.
  • Numeric types are the same as C# with some abbreviation and suffix changes.
  • There is a new operator!! ** is the exponent operator, which seems very handy for mathematical programs: Math.Sqrt(Math.Pow(x,2.0)+Math.Pow(y,2.0)) no more!
  • We also have&&&, |||, ^^^, ~~~, <<<, >>> for bitwise AND, OR, XOR, negation, left shift, right shift.
  • There are no implicit type conversions for numerics as they are considered side effects, so we can’t divide a float by an int and expect it to work.

Type Inference

The type inference capabilities of F# are such that newcomers often mistake it for a dynamically typed language. C# also has type inference (var) but it is very limited.

I think this is probably far too detailed a topic to go into now, but if we copy the example from the book, the upshot is that creating a person with three properties in F# requires exactly three explicit type declarations:

type Person (id : System.Guid, name : string, age : int) =
member x.Id = id
member x.Name = name
member x.Age = age

This appeals to my DRY sense of (humour) programming. We already know the age is an int, why should we have to tell the computer that in both the constructor and the property definition?

You can add a type annotation as well if the compiler gets confused, so something like let x : float = 2.0 .


Nullability

Null is almost never used in F#. To use it, include the [<AllowNullLiteral>] attribute. It’s only really there for .NET interop. Overall this should reduce NullReferenceException instances and null checks.

All this is fantastic. Even with the addition of the null conditional operator in C# 6.0 (represented either by ?. or ?[ depending on if you are accessing a member or an indexer), it’s still annoying to have to ask if something has a null reference all the time.

F# also caters for stuff that might genuinely not have a value: it has the type Option<'T> which is a discriminated union of Some('T) and None, and has its own keyword (e.g. let middleName : string option = None).

This falls through to functions as well: an optional parameter is denoted with ? and given a default value with defaultArg. The defaultArg function accepts an option as its first argument and returns its value when it is Some< _ >; otherwise, it returns the second argument. I’m not so sure I like this syntax as much as the C# version where you just write argName = defaultValue in the method signature, that seems cleaner than specifying this:

type T(?arg) =
  member val Arg = defaultArg arg 0

Lastly, for things that evaluate only for a side effect, there is the unit type, whose value is written as (). This is a concrete type that signifies that no particular value is present, so the result of any expression can safely be ignored.

If we get back something that’s not unit and we don’t use it (bind it or pass to another function), we get a compiler warning. This can be removed by piping (|>) the result to ignore.


Flow Control

The author is quick to point out that recursion is the preferred looping mechanism in F#, but that there are imperative constructs as well:

While loops, written as while...do. As a notable aside, the body of the loop must return unit, reinforcing the point that expressions return values!

For loops:

for i = 0 to 100 do... — standard for loop
for i = 100 downto 100 do... — making it count downwards

for i in [1..10] do... — the equivalent of foreach in C#, this works on enumerated types. What surprised me was that this is actually a syntactic shortcut for pattern matching. I am presuming that this is translated to something that matches on i and does nothing if it is less than 0 or greater than 10. I’m going to skip the details of pattern matching for now as it’s covered in later chapters.

Note that we don’t have break orcontinue keywords to force early termination of loops. This is strange and will definitely require some care when creating certain kinds of loop.

Branching

This is slightly different to C# and uses the syntax if ... then ... else, or if...then...elif...then...else. The nice thing about them is that because they return a value, we don’t need to either pre-assign a variable or use a ternary operator, e.g.

// Using a pre-assigned variable;
bool found;
if(condition)
{
  found = true;
}
else
{
  found = false;
}

// Using ternary operator
bool found = condition ? true : false;

In F# this would be something like

let found condition = 
  if condition then
    true
  else
    false

This is a contrived example and so actually looks a bit messy in F#, but I can see how this works well for more complex branches. Another point to note is that when only an if branch is specified it must evaluate to unit, otherwise each branch must evaluate to the same type.


Generics

As a C# dev, I love generics. Thankfully, their creator also happens to be the F# language designer, Don Syme, and so F# gets first-class support for them too.

Generics allow type-safe code that works with multiple data types, generating implementations based on the input type — consider the difference between ArrayList in which everything is System.Object and List<'T> where the type is clearly disambiguated.

In F#, generics are named with a leading apostrophe ('a, 'A, 'TInput). F# prefers automatic generalisation e.g.

let toTriple a b c = (a, b, c);;
val toTriple : a:'a -> b:'b -> c:'c -> 'a * 'b * 'c

To do useful things with the types, we need constraints (written as when). We inherit all the ones from C# (ref types, value types, those with a default constructor, those that derive from a class or interface) but also F# adds a few more. There’s loads of examples in the book so I’m only going to copy a few in here.

// Fairly standard constraint from C#: we need a parameterless constructor
let myFunc (stream : 'T when 'T : (new : unit -> 'T)) = ()

// Interview question alert!! This works only on comparable types
// which is of course the constraint on sorting algorithms.
let myFunc (stream : 'T when 'T : comparison) = ()

There are a couple of other syntax elements to look at: flexible types are a shortcut for subtype constraints, and are denoted by #: let myFunc (stream : #System.IO.Stream) = ()

To use a generic type as a parameter and let the compiler infer the type, use <_ >: let printList (l : List<_>) = l |> List.iter (fun i -> printfn "%O" i). This will print the result of evaluating the ToString method on whatever type is passed in the list.

One final thing that is specific to F# is the ability to statically resolve generics. These are resolved at compile time and are primarily for use with inline functions (that is, those that are directly integrated into the calling code and/or custom operators. The implication is that the compiler generates a version of the generic type for each resolved type rather than a single version, so be warned!

We declare these using ^ rather than an apostrophe, for example:

// All types where null is valid
let inline myFunc (a : ^T when ^T : null) = ()

// Custom operator, resolves at compile time to ensure all types
// that use it have a Pow operator implementation
let inline (!**) x = x ** 2.0

Overall, it seems like generics are pretty powerful in F#, which is a bonus to me.


When Things Go Wrong

There are two exception constructs in F#: try...with and try...finally (i.e. we have no try...with...finally block). If you need both then use a try...with within a try...finally.

The exception is captured with as and will pattern match against the type of exception, which we do with :?. In total, this gives us something like:

open System.IO

try
  use file = File.OpenText "somefile.txt"
  file.ReadToEnd() |> printfn "%s"
with
| :? FileNotFoundException as ex ->
  printfn "%s was not found" ex.FileName
| :? PathTooLongException
| :? ArgumentNullException
| :? ArgumentException ->
  printfn "Invalid filename"
| _ -> printfn "Error loading file"

My first impression is that this is a more terse and compact block than the C# equivalent, and still very readable even to someone new to the langauge.

To partially handle an exception we use raise or reraise — the former loses the stacktrace and so probably isn’t that desirable, but the latter is only valid in a with block, so both are required depending on the exact circumstance.

Return values com up again here: try...with returns a value. You can return things other than unit, but all cases must then return the same type. Commonly we use the Maybe monad in this case (i.e. the DU of Some<_> and None).

We also raise our own exceptions with raise, but there are additional syntactic helpers: failwith and failwithf , which take in strings as the exception’s message:

if not (File.Exists filename) then
  failwith "File not found"

if not (String.IsNullOrEmpty filename) then
 failwithf "%s could not be found" filename

We can create our own exceptions by deriving from exn, which is shrothand for System.Exception, or use the incredibly lightweight syntax provided by the exception keyword:

exception RetryAttemptFailed of string * int
exception RetryCountExceeded of string

Summary

There was a lot of syntax to take in as part of this chapter. I will admit that most of it didn’t register the first time, but having come back to it whilst writing this post, a lot more makes sense.

The things that stuck with me were default immutability and type inference. Both features seem incredibly powerful if harnessed correctly, so I guess I have to learn how to do that next!

Next time, I’m going to be covering Chapter 5 (Let’s Get Functional). This is because the contents of Chapter 4 cover object-oriented programming in F#, and I’d rather get functional-first thinking drilled into my head before learning how to do OO stuff in another language. Hopefully this will keep my brain progressing down the road towards functional thinking!