Typestate-oriented programming in F*

The Flash

The meaning of this blog post is to investigate the possibility of doing typestate-oriented programming in F*.

1. Short introduction to F*

F* (pronounced “f star”) is a new functional programming language with refinement types, effect types and incremental proving. Cool! So what does that mean?

1.1 Incremental proving

You don’t have to prove anything in F* - its default is to assume ML-like effects and types, like in OCaml and F#. But you can prove a lot of things. Some things are even proved for you. Take a simple add function:

let add x y =
    x + y

The type of this function in OCaml or F# would be int -> int -> int, meaning a function that takes two integers and returns an integer. And that’s more or less the end of a traditional functional type-system (for this simple function). F* lets you go further. Let’s investigate that.

The most basic type of add in F* is

val add : int -> int -> int

This is the same as in OCaml/F#. We can also name the arguments:

val add : x:int -> y:int -> result:int

What happens if we let F* infer the type automatically?

$ ./bin/fstar.exe add.fst --log_types --print_effect_args
[...]
let add : (x:int -> y:int -> Tot int)

Almost the same, but with a Tot before the int. Tot means Total, meaning that add is a total function. From the tutorial:

any expression that is inferred to have type-and-effect Tot t, is guaranteed (provided the computer has enough resources) to evaluate to a t-typed result, without entering an infinite loop; reading or writing the program’s state; throwing exceptions; performing input or output; or, having any other effect whatsoever.

So from the effect Tot, we know that add has no side-effects, that it will terminate and not throw any exception. That’s a pretty good improvement from just int already!

Let’s take the next step:

val add : x:int -> y:int -> Tot (result:int{result == x + y})

== and = are in fact not the same things in F*, where the former is on type level, the latter on value level¹.

Here we add the Tot effect, and also a refinement on the return value of the function: {result == x + y}. If this signature type-checks, we have successfully (and trivially) proven that add does indeed return the sum of x and y.

$ ./bin/fstar.exe add.fst
Verified module: Add (168 milliseconds)
All verification conditions discharged successfully

Oh joy! It’s possible to scramble the function a bit without disturbing the proof:

val add : x:int -> y:int -> Tot (result:int{result == x + y})
let add x y =
    x + y -x + x  (* Still checks out *)

If the function contains an error, F* will show an error message that points to the correct line in the program:

val add : x:int -> y:int -> Tot (result:int{result == x + y})
let add x y =
    x + y - 1

$ ./bin/fstar.exe add.fst
./add.fst(7,4-7,13): (Error) Subtyping check failed; expected type (result#13:int{(eq2 result@0 (op_Addition x y))}); got type int (see ./add.fst(3,44-3,59))

In this way you can choose which part of your program you want to prove, and how much.

Can you write a function that is guaranteed to only return prime numbers? Tip: It's possible to use functions in the refinement clause, as long as they are total.

Refinement types (also in the tutorial) is a way to say that a type is not only an integer or a string, but an integer withint a certain interval or a string of a certain length. To be more precise, it defines a predicate for a type. We saw this in the return type above for the function add, using the notation {} after a type. A common example of refinement types is the definition of the natural numbers, n:int{n >= 0}, but any other properties are indeed possible, e.g. files that are open or closed, as we will see below.

1.3 Effect types with pre- and post-conditions

F* has a system of effects using monads. The effect we are interested in here is the STATE effect, used for proving stateful computations that writes and reads to the heap. Proving is done by writing pre- and post-conditions:

ST unit
    (requires (fun heap -> True))
    (ensures (fun heap result heap' -> True))

True and true are in fact not the same things in F*, where the former is on type level, the latter on value level¹.

Let’s inspect this in more details.

ST is a short-hand for a specific version of the STATE monad. unit signifies that the function that uses this signature returns nothing, just like in OCaml, or like void in a couple of other languages. requires is the pre-condition of the effect. ensures is then of course the post-condition. As you can see, they both take a function as an argument. In those functions, you can use some ready made operations to control what’s happening on the heap:

val sel : #a:Type -> heap -> ref a -> Tot a
val upd : #a:Type -> heap -> ref a -> a -> Tot heap
val contains : #a:Type -> heap -> ref a -> a -> Tot bool

where sel means “select” and upd means “update”.

Let’s make a simple example.

val only_add_to_ten : n:ref int -> ST int
    (requires (fun heap -> (sel heap n) == 10))
    (ensures (fun heap result heap' -> modifies_none heap heap'))
let only_add_to_ten x = 
    !x + 10

let () = 
    let a = 5 in
    let b = 5 in
    let x = alloc (a + b) in
    only_add_to_ten x

Here we have a function that will only compile if its argument is a reference to an integer with value 10, ensured by the pre-condition (sel heap n) == 10. Its post-condition states that nothing in the heaps are modified, ensured by the built-in function modifies_none. The bang before the x in the function body means de-referencing, taking the value at x.

In the small test, we use variables a and b to calculate x. alloc is used to allocate memory for a reference on the heap. If a and b do not equal to 10, the F* compiler will show this error message:

(Error) assertion failed (see ./onlyaddtoten.fst(7,27-7,45))

You might wonder what’s the difference between this and, say, using assert or throwing an exception, but remember that this is done during compile time! It’s now possible to enforce client code of a module to execute functions in a certain order.

1.4 Semi-automatic proving

Above, the F* compiler could prove that only_add_to_ten would only accept references to integer 10. If the function was used in any other way, the program would not compile. Still, we as programmers didn’t have to provide F* with any manual proofs or tactics - the system did it automatically. How? By using the SMT solver Z3. So how does F* disperse the proofs to Z3, and how do you know what can and what cannot be proved automatically? That knowledge is way beyond me. I can only refer to the academic papers written by the F* team.

For an interesting case of how F* can prove termination, see the example with the fibonacci function in the tutorial (chapter 5).

2. Typestate-oriented programming

Typestate-oriented programming is a programming paradigm outlined by Jonathan Aldrich et al in the paper Typestate-oriented programming. The code snippet below describes the intuition behind the concept pretty well:

state File {
  public final String filename;
}

state OpenFile extends File {
  private CFilePtr filePtr;
  public int read() { ... }
  public void close() [OpenFile>>ClosedFile] { ... }
}

state ClosedFile extends File {
  public void open() [ClosedFile>>OpenFile] { ... }
}

Instead of classes, we declare states. A state is much like a class, but an object can change state during its lifetime; the state of the object is mirrored in the type-system. As you can see in the code above, it’s not possible to open an already opened file, and not possible to close a closed file - the methods do not exist in those states. This check is thought to be done during compile-time. With this technique, a whole new area of bugs are available for compile-time checking.

Here’s a small use-case example, also from the paper:

int readFromFile(ClosedFile f) {
  openHelper(f);
  int x = computeBase() + f.read();
  f.close();
  return x;
}

In clear text, this function accepts a closed file, opens it, read a number from it, add the number to a dynamically computed base, close the file and returns the sum. The function is also guaranteed to leave file f closed at the end - if the file would be open, the signature would look like this:

int readFromFile(ClosedFile>>OpenFile f)

signaling that readFromFile changes the state of f from closed to open.

Do we know if computeBase has a reference to file f? How does that matter?

The point of readFromFile is of course to show that f.read() would not compile if not openHelper(f) was called first. In other words, the interface of the state (class) file is preserved, even when it demands a certain order of function calls.

2.1 Using F* to emulate typestate-oriented programming

Let’s head over to F* and use our new knowledge to mimic the behaviour of typestate-oriented programming. First, we need a simple enum to say if a file is open or closed:

type state = 
  | Open
  | Closed

We will save the state in a file record type:

type file = {
    name: string;
    state: ref state;
}

Note that state is a reference, that is, a mutable field, while name is immutable. In the real world, our file type would also include a file handler, but we will skip that part since it’s not relevant for our example.

The next thing we need is a predicate, that is, a function on type level that returns True or False. We will use this predicate in our functions to say that a file has to be either opened or closed.

type isClosed file heap = (sel heap file.state) == Closed
type isOpen file heap = (sel heap file.state) == Open

Now we can write our own openHelper, that will take a closed file and open it:

val openHelper : file:file -> ST unit
    (requires (fun heap -> isClosed file heap))
    (ensures (fun heap result heap' -> isOpen file heap'))
let openHelper file =
    file.state := Open

The operator := means updating a mutable variable with a new value.

Based on earlier examples, the meaning of this function should be pretty clear.

The read function takes an open file and guarantees to leave it open at the end. Again the body of the function is a dummy, that in real life actually would read from a file.

val read : file:file -> ST int
    (requires (fun heap -> isOpen file heap))
    (ensures (fun heap result heap' -> isOpen file heap'))
let read file =
    13

computeBase is a simple, total function:

val computeBase : unit -> Tot int
let computeBase () =
    12

We can now complete our version of readFromFile:

val readFromFile : file:file -> ST int
    (requires (fun heap -> isClosed file heap))
    (ensures (fun heap result heap' -> isClosed file heap'))
let readFromFile file =
    openHelper file;
    let x = computeBase () + read file in
    file.state := Closed;
    x

As you can see, this function takes a file and returns an integer. Further more, we’re using the ST effect. From that we know that it modifies the heap. Taking a look at the pre- and post-conditions, they require that the file is closed both before and after the function.

Full code listing.

3. Discussion

The big question is how much typing overhead is required to ensure a typestate interface in F*. The example I’ve used is small. It shows that it’s definitely feasible, but how would it look like in a bigger project? Would function signatures be cluttered with pre- and post-conditions? Especially functions at top-level might suffer from this. On the other hand, you can write predicates that increase readability and reusability.

Below I list some pros and cons about using F* instead of a Java-like programming language with support for typestate.

Pros:

The F* type-system is more versatile than only typestate-oriented programming.
Even though it’s a highly complex system, it’s not nearly as hard to learn and use as e.g. Coq.
You can choose how much to prove or not, which decreases the burden of verification.

Cons:

A language dedicated to typestate-oriented programming would have language features more easily available.
A Java-like language with typestate-oriented programming might be more attractive to the industry than an ML dialect, and more easy for people to pick up; it would have a shorter cognitive distance.

The biggest pro of all is of course that F* already exists, while the typestate-oriented programming is merely a proposal.

4. Further reading

The F* tutorial is a great starting point if you want to learn more. The F* home page has lots of links to papers about the system. You might want to google “dependent types” if you want a more theoretical understanding and want to find similar systems, like Idris, Agda or Coq.

I’ve tried to find more information about typestate-oriented programming. They made a prototype language, dynamically typed, called Plaid. It seems not to be under development anymore.

5. Notes

^{1. Not completely exact - fully dependent type-systems does not have a hierarchy of levels like value - type - kind; everything is a term. User SkeThuVe from #fstar IRC remarks: “== is propositional equality which always exists whereas = is decidable equality which might not be defined”. The tutorial might be out of date in this regard, as of 2017-02-26.}