Advanced Elm #2: Modular arithmetic

2019-11-23elmadvanced_elmmath

Introduction

Advanced Elm is a project to explore some of the more advanced corners of the Elm language.

I have some objectives for Advanced Elm:

  1. showing that Elm, while simple and perfect for beginners, is powerful enough for a lot of advanced scenarios,
  2. trying to unravel how to use advanced language features like phantom types and extensible records,
  3. trying to shed some light on how I come up with complex types and simple APIs.

In this post I’ll try to answer a seemingly simple question: why do we have to add a default branch when we use case ... of when using modBy n with a fixed n?

The simple, and sane, answer is that the in Elm we have no way to tell the compiler that the only possible results are 1, 2, 3, … n-1.

The How To style answer is that it’s actually possible to write a safeModBy function where we have exactly n branches when we do safeModBy n.

As with elm-codec, rather than explaining how the current code works I’ll reimplement (parts of) it from scratch to emphasize the design process.

A word of caution: this is mostly an excuse to play with complex types, the code is not optimized, the API is insane and the idea is bad. It’s still interesting though.

Notation

[a, b] = [a, a+1, a+2, ..., b].

APIs

To have n branches we must convince the compiler that our type has exactly n possible values, “make impossible states impossible” style.

This immediately rules out Int, because from the compiler point of view it always has “infinite” possible values, we’ll probably need a custom type.

When the road ahead is unclear it’s often useful to write some examples.

What are the possible results of modBy n? Let’s reason about small values for n:

From this we can understand that:

  1. the shape is regular, but a bit different from what we’re used to,
  2. considering that the shape (hence, the type) of the result depends on the value we pass to modBy, we will not be able to use an Int as argument.

Digits

We will need at the very least a different type for every single digit, because every possible modBy with a single digit number will need a different type (unless we create an API that uses Peano-style numbers, but it would be unwieldy and horrible).

Let’s try and write some (hypotetical) code.

type D0 = D0
type D1 = D0 | D1
type D2 = D0 | D1 | D2
...
type D9 = D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9

This is not legal code! Every variant needs a different name. I don’t have any ideas that don’t produce decent names, the least-bad I’ve found is:

type D0 = D0_0
type D1 = D0_1 | D1_1
type D2 = D0_2 | D1_2 | D2_2
...
type D9 = D0_9 | D1_9 | D2_9 | D3_9 | D4_9 | D5_9 | D6_9 | D7_9 | D8_9 | D9_9

For two digits numbers we cannot create a type like type TwoDigits a b = TD a b, because we need the second digit to depend on the value of the first one. This seems to point toward dependent types, which Elm doesn’t have, but it’s actually doable within Elm’s typesystem.

To keep the code a bit more compact, from now on I’ll use base 4 instead of base 10. Mutatis mutandis everything will work for base 10.

Let’s write some pseudo-Elm for the type of two digit numbers up to 22 (in base 4, it would be 10 in base 10).

type UpTo22
    =   0 |   1 |   2 |   3
    | 1 0 | 1 1 | 1 2 | 1 3
    | 2 0 | 2 1 | 2 2

With some grouping, informed by the analysis we did above for multi-digit numbers, we reach (D stands for “digit”):

type D2 = D0_2 | D1_2 | D2_2
type D3 = D0_3 | D1_3 | D2_3 | D3_3

type UpTo22
    = Zero D3
    | One D3
    | Three D2

So if we want to create the UpTo2X type we can imagine it would be:

type UpTo2X x
    = Zero D3
    | One D3
    | Two x

Analyzing UpTo2X we can see that:

  1. if we replace all the D3 with ()s, UpTo2X () gets the same shape as D3,
  2. if we replace all the D3 with UpTo3X D3, UpTo2X (UpToyX z) is type type of numbers up to 2yz.

This suggest a further generalization:

type D0 x y
    = D0_0 y

type D1 x y
    = D0_1 x
    | D1_1 y

type D2 x y
    = D0_2 x
    | D1_2 x
    | D2_2 y

type D3 x y
    | D0_3 x
    | D1_3 x
    | D2_3 x
    | D3_3 y

This is finally enough to represent the numbers up to n, for every n. So if we have a number of the form abc...pqr, we’ll choose Da x y, filling x with the type of numbers up to 333...3 (with the correct length), and y with the type of numbers up to bc...pqr.

Let’s enumerate the values of the type D1 (D3 (D3 () ()) (D3 () ())) (D2 (D3 () ()) (D1 () ())) (this type alone should convince you pretty quickly that what we’re doing here, while interesting, is completely impractical).

Let’s write some intermediate definitions to avoid getting lost:

type alias UpTo1 = D1 () ()
type alias UpTo3 = D3 () ()
type alias UpTo21 = D2 UpTo3 UpTo1
type alias UpTo33 = D3 UpTo3 UpTo3
type alias UpTo121 = D1 UpTo33 UpTo21

We now need two things:

Before we go on: the most common variants will be the D3 ones, so let’s simplify that definition:

type D x y
    = D0 x
    | D1 x
    | D2 x
    | D3 y

modBy

So what will the API for modBy be? We have found out that it will need to return a value of type Dx y z, and the type of the result will depend on the modulus, so the modulus itself needs to encode something about the type. We’ll try and keep the input number an Int.

modBy : Modulus r -> Int -> r

Building moduli

Looking at the types above we can distinguish between the last digit (which is Dx () ()), and the other digits (which are Dx 333....333 partial).

Thinking about the types it’s obvious that we can’t use lists or Ints to build our modulus, the type changes with every digit.

For the last digits’ we can expose simple functions:

f0 : Modulus (D0 () ())
f1 : Modulus (D1 () ())
f2 : Modulus (D2 () ())
f3 : Modulus (D () ())

For the other digits, we’ll have a function dx that gets an input of type Modulus (Dy z w) and we want to produce Modulus (Dx (D z z) (Dy z w)). The problem is that Elm’s functions cannot be generic like that (generic in Dx), so we need to either write a dx for every y (this would mean 100 functions for base 10) or we need to keep track of z in some additional way (and just put a type parameter in place of Dy z w).

How do we keep track of another type? We add a type parameter! So instead of having Modulus x we’ll have Modulus x y where x is the “full” part (which will be of the form D (D (D ...) (D ...)) (D (D ...) (D ...))) and y the Dy z w part.

f0 : Modulus () (D0 () ())
f1 : Modulus () (D1 () ())
f2 : Modulus () (D2 () ())
f3 : Modulus () (D () ())

d0 : Modulus f p -> Modulus (D f f) (D0 f p)
d1 : Modulus f p -> Modulus (D f f) (D1 f p)
d2 : Modulus f p -> Modulus (D f f) (D2 f p)
d3 : Modulus f p -> Modulus (D f f) (D f p)

Implementation

We now need to implement the modBy, fX and dX functions. Our current API builds the modulus digit by digit, but (to my knowledge, correct me if I’m wrong), there are no algorithms for calculating the result the same way. On the other hand we’ll have the input as an Int. To avoid reimplementing the wheel, let’s see if we can use elm/cores modBy in the implementation.

To do this we need two ingredients:

  1. the modulus as an Int (this looks easy),
  2. a way to turn the result of modBy modulus input into a Dx (...) (...) value (this looks harder).

Calculating the Int modulus

We can try simply storing the modulus inside Modulus x y:

type alias Modulus x y =
    { modulus : Int
    , ...
    }

(In this article I’ll write Modulus as a type alias, in a library we would use an opaque type).

But this quickly hits a wall: we are building Modulus values from the least significant digit to the most significant one, so when we add a zero the Int cannot track it. We could memorize the number “in reverse” and then invert the digits inside modBy, but this would just trade the problem of leading zeroes with a problem with trailing zeroes. Or, we can try “reading” the construction from the other side.

If we write d1 <| d2 <| f3 (which means a modulus of 123), when we read it from left to right we can interpret it as “write a 1, then multiply by 10 and add 2, then multiply by 10 and add 3”, so how can we express the “then multiply by 10 and add X” in a way that allows us to invert the order?

“given the previous intermediate value, multiply by 10 and add X” sounds a lot like \v -> v * 10 + x, and indeed representing the Int modulus as a function allows us to invert the flow!

f3 =
    { modulus = \v -> v * 10 + 3
    , ...
    }

d1 p =
    { modulus = \v -> (v * 10 + 1) |> p.modulus
    }

d2 p =
    { modulus = \v -> (v * 10 + 2) |> p.modulus
    }

So in the dX case we first take what’s “arriving from the left”, multiply and add, and then pass it on “to the right” for the next digits.

So now (d1 <| d2 <| f3).modulus is

(d1 <| d2 <| f3).modulus
(d1 <| d2 <| { modulus = \v -> v * 10 + 3}).modulus
(d1 <| \p -> { modulus = \v -> (v * 10 + 2) |> p.modulus } <| { modulus = \v -> v * 10 + 3}).modulus
(d1 <| { modulus = \v -> (v * 10 + 2) |> ({ modulus = \w -> w * 10 + 3}).modulus }).modulus
(d1 <| { modulus = \v -> (v * 10 + 2) |> \w -> w * 10 + 3}).modulus
(d1 <| { modulus = \v -> (v * 10 + 2) * 10 + 3}).modulus
(d1 <| { modulus = \v -> v * 100 + 23}).modulus
(\p -> { modulus = \v -> (v * 10 + 1) |> p.modulus} <| { modulus = \v -> v * 100 + 23}).modulus
({ modulus = \v -> (v * 10 + 1) |> ({ modulus = \w -> w * 100 + 23}).modulus}).modulus
({ modulus = \v -> (v * 10 + 1) |> \w -> w * 100 + 23}).modulus
\v -> (v * 10 + 1) |> \w -> w * 100 + 23
\v -> (v * 10 + 1) * 100 + 23
\v -> v * 1000 + 123

This also let us see how to recover the Int modulus inside modBy: we pass a simple 0 to the Modulus x y we are given.

Converting an Int into the custom type

To be completed…

Epilogue

I hope that this post helped you understand how bend spoons the Elm typesystem to achieve seemingly-impossible compile time safety. I intend to continue publishing posts in the “Advanced Elm” spirit, and I think I’ll talk about crazy compile-time checking of tree/graph structure. Let me know what would you like to see!

Please send me feedback via Slack, Discourse or e-mail!