I’m pleased to announce a new super-major release of dejafu, a library for testing concurrent Haskell programs.

While there are breaking changes, common use-cases shouldn’t be affected too significantly (or not at all). There is a brief guide to the changes, and how to migrate if necessary, on the website.

dejafu is a unit-testing library for concurrent Haskell programs. Tests are deterministic, and work by systematically exploring the possible schedules of your concurrency-using test case, allowing you to confidently check your threaded code.

HUnit and Tasty bindings are available.

dejafu requires your test case to be written against the `MonadConc`

typeclass from the concurrency package. This is a necessity, dejafu cannot peek inside your `IO`

or `STM`

actions, so it needs to be able to plug in an alternative implementation of the concurrency primitives for testing. There is some guidance for how to switch from `IO`

code to `MonadConc`

code on the website.

If you really need `IO`

, you can use `MonadIO`

- but make sure it’s deterministic enough to not invalidate your tests!

Here’s a small example reproducing a deadlock found in an earlier version of the auto-update library:

> :{ autocheck $ do auto <- mkAutoUpdate defaultUpdateSettings auto :} [fail] Successful [deadlock] S0--------S1-----------S0- [fail] Deterministic [deadlock] S0--------S1-----------S0- () S0--------S1--------p0--

dejafu finds the deadlock, and gives a simplified execution trace for each distinct result. More in-depth traces showing exactly what each thread did are also available. This is using a version of auto-update modified to use the `MonadConc`

typeclass. The source is in the dejafu testsuite.

The highlights for this release are setup actions, teardown actions, and invariants:

**Setup actions**are for things which are not really a part of your test case, but which are needed for it (for example, setting up a test distributed system). As dejafu can run a single test case many times, repeating this work can be a significant overhead. By defining this as a setup action, dejafu can “snapshot” the state at the end of the action, and efficiently reload it in subsequent executions of the same test.**Teardown actions**are for things you want to run after your test case completes, in all cases, even if the test deadlocks (for example). As dejafu controls the concurrent execution of the test case, inspecting shared state is possible even if the test case fails to complete.**Invariants**are effect-free atomically-checked conditions over shared state which must always hold. If an invariant throws an exception, the test case is aborted, and any teardown action run.

Here is an example of a setup action with an invariant:

> :{ autocheck $ let setup = do var <- newEmptyMVar registerInvariant $ do value <- inspectMVar var when (value == Just 1) $ throwM Overflow pure var in withSetup setup $ \var -> do fork $ putMVar var 0 fork $ putMVar var 1 tryReadMVar var :} [fail] Successful [invariant failure] S0--P2- [fail] Deterministic [invariant failure] S0--P2- Nothing S0---- Just 0 S0--P1--S0--

In the `[invariant failure]`

case, thread 2 is scheduled, writing the forbidden value “1” to the MVar, which terminates the test.

Here is an example of a setup action with a teardown action:

> :{ autocheck $ let setup = newMVar () teardown var (Right _) = show <$> tryReadMVar var teardown _ (Left e) = pure (show e) in withSetupAndTeardown setup teardown $ \var -> do fork $ takeMVar var takeMVar var :} [pass] Successful [fail] Deterministic "Nothing" S0--- "Deadlock" S0-P1--S0-

The teardown action can perform arbitrary concurrency effects, including inspecting any mutable state returned by the setup action.

Setup and teardown actions were previously available in a slightly different form as the `dontCheck`

and `subconcurrency`

functions, which have been removed (see the migration guide if you used these).

Haskell typeclass instances have two parts: some *constraints*, and the *instance head*:

newtype WrappedFunctor f a = WrappedFunctor (n a) instance Functor f => Functor (WrappedFunctor f) where -- ^^^^^^^^^ constraints -- ^^^^^^^^^^^^^^^^^^^^^^^^^^ head fmap f (WrappedFunctor fa) = WrappedFunctor (fmap f fa)

More specifically, the head is of the form `C (T a1 ... an)`

, where `C`

is the class, `T`

is a type constructor, and `a1 ... an`

are distinct type variables.The `FlexibleInstances`

extension relaxes this restriction a little, allowing some (or all) of the `a1 ... an`

to be arbitrary types, as well as type variables.

When the type checker needs to find an instance, it does so purely based on the head, constraints don’t come into it. The instance above means “whenever you use `WrappedFunctor f`

as a functor, *regardless of what f is and even if we don’t know what it is yet*, then you can use this instance”, and a type error will be thrown if whatever concrete type

`f`

is instantiated to doesn’t in fact have a functor instance.You might think that, if we didn’t define the instance above and instead defined this one:

instance Functor (WrappedFunctor Maybe) where fmap f (WrappedFunctor fa) = WrappedFunctor (fmap f fa)

…and then used a `WrappedFunctor f`

as a functor, that the type checker would infer `f`

must be `Maybe`

. This is not so! Typeclass inference happens under an “open world” approach: just because only one instance is known *at this time* doesn’t mean there won’t be a second instance discovered later. Prematurely selecting the instance for `WrappedFunctor Maybe`

could be unsound.

In GHC Haskell, we can express a constraint that two types have to be equal. For example, this is a weird way to check that two values are equal:

-- this requires GADTs or TypeFamilies funnyEq :: (Eq a, a ~ b) => a -> b -> Bool funnyEq = (==)

We only have a constraint `Eq a`

, not `Eq b`

, but because of the `a ~ b`

constraint, the type checker knows that they’re the same type:

> funnyEq 'a' 'b' False > funnyEq True True True > funnyEq True 'b' <interactive>:22:1: error: • Couldn't match type ‘Bool’ with ‘Char’ arising from a use of ‘funnyEq’ • In the expression: funnyEq True 'b' In an equation for ‘it’: it = funnyEq True 'b'

Let’s put the two together now. Let’s throw away the two instances we defined above, and now look at this one:

instance (f ~ Maybe) => Functor (WrappedFunctor f) where fmap f (WrappedFunctor fa) = WrappedFunctor (fmap f fa)

This instance means “whenever you use `WrappedFunctor f`

as a functor, *regardless of what f is and even if we don’t know what it is yet*, then you can use this instance”, and a type error will be thrown if

`f`

cannot be instantiated to `Maybe`

. This is different to the instance `Functor (WrappedFunctor Maybe)`

!If we have

`instance Functor (WrappedFunctor Maybe)`

:> :t fmap (+1) (WrappedFunctor (pure 3)) :: (Num b, Applicative f, Functor (WrappedFunctor f)) => WrappedFunctor f b

If we have

`instance (f ~ Maybe) => Functor (WrappedFunctor f)`

:> :t fmap (+1) (WrappedFunctor (pure 3)) :: Num b => WrappedFunctor Maybe b

With the latter, we get much better type inference. The downside is that this instance overlaps any more concrete instances, so we couldn’t (for example) define an instance for `WrappedFunctor Identity`

as well.

But if you only need one instance, it’s a neat trick.

Here’s a concrete example from the dejafu-2.0.0.0 branch. I’ve introduced a `Program`

type, to model concurrent programs. There’s one sort of `Program`

, a `Program Basic`

, which can be used as a concurrency monad (a `MonadConc`

) directly. The instances are defined like so:

instance (pty ~ Basic, MonadIO n) => MonadIO (Program pty n) where -- ... instance (pty ~ Basic) => MonadTrans (Program pty) where -- ... instance (pty ~ Basic) => MonadCatch (Program pty n) where -- ... instance (pty ~ Basic) => MonadThrow (Program pty n) where -- ... instance (pty ~ Basic) => MonadMask (Program pty n) where -- ... instance (pty ~ Basic, Monad n) => MonadConc (Program pty n) where -- ...

If instead the instances has been defined for `Program Basic n`

, then the type checker would have complained that the `pty`

parameter is (in many cases) polymorphic, and not use these instances. This means every single use of a `Program pty n`

, where `pty`

was not otherwise constrained, would need a type annotation. By instead formulating the instances this way, the type checker *knows* that if you use a `Program pty n`

as a `MonadConc`

, then it must be a `Program Basic n`

.

This has turned a potentially huge breaking change, requiring everyone who uses dejafu to add type annotations to their tests, into something which just works.

]]>You can express a bunch of interesting problems in terms of ILP, and there are solvers which do a pretty good job of finding good solutions quickly. One of those interesting problems is scheduling, and there’s a nice write-up of how PyCon uses an ILP solver to generate schedules.

Another problem is rota generation, which is after all just a sort of scheduling. I have implemented a rota generator for GOV.UK’s technical support, and this memo is about how it works.

What is a rota?

Well, there are a bunch of time slots \(\mathcal T\), roles \(\mathcal R\), and people \(\mathcal P\). We can represent the assignments as a 3D binary matrix:

\[ \begin{split} A_{tpr} = \begin{cases} 1,&\text{ if, in time }t\text{, person }p\text{ is scheduled in role }r\\ 0,&\text{otherwise} \end{cases} \end{split} \]Next we need some constraints on what a valid rota looks like.

For every pair of slots and roles, the sum of the assignments should be 1:

\[ \forall t \in \mathcal T \text{, } \forall r \in \mathcal R \text{, } \sum_{p \in \mathcal P} A_{tpr} = 1 \]For every pair of slots and people, the sum of the assignments should be 0 (if they’re not assigned anything) or 1 (if they are):

\[ \forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } \sum_{r \in \mathcal R} A_{tpr} \in \{0, 1\} \]We might give our people time off (how generous!), so there’s no point in generating a rota where someone gets scheduled during their time off.

Given a function \(leave : \mathcal P \mapsto 2^{\mathcal T}\), which gives the set of slots someone is on leave, then: for every pair of slots and people, all roles should be unassigned if the slot is in \(leave(p)\):

\[ \forall p \in \mathcal P \text{, } \forall t \in leave(p) \text{, } \forall r \in \mathcal R \text{, } A_{tpr} = 0 \]We might also have a maximum number of shifts any one person can be assigned to in a rota.

Given such a limit \(M\), then: for every person, the sum of the assignments across *all* slots should be less than or equal to \(M\):

If all we wanted was constraints, then we could use a SAT solver, and it would probably do a better job than an ILP solver as a SAT solver is *built* for solving boolean constraints! But there’s one thing which is more easily expressible to an ILP solver than a SAT solver: objective functions to optimise.

Given our above constraints, we will get *a* rota, but it might not be very fair. One person might be scheduled ten times, and another not at all. We can encourage the solver to be more fair by providing it with an objective which results in more people being assigned.

First we’ll need an auxiliary variable to check whether someone has been assigned at all:

\[ \begin{split} X_p = \begin{cases} 1,&\text{ if person }p\text{ has any assignments}\\ 0,&\text{otherwise} \end{cases} \end{split} \]We can use two new constraints to set the value of these \(X\) variables:

\[ \forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } \forall r \in \mathcal R \text{, } X_p \geqslant A_{tpr} \] \[ \forall p \in \mathcal P \text{, } X_p \leqslant \sum_{t \in \mathcal T} \sum_{r \in \mathcal R} A_{tpr} \]As both \(A_{tpr}\) and \(X_p\) are binary variables, this means \(X_p\) will be 1 if (first constraint) and only if (second constraint) person \(p\) has any assignments at all.

We then give an objective to the solver:

\[ \textbf{maximise } \sum_{p \in \mathcal P} X_p \]The only way to increase the value of the sum is by assigning roles to more people, so that is what the solver will do.

PuLP is a Python library for interfacing with ILP solvers. It provides a somewhat nicer interface than directly dealing with the matrices and vectors on which ILP solvers operate, letting us express constraints as equations much like I have here.

Here’s how to express the above with PuLP:

import pulp # Parameters slots = 0 people = [] roles = [] leave = {} max_assignments_per_person = 0 # Create the 'problem' problem = pulp.LpProblem("rota generator", sense=pulp.LpMaximize) # Create variables assignments = pulp.LpVariable.dicts("A", ((slot, person, role) for slot in range(slots) for person in people for role in roles), cat="Binary") is_assigned = pulp.LpVariable.dicts("X", people, cat="Binary") # Add constraints for slot in range(slots): for role in roles: # In every time slot, each role is assigned to exactly one person problem += pulp.lpSum(assignments[slot, person, role] for person in people) == 1 for person in people: # Nobody is assigned multiple roles in the same time slot problem += pulp.lpSum(assignments[slot, person, role] for role in roles) <= 1 for person, bad_slots in leave.items(): for slot in bad_slots: for role in roles: # Nobody is assigned a role in a slot they are on leave for problem += assignments[slot, person, role] == 0 for person in people: # Nobody works too many shifts problem += pulp.lpSum(assignments[slot, person, role] for slot in range(slots) for role in roles) <= max_assignments_per_person # Constrain 'is_assigned' auxiliary variable for slot in range(slots): for person in people: for role in roles: # If problem += is_assigned[person] >= assignments[slot, person, role] for person in people: # Only if problem += is_assigned[person] <= pulp.lpSum(assignments[slot, person, role] for slot in range(slots) for role in roles) # Add objective problem += pulp.lpSum(is_assigned[person] for person in people) # Solve with the Coin/Cbc solver problem.solve(pulp.solvers.COIN_CMD()) # Print the solution! for slot in range(slots): print(f"Slot {slot}:") for role in roles: for person in people: if pulp.value(assignments[slot, person, role]) == 1: print(f" {role}: {person}")

The quantifiers have become `for...in`

loops and the summations have become calls to `pulp.lpSum`

with a generator expression iterating over the values of interest, but other than that it’s fairly straightforward.

With the parameters:

slots = 5 people = ["Spongebob", "Squidward", "Mr. Crabs", "Pearl"] roles = ["Fry Cook", "Cashier", "Money Fondler"] leave = {"Mr. Crabs": [0,2,3,4]} max_assignments_per_person = 5

We get the output:

Slot 0: Fry Cook: Pearl Cashier: Squidward Money Fondler: Spongebob Slot 1: Fry Cook: Spongebob Cashier: Mr. Crabs Money Fondler: Pearl Slot 2: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Pearl Slot 3: Fry Cook: Spongebob Cashier: Pearl Money Fondler: Squidward Slot 4: Fry Cook: Squidward Cashier: Spongebob Money Fondler: Pearl

If you play around with this you might notice two things:

The rota you get is always the same.

If there is no rota which meets the constraints, you get rubbish out!

This is due to how Cbc works. If you try GLPK, a different solver, you’ll still get a deterministic rota, but if there isn’t one meeting the constraints you’ll (probably) get back an empty rota. Solving ILP in the general case is NP-complete, so solvers use heuristics. Both Cbc and GLPK are deterministic, but they differ in heuristics.

You can check the `problem.status`

to see if it’s solved or not:

if problem.status != pulp.constants.LpStatusOptimal: raise Exception("Unable to solve problem.")

Another way to make the solver go wrong is by having a wide range of values in your problem. I’m not sure why this can cause a problem, but it does.

A simple way to introduce randomisation is to add give the solver a randomly generated objective to maximise. For example, we can assign a score to every possible allocation, and try to maximise the overall score:

import random randomise = pulp.lpSum(random.randint(0, 1) * assignments[slot, person, role] for slot in range(slots) for person in people for role in roles)

As we want the actual objective function to take priority, scale it up:

# Add objective problem += pulp.lpSum(is_assigned[person] for person in people) * 100 + randomise

Now if we run the tool multiple times, we get different rotas:

$ python3 rota.py Slot 0: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Pearl Slot 1: Fry Cook: Pearl Cashier: Spongebob Money Fondler: Mr. Crabs Slot 2: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Pearl Slot 3: Fry Cook: Squidward Cashier: Spongebob Money Fondler: Pearl Slot 4: Fry Cook: Squidward Cashier: Pearl Money Fondler: Spongebob $ python3 rota.py Slot 0: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Pearl Slot 1: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Mr. Crabs Slot 2: Fry Cook: Spongebob Cashier: Squidward Money Fondler: Pearl Slot 3: Fry Cook: Pearl Cashier: Squidward Money Fondler: Spongebob Slot 4: Fry Cook: Squidward Cashier: Spongebob Money Fondler: Pearl

The downside to this approach is that we might accidentally generate a random objective which is really hard to maximise, making the solver do a lot of work when all we really want is an arbitrary solution.

The GOV.UK support rota is a bit more complex than the example above. A typical rota runs for 12 weeks, with 1 week being 1 slot, in the above parlance. There are two types of roles, and constraints about who can occupy which roles:

**In-hours support roles:***Primary in-hours*, must have been secondary in-hours at least three times.*Secondary in-hours*, must have been shadow at least two times.*Shadow*, must not have shadowed twice before. This role is optional.

**Out-of-hours support roles:***Primary on-call*, no special requirements.*Secondary on-call*, must have been primary on-call at least three timesThere’s an asymmetry there: the primary in-hours needs to be experienced, but the opposite is the case for on-call roles. This is intentional! If the primary on-call were more experienced, they would resolve every issue themselves and the less experienced one would never get to learn anything.

.

There are separate pools for each type: there are some people who can do in-hours support, some people who can do out-of-hours support, and some people who can do both.

To ensure individuals and teams aren’t over-burdened with support roles, there are some constraints about when people can be scheduled:

- Someone can’t be on in-hours support in two adjacent weeks.
- Two people on in-hours support in the same week (or adjacent weeks) can’t be on the same team.

And there is also a limit on the number of in-hours and out-of-hours roles someone can do across the entire rota.

The objective function is a bit more complex too:

- As above, we want to maximise the number of people on the rota.
- We want to maximise the number of weeks when the secondary in-hours has done it fewer than three times.
- We want to maximise the number of weeks when the primary out-of-hours has done it fewer than three times.
- We want to maximise the number of weeks with a shadow.

I won’t go through all of the constraints, as they’re mostly more of the same, but this is an example of a particularly interesting constraint, as it’s pretty hard to implement.

The logic here is simple, but the language of ILP is very limited: you can’t directly express `if...then`

-style constraints between variables. Now, this is fine if we want to limit the primary in-hours role to people who have been secondary in-hours at least three times *before this rota period*, as we can statically determine that:

But that’s too restrictive. If someone has been secondary in-hours two times before the start of the rota, and is secondary in-hours in one week, they should be able to be primary in-hours in subsequent weeks.

To work around this we’ll need some auxiliary variables.

Firstly, let’s record how many times someone has been a secondary at the start of each slot:

\[ \begin{split} S_{tp} = \begin{cases} \text{the number of times person }p\text{ has been a secondary before the start of this rota},&\text{ if }t = 0\\ S_{t-1,p} + A_{t-1,p,\text{secondary}},&\text{otherwise} \end{cases} \end{split} \]Unlike previous variables we’ve seen, this is not a binary variable. But it is still an integral variable. Translating the above into ILP constraints is straightforward:

\[ \forall p \in \mathcal P \text{, } S_{0,p} = \text{the number of times person }p\text{ (etc)} \] \[ \forall t \geqslant 1 \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } S_{tp} = S_{t-1,p} + A_{t-1,p,\text{secondary}} \]Now we can use a trick I found to encode conditionals in ILP. The trick is to introduce an auxiliary variable, \(D \in \{0,1\}\), and use constraints to ensure that \(D = 0\) when the condition goes one way, and \(D = 1\) when it goes the other.

Here is how we encode `if X > k then Y >= 0 else Y <= 0`

, where `k`

is constant:

Here \(X\) and \(Y\) are the ILP variables from our conditional, \(D\) is the auxiliary variable we introduced, and \(m\) is some large constant, way bigger than the possible maximum values of \(X\) or \(Y\). Let’s walk through this, firstly here’s the case where \(D = 0\):

\[ \begin{align} 0 &\lt X - k \\ 0 &\leqslant Y \\ X - k &\leqslant m \\ Y &\leqslant m \end{align} \]Because \(m\) is a large constant, the bottom two constraints are trivially true, so they can be removed. With a little rearranging, we have:

\[ \begin{align} k &\lt X \\ 0 &\leqslant Y \\ \end{align} \]So if \(D = 0\), \(X\) is strictly greater than \(k\) (the condition is true), and \(Y \geqslant 0\). That’s the true branch sorted!

Now let’s look at the \(D = 1\) branch:

\[ \begin{align} 0 &\lt X - k + m \\ 0 &\leqslant Y + m \\ X - k &\leqslant 0 \\ Y &\leqslant 0 \end{align} \]Because \(m\) is a large constant, this time we can get rid of the first two constraints. With a little rearranging, we get:

\[ \begin{align} X &\leqslant k \\ Y &\leqslant 0 \end{align} \]So if \(D = 1\), \(X\) is not strictly greater than \(k\) and \(Y\) is at most zero. Remember, the real “\(Y\)” we’re using is an \(A_{tpr}\) value, which is a binary value, so the overall effect is to specify that it must be zero. Adding a constraint \(Y \geqslant 0\) would do the same job.

Each conditional needs a fresh \(D\) variable. So adding these conditionals in results in a lot of extra variables and constraints:

\[ \begin{alignat*}{4} &\forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } & 0 &\lt S_{tp} - 2 + 999 \times D_{tp} \\ &\forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } &0 &\leqslant A_{tp,\text{primary}} + 999 \times D_{tp} \\ &\forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } &S_{tp} - 2 &\leqslant 999 \times (1 - D_{tp}) \\ &\forall t \in \mathcal T \text{, } \forall p \in \mathcal P \text{, } &A_{tp,\text{primary}} &\leqslant 999 \times (1 - D_{tp}) \end{alignat*} \]Here 2 has been substituted for \(k\), as someone needs to have been a secondary at least three times to be a primary; and 999 has been substituted for \(m\), which is larger than the number of secondary shifts someone could actually have done.

Let’s cover one more type of constraint: not over-burdening teams by taking all of their members away to be on support at once. This one is pretty simple, but does require a bit more information about the people, specifically, what team they’re on.

Given a function \(team : \mathcal P \mapsto 2^{\mathcal P}\), which gives the set of people on the same team as another, then: for every pair of slots and people, there should be no overlap in the in-hours assignments if the two people are on the same team:

\[ \forall t \in \mathcal T \text{, } \forall p_1 \in \mathcal P \text{, } \forall p_2 \neq p_1 \in team(p_1) \text{, } \\ \forall r_1 \in \{\text{primary}, \text{secondary}, \text{shadow}\} \text{, } \\ \forall r_2 \in \{\text{primary}, \text{secondary}, \text{shadow}\} \text{, } \\ A_{t,p_1,r_1} + A_{t,p_2,r_2} \leqslant 1 \]My GOV.UK rota generator is on GitHub, and also on Heroku as The Incredible Rota Machine.

I’ve timed it on my laptop by running it repeatedly overnight, and found that the time to generate a rota varies between about 10s and 15m, but the median is about 30s. I expect it’ll be slower on Heroku, though.

It’s already paying off, I saved the person who usually puts together the rota an hour and a half! A new rota is needed every quarter, and it took me three and a half days to make, so it’ll pay for itself in a mere four and a half years!

It was a fun project, and a neat thing to do in firebreak—the one-week “do whatever you want as long as it’s useful” gap we have between quarters—but probably not worth it if you’re looking to save a bit of time.

]]>The C standard bakes in enough details about pointers such that the amount of memory a C program can access (even on a hypothetical infinite-memory machine) is bounded and statically known. Access to an unbounded amount of memory is necessary (but not sufficient) for Turing completeness. Therefore C is not Turing complete.

This is an argument about the

*specification*of C, not any particular*implementation*. The fact that no real machine has unbounded memory is totally irrelevant.This is not a criticism of C.

A friend told me that C isn’t actually Turing-complete due to the semantics of pointers, so I decided to dig through the (C11) spec to find evidence for this claim. The two key bits are 6.2.6.1.4 and 6.5.9.5:

Values stored in non-bit-field objects of any other object type consist of

`n × CHAR_BIT`

bits, where`n`

is the size of an object of that type, in bytes. The value may be copied into an object of type`unsigned char [n]`

(e.g., by`memcpy`

); the resulting set of bytes is called the object representation of the value. Values stored in bit-fields consist of`m`

bits, where`m`

is the size specified for the bit-field. The object representation is the set of`m`

bits the bit-field comprises in the addressable storage unit holding it. Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

The important bit is the use of the definite article in the first sentence, “where `n`

is **the** size of an object of that type”, this means that all types have a size which is known statically.

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Pointers to distinct objects of the same typeInterestingly, you could have a distinct heap for every type, with overlapping pointer values. And this is totally fine according to the spec! This doesn’t help you, however, because the number of types is finite: they’re specified in the text of the program, which is necessarily finite.

compare unequal. As pointers are fixed in size, this means that there’s only a finite number of them. You can take a pointer to any object “The unary `&`

operator yields the address of its operand.”, first sentence of 6.5.3.2.3.

, therefore there are a finite number of objects that can exist at any one time!

However, C is slightly more interesting than a finite-state machine. We have one more mechanism to store values: the return value of a function! Fortunately, the C spec doesn’t impose a maximum stack depth“Recursive function calls shall be permitted, both directly and indirectly through any chain of other functions.”, 6.5.2.2.11, nothing else is said on the matter.

, and so we can in principle implement a pushdown automata.

Just an interesting bit of information about C, because it’s so common to see statements like “because C is Turing-complete…”. Of course, on a real computer, nothing is Turing-complete, but C doesn’t even manage it in theory.

In a discussion about this on Twitter, the possibility of doing some sort of virtual memory shenanigans to make a pointer see different things depending on its context of use came up. I believe that this is prohibited by the semantics of object lifetimes (6.2.4.2):

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

The lifetime for heap-allocated objects is from the allocation until the deallocation (7.22.3.1):

The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

I had a fun discussion on IRC, where someone argued that the definition of pointer equality does not mention the object representation, therefore the fixed object representation size is irrelevant! Therefore, pointers could have extra information somehow which is not part of the object representation.

It took a while to resolve, but I believe the final sentence of the object representation quote and the first clause of the pointer equality quote, together with the fact that pointers are values, resolves this:

- Pointers are values.
Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

- Points (1) and (2) mean that pointers with the same object representation compare equal.
Two pointers compare equal if and only if…

- The “only if” in (4) means that if two pointers compare equal, then the rest of the rules apply.
- Points (3) and (5) mean that two pointers with the same object representation compare equal, and therefore point to the same object (or are both null pointers, etc).

This means that there cannot be any further information that what is stored in the object representation.

Interestingly, I believe this forbids something I initially thought to be the case: I say in a footnote that different types could have different heaps. They *could*, but that doesn’t let you use the same object representation for pointers of different types!

Amazon Simple Notification Service (SNS) lets you set up “topics”, subscribe to them through a variety of protocols (including SMS and email), and send a message to a topic by hitting a web endpoint. This seemed the simplest way to get my computer to text me.

You’ll need an AWS account, and you’ll also need to be okay with SNS not being free. Fortunately, unless you’re going to be sending hundreds of notifications, it’s pretty cheap. Then you need to create an SNS topic and add subscribers to it, which you can do through the AWS web interface.

I set up the SNS topic and SMS notifications with Terraform, a tool for provisioning infrastructure. Here’s a self-contained Terraform config for an SNS topic with SMS notifications:

locals { "phone" = "your phone number" "access_key" = "your aws access key" "secret_key" = "your aws secret key" } provider "aws" { access_key = "${locals.access_key}" secret_key = "${locals.secret_key}" region = "eu-west-1" } resource "aws_sns_topic" "topic_name" { name = "topic-name" } resource "aws_sns_topic_subscription" "topic_name_sms" { topic_arn = "${aws_sns_topic.topic_name.arn}" protocol = "sms" endpoint = "${locals.phone}" }

In my actual Terraform configuration, the phone number and keys are in a file which isn’t checked into the repository. Unfortunately Terraform can’t set up email subscriptions, as they need to be manually confirmed. So I had to set that up via the AWS web interface.

You can send test messages through the AWS web interface, so try that to make sure everything is working.

SNS exposes a web endpoint, so the “simplest” way to send a message to your topic would be to `curl`

that. I decided to use the excellent boto3 library for Python instead.

I quickly whipped together this script, which I store in `~/bin/aws-sns`

:

#! /usr/bin/env nix-shell #! nix-shell -i python3 -p python3Packages.boto3 ''' A script to push a message from stdin to a SNS topic. ''' import argparse import boto3 import sys arg_parser = argparse.ArgumentParser(description=__doc__) arg_parser.add_argument( '-t', dest='topic', required=True, help='Topic ARN.') arg_parser.add_argument( '-s', dest='subject', required=True, help='Subject for email.') arg_parser.add_argument( '-R', dest='region', required=False, help='Region to use.', default='eu-west-1') parsed_args = arg_parser.parse_args() # boto3 checks for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env # vars automatically. client = boto3.client('sns', region_name=parsed_args.region) # message body is stdin message = sys.stdin.read() response = client.publish( TopicArn=parsed_args.topic, Subject=parsed_args.subject, Message=message ) print('Message ID: %s' % response['MessageId'])

This is a nix-shell script, which fetches boto3 automatically when invoked. If you’re not a nix user, you’d do the usual virtualenv/source/pip dance.

You’ll need to create a user in the AWS web interface with permissions to poke SNS, and note down their access key and secret key. With those keys, and the ARN of your SNS topic, you should be able to send a message from the command line:

$ export AWS_ACCESS_KEY_ID="foo" $ export AWS_SECRET_ACCESS_KEY="bar" $ echo "Hello, world" | aws-sns -t "baz" -s "Test Message"

Now we have the alerting, so we just need the monitoring. Firstly we need a script to check whatever condition we care about (zpool status in my case), and to call the SNS script if it’s not good.

Here’s a self-contained zpool script:

#!/usr/bin/env bash export AWS_ACCESS_KEY_ID="foo" export AWS_SECRET_ACCESS_KEY="bar" ZFS_TOPIC_ARN="baz" if [[ "`zpool status -x`" != "all pools are healthy" ]]; then zpool status | aws-sns -t "$ZFS_TOPIC_ARN" -s "zfs zpool status" fi

The final piece of the puzzle is a systemd timer (or cronjob, whatever your system uses) to periodically run the script. I have mine run every 12 hours. Here’s the service definition from my NixOS config:

systemd.timers.monitoring-scripts = { wantedBy = [ "timers.target" ]; timerConfig = { OnCalendar = "0/12:00:00"; }; }; systemd.services.monitoring-scripts = { description = "Run monitoring scripts"; serviceConfig.WorkingDirectory = "/home/barrucadu/monitoring-scripts"; serviceConfig.ExecStart = "${pkgs.zsh}/bin/zsh --login -c ./monitor.sh"; serviceConfig.User = "barrucadu"; serviceConfig.Group = "users"; };

Which generates this systemd timer:

[Unit] [Timer] OnCalendar=0/12:00:00

And this unit (ignore the scary nix paths):

[Unit] Description=Run monitoring scripts [Service] Environment="LOCALE_ARCHIVE=/nix/store/vg0s4sl74f5ik64wrrx0q9n6m48vvmgs-glibc-locales-2.26-131/lib/locale/locale-archive" Environment="PATH=/nix/store/cb3slv3szhp46xkrczqw7mscy5mnk64l-coreutils-8.29/bin:/nix/store/364b5gkvgrm87bh1scxm5h8shp975n0r-findutils-4.6.0/bin:/nix/store/s63b2myh6rxfl4aqwi9yxd6rq66djk33-gnugrep-3.1/bin:/nix/store/navldm477k3ar6cy0zlw9rk43i459g69-gnused-4.4/bin:/nix/store/f9dbl8y4zjgr81hs3y3zf187rqv83apz-systemd-237/bin:/nix/store/cb3slv3szhp46xkrczqw7mscy5mnk64l-coreutils-8.29/sbin:/nix/store/364b5gkvgrm87bh1scxm5h8shp975n0r-findutils-4.6.0/sbin:/nix/store/s63b2myh6rxfl4aqwi9yxd6rq66djk33-gnugrep-3.1/sbin:/nix/store/navldm477k3ar6cy0zlw9rk43i459g69-gnused-4.4/sbin:/nix/store/f9dbl8y4zjgr81hs3y3zf187rqv83apz-systemd-237/sbin" Environment="TZDIR=/nix/store/brib029xs79az5vhjd5nhixp1l39ni31-tzdata-2017c/share/zoneinfo" ExecStart=/nix/store/77bsskn86yf6h11mx96xkxm9bqv42kqg-zsh-5.5.1/bin/zsh --login -c ./monitor.sh Group=users User=barrucadu WorkingDirectory=/home/barrucadu/monitoring-scripts

The only thing left to do was to test the whole set-up by simulating a hardware failure.

I powered off nyarlathotep, unplugged a drive, and booted it back up again. I then ran the monitoring script directly, to ensure that it worked, and then waited until midnight (which was closer than noon, at the time I was doing this) to check that the timer worked.

Both SMSes and emails came through:

]]>This got me thinking about *market values*. If I want to see the current market value of all my assets, I need to convert them all to the same currency, using a recent exchange rate. So I now have a script to fetch, once a day, exchange rates between £ and everything else:

P 2018-05-30 BTC £5501.58 P 2018-05-30 ETH £413.01 P 2018-05-30 LTC £87.85 P 2018-05-30 EUR £0.8775 P 2018-05-30 JPY £0.0069 P 2018-05-30 USD £0.7531 P 2018-05-30 VANEA £210.24

My script exports market values to influxdb, so I can see how the market value of my assets (in £) has changed over time. Great!

But what if I want to see the market value in a currency other than £? Like USD, for instance? The problem is that I have all these exchange rates:

But I don’t have, say, the exchange rate from EUR to USD.

Well it turns out that the reflexive-symmetric-transitive closure of that graph is just the thing I want! It looks pretty nasty with 7 currencies, so here it is with just 3:

Let’s see how to calculate those `?`

s.

I could pull in a functional graph library, but the graphs I’m dealing with are so small that I may as well just implement the few operations I need myself.

A graph is essentially a function `node -> node -> Maybe label`

:

import Data.Map (Map) import qualified Data.Map as M type Graph node label = Map node (Map node label)

We need an empty graph and, given a graph, we need to be able to add nodes and edges. As our nodes are the keys in the map, they need to be orderable.

-- | A graph with no nodes or edges. empty :: Ord n => Graph n l empty = M.empty -- | Add a node to a graph. addNode :: Ord n => n -> Graph n l -> Graph n l addNode n = M.insertWith (\_ old -> old) n M.empty

We don’t allow duplicate edges, as that means we have two exchange rates between the same pair of currencies, which doesn’t make much sense. So adding edges is a little more involved, as the edge might already exist:

-- | Add an edge to a graph, combining edges if they exist. -- -- If the source node doesn't exist, does not change the graph. addEdge :: Ord n => (l -> l -> l) -- ^ Function to combine edge labels. -> n -- ^ Source node. -> n -- ^ Target node. -> l -- ^ New label. -> Graph n l -> Graph n l addEdge combine from to label graph = case M.lookup from graph of Just edges -> let edges' = M.insertWith combine to label edges in M.insert from edges' graph Nothing -> graph

Ok, so we can represent our currency graph. Now we need to compute the reflexive-symmetric-transitive closure.

Reflexivity lets us go from a currency to itself:

-- | Take the reflexive closure by adding edges with the given label -- where missing. reflexiveClosure :: Ord n => l -> Graph n l -> Graph n l reflexiveClosure label graph = foldr (.) id [ addEdge (\_ old -> old) nA nA label | nA <- M.keys graph ] graph

If we know a exchange rate from A to B, symmetry gives us an exchange rate from B to A:

-- | Take the symmetric closure by adding new edges, transforming -- existing labels. symmetricClosure :: Ord n => (l -> l) -> Graph n l -> Graph n l symmetricClosure mk graph = foldr (.) id [ addEdge (\_ old -> old) nB nA (mk lAB) | (nA, edges) <- M.assocs graph , (nB, lAB) <- M.assocs edges ] graph

If we know an exchange rate from A to B, and from B to C, transitivity gives us an exchange rate from A to C:

-- | Take the transitive closure by adding new edges, combining -- existing labels. transitiveClosure :: (Ord n, Eq l) => (l -> l -> l) -> Graph n l -> Graph n l transitiveClosure combine = fixEq step where fixEq f = find . iterate f where find (a1:a2:as) | a1 == a2 = a1 | otherwise = find (a2:as) step graph = foldr (.) id [ addEdge (\_ old -> old) nA nC (combine lAB lBC) | (nA, edges) <- M.assocs graph , (nB, lAB) <- M.assocs edges , (nC, lBC) <- M.assocs (M.findWithDefault M.empty nB graph) ] graph

Exchange rates have three properties which we can make use of:

Any currency has an exchange rate with itself of 1.

If we have an exchange rate of

`x`

from A to B, then the rate from B to A is`1/x`

.If we have an exchange rate of

`x`

from A to B, and an exchange rate of`y`

from B to C, then the rate from A to C is`x*y`

.

So, given our graph of exchange rates, we can fill in the blanks like so:

-- | Fill in the blanks in an exchange rate graph. completeRates :: (Ord n, Eq l, Fractional l) => Graph n l -> Graph n l completeRates = transitiveClosure (*) . symmetricClosure (1/) . reflexiveClosure 1

There’s also a fourth property we can assume in reality:

- Any two paths between the same two currencies work out to the same exchange rate.

Otherwise we could make a profit by going around in a circle, and I’m sure someone would have noticed that already and made a lot of money. In our implementation however, we can’t assume that. Exchange rates available online have limited precision, and rounding errors will introduce more problems. But in general things will be close, so it doesn’t matter too much from the perspective of getting a rough idea of our personal finances.

So now I can look at my total assets in yen and feel like a millionaire:

]]>I do not like mocking, and think it often does more harm than good unless you are very careful about what your test is actually testing.

Let’s say your program involves loading some data from disk, and you use a library to do this loading. Let’s say that there are a few ways in which a file can be invalid, and these are each signalled by the library raising a different exception.

Your code might look like this:

def calculate_thing ExternalLibrary::Reader.new("data_file").frobnicate rescue ExternalLibrary::MalformedData, ExternalLibrary::UnsupportedExtension nil end

And your test might look like this:

def calculate_thing_handles_file_errors errors = %w(ExternalLibrary::MalformedData ExternalLibrary::UnsupportedExtension) errors.each do |err| ExternalLibrary::Reader.any_instance.stubs(:frobnicate).raises(err.constantize) assert_nil calculate_thing end end

This looks good: you’re catching exceptions in your program, and your test is throwing those and checking that they are handled. But what is this *really* testing?

The test is obviously correct, which isn’t necessarily a bad thing as it guards against the code changing, but does it *really* test that you handle errors from the external library? I don’t think so. If a new version of `ExternalLibrary`

comes along and adds a third exception type, this test will not help you.

This test guards against the exception list in the code being changed, but does *not* check that all errors from the external library are handled.

The main problem with mocking is that it is very easy to write a reasonable test, and then to derive more confidence from it than you should.

Whenever you artificially change the behaviour of something, you need to be very clear about what your test is actually testing. It is much better to avoid the change if possible, possibly at the price of a more complex (but more realistic) test.

There’s a lesser problem that it’s easy to write a mock which doesn’t exercise all the behaviour your program expects (imagine `calculate_thing`

handled the two exceptions differently, but your mock only threw one of them, for example). This problem can be overcome with branch coverage.

It’s also something that dejafu does not do.

I’m going to use the “stores are transitively visible” litmus test as a running example. Here it is:

import qualified Control.Monad.Conc.Class as C import Test.DejaFu.Internal import Test.DejaFu.SCT import Test.DejaFu.SCT.Internal.DPOR import Test.DejaFu.Types import Test.DejaFu.Utils storesAreTransitivelyVisible :: C.MonadConc m => m (Int, Int, Int) storesAreTransitivelyVisible = do x <- C.newCRef 0 y <- C.newCRef 0 j1 <- C.spawn (C.writeCRef x 1) j2 <- C.spawn (do r1 <- C.readCRef x; C.writeCRef x 1; pure r1) j3 <- C.spawn (do r2 <- C.readCRef y; r3 <- C.readCRef x; pure (r2,r3)) (\() r1 (r2,r3) -> (r1,r2,r3)) <$> C.readMVar j1 <*> C.readMVar j2 <*> C.readMVar j3

I picked this one because it’s kind of arbitrarily complex. It’s a small test, but it’s for the relaxed memory implementation, so there’s a lot going on. It’s a fairly dense test.

I’m now going to define a metric of trace complexity which I’ll justify in a moment:

complexity :: Trace -> (Int, Int, Int, Int) complexity = foldr go (0,0,0,0) where go (SwitchTo _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x+1, y, z) go (Start _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y+1, z) go (Continue, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y, z+1) go (SwitchTo _, _, _) (w, x, y, z) = (w, x+1, y, z) go (Start _, _, _) (w, x, y, z) = (w, x, y+1, z) go (Continue, _, _) (w, x, y, z) = (w, x, y, z+1)

Using the `183-shrinking`

branch, we can now get the first trace for every distinct result, along with its complexity:

results :: Way -> MemType -> IO () results way memtype = do let settings = set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

Here are the results for systematic testing:

λ> results (systematically defaultBounds) SequentialConsistency Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S2-----S1---S0---S3-----S0-- (0,0,6,24) Right (0,0,0) S0------------S2-P3-----S1---S0--S2----S0--- (0,1,6,23) Right (1,0,0) S0------------S3-----S1---S0--S2-----S0--- (0,0,6,24) λ> results (systematically defaultBounds) TotalStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23) λ> results (systematically defaultBounds) PartialStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23)

Pretty messy, right? Here’s the results for *random* testing:

λ> results (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----P1-P0----P2-P1-P0-P3-P1-S2-P3--P0-P3-P0-P3-S2-P0-S2-P0--P2-S0- (0,15,5,9) Right (0,0,1) S0-------P2-P1-P2-P0--P2-P0-P1-P0---S2-P3-P0-P2-S3---P1-S3-S0-- (0,12,5,12) Right (1,0,0) S0------------S3-----S1-P2-P1-P0--S2---P1-S0--- (0,4,5,20) Right (0,0,0) S0---------P2-P0--P3-P0-S3--P2-P3-P2--P3-S2-S1--P0---- (0,9,4,15) λ> results (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14) λ> results (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14)

Yikes!

The complexity metric I defined counts four things:

- The number of relaxed-memory commit actions
- The number of pre-emptive context switches
- The number of non-pre-emptive context switches
- The number of continues

I would much rather read a long trace where the only context switches are when threads block, than a short one which is rapidly jumping between threads. So, given two equivalent traces, I will always prefer the one with a lexicographically smaller complexity-tuple.

The key idea underpinning trace simplification is that dejafu can tell when two scheduling decisions can be swapped without changing the behaviour of the program. I talked about this idea in the Using Hedgehog to Test Déjà Fu memo. So we can implement transformations which are guaranteed to preserve semantics *without needing to verify this by re-running the test case*.

Although we don’t need to re-run the test case at all, the `183-shrinking`

branch currently does, but only once at the end after the minimum has been found. This is because it’s easier to generate a simpler sequence of scheduling decisions and use dejafu to produce the corresponding trace than it is to produce a simpler trace directly. This is still strictly better than a typical shrinking algorithm, which would re-run the test case after *each* shrinking step, rather than only at the end.

Rather than drag this out, here’s what those random traces simplify to:

resultsS :: Way -> MemType -> IO () resultsS way memtype = do let settings = set lsimplify True . set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

This is much better.

There are two simplification phases: a preparation phase, which puts the trace into a normal form and prunes unnecessary commits; and an iteration phase, which repeats a step function until a fixed point is reached (or the iteration limit is).

The preparation phase has two steps: first we put the trace into *lexicographic normal form*, then we prune unnecessary commits.

We put a trace in lexicographic normal form by sorting by thread ID, where only independent actions can be swapped:

lexicoNormalForm :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] lexicoNormalForm memtype = go where go trc = let trc' = bubble initialDepState trc in if trc == trc' then trc else go trc' bubble ds (t1@(tid1, ta1):t2@(tid2, ta2):trc) | independent ds tid1 ta1 tid2 ta2 && tid2 < tid1 = bgo ds t2 (t1 : trc) | otherwise = bgo ds t1 (t2 : trc) bubble _ trc = trc bgo ds t@(tid, ta) trc = t : bubble (updateDepState memtype ds tid ta) trc

If simplification only put traces into lexicographic normal form, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18)

These are better than they were, but we can do better still.

After putting the trace into lexicographic normal form, we delete any commit actions which are followed by any number of independent actions and then a memory barrier:

dropCommits :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] dropCommits SequentialConsistency = id dropCommits memtype = go initialDepState where go ds (t1@(tid1, ta1@(CommitCRef _ _)):t2@(tid2, ta2):trc) | isBarrier (simplifyAction ta2) = go ds (t2:trc) | independent ds tid1 ta1 tid2 ta2 = t2 : go (updateDepState memtype ds tid2 ta2) (t1:trc) go ds (t@(tid,ta):trc) = t : go (updateDepState memtype ds tid ta) trc go _ [] = []

Such commits don’t affect the behaviour of the program at all, as all buffered writes gets flushed when the memory barrier happens.

If simplification only did the preparation phase, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm

The iteration phase attempts to reduce context switching by pushing actions forwards, or pulling them backwards, through the trace.

If we have the trace `[(tid1, act1), (tid2, act2), (tid1, act3)]`

, where `act2`

and `act3`

are independent, the “pull back” transformation would re-order that to `[(tid1, act1), (tid1, act3), (tid2, act2)]`

.

In contrast, if `act1`

and `act2`

were independent, the “push forward” transformation would re-order that to `[(tid2, act2), (tid1, act1), (tid1, act3)]`

. The two transformations are almost, but not quite opposites.

Pull-back walks through the trace and, at every context switch, looks forward to see if there is a single action of the original thread it can put before the context switch:

pullBack :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pullBack memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 trc' = if tid1 /= tid2 then maybe trc (uncurry (:)) (findAction tid1 ds' trc) else trc in t1 : go ds' trc' go _ trc = trc findAction tid0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just (t, trc) | otherwise = case fgo (updateDepState memtype ds tid ta) trc of Just (ft@(ftid, fa), trc') | independent ds tid ta ftid fa -> Just (ft, t:trc') _ -> Nothing fgo _ _ = Nothing

Push-forward walks through the trace and, at every context switch, looks forward to see if the last action of the original thread can be put at its next execution:

pushForward :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pushForward memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 in if tid1 /= tid2 then maybe (t1 : go ds' trc) (go ds) (findAction tid1 ta1 ds trc) else t1 : go ds' trc go _ trc = trc findAction tid0 ta0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just ((tid0, ta0) : t : trc) | independent ds tid0 ta0 tid ta = (t:) <$> fgo (updateDepState memtype ds tid ta) trc | otherwise = Nothing fgo _ _ = Nothing

The iteration process just repeats `pushForward memtype . pullBack memtype`

.

If it only used `pullBack`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2---P0--S2--S0-P3-----S0-- (0,3,5,21) Right (0,0,1) S0-----------P2-P1-P2--P1--S0--S2--S0-P3-----S0-- (0,5,5,19) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S2-P3---P2----S3--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21)

With no exception, iterating `pullBack`

is an improvement over just doing preparation.

If it only used `pushForward`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-------P1---S2--P0------S2--P3----P0--- (0,4,3,20) Right (0,0,1) S0-------P2-P1-P2-P1-P0------S1-S2---S3----P0--- (0,6,4,18) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) ^-- no improvement over preparation Right (0,0,0) S0------------S3--P2-----S3--P1--P0---- (0,3,3,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation

With three exceptions, where the traces didn’t change, iterating `pushForward`

is an improvement over just doing preparation.

We’ve already seen the results if we combine them:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

I think what I have right now is pretty good. It’s definitely a vast improvement over not doing any simplification.

*But*, no random traces get simplified to the corresponding systematic traces, which is a little disappointing. I think that’s because the current passes just try to reduce context switches of any form, whereas really I want to reduce pre-emptive context switches more than non-pre-emptive ones.

Trying every possible interleaving will give you, in general, an exponential blow-up of the executions you need to perform as your test case grows in size. The core testing algorithm we use, a variant of dynamic partial-order reduction (DPOR)For all the gory details, see:

, attempts to reduce this blow-up. DPOR identifies actions which are *dependent*, and only tries interleavings which permute dependent actions.

Here are some examples:

It doesn’t matter which order two threads execute

`readMVar`

, for the same`MVar`

. These actions are*independent*.It does matter which order two threads execute

`putMVar`

, for the same`MVar`

. These actions are*dependent*.It doesn’t matter which order two threads execute

`putMVar`

for different`MVar`

s. These actions are*independent*.

Two actions are dependent if the order in which they are performed matters.

So the intuition behind DPOR is that most actions in a concurrent program are *independent*. DPOR won’t help you much if you have a single piece of shared state which every thread is hitting, but most concurrent programs aren’t like that. The worst case is still a terrible exponential blow-up, but the average case is much better.

The dependency relation is *core* part of Déjà Fu today. It has impacts on both performance and correctness. If it says two actions are dependent when they are not, then we may see unnecessary interleavings tried. If it says two actions are not dependent when they really are, then we may miss necessary interleavings.

Being such an important component, it must be well-tested, right? Well, sort of. The Déjà Fu testsuite mostly consists of small concurrent programs together with a list of expected outputs, testing that Déjà Fu finds all the nondeterminism in the program. This does exercise the dependency relation, but only very indirectly.

There things would have remained had I not experienced one of those coincidence-driven flashes of insight:

aherrmann opened an issue on GitHub asking how to take an execution trace and replay it.

agnishom posted a thread on /r/algorithms asking how to check the equivalence of traces where only some elements commute.

I had my idea. I can *directly* test the dependency relation like so:

- Execute a concurrent program.
- Normalise its execution trace in some way.
- “Replay” the normalised trace.
- Assert that the result is the same.

So, what is a good normal form for a trace? I tried out a few approaches here, but there was one I kept coming back to: we should shuffle around independent actions to keep the program on the main thread for as long as possible.

There are two reasons I think this works well. (1) The traces we get will be easier for a human to read, as the program will stay on its main thread and only execute another thread where necessary. (2) A Haskell program terminates when the main thread terminates, so by executing the main thread as much as possible, we may find that some actions don’t need to be executed at all.

So firstly we need to know when two actions commute. Let’s just use the dependency relation for that:

-- | Check if two actions commute. independent :: DepState -> (ThreadId, ThreadAction) -> (ThreadId, ThreadAction) -> Bool independent ds (tid1, ta1) (tid2, ta2) = not (dependent ds tid1 ta1 tid2 ta2)

The `DepState`

parameter tracks information about the history of the execution, allowing us to make better decisions. For example: while in general it matters in which order two `putMVar`

s to the same `MVar`

happen; it *doesn’t* matter if the `MVar`

is already full, as both actions will block without achieving anything.

The approach works well in practice, but has been the source of *so many* off-by-one errors. Even while writing this memo!

So now onto trace normalisation. The easiest way to do it is bubble sort, but with an additional constraint on when we can swap things:

- For every adjacent pair of items
`x`

and`y`

in the trace:- If
`x`

and`y`

commute and`thread_id y < thread_id x`

:- Swap
`x`

and`y`

.

- Swap
- Update the
`DepState`

and continue to the next pair.

- If
- Repeat until there are no more changes.

And here’s the code:

-- | Rewrite a trace into a canonical form. normalise :: [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] normalise trc0 = if changed then normalise trc' else trc' where (changed, trc') = bubble initialDepState False trc0 bubble ds flag ((x@(tid1, _)):(y@(tid2, _)):trc) | independent ds x y && tid2 < tid1 = go ds True y (x : trc) | otherwise = go ds flag x (y : trc) bubble _ flag trc = (flag, trc) go ds flag t@(tid, ta) trc = second (t :) (bubble (updateDepState ds tid ta) flag trc)

Now we need a scheduler which can play a given list of scheduling decisions. This isn’t built in, but we can make one. Schedulers look like this:

-- from Test.DejaFu.Schedule newtype Scheduler state = Scheduler { scheduleThread :: Maybe (ThreadId, ThreadAction) -> NonEmpty (ThreadId, Lookahead) -> state -> (Maybe ThreadId, state) }

A scheduler is a stateful function, which takes the previously scheduled action and the list of runnable threads, and gives back a thread to execute. We don’t care about those parameters. We just want to play a fixed list of scheduling decisions. And here is how we do that:

-- | Execute a concurrent program by playing a list of scheduling decisions. play :: MemType -> [ThreadId] -> ConcIO a -> IO (Either Failure a, [ThreadId], Trace) play = runConcurrent (Scheduler sched) where sched _ _ (t:ts) = (Just t, ts) sched _ _ [] = (Nothing, [])

Now all the background is in place, so we can test what we want to test: that an execution, and the play-back of its normalised trace, give the same result. For reasons which will become apparent in the next section, I’m going to parameterise over the normalisation function:

-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and return both results. runNorm :: ([(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)]) -> Int -> MemType -> ConcIO a -> IO (Either Failure a, [ThreadId], Either Failure a, [ThreadId]) runNorm norm seed memtype conc = do let g = mkStdGen seed -- 1 (efa1, _, trc) <- runConcurrent randomSched memtype g conc let -- 2 trc' = tail ( scanl (\(t, _) (d, _, a) -> (tidOf t d, a)) (initialThread, undefined) trc ) let tids1 = map fst trc' let tids2 = map fst (norm trc') -- 3 (efa2, s, _) <- play memtype tids2 conc let truncated = take (length tids2 - length s) tids2 -- 4 pure (efa1, tids1, efa2, truncated)

There’s a lot going on here, so let’s break it down:

We execute the program with the built-in random scheduler, using the provided seed.

The trace that

`runConcurrent`

gives us is in the form`[(Decision, [(ThreadId, Lookahead)], ThreadAction)]`

, whereas we want a`[(ThreadId, ThreadAction)]`

. So this scan just changes the format. It’s a scan rather than a map because to convert a`Decision`

into a`ThreadId`

potentially requires knowing what the previous thread was.We normalise the trace, and run it again.

If the entire normalised trace wasn’t used up, then it has some unnecessary suffix (because the main thread is now terminating sooner). So we make the normalised trace easier to read by chopping off any such suffix.

Finally, we can write a little function to test using the `normalise`

function:

-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and check that both give the same result. testNormalise :: (Eq a, Show a) => Int -> MemType -> ConcIO a -> IO Bool testNormalise seed memtype conc = do (efa1, tids1, efa2, tids2) <- runNorm normalise seed memtype conc unless (efa1 == efa2) $ do putStrLn "Mismatched result!" putStrLn $ " expected: " ++ show efa1 putStrLn $ " but got: " ++ show efa2 putStrLn "" putStrLn $ "rewritten from: " ++ show tids1 putStrLn $ " to: " ++ show tids2 pure (efa1 == efa2)

And does it work? Let’s copy two example programs from the Test.DejaFu docs:

-- from Test.DejaFu example1 :: MonadConc m => m String example1 = do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var example2 :: MonadConc m => m (Bool, Bool) example2 = do r1 <- newCRef False r2 <- newCRef False x <- spawn $ writeCRef r1 True >> readCRef r2 y <- spawn $ writeCRef r2 True >> readCRef r1 (,) <$> readMVar x <*> readMVar y

And then test them:

> testNormalise 0 TotalStoreOrder example1 True > testNormalise 0 TotalStoreOrder example2 True

According to my very unscientific method, everything works perfectly!

You can probably see where this is going: just supplying *one* random seed and *one* memory model is a poor way to test things. Ah, if only we had some sort of tool to generate arbitrary values for us!

But that’s not all: if the dependency relation is correct, then *any* permutation of independent actions should give the same result, not just the one which `normalise`

implements. So before we introduce Hedgehog and arbitrary values, let’s make something a little more chaotic:

-- | Shuffle independent actions in a trace according to the given list. shuffle :: [Bool] -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] shuffle = go initialDepState where go ds (f:fs) (t1:t2:trc) | independent ds t1 t2 && f = go' ds fs t2 (t1 : trc) | otherwise = go' ds fs t1 (t2 : trc) go _ _ trc = trc go' ds fs t@(tid, ta) trc = t : go (updateDepState ds tid ta) fs trc

In `normalise`

, two independent actions will *always* be re-ordered if it gets us closer to the canonical form. However, in `shuffle`

, two independent actions will either be re-ordered or not, depending on the supplied list of `Bool`

.

This is much better for testing our dependency relation, as we can now get far more re-orderings which *all* should satisfy the same property: that no matter how the independent actions in a trace are shuffled, we get the same result.

I think it’s about time to bring out Hedgehog:

-- | Execute a concurrent program with a random scheduler, arbitrarily permute -- the independent actions in the trace, and check that we get the same result -- out. hog :: (Eq a, Show a) => ConcIO a -> IO Bool hog conc = Hedgehog.check . property $ do mem <- forAll Gen.enumBounded -- 1 seed <- forAll $ Gen.int (Range.linear 0 100) fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool (efa1, tids1, efa2, tids2) <- liftIO -- 2 $ runNorm (shuffle fs) seed mem conc footnote (" to: " ++ show tids2) -- 3 footnote ("rewritten from: " ++ show tids1) efa1 === efa2

Let’s break that down:

We’re telling Hedgehog that this property should hold for all memory models, all seeds, and all

`Bool`

-lists. Unlike most Haskell property-testing libraries, Hedgehog takes generator functions rather than using a typeclass. I think this is nicer.We run our program, normalise it, and get all the results just as before.

We add some footnotes: messages which Hedgehog will display along with a failure. For some reason these get displayed in reverse order.

Alright, let’s see if Hedgehog finds any bugs for us:

> hog example1 ? <interactive> failed after 3 tests and 1 shrink. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 2) (Seed 1824012233418733250 (-4876494268681827407)) <property> False

It did! And look at that output! Magical! I must see if I can get Déjà Fu to give annotated source output like that.

Let’s look at `example1`

again:

do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var

Oh dear, our rewritten trace is trying to execute thread `1`

immediately after the first action of the main thread. The first action of the main thread is `newEmptyMVar`

: thread `1`

doesn’t exist at that point!

Let’s change our `independent`

function to say that an action is dependent with the fork which creates its thread:

independent ds (tid1, ta1) (tid2, ta2) | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)

How about now?

> hog example1 ? <interactive> failed after 13 tests and 2 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ True , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 12) (Seed 654387260079025817 (-6686572164463137223)) <property> False

Well, that failing trace looks exactly like the previous error. But the parameters are different: the first error happened with the list `[False, True]`

, this requires the list `[True, True]`

. So let’s think about what happens to the trace in this case.

We start with:

`[(main, NewEmptyMVar 0), (main, Fork 1), (1, PutMVar 0)]`

.The first two actions are independent, and the flag is

`True`

, so we swap them. We now have:`[(main, Fork 1), (main, NewEmptyMVar 1), (1, PutMVar 0)]`

.The second two actions are independent, and the flag is

`True`

, so we swap them. We now have:`[(main, Fork 1), (1, PutMVar 0), (main, NewEmptyMVar 0)]`

.

We can’t actually re-order actions of the same thread, so we should never have swapped the first two. I suppose there’s another problem here, that no action on an `MVar`

commutes with creating that `MVar`

, but we should never be in a situation where that could happen. So we need another case in `independent`

:

independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)

Our first example program works fine now:

> hog example1 ? <interactive> passed 100 tests. True

The second is a little less happy:

> hog example2 ? <interactive> failed after 48 tests and 9 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? TotalStoreOrder 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , True ? ? ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right ( False , True ) ? ? + Left InternalError rewritten from: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,main,2,-1,2,2,main,main] to: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,2,main,-1] This failure can be reproduced by running: > recheck (Size 47) (Seed 2159662051602767058 (-7857629802164753123)) <property> False

This is a little trickier. Here’s my diagnosis:

It’s an

`InternalError`

again, which means we’re trying to execute a thread which isn’t runnable.The memory model is

`TotalStoreOrder`

, and the thread we’re trying to execute is thread`-1`

, a “fake” thread used in the relaxed memory implementation. So this is a relaxed memory bug.The traces only differ in one place: where

`main, 2, -1`

is changed to`2, main, -1`

. So the issue is caused by re-ordering`main`

and thread`2`

.If the

`main`

action is a memory barrier, then thread`-1`

will not exist after it.So the

`main`

action is probably a memory barrier.

Let’s push along those lines and add a case for memory barriers to `independent`

:

independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = case (simplifyAction ta1, simplifyAction ta2) of (UnsynchronisedWrite _, a) | isBarrier a -> False (a, UnsynchronisedWrite _) | isBarrier a -> False _ -> not (dependent ds tid1 ta1 tid2 ta2)

Did we get it?

> hog example2 ? <interactive> passed 100 tests. True

Great!

So, we explored the dependency relation with Hedgehog, and found three missing cases:

Two actions of the same thread are dependent.

Any action of a thread is dependent with the

`fork`

which creates that thread.Unsynchronised writes are dependent with memory barriers.

But are these *bugs*? I’m not so sure:

The dependency relation is only ever used to compare different threads.

This is technically correct, but it’s not interesting or useful.

This could be a bug. The relaxed memory implementation is pretty hairy and I’ve had a lot of problems with it in the past. Honestly, I just need to rewrite it (or campaign for Haskell to become sequentially consistent

**SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap**, M. Vollmer, R. G. Scott, M. Musuvathi, and R. R. Newton (2017)

and rip it out).

But even if not bugs, these are definitely *confusing*. The dependency relation is currently just an internal thing, not exposed to users. However, I’m planning to expose a function to normalise traces, in which case providing an `independent`

function is entirely reasonable.

So even if these changes don’t make it into `dependent`

, they will be handled by `independent`

.

**Next steps:** I’m going to get this into the test suite, to get a large number of extra example programs for free. My hacky and cobbled-together testing framework in dejafu-tests is capable of running every test case with a variety of different schedulers, so I just need to add another way it runs everything. I won’t need to touch the actual tests, just the layer of glue which runs them all, which is nice.

The only problem is that this glue is currently based on HUnit and test-framework, whereas the only integration I can find for Hedgehog is tasty-hedgehog, so I might need to switch to tasty first. As usual, the hardest part is getting different libraries to co-operate!

Hopefully I’ll find some bugs! Well, not exactly *hopefully*, but you know what I mean.

There are 100 prisoners in solitary cells. There’s a central living room with one light bulb; this bulb is initially off. No prisoner can see the light bulb from his or her own cell. Everyday, the warden picks a prisoner equally at random, and that prisoner visits the living room. While there, the prisoner can toggle the bulb if he or she wishes. Also, the prisoner has the option of asserting that all 100 prisoners have been to the living room by now. If this assertion is false, all 100 prisoners are shot. However, if it is indeed true, all prisoners are set free and inducted into MENSA, since the world could always use more smart people. Thus, the assertion should only be made if the prisoner is 100% certain of its validity. The prisoners are allowed to get together one night in the courtyard, to discuss a plan. What plan should they agree on, so that eventually, someone will make a correct assertion?

We can express this as a concurrency problem: the warden is the scheduler, each prisoner is a thread, and when the program terminates every prisoner should have visited the living room.

Let’s set up some imports:

{-# LANGUAGE RankNTypes #-} import qualified Control.Concurrent.Classy as C import Control.Monad (forever, when) import Data.Foldable (for_) import Data.List (genericLength) import Data.Maybe (mapMaybe) import qualified Data.Set as S import qualified Test.DejaFu as D import qualified Test.DejaFu.Common as D import qualified Test.DejaFu.SCT as D

Before we try to implement a solution, let’s think about how we can check if an execution corresponds to the prisoners succeeding an entering MENSA, or failing and being shot.

Prisoners are threads, and the warden is the scheduler. So if every thread (prisoner) that is forked is scheduled (taken to the room), then the prisoners are successful:

-- | Check if an execution corresponds to a correct guess. isCorrect :: D.Trace -> Bool isCorrect trc = S.fromList (threads trc) == S.fromList (visits trc) -- | Get all threads created. threads :: D.Trace -> [D.ThreadId] threads trc = D.initialThread : mapMaybe go trc where go (_, _, D.Fork tid) = Just tid go _ = Nothing -- | Get all scheduled threads visits :: D.Trace -> [D.ThreadId] visits = mapMaybe go where go (D.Start tid, _, _) = Just tid go (D.SwitchTo tid, _, _) = Just tid go _ = Nothing

So now, given some way of setting up the game and running it to completion, we can test it and print some statistics:

-- | Run the prison game and print statistics. run :: D.Way -> (forall m. C.MonadConc m => m ()) -> IO () run way game = do traces <- map snd <$> D.runSCT way D.defaultMemType game let successes = filter isCorrect traces let failures = filter (not . isCorrect) traces putStrLn (show (length traces) ++ " total attempts") putStrLn (show (length successes) ++ " successes") putStrLn (show (length failures) ++ " failures") putStrLn (show (avgvisits successes) ++ " average number of room visits per success") putStrLn (show (avgvisits failures) ++ " average number of room visits per failure") putStrLn "Sample sequences of visits:" for_ (take 5 traces) (print . visits) where avgvisits ts = sum (map (fromIntegral . numvisits) ts) / genericLength ts numvisits = sum . map count where count (_, _, D.STM _ _) = 1 count (_, _, D.BlockedSTM _) = 1 count (_, _, D.Yield) = 1 count _ = 0

I have decided to assume that a prisoner will either yield (doing nothing) or perform some STM transaction while they’re in the room, to simplify things.

A slow but simple strategy is for the prisoners to nominate a leader. Only the leader can declare to the warden that everyone has visited the room. Whenever a prisoner other than the leader visits the room, if the light is *on*, they do nothing; otherwise, if this is their first time in the room with the light off, they turn it on, otherwise they leave it. Whenever the leader enters the room, they turn the light off. When the leader has turned the light off 99 times (or `1 - num_prisoners`

times), they tell the warden that everyone has visited.

Let’s set up those algorithms:

-- | The state of the light bulb. data Light = IsOn | IsOff -- | Count how many prisoners have toggled the light and terminate -- when everyone has. leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Light -> m () leader prisoners light = go 0 where go counter = do counter' <- C.atomically $ do state <- C.readTVar light case state of IsOn -> do C.writeTVar light IsOff pure (counter + 1) IsOff -> C.retry when (counter' < prisoners - 1) (go counter') -- | Turn the light on once then do nothing. notLeader :: C.MonadConc m => C.TVar (C.STM m) Light -> m () notLeader light = do C.atomically $ do state <- C.readTVar light case state of IsOn -> C.retry IsOff -> C.writeTVar light IsOn forever C.yield

So now we just need to create a program where the leader is the main thread and everyone else is a separate thread:

-- | Most popular English male and female names, according to -- Wikipedia. name :: Int -> String name i = ns !! (i `mod` length ns) where ns = ["Oliver", "Olivia", "George", "Amelia", "Harry", "Emily"] -- | Set up the prison game. The number of prisoners should be at -- least 1. prison :: C.MonadConc m => Int -> m () prison prisoners = do light <- C.atomically (C.newTVar IsOff) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader light)) leader prisoners light

Because these are people, not just threads, I’ve given them names. The leader is just called “main” though, how unfortunate for them.

Now we can try out our system and see if it works:

λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 5 total attempts 5 successes 0 failures 7.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main] λ> runS 3 2035 total attempts 2035 successes 0 failures 133.39066339066338 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: (big lists omitted)

This doesn’t scale well. It’s actually a really bad case for concurrency testing: every thread is messing with the same shared state, so dejafu has to try all the orderings. Not good.

Taking another look at our prisoners, we can see two things which a human would use to decide whether some schedules are redundant or not:

If we adopt any schedule other than alternating leader / non-leader, threads will block without doing anything. So we should alternate.

When a non-leader has completed their task, they will always yield. So we should never schedule a prisoner who will yield.

Unfortunately dejafu can’t really make use of (1). It could be inferred *if* dejafu was able to compare values inside `TVar`

s, rather than just seeing that there had been a write. But Haskell doesn’t let us do that without slapping an `Eq`

constraint on `writeTVar`

, which I definitely don’t want to do (although maybe having a separate `eqwriteTVar`

, `eqputMVar`

, and so on would be a nice addition).

Fortunately, dejafu *can* do something with (2). It already bounds the maximum number of times a thread can yield, so that we can test constructs like spinlocks. This is called *fair bounding*. The default bound is 5, but if we set it to 0 dejafu will just never schedule a thread which is going to yield. Here we go:

λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing, D.boundFair = Just 0 }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 1 total attempts 1 successes 0 failures 4.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main] λ> runS 3 4 total attempts 4 successes 0 failures 7.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main] [main,Olivia,George,main,George,main] [main,George,main,Olivia,main] [main,George,Olivia,main,Olivia,main]

Much better! Although it still doesn’t scale as nicely as we’d like

λ> runS 4 48 total attempts 48 successes 0 failures 11.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main] [main,Olivia,main,Amelia,main,George,main] [main,Olivia,main,Amelia,George,main,George,main] [main,Olivia,George,main,George,main,Amelia,main] λ> runS 5 1536 total attempts 1536 successes 0 failures 16.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main] [main,Olivia,main,George,main,Harry,main,Amelia,main] [main,Olivia,main,George,main,Harry,Amelia,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main,Harry,main] λ> runS 6 122880 total attempts 122880 successes 0 failures 21.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Harry,Emily,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Emily,main,Harry,main] [main,Olivia,main,George,main,Amelia,main,Emily,Harry,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main,Emily,main]

The prisoners are stepping on each other’s toes and causing needless work. This is probably as good as we can do without adding some extra primitives to dejafu to optimise the case where we have an `Eq`

instance available, unfortunately.

In concurrency testing terms, six threads is actually quite a lot.

Empirical studies have found that many concurrency bugs can be exhibited with only two or three threads! Furthermore, most real-world concurrent programs don’t have every single thread operating on the same bit of shared state.

There’s another school of thought which says to just wait for three years, because by then it’s very unlikely that any single prisoner had never visited the room. In fact, we would expect each prisoner to have been to the room ten times by then, assuming the warden is fair.

By keeping track of how many days have passed, we can try this out as well:

leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Int -> m () leader prisoners days = C.atomically $ do numDays <- C.readTVar days C.check (numDays >= (prisoners - 1) * 10) notLeader :: C.MonadConc m => C.TVar (C.STM m) Int -> m () notLeader days = forever . C.atomically $ C.modifyTVar days (+1) prison :: C.MonadConc m => Int -> m () prison prisoners = do days <- C.atomically (C.newTVar 0) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader days)) leader prisoners days

Now let’s see how these brave prisoners do (sample visit sequences omitted because they’re pretty long):

λ> let runR = run $ D.uniformly (R.mkStdGen 0) 100 λ> runR 1 100 total attempts 100 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure λ> runR 2 100 total attempts 100 successes 0 failures 18.35 average number of room visits per success NaN average number of room visits per failure λ> runR 3 100 total attempts 100 successes 0 failures 31.92 average number of room visits per success NaN average number of room visits per failure λ> runR 4 100 total attempts 100 successes 0 failures 43.52 average number of room visits per success NaN average number of room visits per failure λ> runR 5 100 total attempts 100 successes 0 failures 55.88 average number of room visits per success NaN average number of room visits per failure λ> runR 6 100 total attempts 100 successes 0 failures 67.37 average number of room visits per success NaN average number of room visits per failure λ> runR 7 100 total attempts 100 successes 0 failures 77.05 average number of room visits per success NaN average number of room visits per failure λ> runR 8 100 total attempts 99 successes 1 failures 90.4040404040404 average number of room visits per success 81.0 average number of room visits per failure λ> runR 9 100 total attempts 100 successes 0 failures 101.64 average number of room visits per success NaN average number of room visits per failure λ> runR 10 100 total attempts 100 successes 0 failures 114.89 average number of room visits per success NaN average number of room visits per failure

Not bad at all! Although my puny VPS still can’t manage all 100.

]]>