I’m pleased to announce a new super-major release of dejafu, a library for testing concurrent Haskell programs.

While there are breaking changes, common use-cases shouldn’t be affected too significantly (or not at all). There is a brief guide to the changes, and how to migrate if necessary, on the website.

dejafu is a unit-testing library for concurrent Haskell programs. Tests are deterministic, and work by systematically exploring the possible schedules of your concurrency-using test case, allowing you to confidently check your threaded code.

HUnit and Tasty bindings are available.

dejafu requires your test case to be written against the `MonadConc`

typeclass from the concurrency package. This is a necessity, dejafu cannot peek inside your `IO`

or `STM`

actions, so it needs to be able to plug in an alternative implementation of the concurrency primitives for testing. There is some guidance for how to switch from `IO`

code to `MonadConc`

code on the website.

If you really need `IO`

, you can use `MonadIO`

- but make sure it’s deterministic enough to not invalidate your tests!

Here’s a small example reproducing a deadlock found in an earlier version of the auto-update library:

> :{ autocheck $ do auto <- mkAutoUpdate defaultUpdateSettings auto :} [fail] Successful [deadlock] S0--------S1-----------S0- [fail] Deterministic [deadlock] S0--------S1-----------S0- () S0--------S1--------p0--

dejafu finds the deadlock, and gives a simplified execution trace for each distinct result. More in-depth traces showing exactly what each thread did are also available. This is using a version of auto-update modified to use the `MonadConc`

typeclass. The source is in the dejafu testsuite.

The highlights for this release are setup actions, teardown actions, and invariants:

**Setup actions**are for things which are not really a part of your test case, but which are needed for it (for example, setting up a test distributed system). As dejafu can run a single test case many times, repeating this work can be a significant overhead. By defining this as a setup action, dejafu can “snapshot” the state at the end of the action, and efficiently reload it in subsequent executions of the same test.**Teardown actions**are for things you want to run after your test case completes, in all cases, even if the test deadlocks (for example). As dejafu controls the concurrent execution of the test case, inspecting shared state is possible even if the test case fails to complete.**Invariants**are effect-free atomically-checked conditions over shared state which must always hold. If an invariant throws an exception, the test case is aborted, and any teardown action run.

Here is an example of a setup action with an invariant:

> :{ autocheck $ let setup = do var <- newEmptyMVar registerInvariant $ do value <- inspectMVar var when (value == Just 1) $ throwM Overflow pure var in withSetup setup $ \var -> do fork $ putMVar var 0 fork $ putMVar var 1 tryReadMVar var :} [fail] Successful [invariant failure] S0--P2- [fail] Deterministic [invariant failure] S0--P2- Nothing S0---- Just 0 S0--P1--S0--

In the `[invariant failure]`

case, thread 2 is scheduled, writing the forbidden value “1” to the MVar, which terminates the test.

Here is an example of a setup action with a teardown action:

> :{ autocheck $ let setup = newMVar () teardown var (Right _) = show <$> tryReadMVar var teardown _ (Left e) = pure (show e) in withSetupAndTeardown setup teardown $ \var -> do fork $ takeMVar var takeMVar var :} [pass] Successful [fail] Deterministic "Nothing" S0--- "Deadlock" S0-P1--S0-

The teardown action can perform arbitrary concurrency effects, including inspecting any mutable state returned by the setup action.

Setup and teardown actions were previously available in a slightly different form as the `dontCheck`

and `subconcurrency`

functions, which have been removed (see the migration guide if you used these).

It’s also something that dejafu does not do.

I’m going to use the “stores are transitively visible” litmus test as a running example. Here it is:

import qualified Control.Monad.Conc.Class as C import Test.DejaFu.Internal import Test.DejaFu.SCT import Test.DejaFu.SCT.Internal.DPOR import Test.DejaFu.Types import Test.DejaFu.Utils storesAreTransitivelyVisible :: C.MonadConc m => m (Int, Int, Int) storesAreTransitivelyVisible = do x <- C.newCRef 0 y <- C.newCRef 0 j1 <- C.spawn (C.writeCRef x 1) j2 <- C.spawn (do r1 <- C.readCRef x; C.writeCRef x 1; pure r1) j3 <- C.spawn (do r2 <- C.readCRef y; r3 <- C.readCRef x; pure (r2,r3)) (\() r1 (r2,r3) -> (r1,r2,r3)) <$> C.readMVar j1 <*> C.readMVar j2 <*> C.readMVar j3

I picked this one because it’s kind of arbitrarily complex. It’s a small test, but it’s for the relaxed memory implementation, so there’s a lot going on. It’s a fairly dense test.

I’m now going to define a metric of trace complexity which I’ll justify in a moment:

complexity :: Trace -> (Int, Int, Int, Int) complexity = foldr go (0,0,0,0) where go (SwitchTo _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x+1, y, z) go (Start _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y+1, z) go (Continue, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y, z+1) go (SwitchTo _, _, _) (w, x, y, z) = (w, x+1, y, z) go (Start _, _, _) (w, x, y, z) = (w, x, y+1, z) go (Continue, _, _) (w, x, y, z) = (w, x, y, z+1)

Using the `183-shrinking`

branch, we can now get the first trace for every distinct result, along with its complexity:

results :: Way -> MemType -> IO () results way memtype = do let settings = set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

Here are the results for systematic testing:

λ> results (systematically defaultBounds) SequentialConsistency Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S2-----S1---S0---S3-----S0-- (0,0,6,24) Right (0,0,0) S0------------S2-P3-----S1---S0--S2----S0--- (0,1,6,23) Right (1,0,0) S0------------S3-----S1---S0--S2-----S0--- (0,0,6,24) λ> results (systematically defaultBounds) TotalStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23) λ> results (systematically defaultBounds) PartialStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23)

Pretty messy, right? Here’s the results for *random* testing:

λ> results (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----P1-P0----P2-P1-P0-P3-P1-S2-P3--P0-P3-P0-P3-S2-P0-S2-P0--P2-S0- (0,15,5,9) Right (0,0,1) S0-------P2-P1-P2-P0--P2-P0-P1-P0---S2-P3-P0-P2-S3---P1-S3-S0-- (0,12,5,12) Right (1,0,0) S0------------S3-----S1-P2-P1-P0--S2---P1-S0--- (0,4,5,20) Right (0,0,0) S0---------P2-P0--P3-P0-S3--P2-P3-P2--P3-S2-S1--P0---- (0,9,4,15) λ> results (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14) λ> results (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14)

Yikes!

The complexity metric I defined counts four things:

- The number of relaxed-memory commit actions
- The number of pre-emptive context switches
- The number of non-pre-emptive context switches
- The number of continues

I would much rather read a long trace where the only context switches are when threads block, than a short one which is rapidly jumping between threads. So, given two equivalent traces, I will always prefer the one with a lexicographically smaller complexity-tuple.

The key idea underpinning trace simplification is that dejafu can tell when two scheduling decisions can be swapped without changing the behaviour of the program. I talked about this idea in the Using Hedgehog to Test Déjà Fu memo. So we can implement transformations which are guaranteed to preserve semantics *without needing to verify this by re-running the test case*.

Although we don’t need to re-run the test case at all, the `183-shrinking`

branch currently does, but only once at the end after the minimum has been found. This is because it’s easier to generate a simpler sequence of scheduling decisions and use dejafu to produce the corresponding trace than it is to produce a simpler trace directly. This is still strictly better than a typical shrinking algorithm, which would re-run the test case after *each* shrinking step, rather than only at the end.

Rather than drag this out, here’s what those random traces simplify to:

resultsS :: Way -> MemType -> IO () resultsS way memtype = do let settings = set lsimplify True . set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

This is much better.

There are two simplification phases: a preparation phase, which puts the trace into a normal form and prunes unnecessary commits; and an iteration phase, which repeats a step function until a fixed point is reached (or the iteration limit is).

The preparation phase has two steps: first we put the trace into *lexicographic normal form*, then we prune unnecessary commits.

We put a trace in lexicographic normal form by sorting by thread ID, where only independent actions can be swapped:

lexicoNormalForm :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] lexicoNormalForm memtype = go where go trc = let trc' = bubble initialDepState trc in if trc == trc' then trc else go trc' bubble ds (t1@(tid1, ta1):t2@(tid2, ta2):trc) | independent ds tid1 ta1 tid2 ta2 && tid2 < tid1 = bgo ds t2 (t1 : trc) | otherwise = bgo ds t1 (t2 : trc) bubble _ trc = trc bgo ds t@(tid, ta) trc = t : bubble (updateDepState memtype ds tid ta) trc

If simplification only put traces into lexicographic normal form, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18)

These are better than they were, but we can do better still.

After putting the trace into lexicographic normal form, we delete any commit actions which are followed by any number of independent actions and then a memory barrier:

dropCommits :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] dropCommits SequentialConsistency = id dropCommits memtype = go initialDepState where go ds (t1@(tid1, ta1@(CommitCRef _ _)):t2@(tid2, ta2):trc) | isBarrier (simplifyAction ta2) = go ds (t2:trc) | independent ds tid1 ta1 tid2 ta2 = t2 : go (updateDepState memtype ds tid2 ta2) (t1:trc) go ds (t@(tid,ta):trc) = t : go (updateDepState memtype ds tid ta) trc go _ [] = []

Such commits don’t affect the behaviour of the program at all, as all buffered writes gets flushed when the memory barrier happens.

If simplification only did the preparation phase, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm

The iteration phase attempts to reduce context switching by pushing actions forwards, or pulling them backwards, through the trace.

If we have the trace `[(tid1, act1), (tid2, act2), (tid1, act3)]`

, where `act2`

and `act3`

are independent, the “pull back” transformation would re-order that to `[(tid1, act1), (tid1, act3), (tid2, act2)]`

.

In contrast, if `act1`

and `act2`

were independent, the “push forward” transformation would re-order that to `[(tid2, act2), (tid1, act1), (tid1, act3)]`

. The two transformations are almost, but not quite opposites.

Pull-back walks through the trace and, at every context switch, looks forward to see if there is a single action of the original thread it can put before the context switch:

pullBack :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pullBack memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 trc' = if tid1 /= tid2 then maybe trc (uncurry (:)) (findAction tid1 ds' trc) else trc in t1 : go ds' trc' go _ trc = trc findAction tid0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just (t, trc) | otherwise = case fgo (updateDepState memtype ds tid ta) trc of Just (ft@(ftid, fa), trc') | independent ds tid ta ftid fa -> Just (ft, t:trc') _ -> Nothing fgo _ _ = Nothing

Push-forward walks through the trace and, at every context switch, looks forward to see if the last action of the original thread can be put at its next execution:

pushForward :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pushForward memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 in if tid1 /= tid2 then maybe (t1 : go ds' trc) (go ds) (findAction tid1 ta1 ds trc) else t1 : go ds' trc go _ trc = trc findAction tid0 ta0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just ((tid0, ta0) : t : trc) | independent ds tid0 ta0 tid ta = (t:) <$> fgo (updateDepState memtype ds tid ta) trc | otherwise = Nothing fgo _ _ = Nothing

The iteration process just repeats `pushForward memtype . pullBack memtype`

.

If it only used `pullBack`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2---P0--S2--S0-P3-----S0-- (0,3,5,21) Right (0,0,1) S0-----------P2-P1-P2--P1--S0--S2--S0-P3-----S0-- (0,5,5,19) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S2-P3---P2----S3--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21)

With no exception, iterating `pullBack`

is an improvement over just doing preparation.

If it only used `pushForward`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-------P1---S2--P0------S2--P3----P0--- (0,4,3,20) Right (0,0,1) S0-------P2-P1-P2-P1-P0------S1-S2---S3----P0--- (0,6,4,18) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) ^-- no improvement over preparation Right (0,0,0) S0------------S3--P2-----S3--P1--P0---- (0,3,3,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation

With three exceptions, where the traces didn’t change, iterating `pushForward`

is an improvement over just doing preparation.

We’ve already seen the results if we combine them:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

I think what I have right now is pretty good. It’s definitely a vast improvement over not doing any simplification.

*But*, no random traces get simplified to the corresponding systematic traces, which is a little disappointing. I think that’s because the current passes just try to reduce context switches of any form, whereas really I want to reduce pre-emptive context switches more than non-pre-emptive ones.

Trying every possible interleaving will give you, in general, an exponential blow-up of the executions you need to perform as your test case grows in size. The core testing algorithm we use, a variant of dynamic partial-order reduction (DPOR)00For all the gory details, see:

, attempts to reduce this blow-up. DPOR identifies actions which are *dependent*, and only tries interleavings which permute dependent actions.

Here are some examples:

It doesn’t matter which order two threads execute

`readMVar`

, for the same`MVar`

. These actions are*independent*.It does matter which order two threads execute

`putMVar`

, for the same`MVar`

. These actions are*dependent*.It doesn’t matter which order two threads execute

`putMVar`

for different`MVar`

s. These actions are*independent*.

Two actions are dependent if the order in which they are performed matters.

So the intuition behind DPOR is that most actions in a concurrent program are *independent*. DPOR won’t help you much if you have a single piece of shared state which every thread is hitting, but most concurrent programs aren’t like that. The worst case is still a terrible exponential blow-up, but the average case is much better.

The dependency relation is *core* part of Déjà Fu today. It has impacts on both performance and correctness. If it says two actions are dependent when they are not, then we may see unnecessary interleavings tried. If it says two actions are not dependent when they really are, then we may miss necessary interleavings.

Being such an important component, it must be well-tested, right? Well, sort of. The Déjà Fu testsuite mostly consists of small concurrent programs together with a list of expected outputs, testing that Déjà Fu finds all the nondeterminism in the program. This does exercise the dependency relation, but only very indirectly.

There things would have remained had I not experienced one of those coincidence-driven flashes of insight:

aherrmann opened an issue on GitHub asking how to take an execution trace and replay it.

agnishom posted a thread on /r/algorithms asking how to check the equivalence of traces where only some elements commute.

I had my idea. I can *directly* test the dependency relation like so:

- Execute a concurrent program.
- Normalise its execution trace in some way.
- “Replay” the normalised trace.
- Assert that the result is the same.

So, what is a good normal form for a trace? I tried out a few approaches here, but there was one I kept coming back to: we should shuffle around independent actions to keep the program on the main thread for as long as possible.

There are two reasons I think this works well. (1) The traces we get will be easier for a human to read, as the program will stay on its main thread and only execute another thread where necessary. (2) A Haskell program terminates when the main thread terminates, so by executing the main thread as much as possible, we may find that some actions don’t need to be executed at all.

So firstly we need to know when two actions commute. Let’s just use the dependency relation for that:

-- | Check if two actions commute. independent :: DepState -> (ThreadId, ThreadAction) -> (ThreadId, ThreadAction) -> Bool independent ds (tid1, ta1) (tid2, ta2) = not (dependent ds tid1 ta1 tid2 ta2)

The `DepState`

parameter tracks information about the history of the execution, allowing us to make better decisions. For example: while in general it matters in which order two `putMVar`

s to the same `MVar`

happen; it *doesn’t* matter if the `MVar`

is already full, as both actions will block without achieving anything.

The approach works well in practice, but has been the source of *so many* off-by-one errors. Even while writing this memo!

So now onto trace normalisation. The easiest way to do it is bubble sort, but with an additional constraint on when we can swap things:

- For every adjacent pair of items
`x`

and`y`

in the trace:- If
`x`

and`y`

commute and`thread_id y < thread_id x`

:- Swap
`x`

and`y`

.

- Swap
- Update the
`DepState`

and continue to the next pair.

- If
- Repeat until there are no more changes.

And here’s the code:

-- | Rewrite a trace into a canonical form. normalise :: [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] normalise trc0 = if changed then normalise trc' else trc' where (changed, trc') = bubble initialDepState False trc0 bubble ds flag ((x@(tid1, _)):(y@(tid2, _)):trc) | independent ds x y && tid2 < tid1 = go ds True y (x : trc) | otherwise = go ds flag x (y : trc) bubble _ flag trc = (flag, trc) go ds flag t@(tid, ta) trc = second (t :) (bubble (updateDepState ds tid ta) flag trc)

Now we need a scheduler which can play a given list of scheduling decisions. This isn’t built in, but we can make one. Schedulers look like this:

-- from Test.DejaFu.Schedule newtype Scheduler state = Scheduler { scheduleThread :: Maybe (ThreadId, ThreadAction) -> NonEmpty (ThreadId, Lookahead) -> state -> (Maybe ThreadId, state) }

A scheduler is a stateful function, which takes the previously scheduled action and the list of runnable threads, and gives back a thread to execute. We don’t care about those parameters. We just want to play a fixed list of scheduling decisions. And here is how we do that:

-- | Execute a concurrent program by playing a list of scheduling decisions. play :: MemType -> [ThreadId] -> ConcIO a -> IO (Either Failure a, [ThreadId], Trace) play = runConcurrent (Scheduler sched) where sched _ _ (t:ts) = (Just t, ts) sched _ _ [] = (Nothing, [])

Now all the background is in place, so we can test what we want to test: that an execution, and the play-back of its normalised trace, give the same result. For reasons which will become apparent in the next section, I’m going to parameterise over the normalisation function:

-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and return both results. runNorm :: ([(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)]) -> Int -> MemType -> ConcIO a -> IO (Either Failure a, [ThreadId], Either Failure a, [ThreadId]) runNorm norm seed memtype conc = do let g = mkStdGen seed -- 1 (efa1, _, trc) <- runConcurrent randomSched memtype g conc let -- 2 trc' = tail ( scanl (\(t, _) (d, _, a) -> (tidOf t d, a)) (initialThread, undefined) trc ) let tids1 = map fst trc' let tids2 = map fst (norm trc') -- 3 (efa2, s, _) <- play memtype tids2 conc let truncated = take (length tids2 - length s) tids2 -- 4 pure (efa1, tids1, efa2, truncated)

There’s a lot going on here, so let’s break it down:

We execute the program with the built-in random scheduler, using the provided seed.

The trace that

`runConcurrent`

gives us is in the form`[(Decision, [(ThreadId, Lookahead)], ThreadAction)]`

, whereas we want a`[(ThreadId, ThreadAction)]`

. So this scan just changes the format. It’s a scan rather than a map because to convert a`Decision`

into a`ThreadId`

potentially requires knowing what the previous thread was.We normalise the trace, and run it again.

If the entire normalised trace wasn’t used up, then it has some unnecessary suffix (because the main thread is now terminating sooner). So we make the normalised trace easier to read by chopping off any such suffix.

Finally, we can write a little function to test using the `normalise`

function:

-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and check that both give the same result. testNormalise :: (Eq a, Show a) => Int -> MemType -> ConcIO a -> IO Bool testNormalise seed memtype conc = do (efa1, tids1, efa2, tids2) <- runNorm normalise seed memtype conc unless (efa1 == efa2) $ do putStrLn "Mismatched result!" putStrLn $ " expected: " ++ show efa1 putStrLn $ " but got: " ++ show efa2 putStrLn "" putStrLn $ "rewritten from: " ++ show tids1 putStrLn $ " to: " ++ show tids2 pure (efa1 == efa2)

And does it work? Let’s copy two example programs from the Test.DejaFu docs:

-- from Test.DejaFu example1 :: MonadConc m => m String example1 = do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var example2 :: MonadConc m => m (Bool, Bool) example2 = do r1 <- newCRef False r2 <- newCRef False x <- spawn $ writeCRef r1 True >> readCRef r2 y <- spawn $ writeCRef r2 True >> readCRef r1 (,) <$> readMVar x <*> readMVar y

And then test them:

> testNormalise 0 TotalStoreOrder example1 True > testNormalise 0 TotalStoreOrder example2 True

According to my very unscientific method, everything works perfectly!

You can probably see where this is going: just supplying *one* random seed and *one* memory model is a poor way to test things. Ah, if only we had some sort of tool to generate arbitrary values for us!

But that’s not all: if the dependency relation is correct, then *any* permutation of independent actions should give the same result, not just the one which `normalise`

implements. So before we introduce Hedgehog and arbitrary values, let’s make something a little more chaotic:

-- | Shuffle independent actions in a trace according to the given list. shuffle :: [Bool] -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] shuffle = go initialDepState where go ds (f:fs) (t1:t2:trc) | independent ds t1 t2 && f = go' ds fs t2 (t1 : trc) | otherwise = go' ds fs t1 (t2 : trc) go _ _ trc = trc go' ds fs t@(tid, ta) trc = t : go (updateDepState ds tid ta) fs trc

In `normalise`

, two independent actions will *always* be re-ordered if it gets us closer to the canonical form. However, in `shuffle`

, two independent actions will either be re-ordered or not, depending on the supplied list of `Bool`

.

This is much better for testing our dependency relation, as we can now get far more re-orderings which *all* should satisfy the same property: that no matter how the independent actions in a trace are shuffled, we get the same result.

I think it’s about time to bring out Hedgehog:

-- | Execute a concurrent program with a random scheduler, arbitrarily permute -- the independent actions in the trace, and check that we get the same result -- out. hog :: (Eq a, Show a) => ConcIO a -> IO Bool hog conc = Hedgehog.check . property $ do mem <- forAll Gen.enumBounded -- 1 seed <- forAll $ Gen.int (Range.linear 0 100) fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool (efa1, tids1, efa2, tids2) <- liftIO -- 2 $ runNorm (shuffle fs) seed mem conc footnote (" to: " ++ show tids2) -- 3 footnote ("rewritten from: " ++ show tids1) efa1 === efa2

Let’s break that down:

We’re telling Hedgehog that this property should hold for all memory models, all seeds, and all

`Bool`

-lists. Unlike most Haskell property-testing libraries, Hedgehog takes generator functions rather than using a typeclass. I think this is nicer.We run our program, normalise it, and get all the results just as before.

We add some footnotes: messages which Hedgehog will display along with a failure. For some reason these get displayed in reverse order.

Alright, let’s see if Hedgehog finds any bugs for us:

> hog example1 ? <interactive> failed after 3 tests and 1 shrink. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 2) (Seed 1824012233418733250 (-4876494268681827407)) <property> False

It did! And look at that output! Magical! I must see if I can get Déjà Fu to give annotated source output like that.

Let’s look at `example1`

again:

do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var

Oh dear, our rewritten trace is trying to execute thread `1`

immediately after the first action of the main thread. The first action of the main thread is `newEmptyMVar`

: thread `1`

doesn’t exist at that point!

Let’s change our `independent`

function to say that an action is dependent with the fork which creates its thread:

independent ds (tid1, ta1) (tid2, ta2) | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)

How about now?

> hog example1 ? <interactive> failed after 13 tests and 2 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ True , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 12) (Seed 654387260079025817 (-6686572164463137223)) <property> False

Well, that failing trace looks exactly like the previous error. But the parameters are different: the first error happened with the list `[False, True]`

, this requires the list `[True, True]`

. So let’s think about what happens to the trace in this case.

We start with:

`[(main, NewEmptyMVar 0), (main, Fork 1), (1, PutMVar 0)]`

.The first two actions are independent, and the flag is

`True`

, so we swap them. We now have:`[(main, Fork 1), (main, NewEmptyMVar 1), (1, PutMVar 0)]`

.The second two actions are independent, and the flag is

`True`

, so we swap them. We now have:`[(main, Fork 1), (1, PutMVar 0), (main, NewEmptyMVar 0)]`

.

We can’t actually re-order actions of the same thread, so we should never have swapped the first two. I suppose there’s another problem here, that no action on an `MVar`

commutes with creating that `MVar`

, but we should never be in a situation where that could happen. So we need another case in `independent`

:

independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)

Our first example program works fine now:

> hog example1 ? <interactive> passed 100 tests. True

The second is a little less happy:

> hog example2 ? <interactive> failed after 48 tests and 9 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? TotalStoreOrder 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , True ? ? ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right ( False , True ) ? ? + Left InternalError rewritten from: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,main,2,-1,2,2,main,main] to: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,2,main,-1] This failure can be reproduced by running: > recheck (Size 47) (Seed 2159662051602767058 (-7857629802164753123)) <property> False

This is a little trickier. Here’s my diagnosis:

It’s an

`InternalError`

again, which means we’re trying to execute a thread which isn’t runnable.The memory model is

`TotalStoreOrder`

, and the thread we’re trying to execute is thread`-1`

, a “fake” thread used in the relaxed memory implementation. So this is a relaxed memory bug.The traces only differ in one place: where

`main, 2, -1`

is changed to`2, main, -1`

. So the issue is caused by re-ordering`main`

and thread`2`

.If the

`main`

action is a memory barrier, then thread`-1`

will not exist after it.So the

`main`

action is probably a memory barrier.

Let’s push along those lines and add a case for memory barriers to `independent`

:

independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = case (simplifyAction ta1, simplifyAction ta2) of (UnsynchronisedWrite _, a) | isBarrier a -> False (a, UnsynchronisedWrite _) | isBarrier a -> False _ -> not (dependent ds tid1 ta1 tid2 ta2)

Did we get it?

> hog example2 ? <interactive> passed 100 tests. True

Great!

So, we explored the dependency relation with Hedgehog, and found three missing cases:

Two actions of the same thread are dependent.

Any action of a thread is dependent with the

`fork`

which creates that thread.Unsynchronised writes are dependent with memory barriers.

But are these *bugs*? I’m not so sure:

The dependency relation is only ever used to compare different threads.

This is technically correct, but it’s not interesting or useful.

This could be a bug. The relaxed memory implementation is pretty hairy and I’ve had a lot of problems with it in the past. Honestly, I just need to rewrite it (or campaign for Haskell to become sequentially consistent11

**SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap**, M. Vollmer, R. G. Scott, M. Musuvathi, and R. R. Newton (2017)

and rip it out).

But even if not bugs, these are definitely *confusing*. The dependency relation is currently just an internal thing, not exposed to users. However, I’m planning to expose a function to normalise traces, in which case providing an `independent`

function is entirely reasonable.

So even if these changes don’t make it into `dependent`

, they will be handled by `independent`

.

**Next steps:** I’m going to get this into the test suite, to get a large number of extra example programs for free. My hacky and cobbled-together testing framework in dejafu-tests is capable of running every test case with a variety of different schedulers, so I just need to add another way it runs everything. I won’t need to touch the actual tests, just the layer of glue which runs them all, which is nice.

The only problem is that this glue is currently based on HUnit and test-framework, whereas the only integration I can find for Hedgehog is tasty-hedgehog, so I might need to switch to tasty first. As usual, the hardest part is getting different libraries to co-operate!

Hopefully I’ll find some bugs! Well, not exactly *hopefully*, but you know what I mean.

There are 100 prisoners in solitary cells. There’s a central living room with one light bulb; this bulb is initially off. No prisoner can see the light bulb from his or her own cell. Everyday, the warden picks a prisoner equally at random, and that prisoner visits the living room. While there, the prisoner can toggle the bulb if he or she wishes. Also, the prisoner has the option of asserting that all 100 prisoners have been to the living room by now. If this assertion is false, all 100 prisoners are shot. However, if it is indeed true, all prisoners are set free and inducted into MENSA, since the world could always use more smart people. Thus, the assertion should only be made if the prisoner is 100% certain of its validity. The prisoners are allowed to get together one night in the courtyard, to discuss a plan. What plan should they agree on, so that eventually, someone will make a correct assertion?

We can express this as a concurrency problem: the warden is the scheduler, each prisoner is a thread, and when the program terminates every prisoner should have visited the living room.

Let’s set up some imports:

{-# LANGUAGE RankNTypes #-} import qualified Control.Concurrent.Classy as C import Control.Monad (forever, when) import Data.Foldable (for_) import Data.List (genericLength) import Data.Maybe (mapMaybe) import qualified Data.Set as S import qualified Test.DejaFu as D import qualified Test.DejaFu.Common as D import qualified Test.DejaFu.SCT as D

Before we try to implement a solution, let’s think about how we can check if an execution corresponds to the prisoners succeeding an entering MENSA, or failing and being shot.

Prisoners are threads, and the warden is the scheduler. So if every thread (prisoner) that is forked is scheduled (taken to the room), then the prisoners are successful:

-- | Check if an execution corresponds to a correct guess. isCorrect :: D.Trace -> Bool isCorrect trc = S.fromList (threads trc) == S.fromList (visits trc) -- | Get all threads created. threads :: D.Trace -> [D.ThreadId] threads trc = D.initialThread : mapMaybe go trc where go (_, _, D.Fork tid) = Just tid go _ = Nothing -- | Get all scheduled threads visits :: D.Trace -> [D.ThreadId] visits = mapMaybe go where go (D.Start tid, _, _) = Just tid go (D.SwitchTo tid, _, _) = Just tid go _ = Nothing

So now, given some way of setting up the game and running it to completion, we can test it and print some statistics:

-- | Run the prison game and print statistics. run :: D.Way -> (forall m. C.MonadConc m => m ()) -> IO () run way game = do traces <- map snd <$> D.runSCT way D.defaultMemType game let successes = filter isCorrect traces let failures = filter (not . isCorrect) traces putStrLn (show (length traces) ++ " total attempts") putStrLn (show (length successes) ++ " successes") putStrLn (show (length failures) ++ " failures") putStrLn (show (avgvisits successes) ++ " average number of room visits per success") putStrLn (show (avgvisits failures) ++ " average number of room visits per failure") putStrLn "Sample sequences of visits:" for_ (take 5 traces) (print . visits) where avgvisits ts = sum (map (fromIntegral . numvisits) ts) / genericLength ts numvisits = sum . map count where count (_, _, D.STM _ _) = 1 count (_, _, D.BlockedSTM _) = 1 count (_, _, D.Yield) = 1 count _ = 0

I have decided to assume that a prisoner will either yield (doing nothing) or perform some STM transaction while they’re in the room, to simplify things.

A slow but simple strategy is for the prisoners to nominate a leader. Only the leader can declare to the warden that everyone has visited the room. Whenever a prisoner other than the leader visits the room, if the light is *on*, they do nothing; otherwise, if this is their first time in the room with the light off, they turn it on, otherwise they leave it. Whenever the leader enters the room, they turn the light off. When the leader has turned the light off 99 times (or `1 - num_prisoners`

times), they tell the warden that everyone has visited.

Let’s set up those algorithms:

-- | The state of the light bulb. data Light = IsOn | IsOff -- | Count how many prisoners have toggled the light and terminate -- when everyone has. leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Light -> m () leader prisoners light = go 0 where go counter = do counter' <- C.atomically $ do state <- C.readTVar light case state of IsOn -> do C.writeTVar light IsOff pure (counter + 1) IsOff -> C.retry when (counter' < prisoners - 1) (go counter') -- | Turn the light on once then do nothing. notLeader :: C.MonadConc m => C.TVar (C.STM m) Light -> m () notLeader light = do C.atomically $ do state <- C.readTVar light case state of IsOn -> C.retry IsOff -> C.writeTVar light IsOn forever C.yield

So now we just need to create a program where the leader is the main thread and everyone else is a separate thread:

-- | Most popular English male and female names, according to -- Wikipedia. name :: Int -> String name i = ns !! (i `mod` length ns) where ns = ["Oliver", "Olivia", "George", "Amelia", "Harry", "Emily"] -- | Set up the prison game. The number of prisoners should be at -- least 1. prison :: C.MonadConc m => Int -> m () prison prisoners = do light <- C.atomically (C.newTVar IsOff) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader light)) leader prisoners light

Because these are people, not just threads, I’ve given them names. The leader is just called “main” though, how unfortunate for them.

Now we can try out our system and see if it works:

λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 5 total attempts 5 successes 0 failures 7.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main] λ> runS 3 2035 total attempts 2035 successes 0 failures 133.39066339066338 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: (big lists omitted)

This doesn’t scale well. It’s actually a really bad case for concurrency testing: every thread is messing with the same shared state, so dejafu has to try all the orderings. Not good.

Taking another look at our prisoners, we can see two things which a human would use to decide whether some schedules are redundant or not:

If we adopt any schedule other than alternating leader / non-leader, threads will block without doing anything. So we should alternate.

When a non-leader has completed their task, they will always yield. So we should never schedule a prisoner who will yield.

Unfortunately dejafu can’t really make use of (1). It could be inferred *if* dejafu was able to compare values inside `TVar`

s, rather than just seeing that there had been a write. But Haskell doesn’t let us do that without slapping an `Eq`

constraint on `writeTVar`

, which I definitely don’t want to do (although maybe having a separate `eqwriteTVar`

, `eqputMVar`

, and so on would be a nice addition).

Fortunately, dejafu *can* do something with (2). It already bounds the maximum number of times a thread can yield, so that we can test constructs like spinlocks. This is called *fair bounding*. The default bound is 5, but if we set it to 0 dejafu will just never schedule a thread which is going to yield. Here we go:

λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing, D.boundFair = Just 0 }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 1 total attempts 1 successes 0 failures 4.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main] λ> runS 3 4 total attempts 4 successes 0 failures 7.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main] [main,Olivia,George,main,George,main] [main,George,main,Olivia,main] [main,George,Olivia,main,Olivia,main]

Much better! Although it still doesn’t scale as nicely as we’d like

λ> runS 4 48 total attempts 48 successes 0 failures 11.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main] [main,Olivia,main,Amelia,main,George,main] [main,Olivia,main,Amelia,George,main,George,main] [main,Olivia,George,main,George,main,Amelia,main] λ> runS 5 1536 total attempts 1536 successes 0 failures 16.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main] [main,Olivia,main,George,main,Harry,main,Amelia,main] [main,Olivia,main,George,main,Harry,Amelia,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main,Harry,main] λ> runS 6 122880 total attempts 122880 successes 0 failures 21.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Harry,Emily,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Emily,main,Harry,main] [main,Olivia,main,George,main,Amelia,main,Emily,Harry,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main,Emily,main]

The prisoners are stepping on each other’s toes and causing needless work. This is probably as good as we can do without adding some extra primitives to dejafu to optimise the case where we have an `Eq`

instance available, unfortunately.

In concurrency testing terms, six threads is actually quite a lot.

Empirical studies have found that many concurrency bugs can be exhibited with only two or three threads! Furthermore, most real-world concurrent programs don’t have every single thread operating on the same bit of shared state.

There’s another school of thought which says to just wait for three years, because by then it’s very unlikely that any single prisoner had never visited the room. In fact, we would expect each prisoner to have been to the room ten times by then, assuming the warden is fair.

By keeping track of how many days have passed, we can try this out as well:

leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Int -> m () leader prisoners days = C.atomically $ do numDays <- C.readTVar days C.check (numDays >= (prisoners - 1) * 10) notLeader :: C.MonadConc m => C.TVar (C.STM m) Int -> m () notLeader days = forever . C.atomically $ C.modifyTVar days (+1) prison :: C.MonadConc m => Int -> m () prison prisoners = do days <- C.atomically (C.newTVar 0) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader days)) leader prisoners days

Now let’s see how these brave prisoners do (sample visit sequences omitted because they’re pretty long):

λ> let runR = run $ D.uniformly (R.mkStdGen 0) 100 λ> runR 1 100 total attempts 100 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure λ> runR 2 100 total attempts 100 successes 0 failures 18.35 average number of room visits per success NaN average number of room visits per failure λ> runR 3 100 total attempts 100 successes 0 failures 31.92 average number of room visits per success NaN average number of room visits per failure λ> runR 4 100 total attempts 100 successes 0 failures 43.52 average number of room visits per success NaN average number of room visits per failure λ> runR 5 100 total attempts 100 successes 0 failures 55.88 average number of room visits per success NaN average number of room visits per failure λ> runR 6 100 total attempts 100 successes 0 failures 67.37 average number of room visits per success NaN average number of room visits per failure λ> runR 7 100 total attempts 100 successes 0 failures 77.05 average number of room visits per success NaN average number of room visits per failure λ> runR 8 100 total attempts 99 successes 1 failures 90.4040404040404 average number of room visits per success 81.0 average number of room visits per failure λ> runR 9 100 total attempts 100 successes 0 failures 101.64 average number of room visits per success NaN average number of room visits per failure λ> runR 10 100 total attempts 100 successes 0 failures 114.89 average number of room visits per success NaN average number of room visits per failure

Not bad at all! Although my puny VPS still can’t manage all 100.

]]>As before, all code is available on GitHub. The code for this post is under the “post-02” tag.

Did you do last time’s homework task? It was to implement this interface:

data CRef m a = -- ... newCRef :: a -> MiniFu m (CRef m a) readCRef :: CRef m a -> MiniFu m a writeCRef :: CRef m a -> a -> MiniFu m () atomicModifyCRef :: CRef m a -> (a -> (a, b)) -> MiniFu m b

Here are my solutions, available at the “homework-01” tag:

- (
`2070bdf`

) Add the`CRef`

type, the`PrimOp`

constructors, and the wrapper functions - (
`188eec5`

) Implement the primops

I also made some changes, available at the “pre-02” tag:

- (
`7ce6e41`

) Add a helper for primops which don’t create any identifiers - (
`2419796`

) Move some definitions into an internal module - (
`9c49f9d`

) Change the type of the`block`

helper to`MVarId -> Threads m -> Threads m`

- (
`dabd84b`

) Implement`readMVar`

Now on to the show…

We can’t implement exceptions with what we have already. We’re going to need some new primops. I think you’re getting a feel for how this works now, so I won’t drag this out. Here we go:

import qualified Control.Exception as E data PrimOp m where -- ... Throw :: E.Exception e => e -> PrimOp m Catch :: E.Exception e => MiniFu m a -> (e -> MiniFu m a) -> (a -> PrimOp m) -> PrimOp m PopH :: PrimOp m -> PrimOp m throw :: E.Exception e => e -> MiniFu m a throw e = MiniFu (K.cont (\_ -> Throw e)) catch :: E.Exception e => MiniFu m a -> (e -> MiniFu m a) -> MiniFu m a catch act h = MiniFu (K.cont (Catch act h))

Throwing an exception with `throw`

jumps back to the closest enclosing `catch`

with an exception handler of the appropriate type, killing the thread if there is none. The `PopH`

primop will pop the top exception handler from the stack. We’ll insert those as appropriate when entering a `catch`

.

Before we can actually implement these primops, we need to give threads a place to store their exception handlers. You might have guessed it when I said “stack”: we’ll just give every thread a list of them. This requires changing our `Thread`

type and `thread`

function:

data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe MVarId , threadExc :: [Handler m] -- <- new } data Handler m where Handler :: E.Exception e => (e -> PrimOp m) -> Handler m thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing , threadExc = [] -- <- new }

As `Exception`

is a subclass of `Typeable`

, given some exception value we’re able to look for the first matching handler:

raise :: E.Exception e => e -> Thread m -> Maybe (Thread m) raise exc thrd = go (threadExc thrd) where go (Handler h:hs) = case h <$> E.fromException exc' of Just pop -> Just (thrd { threadK = pop, threadBlock = Nothing, threadExc = hs }) Nothing -> go hs go [] = Nothing exc' = E.toException exc

If `raise`

returns a `Just`

, then a handler was found and entered. Otherwise, no handler exists and the thread should be removed from the `Threads`

collection. This can be expressed rather nicely as `M.update . raise`

.

Now we have enough support to implement the primops:

stepThread {- ... -} where -- ... go (Throw e) = simple (M.update (raise e) tid) go (Catch (MiniFu ma) h k) = simple . adjust $ \thrd -> thrd { threadK = K.runCont ma (PopH . k) , threadExc = let h' exc = K.runCont (runMiniFu (h exc)) k in Handler h' : threadExc thrd } go (PopH k) = simple . adjust $ \thrd -> thrd { threadK = k , threadExc = tail (threadExc thrd) }

Let’s break that down:

`Throw`

just re-uses our`raise`

function to either jump to the exception handler or kill the thread.`Catch`

changes the continuation of the thread to run the enclosed action, then do a`PopH`

action, then run the outer action. It also adds an exception continuation, which just runs the exception handler, then runs the outer action.`PopH`

just removes the head exception continuation.

It’s important that the exception continuation *doesn’t* use `PopH`

to remove itself: that happens in `raise`

when an exception is thrown. When writing this section I realised I’d made that mistake in dejafu (#139)!

So now we can use synchronous exceptions! Here’s an incredibly contrived example:

{-# LANGUAGE ScopedTypeVariables #-} import Control.Monad (join) example_sync :: MiniFu m Int example_sync = do a <- newEmptyMVar fork (putMVar a (pure 1)) fork (putMVar a (throw E.NonTermination)) fork (putMVar a (throw E.AllocationLimitExceeded)) catch (catch (join (readMVar a)) (\(_ :: E.AllocationLimitExceeded) -> pure 2)) (\(_ :: E.NonTermination) -> pure 3) demo_sync :: IO () demo_sync = do g <- R.newStdGen print . fst =<< minifu randomSched g example_sync

If we run this a few times in ghci, we can see the different exceptions being thrown and caught (resulting in different outputs):

λ> demo_sync Just 1 λ> demo_sync Just 3 λ> demo_sync Just 3 λ> demo_sync Just 2

`MonadConc`

has a bunch of superclasses, and we can now implement two of them!

import qualified Control.Monad.Catch as EM instance EM.MonadThrow (MiniFu m) where throwM = -- 'throw' from above instance EM.MonadCatch (MiniFu m) where catch = -- 'catch' from above

The exceptions package provides the `MonadThrow`

, `MonadCatch`

, and `MonadMask`

typeclasses, so we can talk about exceptions in a wider context than just `IO`

. We’ll get on to `MonadMask`

when we look at asynchronous exceptions.

It is with exceptions that we hit the first thing we can’t do in MiniFu.

When in `IO`

, we can catch exceptions from pure code:

λ> import Control.Exception λ> evaluate undefined `catch` \e -> putStrLn ("Got " ++ show (e :: SomeException)) Got Prelude.undefined CallStack (from HasCallStack): error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err undefined, called at <interactive>:5:10 in interactive:Ghci2

But we can’t do that in `MiniFu`

, as there’s no suitable `evaluate`

function.

Should there be an `evaluate`

in the `MonadConc`

class? I’m unconvinced, as it’s not really a *concurrency* operation.

Should we constrain the `m`

in `MiniFu m`

to be a `MonadIO`

, which would let us call `evaluate`

? Perhaps, that would certainly be a way to do it, and I’m currently investigating the advantages of an `IO`

base monad for dejafu (although originally for a different reason).

Asynchronous exceptions are like synchronous exceptions, except for two details:

- They are thrown to a thread identified by
`ThreadId`

. We can do this already with`raise`

. - Raising the exception may be blocked due to the target thread’s
*masking state*. We need to do some extra work to implement this.

When a thread is masked, attempting to deliver an asynchronous exception to it will block. There are three masking states:

`Unmasked`

, asynchronous exceptions are unmasked.`MaskedInterruptible`

, asynchronous exceptions are masked, but blocked operations may still be interrupted.`MaskedUninterruptible`

, asynchronous exceptions are masked, and blocked operations may not be interrupted.

So we’ll add the current masking state to our `Thread`

type, defaulting to `Unmasked`

, and also account for blocking on another thread:

data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe (Either ThreadId MVarId) -- <- new , threadExc :: [Handler m] , threadMask :: E.MaskingState -- <- new } thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing , threadExc = [] , threadMask = E.Unmasked -- <- new }

We’ll also need a primop to set the masking state:

data PrimOp m where -- ... Mask :: E.MaskingState -> PrimOp m -> PrimOp m

Which has a fairly straightforward implementation:

stepThread {- ... -} where -- ... go (Mask ms k) = simple . adjust $ \thrd -> thrd { threadK = k , threadMask = ms }

Finally, we need to make sure that if an exception is raised, and we jump into an exception handler, the masking state gets reset to what it was when the handler was created. This means we need a small change to the `Catch`

primop:

stepThread {- ... -} where -- ... go (Catch (MiniFu ma) h k) = simple . adjust $ \thrd -> thrd { threadK = K.runCont ma (PopH . k) , threadExc = let ms0 = threadMask thrd -- <- new h' exc = flip K.runCont k $ do K.cont (\c -> Mask ms0 (c ())) -- <- new runMiniFu (h exc) in Handler h' : threadExc thrd }

Alright, now we have enough background to actually implement the user-facing operations.

To throw an asynchronous exception, we’re going to need a new primop:

data PrimOp m where -- ... ThrowTo :: E.Exception e => ThreadId -> e -> PrimOp m -> PrimOp m

Which has a corresponding wrapper function:

throwTo :: E.Exception e => ThreadId -> e -> MiniFu m () throwTo tid e = MiniFu (K.cont (\k -> ThrowTo tid e (k ())))

Let’s think about the implementation of the `ThrowTo`

primop. It first needs to check if the target thread is interruptible and, if so, raises the exception in that thread; if not, it blocks the current thread. A thread is interruptible if its masking state is `Unmasked`

, or `MaskedInterruptible`

and it’s currently blocked.

Let’s encapsulate that logic:

import Data.Maybe (isJust) isInterruptible :: Thread m -> Bool isInterruptible thrd = threadMask thrd == E.Unmasked || (threadMask thrd == E.MaskedInterruptible && isJust (threadBlock thrd))

Given that, the implementation of `ThrowTo`

is straightforward:

stepThread {- ... -} where -- ... go (ThrowTo threadid e k) = simple $ case M.lookup threadid threads of Just t | isInterruptible t -> goto k . M.update (raise e) threadid | otherwise -> block (Left threadid) Nothing -> goto k

First, check if the thread exists. Then check if it’s interruptible: if it is, raise the exception, otherwise block. If the thread doesn’t exist any more, just continue.

Now we just need to handle *unblocking* threads which are blocked in `ThrowTo`

. For that, we’ll go back to the `run`

function and add a pass to unblock threads if the current one is interruptible after it processes its action:

run :: C.MonadConc m => Scheduler s -> s -> PrimOp m -> m s run sched s0 = go s0 . initialise where go s (threads, idsrc) | initialThreadId `M.member` threads = case runnable threads of Just tids -> do let (chosen, s') = sched tids s (threads', idsrc') <- stepThread chosen (threads, idsrc) let threads'' = if (isInterruptible <$> M.lookup chosen threads') /= Just False then unblock (Left chosen) threads' else threads' -- ^- new go s' (threads'', idsrc') Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . threadBlock) initialThreadId = fst (nextThreadId initialIdSource)

So after stepping a thread, we unblock every thread blocked on it if it either doesn’t exist, of if it does exist and is interruptible. It’s much more robust to do this once here than everywhere in `stepThread`

which might cause the thread to become interruptible.

There are two operations at the programmer’s disposal to change the masking state of a thread, `mask`

and `uninterruptibleMask`

. Here’s what the `MiniFu`

types will look like:

{-# LANGUAGE RankNTypes #-} mask :: ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> MiniFu m a uninterruptibleMask :: ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> MiniFu m a

Each takes an action to run, and runs it as either `MaskedInterruptible`

or `MaskedUninterruptible`

. The action is provided with a polymorphic callback to run a subcomputation with the original masking state.

This is going to need, you guessed it, a new primop! We *could* modify the `Mask`

primop to do this job as well, but I think it’s a little clearer to have two separate ones:

data PrimOp m where -- ... InMask :: E.MaskingState -> ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> (a -> PrimOp m) -> PrimOp m

And here’s the implementations of our masking functions:

mask ma = MiniFu (K.cont (InMask E.MaskedInterruptible ma)) uninterruptibleMask ma = MiniFu (K.cont (InMask E.MaskedUninterruptible ma))

We can now fulfil another requirement of `MonadConc`

: a `MonadMask`

instance!

instance MonadMask (MiniFu m) where mask = -- 'mask' from above uninterruptibleMask = -- 'uninterruptibleMask' from above

The very last piece of the puzzle for exception handling in MiniFu is to implement this `InMask`

primop. Its type looks quite intense, but the implementation is really not that bad. There are three parts:

stepThread {- ... -} where -- ... go (InMask ms ma k) = simple . adjust $ \thrd -> thrd { threadK = let ms0 = threadMask thrd -- (1) we need to construct the polymorphic argument function umask :: MiniFu m x -> MiniFu m x umask (MiniFu mx) = MiniFu $ do K.cont (\c -> Mask ms0 (c ())) x <- mx K.cont (\c -> Mask ms (c ())) pure x -- (2) we need to run the inner continuation, resetting the masking state -- when done in K.runCont (runMiniFu (ma umask)) (Mask ms0 . k) -- (3) we need to change the masking state , threadMask = ms }

The explicit type signature on `umask`

is needed because we’re using `GADTs`

, which implies `MonoLocalBinds`

, which prevents the polymorphic type from being inferred. We could achieve the same effect by turning on `NoMonoLocalBinds`

.

Now we have asynchronous exceptions, check it out:

example_async :: MiniFu m String example_async = do a <- newEmptyMVar tid <- fork (putMVar a "hello from the other thread") throwTo tid E.ThreadKilled readMVar a demo_async :: IO () demo_async = do g <- R.newStdGen print . fst =<< minifu randomSched g example_async

See:

λ> demo_async Just "hello from the other thread" λ> demo_async Just "hello from the other thread" λ> demo_async Nothing

We have come to the end of part 2! Again, I hope you enjoyed this post, any feedback is welcome. This is all on GitHub, and you can see the code we ended up with at the “post-02” tag.

Once again, I have some homework for you. Your task, should you choose to accept it, is to implement:

tryPutMVar :: MVar m a -> a -> MiniFu m Bool tryTakeMVar :: MVar m a -> MiniFu m (Maybe a) tryReadMVar :: MVar m a -> MiniFu m (Maybe a)

Solutions will be up in a few days, as before, at the “homework-02” tag.

Stay tuned because next time we’re going to implement STM: all of it in one go. Then we can finally get on to the testing.

Thanks to Will Sewell for reading an earlier draft of this post.

]]>example = do a <- newEmptyMVar forkIO (putMVar a 1) forkIO (putMVar a 2) takeMVar a

Will tell us the possible results of that computation:

λ> test example [1, 2]

We’re going to build this from the ground up, using the concurrency library, as it provides a typeclass abstraction over forking, MVars, STM, and suchlike.

You may have come across my dejafu library before. If not, don’t worry, but you may want to check it out as we’re going to be building something very similar.

Ok, with the preliminaries over, let’s get coding! All the code written in this series is on GitHub, with one tag for each post. The code for this post is under the “post-01” tag.

The goal in this post is to be able to implement a function which can execute simple thread-and-MVar computations (like the example from the beginning) with a stateful scheduler. Firstly, let’s say what we know:

- We’re using the
`MonadConc`

typeclass from concurrency, rather than`IO`

. - We want to be able to examine arbitrary
`MonadConc`

computations. - We also want to be able to pause and resume “threads” at will, so we can explore different executions.

That sounds rather like something based on continuations or a free monad. Furthermore, we’re going to need mutable state to implement all of this, as we’re modelling a DSL with mutable references, and doing that purely is a huge pain.

Let’s write down some types. Because we’re writing a mini-dejafu, I’m calling this project “minifu”. So we want a function:

import qualified Control.Concurrent.Classy as C import Data.List.NonEmpty (NonEmpty(..)) newtype ThreadId = ThreadId Int deriving (Eq, Ord) type Scheduler s = NonEmpty ThreadId -> s -> (ThreadId, s) minifu :: C.MonadConc m => Scheduler s -> s -> MiniFu m a -> m (Maybe a, s)

For some suitable `MiniFu`

monad transformer. Now we’re going to take the standard way of constructing a free monad, and have a data structure representing our class of interest (`MonadConc`

), with one constructor for every function. Because we’re only talking about threads and MVars in this post, it will be a fairly small type:

{-# LANGUAGE GADTs #-} data PrimOp m where Fork :: MiniFu m () -> (ThreadId -> PrimOp m) -> PrimOp m NewEmptyMVar :: (MVar m a -> PrimOp m) -> PrimOp m PutMVar :: MVar m a -> a -> PrimOp m -> PrimOp m TakeMVar :: MVar m a -> (a -> PrimOp m) -> PrimOp m Stop :: m () -> PrimOp m newtype MVarId = MVarId Int deriving (Eq, Ord) data MVar m a = MVar { mvarId :: MVarId , mvarRef :: C.CRef m (Maybe a) }

The `Stop`

action is what is going to let us communicate the final result out of the computation. I’ve also defined an `MVar`

type. Our MVars are going to be implemented as a `CRef`

(what concurrency calls an `IORef`

) holding a maybe value, along with an identifier. These identifiers will come into play when we look at threads blocking.

Given this set up, the `MiniFu`

type is very simple:

{-# LANGUAGE GeneralizedNewtypeDeriving #-} import qualified Control.Monad.Cont as K newtype MiniFu m a = MiniFu { runMiniFu :: K.Cont (PrimOp m) a } deriving (Functor, Applicative, Monad)

We’re not actually going to write a `MonadConc`

instance for `MiniFu`

yet, because there are a bunch of constraints which we can’t really satisfy. But we can still define the functions of interest:

fork :: MiniFu m () -> MiniFu m ThreadId fork ma = MiniFu (K.cont (Fork ma)) newEmptyMVar :: MiniFu m (MVar m a) newEmptyMVar = MiniFu (K.cont NewEmptyMVar) putMVar :: MVar m a -> a -> MiniFu m () putMVar v a = MiniFu (K.cont (\k -> PutMVar v a (k ()))) takeMVar :: MVar m a -> MiniFu m a takeMVar v = MiniFu (K.cont (TakeMVar v))

Hey, not bad! Now we can slap a `MiniFu m Int`

type signature on our example from the start (and rename the `forkIO`

calls) and it compiles!

example :: MiniFu m Int example = do a <- newEmptyMVar fork (putMVar a 1) fork (putMVar a 2) takeMVar a

Take a moment to make sure you’re happy with this section before moving on to the next. MiniFu is going to be a layered application: this is the basic layer which defines the functions we can test; the next layer executes a MiniFu computation; the layers above that will implement the systematic testing behaviour.

`minifu`

Recall the type of `minifu`

:

minifu :: C.MonadConc m => Scheduler s -> s -> MiniFu m a -> m (Maybe a, s)

So, what does it need to do? It needs to set up the execution environment: in this case that’s specifying that the provided computation is the main thread, and then it needs to repeatedly call the scheduler, executing one `PrimOp`

of the chosen thread at a time, until either the main thread terminates or everything is blocked.

In the best functional programming practice, `minifu`

is going to do the minimum it can and call another function to do the rest. So what `minifu`

is *actually* going to do is to extract the continuation and set up the mechanism to communicate the final result back:

minifu sched s (MiniFu ma) = do out <- C.newCRef Nothing s' <- run sched s (K.runCont ma (Stop . C.writeCRef out . Just)) a <- C.readCRef out pure (a, s')

Before we move on to the implementation of `run`

, let’s first look at two concerns we’ll have along the way: getting unique names (for threads and MVars) and representing threads.

Each thread gets a unique `ThreadId`

, and each MVar gets a unique `MVarId`

. As these are just an `Int`

, we can use the same source for both:

type IdSource = Int initialIdSource :: IdSource initialIdSource = 0 nextThreadId :: IdSource -> (ThreadId, IdSource) nextThreadId n = (ThreadId n, n + 1) nextMVarId :: IdSource -> (MVarId, IdSource) nextMVarId n = (MVarId n, n + 1)

This is as simple as it gets, but it’s good enough for now.

What is a thread? Well, it has a continuation, which is some value of type `PrimOp m`

, and it might be blocked. We want to know if a thread is blocked for two reasons: we don’t want the scheduler to schedule a blocked thread, and we want to be able to tell if the computation is deadlocked. Threads can only block on reading from or writing to MVars (currently), so let’s use a `Maybe MVarId`

to indicate whether the thread is blocked:

data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe MVarId }

When we create a thread, it’s initially unblocked:

thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing }

And finally we need a way to construct our initial collection of threads:

import Data.Map (Map) import qualified Data.Map as M type Threads m = Map ThreadId (Thread m) initialise :: PrimOp m -> (Threads m, IdSource) initialise k = let (tid, idsrc) = nextThreadId initialIdSource in (M.singleton tid (thread k), idsrc)

And now back to the implementation of `minifu`

.

`run`

The `run`

function is responsible for taking the first continuation, creating the collection of threads, and repeatedly calling the scheduler and stepping the chosen thread, until the computation is done.

It has this type:

run :: C.MonadConc m => Scheduler s -> s -> PrimOp m -> m s

As with `minifu`

, we shall keep it simple, and delegate most of the work to yet another function:

import Data.List.NonEmpty (nonEmpty) import Data.Maybe (isNothing) run sched s0 = go s0 . initialise where go s (threads, ids) | initialThreadId `M.member` threads = case runnable threads of Just tids -> let (chosen, s') = sched tids s in go s' =<< stepThread chosen (threads, ids) Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . threadBlock) initialThreadId = fst (nextThreadId initialIdSource)

Let’s break down that `go`

function a bit:

- We check if the initial thread still exists. If not, we return.
- We check if the collection of runnable threads is nonempty. If not, we return.
- We call the scheduler to pick a thread from the runnable ones.
- We call the (not yet defined)
`stepThread`

function to execute one step of that thread. - We go around the loop again.

Not too bad, hey? Finally (really finally) we just have one function to go, `stepThread`

. Can you see what the type will be?

It’s going to start like this:

stepThread :: C.MonadConc m => ThreadId -> (Threads m, IdSource) -> m (Threads m, IdSource) stepThread tid (threads, idsrc) = case M.lookup tid threads of Just thrd -> go (threadK thrd) Nothing -> pure (threads, idsrc) where adjust :: (Thread m -> Thread m) -> Threads m -> Threads m adjust f = M.adjust f tid goto :: PrimOp m -> Threads m -> Threads m goto k = adjust (\thrd -> thrd { threadK = k }) block :: Maybe MVarId -> Threads m -> Threads m block mv = adjust (\thrd -> thrd { threadBlock = mv }) unblock :: MVarId -> Threads m -> Threads m unblock v = fmap (\thrd -> if threadBlock thrd == Just v then thrd { threadBlock = Nothing } else thrd) go :: PrimOp m -> m (Threads m, IdSource) -- go ...

I’ve introduced a few helper functions, which will crop up a lot. That `go`

function will have a case for every constructor of `PrimOp m`

, and it’s going to look a bit hairy, so we’ll take it one constructor at a time. Let’s do the constructors in order.

First, we can fork threads:

go (Fork (MiniFu ma) k) = let (tid', idsrc') = nextThreadId idsrc thrd' = thread (K.runCont ma (\_ -> Stop (pure ()))) in pure (goto (k tid') (M.insert tid' thrd' threads), idsrc')

Forking is pretty straightforward. We simply get the next available `ThreadId`

from the `IdSource`

, create a thread with the provided continuation, and insert it into the `Threads m`

map.

Next up is `NewEmptyMVar`

:

go (NewEmptyMVar k) = do ref <- C.newCRef Nothing let (mvid, idsrc') = nextMVarId idsrc pure (goto (k (MVar mvid ref)) threads, idsrc')

Remember that we’re implementing our `MVar`

type using the `CRef`

type of the underlying `MonadConc`

. As the `MVar`

starts out empty, the `CRef`

starts out holding `Nothing`

.

The `PutMVar`

and `TakeMVar`

actions are almost the same, so let’s tackle them together:

go (PutMVar (MVar mvid ref) a k) = do old <- C.readCRef ref case old of Just _ -> pure (block (Just mvid) threads, idsrc) Nothing -> do C.writeCRef ref (Just a) pure (goto k (unblock mvid threads), idsrc) go (TakeMVar (MVar mvid ref) k) = do old <- C.readCRef ref case old of Just a -> do C.writeCRef ref Nothing pure (goto (k a) (unblock mvid threads), idsrc) Nothing -> pure (block (Just mvid) threads, idsrc)

In both cases, we start out by reading the value of the reference. Remember that `Nothing`

indicates emptiness, and `Just`

indicates the presence of a value. So, for `PutMVar`

*if there already is a value* (and for `TakeMVar`

*if there isn’t a value*), we block the thread. In the other case, we update the value in the reference, putting in the new value (or taking out the old), unblock all the relevant threads, and go to the continuation.

These implementations are not atomic. But that’s fine: despite MiniFu testing concurrent programs, there’s no concurrency going on within MiniFu itself. We can do as much or as little as we want during one atomic “step” of our program. This will turn out to be very useful when we implement STM in a few posts time.

Finally, we have `Stop`

:

go (Stop mx) = do mx pure (M.delete tid threads, idsrc)

And we’re done! That’s it! All we need now is a scheduler, and we can execute our example!

Our example is nondeterministic, so we want a scheduler which will let us see that. It would be no good us implementing something which always made the same decisions, as we’d only see one result! So until we implement the systematic testing behaviour, let’s just use a simple random scheduler.

import qualified System.Random as R randomSched :: R.RandomGen g => Scheduler g randomSched (t:|ts) g = let (i, g') = R.randomR (0, length ts) g in ((t:ts) !! i, g')

There’s no deep magic here, we’re just picking a random value from a nonempty list. Finally, we can construct a little demo:

demo :: IO () demo = do g <- R.newStdGen print . fst =<< minifu randomSched g example

Which we can run in ghci like so:

λ> demo Just 1 λ> demo Just 1 λ> demo Just 1 λ> demo Just 2 λ> demo Just 1

Success!

A random scheduler is fine for demonstration purposes, but not so great for testing. Different seeds may lead to the same execution, which makes it hard to know how many executions of a test is enough. It can be a useful technique, but for us this is only the beginning.

Next time we’ll look at implementing exceptions, both synchronous and asynchronous.

I hope you enjoyed this post, any feedback is welcome. As I mentioned at the start, this is on GitHub, you can get the code we ended up with at the “post-01” tag.

*Before* next time, I have some homework for you! You have seen how to implement MVars, so now try implementing CRefs! Here are the functions should you have a go at writing:

data CRef m a = -- ... newCRef :: a -> MiniFu m (CRef m a) readCRef :: CRef m a -> MiniFu m a writeCRef :: CRef m a -> a -> MiniFu m () atomicModifyCRef :: CRef m a -> (a -> (a, b)) -> MiniFu m b

Don’t worry about any of the relaxed memory stuff implemented in dejafu, just do sequential consistency (and if you don’t know what that means: it means to do the obvious). I’ll put up a solution (and maybe do a little refactoring) before the next post.

Thanks to José Manuel Calderón Trilla for reading an earlier draft of this post.

]]>We have a

*typeclass*abstracting over concurrency.- There’s an implementation of this typeclass using
`IO`

. - There’s also an implementation of this typeclass using a custom monad transformer called
`ConcT`

.

- There’s an implementation of this typeclass using
Computations of type

`MonadRef r n => ConcT r n a`

can be executed with a given scheduler, to produce a result and an execution trace.Unlike

`IO`

, the threads in a`ConcT r n`

computation are executed in a single-step fashion based on the decisions of the scheduler.To implement this single-step execution, all threads are executed in a single “real” thread.

It’s the third point that gives dejafu the ability to systematically explore different executions. If execution were not single-step, then it wouldn’t be possible in general to context switch between arbitrary concurrency actions.

The fourth point greatly simplifies the implementation, but also causes problems: GHC Haskell has a notion of “bound threads”, which are Haskell threads bound to a particular OS thread. Bound threads are absolutely essential to use FFI calls which rely on thread-local state. **Deja Fu cannot support bound threads if it executes everything in a single thread!**

How can we address this?

PULSE is a concurrency-testing tool for Erlang. It works by code instrumentation: around every communication operation is inserted a call to the PULSE scheduler process. The scheduler process tells processes when they can run. Execution is *not* serialised into a single thread, the distinct Erlang processes still exist, but only one of them may run at a time.

We can do the same thing in Haskell.

Let’s look at a much simplified version of dejafu to try this idea out.

{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE TypeFamilies #-} import qualified Control.Concurrent as C import qualified Control.Monad.Cont as K import Data.List.NonEmpty (NonEmpty(..), nonEmpty) import qualified Data.Map as M import Data.Maybe (isNothing) import qualified System.Random as R class Monad m => MonadConc m where type ThreadId m :: * type MVar m :: * -> * fork :: m () -> m (ThreadId m) forkOS :: m () -> m (ThreadId m) newEmptyMVar :: m (MVar m a) putMVar :: MVar m a -> a -> m () takeMVar :: MVar m a -> m a newMVar :: MonadConc m => a -> m (MVar m a) newMVar a = do v <- newEmptyMVar putMVar v a pure v

There’s a straightforward implementation for `IO`

:

instance MonadConc IO where type ThreadId IO = C.ThreadId type MVar IO = C.MVar fork = C.forkIO forkOS = C.forkOS newEmptyMVar = C.newEmptyMVar putMVar = C.putMVar takeMVar = C.takeMVar

The testing implementation is a little hairier. Because we want to be able to single-step it, we’ll use continuations:

newtype ConcT m a = ConcT { runConcT :: K.Cont (Action m) a } deriving (Functor, Applicative, Monad) newtype CTThreadId = CTThreadId Int deriving (Eq, Ord) data CTMVar m a = CTMVar { mvarID :: Int, mvarRef :: MVar m (Maybe a) } data Action m = Fork (ConcT m ()) (CTThreadId -> Action m) | ForkOS (ConcT m ()) (CTThreadId -> Action m) | forall a. NewEmptyMVar (CTMVar m a -> Action m) | forall a. PutMVar (CTMVar m a) a (Action m) | forall a. TakeMVar (CTMVar m a) (a -> Action m) | Stop (m ()) instance MonadConc (ConcT m) where type ThreadId (ConcT m) = CTThreadId type MVar (ConcT m) = CTMVar m fork ma = ConcT (K.cont (Fork ma)) forkOS ma = ConcT (K.cont (ForkOS ma)) newEmptyMVar = ConcT (K.cont NewEmptyMVar) putMVar mvar a = ConcT (K.cont (\k -> PutMVar mvar a (k ()))) takeMVar mvar = ConcT (K.cont (TakeMVar mvar))

Let’s talk about the `Action`

type a bit before moving on. The general structure is `Name [<args> ...] (<result> -> Action m)`

, where `m`

is some `MonadConc`

. For `MVar`

s, we’re just re-using the `MVar`

type of the underlying monad (dejafu proper re-uses the `IORef`

s of the underlying monad). For `ThreadId`

s we’re using `Int`

s. And we’re going to get the final result out of the computation with the `Stop`

action.

Let’s keep things simple and not support most of the fancy scheduling stuff dejafu does. Our scheduler is just going to be a stateful function from runnable threads to a single thread:

type Scheduler s = NonEmpty CTThreadId -> s -> (CTThreadId, s)

So now our execution function is going to look like this:

minifu :: MonadConc m => Scheduler s -> s -> ConcT m a -> m (Maybe a, s) minifu sched s (ConcT ma) = do out <- newMVar Nothing s' <- run sched s (K.runCont ma (\a -> Stop (takeMVar out >> putMVar out (Just a)))) a <- takeMVar out pure (a, s')

The real meat is the `run`

function:

run :: MonadConc m => Scheduler s -> s -> Action m -> m s run sched s0 a0 = go s0 initialIdSource =<< initialThreads a0 where go s ids threads | initialThreadId `M.member` threads = case runnable threads of Just tids -> let (chosen, s') = sched tids s in uncurry (go s') =<< loopStepThread ids chosen threads Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . blockedOn)

Like in dejafu proper, execution is going to if the main thread terminates, even if there are other threads. Threads are going to live in a map keyed by `ThreadId`

.

type Threads m = M.Map CTThreadId (Thread m) initialThreads :: MonadConc m => Action m -> m (Threads m) initialThreads a0 = do t <- forkThread False a0 pure (M.singleton initialThreadId t) initialThreadId :: CTThreadId initialThreadId = CTThreadId 0

Each thread in our program-under-test is going to be executed in an actual thread. So, like PULSE, we’ll introduce communication (in the form of `MVar`

s) around concurrency actions to ensure that we get single-step execution. So a thread is going to have three components: the `MVar`

(if any) it’s currently blocked on, an `MVar`

to signal that it should execute one step, and an `MVar`

to communicate what the thread did.

data Thread m = Thread { blockedOn :: Maybe Int , signalStep :: MVar m IdSource , awaitResult :: MVar m (IdSource, ThreadResult m) } data ThreadResult m = BusinessAsUsual | Killed | Updated Int | Blocked Int | Forked (Thread m)

The `IdSource`

is used to generate new unique thread and `MVar`

IDs:

type IdSource = (Int, Int) initialIdSource :: IdSource initialIdSource = (1, 0) nextThreadId :: IdSource -> (CTThreadId, IdSource) nextThreadId (t, m) = (CTThreadId t, (t + 1, m)) nextMVarId :: IdSource -> (Int, IdSource) nextMVarId (t, m) = (m, (t, m + 1))

Forking a thread is going to set up these `MVar`

s and the small bit of logic to ensure things happen as we like:

forkThread :: MonadConc m => Bool -> Action m -> m (Thread m) forkThread isOS act = do signal <- newEmptyMVar await <- newEmptyMVar _ <- (if isOS then forkOS else fork) (runThread signal await act) pure (Thread Nothing signal await) runThread :: MonadConc m => MVar m IdSource -> MVar m (IdSource, ThreadResult m) -> Action m -> m () runThread signal await = go where go act = do ids <- takeMVar signal (act', ids', res) <- runStepThread ids act putMVar await (ids', res) maybe (pure ()) go act'

The final pieces of the puzzle are the two `*StepThread`

functions, which executes one action of our chosen thread. These are a little tricker than in normal dejafu.

Firstly, `loopStepThread`

, which tells the thread that was chosen by the scheduler to step:

loopStepThread :: MonadConc m => IdSource -> CTThreadId -> Threads m -> m (IdSource, Threads m) loopStepThread ids tid threads = case M.lookup tid threads of Just thread -> do putMVar (signalStep thread) ids (ids', res) <- takeMVar (awaitResult thread) let resf = case res of BusinessAsUsual -> id Killed -> M.delete tid Updated i -> fmap (\t -> if blockedOn t == Just i then t { blockedOn = Nothing } else t) Blocked i -> M.insert tid (thread { blockedOn = Just i }) Forked thread' -> M.insert (fst (nextThreadId ids)) thread' pure (ids', resf threads) Nothing -> pure (ids, threads)

Finally `runStepThread`

, which executes an action:

runStepThread :: MonadConc m => IdSource -> Action m -> m (Maybe (Action m), IdSource, ThreadResult m) runStepThread ids (Fork (ConcT ma) k) = do t <- primFork False ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (ForkOS (ConcT ma) k) = do t <- primFork True ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (NewEmptyMVar k) = do v <- newEmptyMVar putMVar v Nothing let (mvid, ids') = nextMVarId ids let mvar = CTMVar mvid v pure (Just (k mvar), ids', BusinessAsUsual) runStepThread ids k0@(PutMVar (CTMVar mvid v) a k) = do old <- takeMVar v case old of Just _ -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Nothing -> putMVar v (Just a) >> pure (Just k, ids, Updated mvid) runStepThread ids k0@(TakeMVar (CTMVar mvid v) k) = do old <- takeMVar v case old of Nothing -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Just a -> putMVar v Nothing >> pure (Just (k a), ids, Updated mvid) runStepThread ids (Stop ma) = do ma pure (Nothing, ids, Killed) primFork :: MonadConc m => Bool -> K.Cont (Action m) () -> m (Thread m) primFork isOS ma = forkThread isOS (K.runCont ma (\_ -> Stop (pure ())))

This looks pretty horrible, but each case is fairly small, so just look at those.

Now we can run it (with a random scheduler for fun) and see that it works:

test :: MonadConc m => m Int test = do a <- newEmptyMVar b <- newMVar 2 c <- newMVar 3 forkOS (putMVar a b) forkOS (putMVar a c) forkOS (takeMVar b >> putMVar b 14) forkOS (takeMVar c >> putMVar c 15) takeMVar =<< takeMVar a randomSched :: Scheduler R.StdGen randomSched (t:|ts) g = let (i, g') = R.randomR (0, length ts) g in ((t:ts) !! i, g') main :: IO () main = do g <- R.newStdGen print . fst =<< minifu randomSched g test

Giving:

λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 14 λ> main Just 15 λ> main Just 15

That wasn’t so bad!

Mini Fu is much smaller than Deja Fu, but it demonstrates the key concepts. To get a multithreaded runtime into dejafu, I think the main change to this stuff is to figure out how thread communication is going to work: in dejafu proper, actions can change the continuation of an arbitrary thread (eg, throwing an exception to a thread will call its exception handler).

The overhead of this method compared to the single-threaded approach must be measured. It would be great to support bound threads, but not at the cost of everything else becoming much worse! If the overhead is bad, perhaps a hybrid approach could be used: unbound threads in the program-under-test are executed as they are currently, whereas bound threads get the fancy multithreaded implementation. It would complicate things, but possibly eliminate the overhead in the common case.

Finally, when the main thread terminates, any still-running ones should terminate as well, so the `Thread`

record will need to contain the `ThreadId m`

of the underlying monad, so `killThread`

can be used.

dejafu leans more towards correctness than performance, by default. Your test cases will be executed using the `Test.DejaFu.SCT.sctBounded`

function, which is complete but can be slow; every result you get will have an associated trace, which can be useful for debugging, but takes up memory.

dejafu-0.7.1.0 gives you an extra knob to tweak, and 0.7.1.1 makes it even better.

**Full-size images:** before, after.

Test cases with long traces have been a particularly bad case, as all the traces stuck around in memory until you did something with them at the end (like print the bad ones). This is such a case:

contendedMVar :: MonadConc m => m () contendedMVar = do threadId <- myThreadId mvar <- newEmptyMVar let maxval = 150 let go = takeMVar mvar >>= \x -> if x == maxval then killThread threadId else go for_ [1..20] . const $ fork go fork $ for_ [1..maxval] (putMVar mvar) takeMVar =<< newEmptyMVar

I ran that 100 times with random scheduling, and the traces varied from about 2500 to 3000 elements long. That’s a lot of stuff to keep around in memory!

Sometimes you don’t want *all* the results or traces of your test case, you only want some of them. Now you can tell dejafu to throw things away as it’s running, allowing garbage collection to kick in sooner, and reduce the resident memory usage.

There’s a new type and some new functions:

module Test.DejaFu.SCT where -- ... -- | An @Either Failure a -> Maybe Discard@ value can be used to -- selectively discard results. -- -- @since 0.7.1.0 data Discard = DiscardTrace -- ^ Discard the trace but keep the result. The result will appear -- to have an empty trace. | DiscardResultAndTrace -- ^ Discard the result and the trace. It will simply not be -- reported as a possible behaviour of the program. deriving (Eq, Show, Read, Ord, Enum, Bounded) -- | A variant of 'runSCT' which can selectively discard results. -- -- @since 0.7.1.0 runSCTDiscard :: MonadRef r n => (Either Failure a -> Maybe Discard) -- ^ Selectively discard results. -> Way -- ^ How to run the concurrent program. -> MemType -- ^ The memory model to use for non-synchronised @CRef@ operations. -> ConcT r n a -- ^ The computation to run many times. -> n [(Either Failure a, Trace)] -- and: runSCTDiscard', resultsSetDiscard, resultsSetDiscard', sctBoundDiscard, -- sctUniformRandomDiscard, sctWeightedRandomDiscard -- and: dejafuDiscard, dejafuDiscardIO (Test.DejaFu) -- and: testDejafuDiscard, testDejafuDiscardIO (Test.{HUnit,Tasty}.DejaFu)

Every iteration of the SCT loop, an `Either Failure a`

value is produced. The `*Discard`

function variants will throw it (or its trace) away if you so tell it.

For example, you can now check that a test case doesn’t deadlock in a far more memory-efficient way like so:

dejafuDiscard -- "efa" == "either failure a", discard everything but deadlocks (\efa -> if efa == Left Deadlock then Nothing else Just DiscardResultAndTrace) -- try 1000 executions with random scheduling (randomly (mkStdGen 42) 1000) -- use the default memory model defaultMemType -- your test case testCase -- the predicate to check (which is a bit redundant in this case) ("Never Deadlocks", deadlocksNever)

**Full-size images:** before, after.

Unfortunately, 0.7.1.0 was only a win for random testing, as systematic testing explicitly constructed the tree of executions in memory. This has been a long-standing issue with dejafu, but I’d never gotten around to solving it before, because it wasn’t really any worse than what was happening elsewhere in the codebase. But now it was the worst!

The solution, in principle, was simple: you can avoid constructing the complete tree by instead exploring schedules in a depth-first fashion, which means you only need a stack and some bookkeeping information.

The implementation was fairly simple too! I like simple things.

So now we can check every possible execution of our test case for deadlocks, still in a memory-efficient fashion:

dejafuDiscard (\efa -> if efa == Left Deadlock then Nothing else Just DiscardResultAndTrace) -- the default way is systematic testing defaultWay defaultMemType testCase ("Never Deadlocks", deadlocksNever)

It’s not as memory-efficient as random scheduling, as it needs to keep around *some* information about prior executions, but the amount it is keeping around is greatly reduced from before.

What’s next? I don’t really know. There are still a lot of memory inefficiencies in dejafu, but they all pale in comparison to these two, so they can probably sit for a while longer. I’d like to build a suite of benchmarks, because I don’t really have any other than the test suite (which makes a poor benchmark). If you have any test cases which dejafu just can’t handle, let me know!

I think it’s fair to say that the frontiers of what dejafu is capable of have been pushed back a *long* way by these changes.

We can check if two expressions have equivalent behaviours, or if one has *fewer* behaviours than the other. Such properties can serve both as documentation and as regression tests.

Let’s dive straight into an example:

Is

`readMVar`

equivalent to a`takeMVar`

followed by a`putMVar`

?

We might phrase this property like so:

prop_read_equiv_take_put = sig readMVar `equivalentTo` sig (\v -> takeMVar v >>= putMVar v)

The property-testing uses *signatures*, where a signature tells dejafu how to (1) create a *new* state; (2) make some *observation* of the state; (3) concurrently *interfere* with the state in some way; and (4) the expression to evaluate.

Properties are monomorphic, so we can’t directly express a property about *any* `MVar`

, we need to pick a concrete type for its contents. Let’s just pick `Int`

:

type State = MVar ConcIO Int

Properties operate in the `ConcIO`

monad. There is no option to use `ConcST`

yet, as I couldn’t get a nice interface working which didn’t break type inference in GHCi.

The state is constructed using a pure *seed value* the property-checker generates. We want to consider both *full* and *empty* `MVar`

s, so we’ll ask it to supply a `Maybe Int`

:

type Seed = Maybe Int

The initialisation function we will include in the signature then just calls `newMVar`

or `newEmptyMVar`

as appropriate:

makeMVar :: Seed -> ConcIO State makeMVar (Just x) = newMVar x makeMVar Nothing = newEmptyMVar

Seed values are generated using LeanCheck, an enumerative property-based testing library.

We want to know if the `MVar`

contains a value when we observe it, and we want to know what that value is; another `Maybe`

:

type Observation = Maybe Int

It is important that the observation function does not block, so we use `tryReadMVar`

here rather than `readMVar`

or `takeMVar`

:

observeMVar :: State -> Seed -> ConcIO Observation observeMVar v _ = tryReadMVar v

It does not matter if making the observation has side-effects, so `tryTakeMVar`

would have been equally valid.

Our interference function will just mess with the value in the `MVar`

:

interfereMVar :: State -> Seed -> ConcIO () interfereMVar mvar mx = do tryTakeMVar mvar void . tryPutMVar mvar $ case mx of Just x -> (x+1) * 3000 Nothing -> 7000

As LeanCheck is enumerative, large values like 3000 and 7000 will stand out if the tool reports a failure.

Now we package these operations up into a signature:

sig :: (State -> ConcIO a) -> Sig State Observation Seed sig e = Sig { initialise = makeMVar , observe = observeMVar , interfere = interfereMVar , expression = void . e }

We could, of course, have defined all this inside `sig`

without the top-level functions and type synonyms.

Now we can test the property:

> check $ sig readMVar `equivalentTo` sig (\v -> takeMVar v >>= putMVar v) *** Failure: (seed Just 0) left: [(Nothing,Just 3000)] right: [(Nothing,Just 0),(Nothing,Just 3000),(Just Deadlock,Just 3000)]

We get a failure! This is because the left term is atomic, whereas the right is not: another thread writing to the `MVar`

has the opportunity to swoop in and insert a new value after the `takeMVar`

but before the `putMVar`

. The right has strictly more behaviours than the left.

We can capture this, by using a different comparison:

> check $ sig readMVar `strictlyRefines` sig (\v -> takeMVar v >>= putMVar v) +++ OK

To “strictly refine” something is to have a proper subset of the behaviour. There is also a “refines” comparison, which does not require the subset to be proper.

Doesn’t

`readMVar v`

return a different thing to`takeMVar v >>= putMVar v`

?*Yes!*If they return at all, the former returns the value in the

`MVar`

, whereas the latter returns unit. Properties do not care about the return value of an expression, only the effects.You can see this by looking at the definition of

`sig`

again: it throws away the result of the expression using`void`

.Both of our properties are of the form

`sig f `cmp` sig g`

, can’t that redundancy be removed?*No!*You can use

*different*signatures with*different*state types! As long as the seed and observation types are the same,`check`

can compare them.You can use this to compare different implementations of a similar concurrent data structure.

Properties can also have arguments, using LeanCheck to generate their values. This doesn’t work in any mysterious way, here’s a property about the `QSemN`

functions:

> check $ \x y -> sig' (\v -> signalQSemN v (x + y)) `equivalentTo` sig' (\v -> signalQSemN v x >> signalQSemN v y) *** Failure: -1 1 (seed 0) left: [(Nothing,0)] right: [(Nothing,0),(Just Deadlock,0)]

You can even use your own types, as long as they have a `Listable`

(the typeclass LeanCheck uses) instance:

> :{ newtype Nat n = Nat n deriving Show instance (Num n, Ord n, Listable n) => Listable (Nat n) where list = [Nat n | n <- list, n >= 0] :} > check $ \(Nat x) (Nat y) -> sig' (\v -> signalQSemN v (x + y)) `equivalentTo` sig' (\v -> signalQSemN v x >> signalQSemN v y) +++ OK!

Currently it’s a bit slow as I need to fiddle with the implementation and work out what a good number of tests to run is. `check`

uses 10 seed values with 100 variable assignments each (1000 tests total), you can use `checkFor`

to reduce that.

So there you have it, property-testing for side-effects of stateful operations.

This has come out of my work on CoCo, a tool for automatically *discovering* these properties (paper here). In the CoCo repository are a few more examples. CoCo is still a work-in-progress, but one of the goals is to be able to generate dejafu-compatible output, so that CoCo can discover properties which dejafu can immediately begin using for regression testing.

I recently implemented async-dejafu, a version of the async library using Deja Fu so programs written with it can be tested, and I was curious about checking the relevant typeclass laws automatically.

Checking typeclass laws has been done with QuickCheck before, but the difference here is that async uses *concurrency*! If only we had some way to test concurrent Haskell code! Oh, wait…

Specifically, I want to test the laws for the `Concurrently`

type. `Concurrently`

is a monad for expressing `IO`

actions which should be run concurrently.

Firstly, we need some language extensions and imports:

{-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ViewPatterns #-} module Concurrently where import Control.Applicative import Control.Exception (SomeException) import Control.Monad ((>=>), ap, liftM, forever) import Control.Monad.Catch (onException) import Control.Monad.Conc.Class import Data.Maybe (isJust) import Data.Set (Set, fromList) import Test.DejaFu (Failure(..), defaultMemType) import Test.DejaFu.Deterministic (ConcST, Trace) import Test.DejaFu.SCT (sctBound, defaultBounds) import Test.QuickCheck (Arbitrary(..)) import Test.QuickCheck.Function (Fun, apply) import Unsafe.Coerce (unsafeCoerce)

I have sadly not managed to eliminate that `unsafeCoerce`

, it shows up because of the use of higher-ranked types, and makes me very sad. If anyone knows how I can get rid of it, I would be very happy!

Now we need our `Concurrently`

type. The original just uses `IO`

, so we have to parameterise ours over the underlying monad:

newtype Concurrently m a = Concurrently { runConcurrently :: m a }

We’ll also be using a `ConcST`

variant for testing a lot, so here’s a type synonym for that:

type CST t = Concurrently (ConcST t)

We also need some instances for `Concurrently`

in order to make QuickCheck happy, but these aren’t terribly important:

instance Show (Concurrently m a) where show _ = "<concurrently>" instance (Arbitrary a, Applicative m) => Arbitrary (Concurrently m a) where arbitrary = Concurrently . pure <$> arbitrary

Ok, let’s get started!

`Functor`

lets you apply a pure function to a value in a context.

class Functor f where fmap :: (a -> b) -> f a -> f b

A `Functor`

should satisfy the identity law:

fmap id = id

And the composition law:

fmap f . fmap g = fmap (f . g)

The `Functor`

instance for `Concurrently`

just delegates the work to the instance for the underlying monad:

instance MonadConc m => Functor (Concurrently m) where fmap f (Concurrently a) = Concurrently $ f <$> a

The composition law is a little awkward to express in a way that QuickCheck can deal with, as it involves arbitrary functions. QuickCheck has a `Fun`

type, representing functions which can be serialised to a string. Bearing that in mind, here is how we can express those two laws as tests:

prop_functor_id :: Ord a => CST t a -> Bool prop_functor_id ca = ca `eq` (id <$> ca) prop_functor_comp :: Ord c => CST t a -> Fun a b -> Fun b c -> Bool prop_functor_comp ca (apply -> f) (apply -> g) = (g . f <$> ca) `eq` (g <$> (f <$> ca))

We’re using view patterns here to extract the actual function from the `Fun`

value. let’s see if the laws hold!

λ> quickCheck (prop_functor_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_functor_comp :: CST t Int -> Fun Int Integer -> Fun Integer String -> Bool) +++ OK, passed 100 tests.

Cool! Wait, what’s that `eq`

function?

I’ve decided to treat two concurrent computations as equal if the sets of values that they can produce are equal:

eq :: Ord a => CST t a -> CST t a -> Bool eq left right = runConcurrently left `eq'` runConcurrently right eq' :: forall t a. Ord a => ConcST t a -> ConcST t a -> Bool eq' left right = results left == results right where results cst = fromList . map fst $ sctBound' cst sctBound' :: ConcST t a -> [(Either Failure a, Trace)] sctBound' = unsafeCoerce $ sctBound defaultMemType defaultBounds

This is where the unfortunate `unsafeCoerce`

comes in. The definition of `sctBound'`

there doesn’t type-check without it, which is a shame. If anyone could offer a solution, I would be very grateful.

`Applicative`

extends `Functor`

with the ability to inject a value into a context without introducing any effects, and to apply a function in a context to a value in a context.

class Functor f => Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b

An `Applicative`

should satisfy the identity law:

pure id <*> a = a

The homomorphism law, which says that applying a pure function to a pure value in a context is the same as just applying the function to the value and injecting the entire result into a context:

pure (f a) = pure f <*> pure a

The interchange law, which says that when applying a function in a context to a pure value, the order in which each is evaluated doesn’t matter:

u <*> pure y = pure ($ y) <*> u

And the composition law, which is a sort of associativity property:

u <*> (v <*> w) = pure (.) <*> u <*> v <*> w

Finally, there is a law relating `Applicative`

to `Functor`

, that says we can decompose `fmap`

into two steps, injecting a function into a context, and then application within that context:

f <$> x = pure f <*> x

This is where `Concurrently`

gets its concurrency. `(<*>)`

runs its two arguments concurrently, killing the other if one throws an exception.

instance MonadConc m => Applicative (Concurrently m) where pure = Concurrently . pure Concurrently fs <*> Concurrently as = Concurrently $ (\(f, a) -> f a) <$> concurrently fs as concurrently :: MonadConc m => m a -> m b -> m (a, b) concurrently = ...

Armed with the knowledge of how to generate arbitrary functions, these are all fairly straight-forward to test

prop_applicative_id :: Ord a => CST t a -> Bool prop_applicative_id ca = ca `eq` (pure id <*> ca) prop_applicative_homo :: Ord b => a -> Fun a b -> Bool prop_applicative_homo a (apply -> f) = (pure $ f a) `eq` (pure f <*> pure a) prop_applicative_inter :: Ord b => CST t (Fun a b) -> a -> Bool prop_applicative_inter u y = (u' <*> pure y) `eq` (pure ($ y) <*> u') where u' = apply <$> u prop_applicative_comp :: Ord c => CST t (Fun b c) -> CST t (Fun a b) -> CST t a -> Bool prop_applicative_comp u v w = (u' <*> (v' <*> w)) `eq` (pure (.) <*> u' <*> v' <*> w) where u' = apply <$> u v' = apply <$> v prop_applicative_fmap :: Ord b => Fun a b -> CST t a -> Bool prop_applicative_fmap (apply -> f) a = (f <$> a) `eq` (pure f <*> a)

And indeed we see that the laws hold:

λ> quickCheck (prop_applicative_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_homo :: String -> Fun String Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_inter :: CST t (Fun Int String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_comp :: CST t (Fun Int String) -> CST t (Fun Char Int) -> CST t Char -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_fmap :: Fun Int String -> CST t Int -> Bool) +++ OK, passed 100 tests.

`Alternative`

is a kind of monoid over `Applicative`

.

class Applicative f => Alternative f where empty :: f a (<|>) :: f a -> f a -> f a -- These both have default definitions some :: f a -> f [a] many :: f a -> f [a]

An `Alternative`

should satisfy the monoid laws. Namely, left and right identity:

empty <|> x = x x <|> empty = x

And associativity:

(x <|> y) <|> z = x <|> (y <|> z)

The `Alternative`

instance for `Concurrently`

is used to express races, with `(<|>)`

executing both of its arguments concurrently and returning the first to finish:

instance MonadConc m => Alternative (Concurrently m) where empty = Concurrently $ forever yield Concurrently as <|> Concurrently bs = Concurrently $ either id id <$> race as bs race :: MonadConc m => m a -> m b -> m (Either a b) race = ...

Once again, the translation into QuickCheck properties is quite simple:

prop_alternative_right_id :: Ord a => CST t a -> Bool prop_alternative_right_id x = x `eq` (x <|> empty) prop_alternative_left_id :: Ord a => CST t a -> Bool prop_alternative_left_id x = x `eq` (empty <|> x) prop_alternative_assoc :: Ord a => CST t a -> CST t a -> CST t a -> Bool prop_alternative_assoc x y z = (x <|> (y <|> z)) `eq` ((x <|> y) <|> z)

And the laws hold!

λ> quickCheck (prop_alternative_right_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_alternative_left_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_alternative_assoc :: CST t Int -> CST t Int -> CST t Int -> Bool) +++ OK, passed 100 tests.

There are also some laws relating `Alternative`

to `Applicative`

, but these are expressed in terms of `some`

and `many`

, which have default law-satisfying definitions.

`Monad`

extends `Applicative`

with the ability to squash nested monadic values together, and are commonly used to express sequencing.

class Applicative m => Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b

There are a few different formulations of the `Monad`

laws, I prefer the one in terms of `(>=>)`

(the fish operator), which is defined as:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c f >=> g = \x -> f x >>= g

Using this function the laws become simply the monoid laws:

return >=> f = f f >=> return = f (f >=> g) >=> h = f >=> (g >=> h)

There are also a few laws relating `Monad`

to `Applicative`

and `Functor`

:

f <$> a = f `liftM` a return = pure (<*>) = ap

As with the `Functor`

, the `Monad`

instance just delegates the work:

instance MonadConc m => Monad (Concurrently m) where return = pure Concurrently a >>= f = Concurrently $ a >>= runConcurrently . f

As these laws are mostly about function equality, a helper function to express that is used:

eqf :: Ord b => (a -> CST t b) -> (a -> CST t b) -> a -> Bool eqf left right a = left a `eq` right a

Given that, the translation is simple:

prop_monad_left_id :: Ord b => Fun a (CST t b) -> a -> Bool prop_monad_left_id (apply -> f) = f `eqf` (return >=> f) prop_monad_right_id :: Ord b => Fun a (CST t b) -> a -> Bool prop_monad_right_id (apply -> f) = f `eqf` (f >=> return) prop_monad_comp :: Ord d => Fun a (CST t b) -> Fun b (CST t c) -> Fun c (CST t d) -> a -> Bool prop_monad_comp (apply -> f) (apply -> g) (apply -> h) = ((f >=> g) >=> h) `eqf` (f >=> (g >=> h)) prop_monad_fmap :: Ord b => Fun a b -> CST t a -> Bool prop_monad_fmap (apply -> f) a = (f <$> a) `eq` (f `liftM` a) prop_monad_pure :: Ord a => a -> Bool prop_monad_pure = pure `eqf` return prop_monad_ap :: Ord b => Fun a b -> a -> Bool prop_monad_ap (apply -> f) a = (pure f <*> pure a) `eq` (return f `ap` return a)

Are there any counterexamples? No there aren’t!

λ> quickCheck (prop_monad_left_id :: Fun Int (CST t String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_right_id :: Fun Int (CST t String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_comp :: Fun Int (CST t String) -> Fun String (CST t Bool) -> Fun Bool (CST t Int) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_fmap :: Fun Int String -> CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_pure :: Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_ap :: Fun Int String -> Int -> Bool) +++ OK, passed 100 tests.

So, it certainly *looks* like all the laws hold! Yay!

Consider the `eq'`

function. This sort of “value-level” equality is good enough for most types, where any type of effect is a value, but it doesn’t work so well when concurrency (or any sort of `IO`

) is involved, as there effects do not directly correspond to values.

There’s one type of effect we particularly care about for the case of `Concurrently`

: namely, the amount of concurrency going on! To test this, we need to write our tests such that different amounts of concurrency can produce different results, which means our current `Arbitrary`

instance for `Concurrently`

isn’t good enough. We need interaction between different concurrent inputs.

So let’s try writing a test case for the `(<*>) = ap`

law, but explicitly testing the amount of concurrency:

prop_monad_ap2 :: forall a b. Ord b => Fun a b -> Fun a b -> a -> Bool prop_monad_ap2 (apply -> f) (apply -> g) a = go (<*>) `eq'` go ap where go :: (CST t (a -> b) -> CST t a -> CST t b) -> ConcST t b go combine = do var <- newEmptyCVar let cf = do { res <- tryTakeCVar var; pure $ if isJust res then f else g } let ca = do { putCVar var (); pure a } runConcurrently $ Concurrently cf `combine` Concurrently ca

Here we have two functions, `f`

and `g`

, and are using whether a `CVar`

is full or empty to choose between them. If the combining function executes its arguments concurrently, then we will see both cases; otherwise we’ll only see the `g`

case. *If* the law holds, and `(<*>) = ap`

, then we will see both cases for both of them!

λ> quickCheck (prop_monad_ap2 :: Fun Int String -> Fun Int String -> Int -> Bool) *** Failed! Falsifiable (after 3 tests and 8 shrinks): {_->""} {_->"a"} 0

Oops! We found a counterexample! Let’s see what’s happening:

λ> results $ go (<*>) (\_ -> "") (\_ -> "a") 0 fromList [Right "",Right "a"] λ> results $ go ap (\_ -> "") (\_ -> "a") 0 fromList [Right "a"]

If we look at the definition of `ap`

, the problem becomes clear:

ap :: Monad m => m (a -> b) -> m a -> m b ap mf ma = mf >>= \f -> ma >>= \a -> return (f a)

The issue is that our definiton of `(>>=)`

is *sequential*, whereas `(<*>)`

is *concurrent*. The `Monad`

instance is not consistent with that `Applicative`

*when there is interaction between actions*, as this shows!

So what’s the problem? It’s *close enough*, right? Well, close enough isn’t good enough, when it comes to laws. This very issue caused breakage, and is the reason that the `Monad`

instance for `Concurrently`

got removed!

So what’s the point of this? Big deal, laws are important.

Well, that *is* the point. Laws *are* important, but often we don’t bother to test them. That’s possibly fine if the instances are simple, and you can check the laws by just juggling definitions in your head, but when `IO`

is involved, the situation becomes a bit more murky.

Code involving `IO`

and concurrency is easy to get wrong, so when building up a monad or whatever based on it, why not *actually test* the laws, rather than just assume they’re right? Because if, as a library author, your assumption is wrong, your users will suffer for it.