The gist of it is that this snippet of code:
mask $ \restore -> do putMVar var x ...
behaves differently to this snippet of code:
mask $ \restore -> do restore $ putMVar var x ...
in the presence of asynchronous exceptions. The post goes on to explain what the different behaviours are and why they crop up; but thinking about concurrency is too much like effort, let’s turn to dejafu!
In this test case, I want to see
putMVar var x
is interrupted by an asynchronous exception; and...
bit of code gets executedSo the actual test case is a bit more complex than just the snippet above. We’re going to need three threads:
thread1 = mask $ \restore -> catch (putMVar var "hello world" >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) thread2 = putMVar var "interrupted!" thread3 = killThread thread1
Putting it together into an actual test case, we get:
import Control.Concurrent.Classy import Control.Exception (SomeException) example1 :: MonadConc m => m (String, Bool) example1 = do var <- newEmptyMVar success <- newEmptyMVar interruptMe <- newEmptyMVar tid <- fork $ mask $ \_ -> do putMVar interruptMe () catch (putMVar var "hello world" >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) -- wait for the thread to be inside the `mask`, then fork a thread -- to race on the `putMVar` and also throw an async exception. takeMVar interruptMe _ <- fork $ putMVar var "interrupted!" killThread tid (,) <$> readMVar var <*> readMVar success
There’s a little extra ceremony involved in making sure that the race happens after the mask
—we need a new interruptMe
MVar
—but other than that it’s fairly straightforward.
dejafu finds two behaviours for this example, and gives abbreviated execution traces:
> autocheck example1 [pass] Successful [fail] Deterministic ("hello world",True) S0-----S1--------S0------ ("interrupted!",False) S0-----S1---P0---S2--S1-S0---S1---S0-- False
Here’s our new test case:
import Control.Concurrent.Classy import Control.Exception (SomeException) example2 :: MonadConc m => m (String, Bool) example2 = do interruptMe <- newEmptyMVar var <- newEmptyMVar success <- newEmptyMVar tid <- fork $ mask $ \restore -> do putMVar interruptMe () catch (restore (putMVar var "hello world") >> putMVar success True) (\(_ :: SomeException) -> putMVar success False) -- wait for the thread to be inside the `mask`, then fork a thread -- to race on the `putMVar` and also throw an async exception. takeMVar interruptMe _ <- fork $ putMVar var "interrupted!" killThread tid (,) <$> readMVar var <*> readMVar success
Lo and behold, dejafu finds a third behaviour:
> autocheck example2 [pass] Successful [fail] Deterministic ("hello world",True) S0-----S1-----------S0------ ("hello world",False) S0-----S1-----P0-----S1---S0-- ("interrupted!",False) S0-----S1----P0----S1---S2--S0--- False
So it seems that we can now end up in the situation where the putMVar var "hello world"
does happen, but after writing to the MVar
the asynchronous exception is delivered and so we hit the putMVar success False
case.
Weird, right?
We can get the actual execution trace for the new case with a lower-level function in dejafu, runSCT
. Digging through it, we can find the pre-emption of thread 1 (the first thread forked) by thread 0 (the main thread):
(SwitchTo main, [(1, WillResetMasking True MaskedInterruptible)], TakeMVar 1 [])
This says that we switched to the main thread, and it performed a takeMVar
operation. And furthermore, that thread 1 will next reset the masking state back to MaskedInterruptible
.
Now the issue becomes clear. The problematic snippet:
mask $ \restore -> do restore $ putMVar var x ...
Actually means to perform these steps:
MaskedInterruptible
Unmasked
putMVar var x
MaskedInterruptible
...
The issue is that completing the putMVar var x
call and resetting the masking state are two operations. That’s not atomic. So there is a chance that an exception can be delivered between them.
And that’s the issue explained in It’s not a no-op to unmask an interruptible operation, replicated with dejafu.
]]>I’m pleased to announce a new super-major release of dejafu, a library for testing concurrent Haskell programs.
While there are breaking changes, common use-cases shouldn’t be affected too significantly (or not at all). There is a brief guide to the changes, and how to migrate if necessary, on the website.
dejafu is a unit-testing library for concurrent Haskell programs. Tests are deterministic, and work by systematically exploring the possible schedules of your concurrency-using test case, allowing you to confidently check your threaded code.
HUnit and Tasty bindings are available.
dejafu requires your test case to be written against the MonadConc
typeclass from the concurrency package. This is a necessity, dejafu cannot peek inside your IO
or STM
actions, so it needs to be able to plug in an alternative implementation of the concurrency primitives for testing. There is some guidance for how to switch from IO
code to MonadConc
code on the website.
If you really need IO
, you can use MonadIO
- but make sure it’s deterministic enough to not invalidate your tests!
Here’s a small example reproducing a deadlock found in an earlier version of the auto-update library:
> :{ autocheck $ do auto <- mkAutoUpdate defaultUpdateSettings auto :} [fail] Successful [deadlock] S0--------S1-----------S0- [fail] Deterministic [deadlock] S0--------S1-----------S0- () S0--------S1--------p0--
dejafu finds the deadlock, and gives a simplified execution trace for each distinct result. More in-depth traces showing exactly what each thread did are also available. This is using a version of auto-update modified to use the MonadConc
typeclass. The source is in the dejafu testsuite.
The highlights for this release are setup actions, teardown actions, and invariants:
Setup actions are for things which are not really a part of your test case, but which are needed for it (for example, setting up a test distributed system). As dejafu can run a single test case many times, repeating this work can be a significant overhead. By defining this as a setup action, dejafu can “snapshot” the state at the end of the action, and efficiently reload it in subsequent executions of the same test.
Teardown actions are for things you want to run after your test case completes, in all cases, even if the test deadlocks (for example). As dejafu controls the concurrent execution of the test case, inspecting shared state is possible even if the test case fails to complete.
Invariants are effect-free atomically-checked conditions over shared state which must always hold. If an invariant throws an exception, the test case is aborted, and any teardown action run.
Here is an example of a setup action with an invariant:
> :{ autocheck $ let setup = do var <- newEmptyMVar registerInvariant $ do value <- inspectMVar var when (value == Just 1) $ throwM Overflow pure var in withSetup setup $ \var -> do fork $ putMVar var 0 fork $ putMVar var 1 tryReadMVar var :} [fail] Successful [invariant failure] S0--P2- [fail] Deterministic [invariant failure] S0--P2- Nothing S0---- Just 0 S0--P1--S0--
In the [invariant failure]
case, thread 2 is scheduled, writing the forbidden value “1” to the MVar, which terminates the test.
Here is an example of a setup action with a teardown action:
> :{ autocheck $ let setup = newMVar () teardown var (Right _) = show <$> tryReadMVar var teardown _ (Left e) = pure (show e) in withSetupAndTeardown setup teardown $ \var -> do fork $ takeMVar var takeMVar var :} [pass] Successful [fail] Deterministic "Nothing" S0--- "Deadlock" S0-P1--S0-
The teardown action can perform arbitrary concurrency effects, including inspecting any mutable state returned by the setup action.
Setup and teardown actions were previously available in a slightly different form as the dontCheck
and subconcurrency
functions, which have been removed (see the migration guide if you used these).
It’s also something that dejafu does not do.
I’m going to use the “stores are transitively visible” litmus test as a running example. Here it is:
import qualified Control.Monad.Conc.Class as C import Test.DejaFu.Internal import Test.DejaFu.SCT import Test.DejaFu.SCT.Internal.DPOR import Test.DejaFu.Types import Test.DejaFu.Utils storesAreTransitivelyVisible :: C.MonadConc m => m (Int, Int, Int) storesAreTransitivelyVisible = do x <- C.newCRef 0 y <- C.newCRef 0 j1 <- C.spawn (C.writeCRef x 1) j2 <- C.spawn (do r1 <- C.readCRef x; C.writeCRef x 1; pure r1) j3 <- C.spawn (do r2 <- C.readCRef y; r3 <- C.readCRef x; pure (r2,r3)) (\() r1 (r2,r3) -> (r1,r2,r3)) <$> C.readMVar j1 <*> C.readMVar j2 <*> C.readMVar j3
I picked this one because it’s kind of arbitrarily complex. It’s a small test, but it’s for the relaxed memory implementation, so there’s a lot going on. It’s a fairly dense test.
I’m now going to define a metric of trace complexity which I’ll justify in a moment:
complexity :: Trace -> (Int, Int, Int, Int) complexity = foldr go (0,0,0,0) where go (SwitchTo _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x+1, y, z) go (Start _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y+1, z) go (Continue, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y, z+1) go (SwitchTo _, _, _) (w, x, y, z) = (w, x+1, y, z) go (Start _, _, _) (w, x, y, z) = (w, x, y+1, z) go (Continue, _, _) (w, x, y, z) = (w, x, y, z+1)
Using the 183-shrinking
branch, we can now get the first trace for every distinct result, along with its complexity:
results :: Way -> MemType -> IO () results way memtype = do let settings = set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))
Here are the results for systematic testing:
λ> results (systematically defaultBounds) SequentialConsistency Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S2-----S1---S0---S3-----S0-- (0,0,6,24) Right (0,0,0) S0------------S2-P3-----S1---S0--S2----S0--- (0,1,6,23) Right (1,0,0) S0------------S3-----S1---S0--S2-----S0--- (0,0,6,24) λ> results (systematically defaultBounds) TotalStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23) λ> results (systematically defaultBounds) PartialStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23)
Pretty messy, right? Here’s the results for random testing:
λ> results (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----P1-P0----P2-P1-P0-P3-P1-S2-P3--P0-P3-P0-P3-S2-P0-S2-P0--P2-S0- (0,15,5,9) Right (0,0,1) S0-------P2-P1-P2-P0--P2-P0-P1-P0---S2-P3-P0-P2-S3---P1-S3-S0-- (0,12,5,12) Right (1,0,0) S0------------S3-----S1-P2-P1-P0--S2---P1-S0--- (0,4,5,20) Right (0,0,0) S0---------P2-P0--P3-P0-S3--P2-P3-P2--P3-S2-S1--P0---- (0,9,4,15) λ> results (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14) λ> results (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14)
Yikes!
The complexity metric I defined counts four things:
I would much rather read a long trace where the only context switches are when threads block, than a short one which is rapidly jumping between threads. So, given two equivalent traces, I will always prefer the one with a lexicographically smaller complexity-tuple.
The key idea underpinning trace simplification is that dejafu can tell when two scheduling decisions can be swapped without changing the behaviour of the program. I talked about this idea in the Using Hedgehog to Test Déjà Fu memo. So we can implement transformations which are guaranteed to preserve semantics without needing to verify this by re-running the test case.
Although we don’t need to re-run the test case at all, the 183-shrinking
branch currently does, but only once at the end after the minimum has been found. This is because it’s easier to generate a simpler sequence of scheduling decisions and use dejafu to produce the corresponding trace than it is to produce a simpler trace directly. This is still strictly better than a typical shrinking algorithm, which would re-run the test case after each shrinking step, rather than only at the end.
Rather than drag this out, here’s what those random traces simplify to:
resultsS :: Way -> MemType -> IO () resultsS way memtype = do let settings = set lsimplify True . set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)
This is much better.
There are two simplification phases: a preparation phase, which puts the trace into a normal form and prunes unnecessary commits; and an iteration phase, which repeats a step function until a fixed point is reached (or the iteration limit is).
The preparation phase has two steps: first we put the trace into lexicographic normal form, then we prune unnecessary commits.
We put a trace in lexicographic normal form by sorting by thread ID, where only independent actions can be swapped:
lexicoNormalForm :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] lexicoNormalForm memtype = go where go trc = let trc' = bubble initialDepState trc in if trc == trc' then trc else go trc' bubble ds (t1@(tid1, ta1):t2@(tid2, ta2):trc) | independent ds tid1 ta1 tid2 ta2 && tid2 < tid1 = bgo ds t2 (t1 : trc) | otherwise = bgo ds t1 (t2 : trc) bubble _ trc = trc bgo ds t@(tid, ta) trc = t : bubble (updateDepState memtype ds tid ta) trc
If simplification only put traces into lexicographic normal form, we would get these results:
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18)
These are better than they were, but we can do better still.
After putting the trace into lexicographic normal form, we delete any commit actions which are followed by any number of independent actions and then a memory barrier:
dropCommits :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] dropCommits SequentialConsistency = id dropCommits memtype = go initialDepState where go ds (t1@(tid1, ta1@(CommitCRef _ _)):t2@(tid2, ta2):trc) | isBarrier (simplifyAction ta2) = go ds (t2:trc) | independent ds tid1 ta1 tid2 ta2 = t2 : go (updateDepState memtype ds tid2 ta2) (t1:trc) go ds (t@(tid,ta):trc) = t : go (updateDepState memtype ds tid ta) trc go _ [] = []
Such commits don’t affect the behaviour of the program at all, as all buffered writes gets flushed when the memory barrier happens.
If simplification only did the preparation phase, we would get these results:
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm
The iteration phase attempts to reduce context switching by pushing actions forwards, or pulling them backwards, through the trace.
If we have the trace [(tid1, act1), (tid2, act2), (tid1, act3)]
, where act2
and act3
are independent, the “pull back” transformation would re-order that to [(tid1, act1), (tid1, act3), (tid2, act2)]
.
In contrast, if act1
and act2
were independent, the “push forward” transformation would re-order that to [(tid2, act2), (tid1, act1), (tid1, act3)]
. The two transformations are almost, but not quite opposites.
Pull-back walks through the trace and, at every context switch, looks forward to see if there is a single action of the original thread it can put before the context switch:
pullBack :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pullBack memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 trc' = if tid1 /= tid2 then maybe trc (uncurry (:)) (findAction tid1 ds' trc) else trc in t1 : go ds' trc' go _ trc = trc findAction tid0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just (t, trc) | otherwise = case fgo (updateDepState memtype ds tid ta) trc of Just (ft@(ftid, fa), trc') | independent ds tid ta ftid fa -> Just (ft, t:trc') _ -> Nothing fgo _ _ = Nothing
Push-forward walks through the trace and, at every context switch, looks forward to see if the last action of the original thread can be put at its next execution:
pushForward :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pushForward memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 in if tid1 /= tid2 then maybe (t1 : go ds' trc) (go ds) (findAction tid1 ta1 ds trc) else t1 : go ds' trc go _ trc = trc findAction tid0 ta0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just ((tid0, ta0) : t : trc) | independent ds tid0 ta0 tid ta = (t:) <$> fgo (updateDepState memtype ds tid ta) trc | otherwise = Nothing fgo _ _ = Nothing
The iteration process just repeats pushForward memtype . pullBack memtype
.
If it only used pullBack
, we would get these results:
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2---P0--S2--S0-P3-----S0-- (0,3,5,21) Right (0,0,1) S0-----------P2-P1-P2--P1--S0--S2--S0-P3-----S0-- (0,5,5,19) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S2-P3---P2----S3--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21)
With no exception, iterating pullBack
is an improvement over just doing preparation.
If it only used pushForward
, we would get these results:
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-------P1---S2--P0------S2--P3----P0--- (0,4,3,20) Right (0,0,1) S0-------P2-P1-P2-P1-P0------S1-S2---S3----P0--- (0,6,4,18) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) ^-- no improvement over preparation Right (0,0,0) S0------------S3--P2-----S3--P1--P0---- (0,3,3,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation
With three exceptions, where the traces didn’t change, iterating pushForward
is an improvement over just doing preparation.
We’ve already seen the results if we combine them:
λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)
I think what I have right now is pretty good. It’s definitely a vast improvement over not doing any simplification.
But, no random traces get simplified to the corresponding systematic traces, which is a little disappointing. I think that’s because the current passes just try to reduce context switches of any form, whereas really I want to reduce pre-emptive context switches more than non-pre-emptive ones.
]]>Trying every possible interleaving will give you, in general, an exponential blow-up of the executions you need to perform as your test case grows in size. The core testing algorithm we use, a variant of dynamic partial-order reduction (DPOR)1, attempts to reduce this blow-up. DPOR identifies actions which are dependent, and only tries interleavings which permute dependent actions.
Here are some examples:
It doesn’t matter which order two threads execute readMVar
, for the same MVar
. These actions are independent.
It does matter which order two threads execute putMVar
, for the same MVar
. These actions are dependent.
It doesn’t matter which order two threads execute putMVar
for different MVar
s. These actions are independent.
Two actions are dependent if the order in which they are performed matters.
So the intuition behind DPOR is that most actions in a concurrent program are independent. DPOR won’t help you much if you have a single piece of shared state which every thread is hitting, but most concurrent programs aren’t like that. The worst case is still a terrible exponential blow-up, but the average case is much better.
The dependency relation is core part of Déjà Fu today. It has impacts on both performance and correctness. If it says two actions are dependent when they are not, then we may see unnecessary interleavings tried. If it says two actions are not dependent when they really are, then we may miss necessary interleavings.
Being such an important component, it must be well-tested, right? Well, sort of. The Déjà Fu testsuite mostly consists of small concurrent programs together with a list of expected outputs, testing that Déjà Fu finds all the nondeterminism in the program. This does exercise the dependency relation, but only very indirectly.
There things would have remained had I not experienced one of those coincidence-driven flashes of insight:
aherrmann opened an issue on GitHub asking how to take an execution trace and replay it.
agnishom posted a thread on /r/algorithms asking how to check the equivalence of traces where only some elements commute.
I had my idea. I can directly test the dependency relation like so:
So, what is a good normal form for a trace? I tried out a few approaches here, but there was one I kept coming back to: we should shuffle around independent actions to keep the program on the main thread for as long as possible.
There are two reasons I think this works well. (1) The traces we get will be easier for a human to read, as the program will stay on its main thread and only execute another thread where necessary. (2) A Haskell program terminates when the main thread terminates, so by executing the main thread as much as possible, we may find that some actions don’t need to be executed at all.
So firstly we need to know when two actions commute. Let’s just use the dependency relation for that:
-- | Check if two actions commute. independent :: DepState -> (ThreadId, ThreadAction) -> (ThreadId, ThreadAction) -> Bool independent ds (tid1, ta1) (tid2, ta2) = not (dependent ds tid1 ta1 tid2 ta2)
The DepState
parameter tracks information about the history of the execution, allowing us to make better decisions. For example: while in general it matters in which order two putMVar
s to the same MVar
happen; it doesn’t matter if the MVar
is already full, as both actions will block without achieving anything.
The approach works well in practice, but has been the source of so many off-by-one errors. Even while writing this memo!
So now onto trace normalisation. The easiest way to do it is bubble sort, but with an additional constraint on when we can swap things:
x
and y
in the trace:
x
and y
commute and thread_id y < thread_id x
:
x
and y
.DepState
and continue to the next pair.And here’s the code:
-- | Rewrite a trace into a canonical form. normalise :: [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] normalise trc0 = if changed then normalise trc' else trc' where (changed, trc') = bubble initialDepState False trc0 bubble ds flag ((x@(tid1, _)):(y@(tid2, _)):trc) | independent ds x y && tid2 < tid1 = go ds True y (x : trc) | otherwise = go ds flag x (y : trc) bubble _ flag trc = (flag, trc) go ds flag t@(tid, ta) trc = second (t :) (bubble (updateDepState ds tid ta) flag trc)
Now we need a scheduler which can play a given list of scheduling decisions. This isn’t built in, but we can make one. Schedulers look like this:
-- from Test.DejaFu.Schedule newtype Scheduler state = Scheduler { scheduleThread :: Maybe (ThreadId, ThreadAction) -> NonEmpty (ThreadId, Lookahead) -> state -> (Maybe ThreadId, state) }
A scheduler is a stateful function, which takes the previously scheduled action and the list of runnable threads, and gives back a thread to execute. We don’t care about those parameters. We just want to play a fixed list of scheduling decisions. And here is how we do that:
-- | Execute a concurrent program by playing a list of scheduling decisions. play :: MemType -> [ThreadId] -> ConcIO a -> IO (Either Failure a, [ThreadId], Trace) play = runConcurrent (Scheduler sched) where sched _ _ (t:ts) = (Just t, ts) sched _ _ [] = (Nothing, [])
Now all the background is in place, so we can test what we want to test: that an execution, and the play-back of its normalised trace, give the same result. For reasons which will become apparent in the next section, I’m going to parameterise over the normalisation function:
-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and return both results. runNorm :: ([(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)]) -> Int -> MemType -> ConcIO a -> IO (Either Failure a, [ThreadId], Either Failure a, [ThreadId]) runNorm norm seed memtype conc = do let g = mkStdGen seed -- 1 (efa1, _, trc) <- runConcurrent randomSched memtype g conc let -- 2 trc' = tail ( scanl (\(t, _) (d, _, a) -> (tidOf t d, a)) (initialThread, undefined) trc ) let tids1 = map fst trc' let tids2 = map fst (norm trc') -- 3 (efa2, s, _) <- play memtype tids2 conc let truncated = take (length tids2 - length s) tids2 -- 4 pure (efa1, tids1, efa2, truncated)
There’s a lot going on here, so let’s break it down:
We execute the program with the built-in random scheduler, using the provided seed.
The trace that runConcurrent
gives us is in the form [(Decision, [(ThreadId, Lookahead)], ThreadAction)]
, whereas we want a [(ThreadId, ThreadAction)]
. So this scan just changes the format. It’s a scan rather than a map because to convert a Decision
into a ThreadId
potentially requires knowing what the previous thread was.
We normalise the trace, and run it again.
If the entire normalised trace wasn’t used up, then it has some unnecessary suffix (because the main thread is now terminating sooner). So we make the normalised trace easier to read by chopping off any such suffix.
Finally, we can write a little function to test using the normalise
function:
-- | Execute a concurrent program with a random scheduler, normalise its trace, -- execute the normalised trace, and check that both give the same result. testNormalise :: (Eq a, Show a) => Int -> MemType -> ConcIO a -> IO Bool testNormalise seed memtype conc = do (efa1, tids1, efa2, tids2) <- runNorm normalise seed memtype conc unless (efa1 == efa2) $ do putStrLn "Mismatched result!" putStrLn $ " expected: " ++ show efa1 putStrLn $ " but got: " ++ show efa2 putStrLn "" putStrLn $ "rewritten from: " ++ show tids1 putStrLn $ " to: " ++ show tids2 pure (efa1 == efa2)
And does it work? Let’s copy two example programs from the Test.DejaFu docs:
-- from Test.DejaFu example1 :: MonadConc m => m String example1 = do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var example2 :: MonadConc m => m (Bool, Bool) example2 = do r1 <- newCRef False r2 <- newCRef False x <- spawn $ writeCRef r1 True >> readCRef r2 y <- spawn $ writeCRef r2 True >> readCRef r1 (,) <$> readMVar x <*> readMVar y
And then test them:
> testNormalise 0 TotalStoreOrder example1 True > testNormalise 0 TotalStoreOrder example2 True
According to my very unscientific method, everything works perfectly!
You can probably see where this is going: just supplying one random seed and one memory model is a poor way to test things. Ah, if only we had some sort of tool to generate arbitrary values for us!
But that’s not all: if the dependency relation is correct, then any permutation of independent actions should give the same result, not just the one which normalise
implements. So before we introduce Hedgehog and arbitrary values, let’s make something a little more chaotic:
-- | Shuffle independent actions in a trace according to the given list. shuffle :: [Bool] -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] shuffle = go initialDepState where go ds (f:fs) (t1:t2:trc) | independent ds t1 t2 && f = go' ds fs t2 (t1 : trc) | otherwise = go' ds fs t1 (t2 : trc) go _ _ trc = trc go' ds fs t@(tid, ta) trc = t : go (updateDepState ds tid ta) fs trc
In normalise
, two independent actions will always be re-ordered if it gets us closer to the canonical form. However, in shuffle
, two independent actions will either be re-ordered or not, depending on the supplied list of Bool
.
This is much better for testing our dependency relation, as we can now get far more re-orderings which all should satisfy the same property: that no matter how the independent actions in a trace are shuffled, we get the same result.
I think it’s about time to bring out Hedgehog:
-- | Execute a concurrent program with a random scheduler, arbitrarily permute -- the independent actions in the trace, and check that we get the same result -- out. hog :: (Eq a, Show a) => ConcIO a -> IO Bool hog conc = Hedgehog.check . property $ do mem <- forAll Gen.enumBounded -- 1 seed <- forAll $ Gen.int (Range.linear 0 100) fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool (efa1, tids1, efa2, tids2) <- liftIO -- 2 $ runNorm (shuffle fs) seed mem conc footnote (" to: " ++ show tids2) -- 3 footnote ("rewritten from: " ++ show tids1) efa1 === efa2
Let’s break that down:
We’re telling Hedgehog that this property should hold for all memory models, all seeds, and all Bool
-lists. Unlike most Haskell property-testing libraries, Hedgehog takes generator functions rather than using a typeclass. I think this is nicer.
We run our program, normalise it, and get all the results just as before.
We add some footnotes: messages which Hedgehog will display along with a failure. For some reason these get displayed in reverse order.
Alright, let’s see if Hedgehog finds any bugs for us:
> hog example1 ? <interactive> failed after 3 tests and 1 shrink. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 2) (Seed 1824012233418733250 (-4876494268681827407)) <property> False
It did! And look at that output! Magical! I must see if I can get Déjà Fu to give annotated source output like that.
Let’s look at example1
again:
do var <- newEmptyMVar fork (putMVar var "hello") fork (putMVar var "world") readMVar var
Oh dear, our rewritten trace is trying to execute thread 1
immediately after the first action of the main thread. The first action of the main thread is newEmptyMVar
: thread 1
doesn’t exist at that point!
Let’s change our independent
function to say that an action is dependent with the fork which creates its thread:
independent ds (tid1, ta1) (tid2, ta2) | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)
How about now?
> hog example1 ? <interactive> failed after 13 tests and 2 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? SequentialConsistency 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ True , True ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right "hello" ? ? + Left InternalError rewritten from: [main,main,1,main,1,2,main,2,main] to: [main,1] This failure can be reproduced by running: > recheck (Size 12) (Seed 654387260079025817 (-6686572164463137223)) <property> False
Well, that failing trace looks exactly like the previous error. But the parameters are different: the first error happened with the list [False, True]
, this requires the list [True, True]
. So let’s think about what happens to the trace in this case.
We start with: [(main, NewEmptyMVar 0), (main, Fork 1), (1, PutMVar 0)]
.
The first two actions are independent, and the flag is True
, so we swap them. We now have: [(main, Fork 1), (main, NewEmptyMVar 1), (1, PutMVar 0)]
.
The second two actions are independent, and the flag is True
, so we swap them. We now have: [(main, Fork 1), (1, PutMVar 0), (main, NewEmptyMVar 0)]
.
We can’t actually re-order actions of the same thread, so we should never have swapped the first two. I suppose there’s another problem here, that no action on an MVar
commutes with creating that MVar
, but we should never be in a situation where that could happen. So we need another case in independent
:
independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = not (dependent ds tid1 ta1 tid2 ta2)
Our first example program works fine now:
> hog example1 ? <interactive> passed 100 tests. True
The second is a little less happy:
> hog example2 ? <interactive> failed after 48 tests and 9 shrinks. ??? extra.hs ??? 82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool 83 ? hog conc = Hedgehog.check . property $ do 84 ? mem <- forAll Gen.enumBounded ? ? TotalStoreOrder 85 ? seed <- forAll $ Gen.int (Range.linear 0 100) ? ? 0 86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool ? ? [ False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , False ? ? , True ? ? ] 87 ? 88 ? (efa1, tids1, efa2, tids2) <- liftIO 89 ? $ runNorm (shuffle fs) seed mem conc 90 ? footnote (" to: " ++ show tids2) 91 ? footnote ("rewritten from: " ++ show tids1) 92 ? efa1 === efa2 ? ^^^^^^^^^^^^^ ? ? Failed (- lhs =/= + rhs) ? ? - Right ( False , True ) ? ? + Left InternalError rewritten from: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,main,2,-1,2,2,main,main] to: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,2,main,-1] This failure can be reproduced by running: > recheck (Size 47) (Seed 2159662051602767058 (-7857629802164753123)) <property> False
This is a little trickier. Here’s my diagnosis:
It’s an InternalError
again, which means we’re trying to execute a thread which isn’t runnable.
The memory model is TotalStoreOrder
, and the thread we’re trying to execute is thread -1
, a “fake” thread used in the relaxed memory implementation. So this is a relaxed memory bug.
The traces only differ in one place: where main, 2, -1
is changed to 2, main, -1
. So the issue is caused by re-ordering main
and thread 2
.
If the main
action is a memory barrier, then thread -1
will not exist after it.
So the main
action is probably a memory barrier.
Let’s push along those lines and add a case for memory barriers to independent
:
independent ds (tid1, ta1) (tid2, ta2) | tid1 == tid2 = False | ta1 == Fork tid2 = False | ta2 == Fork tid1 = False | otherwise = case (simplifyAction ta1, simplifyAction ta2) of (UnsynchronisedWrite _, a) | isBarrier a -> False (a, UnsynchronisedWrite _) | isBarrier a -> False _ -> not (dependent ds tid1 ta1 tid2 ta2)
Did we get it?
> hog example2 ? <interactive> passed 100 tests. True
Great!
So, we explored the dependency relation with Hedgehog, and found three missing cases:
Two actions of the same thread are dependent.
Any action of a thread is dependent with the fork
which creates that thread.
Unsynchronised writes are dependent with memory barriers.
But are these bugs? I’m not so sure:
The dependency relation is only ever used to compare different threads.
This is technically correct, but it’s not interesting or useful.
This could be a bug. The relaxed memory implementation is pretty hairy and I’ve had a lot of problems with it in the past. Honestly, I just need to rewrite it (or campaign for Haskell to become sequentially consistent2 and rip it out).
But even if not bugs, these are definitely confusing. The dependency relation is currently just an internal thing, not exposed to users. However, I’m planning to expose a function to normalise traces, in which case providing an independent
function is entirely reasonable.
So even if these changes don’t make it into dependent
, they will be handled by independent
.
Next steps: I’m going to get this into the test suite, to get a large number of extra example programs for free. My hacky and cobbled-together testing framework in dejafu-tests is capable of running every test case with a variety of different schedulers, so I just need to add another way it runs everything. I won’t need to touch the actual tests, just the layer of glue which runs them all, which is nice.
The only problem is that this glue is currently based on HUnit and test-framework, whereas the only integration I can find for Hedgehog is tasty-hedgehog, so I might need to switch to tasty first. As usual, the hardest part is getting different libraries to co-operate!
Hopefully I’ll find some bugs! Well, not exactly hopefully, but you know what I mean.
For all the gory details, see:
Dynamic partial order reduction for relaxed memory models, N. Zhang, M. Kusano, and C. Wang (2015)
Bounded partial-order reduction, K. Coons, M. Musuvathi, and K. McKinley (2013)
Refining dependencies improves partial-order verification methods (extended abstract), P. Godefroid and D. Pirottin (1993)
SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap, M. Vollmer, R. G. Scott, M. Musuvathi, and R. R. Newton (2017)↩︎
There are 100 prisoners in solitary cells. There’s a central living room with one light bulb; this bulb is initially off. No prisoner can see the light bulb from his or her own cell. Everyday, the warden picks a prisoner equally at random, and that prisoner visits the living room. While there, the prisoner can toggle the bulb if he or she wishes. Also, the prisoner has the option of asserting that all 100 prisoners have been to the living room by now. If this assertion is false, all 100 prisoners are shot. However, if it is indeed true, all prisoners are set free and inducted into MENSA, since the world could always use more smart people. Thus, the assertion should only be made if the prisoner is 100% certain of its validity. The prisoners are allowed to get together one night in the courtyard, to discuss a plan. What plan should they agree on, so that eventually, someone will make a correct assertion?
We can express this as a concurrency problem: the warden is the scheduler, each prisoner is a thread, and when the program terminates every prisoner should have visited the living room.
Let’s set up some imports:
{-# LANGUAGE RankNTypes #-} import qualified Control.Concurrent.Classy as C import Control.Monad (forever, when) import Data.Foldable (for_) import Data.List (genericLength) import Data.Maybe (mapMaybe) import qualified Data.Set as S import qualified Test.DejaFu as D import qualified Test.DejaFu.Common as D import qualified Test.DejaFu.SCT as D
Before we try to implement a solution, let’s think about how we can check if an execution corresponds to the prisoners succeeding an entering MENSA, or failing and being shot.
Prisoners are threads, and the warden is the scheduler. So if every thread (prisoner) that is forked is scheduled (taken to the room), then the prisoners are successful:
-- | Check if an execution corresponds to a correct guess. isCorrect :: D.Trace -> Bool isCorrect trc = S.fromList (threads trc) == S.fromList (visits trc) -- | Get all threads created. threads :: D.Trace -> [D.ThreadId] threads trc = D.initialThread : mapMaybe go trc where go (_, _, D.Fork tid) = Just tid go _ = Nothing -- | Get all scheduled threads visits :: D.Trace -> [D.ThreadId] visits = mapMaybe go where go (D.Start tid, _, _) = Just tid go (D.SwitchTo tid, _, _) = Just tid go _ = Nothing
So now, given some way of setting up the game and running it to completion, we can test it and print some statistics:
-- | Run the prison game and print statistics. run :: D.Way -> (forall m. C.MonadConc m => m ()) -> IO () run way game = do traces <- map snd <$> D.runSCT way D.defaultMemType game let successes = filter isCorrect traces let failures = filter (not . isCorrect) traces putStrLn (show (length traces) ++ " total attempts") putStrLn (show (length successes) ++ " successes") putStrLn (show (length failures) ++ " failures") putStrLn (show (avgvisits successes) ++ " average number of room visits per success") putStrLn (show (avgvisits failures) ++ " average number of room visits per failure") putStrLn "Sample sequences of visits:" for_ (take 5 traces) (print . visits) where avgvisits ts = sum (map (fromIntegral . numvisits) ts) / genericLength ts numvisits = sum . map count where count (_, _, D.STM _ _) = 1 count (_, _, D.BlockedSTM _) = 1 count (_, _, D.Yield) = 1 count _ = 0
I have decided to assume that a prisoner will either yield (doing nothing) or perform some STM transaction while they’re in the room, to simplify things.
A slow but simple strategy is for the prisoners to nominate a leader. Only the leader can declare to the warden that everyone has visited the room. Whenever a prisoner other than the leader visits the room, if the light is on, they do nothing; otherwise, if this is their first time in the room with the light off, they turn it on, otherwise they leave it. Whenever the leader enters the room, they turn the light off. When the leader has turned the light off 99 times (or 1 - num_prisoners
times), they tell the warden that everyone has visited.
Let’s set up those algorithms:
-- | The state of the light bulb. data Light = IsOn | IsOff -- | Count how many prisoners have toggled the light and terminate -- when everyone has. leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Light -> m () leader prisoners light = go 0 where go counter = do counter' <- C.atomically $ do state <- C.readTVar light case state of IsOn -> do C.writeTVar light IsOff pure (counter + 1) IsOff -> C.retry when (counter' < prisoners - 1) (go counter') -- | Turn the light on once then do nothing. notLeader :: C.MonadConc m => C.TVar (C.STM m) Light -> m () notLeader light = do C.atomically $ do state <- C.readTVar light case state of IsOn -> C.retry IsOff -> C.writeTVar light IsOn forever C.yield
So now we just need to create a program where the leader is the main thread and everyone else is a separate thread:
-- | Most popular English male and female names, according to -- Wikipedia. name :: Int -> String name i = ns !! (i `mod` length ns) where ns = ["Oliver", "Olivia", "George", "Amelia", "Harry", "Emily"] -- | Set up the prison game. The number of prisoners should be at -- least 1. prison :: C.MonadConc m => Int -> m () prison prisoners = do light <- C.atomically (C.newTVar IsOff) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader light)) leader prisoners light
Because these are people, not just threads, I’ve given them names. The leader is just called “main” though, how unfortunate for them.
Now we can try out our system and see if it works:
λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 5 total attempts 5 successes 0 failures 7.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main,Olivia,main] [main,Olivia,main] λ> runS 3 2035 total attempts 2035 successes 0 failures 133.39066339066338 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: (big lists omitted)
This doesn’t scale well. It’s actually a really bad case for concurrency testing: every thread is messing with the same shared state, so dejafu has to try all the orderings. Not good.
Taking another look at our prisoners, we can see two things which a human would use to decide whether some schedules are redundant or not:
If we adopt any schedule other than alternating leader / non-leader, threads will block without doing anything. So we should alternate.
When a non-leader has completed their task, they will always yield. So we should never schedule a prisoner who will yield.
Unfortunately dejafu can’t really make use of (1). It could be inferred if dejafu was able to compare values inside TVar
s, rather than just seeing that there had been a write. But Haskell doesn’t let us do that without slapping an Eq
constraint on writeTVar
, which I definitely don’t want to do (although maybe having a separate eqwriteTVar
, eqputMVar
, and so on would be a nice addition).
Fortunately, dejafu can do something with (2). It already bounds the maximum number of times a thread can yield, so that we can test constructs like spinlocks. This is called fair bounding. The default bound is 5, but if we set it to 0 dejafu will just never schedule a thread which is going to yield. Here we go:
λ> let runS = run $ D.systematically (D.defaultBounds { D.boundPreemp = Nothing, D.boundFair = Just 0 }) λ> runS 1 1 total attempts 1 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main] λ> runS 2 1 total attempts 1 successes 0 failures 4.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main] λ> runS 3 4 total attempts 4 successes 0 failures 7.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main] [main,Olivia,George,main,George,main] [main,George,main,Olivia,main] [main,George,Olivia,main,Olivia,main]
Much better! Although it still doesn’t scale as nicely as we’d like
λ> runS 4 48 total attempts 48 successes 0 failures 11.5 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main] [main,Olivia,main,Amelia,main,George,main] [main,Olivia,main,Amelia,George,main,George,main] [main,Olivia,George,main,George,main,Amelia,main] λ> runS 5 1536 total attempts 1536 successes 0 failures 16.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main] [main,Olivia,main,George,main,Harry,main,Amelia,main] [main,Olivia,main,George,main,Harry,Amelia,main,Amelia,main] [main,Olivia,main,George,Amelia,main,Amelia,main,Harry,main] λ> runS 6 122880 total attempts 122880 successes 0 failures 21.0 average number of room visits per success NaN average number of room visits per failure Sample sequences of visits: [main,Olivia,main,George,main,Amelia,main,Harry,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Harry,Emily,main,Emily,main] [main,Olivia,main,George,main,Amelia,main,Emily,main,Harry,main] [main,Olivia,main,George,main,Amelia,main,Emily,Harry,main,Harry,main] [main,Olivia,main,George,main,Amelia,Harry,main,Harry,main,Emily,main]
The prisoners are stepping on each other’s toes and causing needless work. This is probably as good as we can do without adding some extra primitives to dejafu to optimise the case where we have an Eq
instance available, unfortunately.
In concurrency testing terms, six threads is actually quite a lot.
Empirical studies have found that many concurrency bugs can be exhibited with only two or three threads! Furthermore, most real-world concurrent programs don’t have every single thread operating on the same bit of shared state.
There’s another school of thought which says to just wait for three years, because by then it’s very unlikely that any single prisoner had never visited the room. In fact, we would expect each prisoner to have been to the room ten times by then, assuming the warden is fair.
By keeping track of how many days have passed, we can try this out as well:
leader :: C.MonadConc m => Int -> C.TVar (C.STM m) Int -> m () leader prisoners days = C.atomically $ do numDays <- C.readTVar days C.check (numDays >= (prisoners - 1) * 10) notLeader :: C.MonadConc m => C.TVar (C.STM m) Int -> m () notLeader days = forever . C.atomically $ C.modifyTVar days (+1) prison :: C.MonadConc m => Int -> m () prison prisoners = do days <- C.atomically (C.newTVar 0) for_ [1..prisoners-1] (\i -> C.forkN (name i) (notLeader days)) leader prisoners days
Now let’s see how these brave prisoners do (sample visit sequences omitted because they’re pretty long):
λ> let runR = run $ D.uniformly (R.mkStdGen 0) 100 λ> runR 1 100 total attempts 100 successes 0 failures 2.0 average number of room visits per success NaN average number of room visits per failure λ> runR 2 100 total attempts 100 successes 0 failures 18.35 average number of room visits per success NaN average number of room visits per failure λ> runR 3 100 total attempts 100 successes 0 failures 31.92 average number of room visits per success NaN average number of room visits per failure λ> runR 4 100 total attempts 100 successes 0 failures 43.52 average number of room visits per success NaN average number of room visits per failure λ> runR 5 100 total attempts 100 successes 0 failures 55.88 average number of room visits per success NaN average number of room visits per failure λ> runR 6 100 total attempts 100 successes 0 failures 67.37 average number of room visits per success NaN average number of room visits per failure λ> runR 7 100 total attempts 100 successes 0 failures 77.05 average number of room visits per success NaN average number of room visits per failure λ> runR 8 100 total attempts 99 successes 1 failures 90.4040404040404 average number of room visits per success 81.0 average number of room visits per failure λ> runR 9 100 total attempts 100 successes 0 failures 101.64 average number of room visits per success NaN average number of room visits per failure λ> runR 10 100 total attempts 100 successes 0 failures 114.89 average number of room visits per success NaN average number of room visits per failure
Not bad at all! Although my puny VPS still can’t manage all 100.
]]>As before, all code is available on GitHub. The code for this post is under the “post-02” tag.
Did you do last time’s homework task? It was to implement this interface:
data CRef m a = -- ... newCRef :: a -> MiniFu m (CRef m a) readCRef :: CRef m a -> MiniFu m a writeCRef :: CRef m a -> a -> MiniFu m () atomicModifyCRef :: CRef m a -> (a -> (a, b)) -> MiniFu m b
Here are my solutions, available at the “homework-01” tag:
2070bdf
) Add the CRef
type, the PrimOp
constructors, and the wrapper functions188eec5
) Implement the primopsI also made some changes, available at the “pre-02” tag:
7ce6e41
) Add a helper for primops which don’t create any identifiers2419796
) Move some definitions into an internal module9c49f9d
) Change the type of the block
helper to MVarId -> Threads m -> Threads m
dabd84b
) Implement readMVar
Now on to the show…
We can’t implement exceptions with what we have already. We’re going to need some new primops. I think you’re getting a feel for how this works now, so I won’t drag this out. Here we go:
import qualified Control.Exception as E data PrimOp m where -- ... Throw :: E.Exception e => e -> PrimOp m Catch :: E.Exception e => MiniFu m a -> (e -> MiniFu m a) -> (a -> PrimOp m) -> PrimOp m PopH :: PrimOp m -> PrimOp m throw :: E.Exception e => e -> MiniFu m a throw e = MiniFu (K.cont (\_ -> Throw e)) catch :: E.Exception e => MiniFu m a -> (e -> MiniFu m a) -> MiniFu m a catch act h = MiniFu (K.cont (Catch act h))
Throwing an exception with throw
jumps back to the closest enclosing catch
with an exception handler of the appropriate type, killing the thread if there is none. The PopH
primop will pop the top exception handler from the stack. We’ll insert those as appropriate when entering a catch
.
Before we can actually implement these primops, we need to give threads a place to store their exception handlers. You might have guessed it when I said “stack”: we’ll just give every thread a list of them. This requires changing our Thread
type and thread
function:
data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe MVarId , threadExc :: [Handler m] -- <- new } data Handler m where Handler :: E.Exception e => (e -> PrimOp m) -> Handler m thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing , threadExc = [] -- <- new }
As Exception
is a subclass of Typeable
, given some exception value we’re able to look for the first matching handler:
raise :: E.Exception e => e -> Thread m -> Maybe (Thread m) raise exc thrd = go (threadExc thrd) where go (Handler h:hs) = case h <$> E.fromException exc' of Just pop -> Just (thrd { threadK = pop, threadBlock = Nothing, threadExc = hs }) Nothing -> go hs go [] = Nothing exc' = E.toException exc
If raise
returns a Just
, then a handler was found and entered. Otherwise, no handler exists and the thread should be removed from the Threads
collection. This can be expressed rather nicely as M.update . raise
.
Now we have enough support to implement the primops:
stepThread {- ... -} where -- ... go (Throw e) = simple (M.update (raise e) tid) go (Catch (MiniFu ma) h k) = simple . adjust $ \thrd -> thrd { threadK = K.runCont ma (PopH . k) , threadExc = let h' exc = K.runCont (runMiniFu (h exc)) k in Handler h' : threadExc thrd } go (PopH k) = simple . adjust $ \thrd -> thrd { threadK = k , threadExc = tail (threadExc thrd) }
Let’s break that down:
Throw
just re-uses our raise
function to either jump to the exception handler or kill the thread.Catch
changes the continuation of the thread to run the enclosed action, then do a PopH
action, then run the outer action. It also adds an exception continuation, which just runs the exception handler, then runs the outer action.PopH
just removes the head exception continuation.It’s important that the exception continuation doesn’t use PopH
to remove itself: that happens in raise
when an exception is thrown. When writing this section I realised I’d made that mistake in dejafu (#139)!
So now we can use synchronous exceptions! Here’s an incredibly contrived example:
{-# LANGUAGE ScopedTypeVariables #-} import Control.Monad (join) example_sync :: MiniFu m Int example_sync = do a <- newEmptyMVar fork (putMVar a (pure 1)) fork (putMVar a (throw E.NonTermination)) fork (putMVar a (throw E.AllocationLimitExceeded)) catch (catch (join (readMVar a)) (\(_ :: E.AllocationLimitExceeded) -> pure 2)) (\(_ :: E.NonTermination) -> pure 3) demo_sync :: IO () demo_sync = do g <- R.newStdGen print . fst =<< minifu randomSched g example_sync
If we run this a few times in ghci, we can see the different exceptions being thrown and caught (resulting in different outputs):
λ> demo_sync Just 1 λ> demo_sync Just 3 λ> demo_sync Just 3 λ> demo_sync Just 2
MonadConc
has a bunch of superclasses, and we can now implement two of them!
import qualified Control.Monad.Catch as EM instance EM.MonadThrow (MiniFu m) where throwM = -- 'throw' from above instance EM.MonadCatch (MiniFu m) where catch = -- 'catch' from above
The exceptions package provides the MonadThrow
, MonadCatch
, and MonadMask
typeclasses, so we can talk about exceptions in a wider context than just IO
. We’ll get on to MonadMask
when we look at asynchronous exceptions.
It is with exceptions that we hit the first thing we can’t do in MiniFu.
When in IO
, we can catch exceptions from pure code:
λ> import Control.Exception λ> evaluate undefined `catch` \e -> putStrLn ("Got " ++ show (e :: SomeException)) Got Prelude.undefined CallStack (from HasCallStack): error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err undefined, called at <interactive>:5:10 in interactive:Ghci2
But we can’t do that in MiniFu
, as there’s no suitable evaluate
function.
Should there be an evaluate
in the MonadConc
class? I’m unconvinced, as it’s not really a concurrency operation.
Should we constrain the m
in MiniFu m
to be a MonadIO
, which would let us call evaluate
? Perhaps, that would certainly be a way to do it, and I’m currently investigating the advantages of an IO
base monad for dejafu (although originally for a different reason).
Asynchronous exceptions are like synchronous exceptions, except for two details:
ThreadId
. We can do this already with raise
.When a thread is masked, attempting to deliver an asynchronous exception to it will block. There are three masking states:
Unmasked
, asynchronous exceptions are unmasked.MaskedInterruptible
, asynchronous exceptions are masked, but blocked operations may still be interrupted.MaskedUninterruptible
, asynchronous exceptions are masked, and blocked operations may not be interrupted.So we’ll add the current masking state to our Thread
type, defaulting to Unmasked
, and also account for blocking on another thread:
data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe (Either ThreadId MVarId) -- <- new , threadExc :: [Handler m] , threadMask :: E.MaskingState -- <- new } thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing , threadExc = [] , threadMask = E.Unmasked -- <- new }
We’ll also need a primop to set the masking state:
data PrimOp m where -- ... Mask :: E.MaskingState -> PrimOp m -> PrimOp m
Which has a fairly straightforward implementation:
stepThread {- ... -} where -- ... go (Mask ms k) = simple . adjust $ \thrd -> thrd { threadK = k , threadMask = ms }
Finally, we need to make sure that if an exception is raised, and we jump into an exception handler, the masking state gets reset to what it was when the handler was created. This means we need a small change to the Catch
primop:
stepThread {- ... -} where -- ... go (Catch (MiniFu ma) h k) = simple . adjust $ \thrd -> thrd { threadK = K.runCont ma (PopH . k) , threadExc = let ms0 = threadMask thrd -- <- new h' exc = flip K.runCont k $ do K.cont (\c -> Mask ms0 (c ())) -- <- new runMiniFu (h exc) in Handler h' : threadExc thrd }
Alright, now we have enough background to actually implement the user-facing operations.
To throw an asynchronous exception, we’re going to need a new primop:
data PrimOp m where -- ... ThrowTo :: E.Exception e => ThreadId -> e -> PrimOp m -> PrimOp m
Which has a corresponding wrapper function:
throwTo :: E.Exception e => ThreadId -> e -> MiniFu m () throwTo tid e = MiniFu (K.cont (\k -> ThrowTo tid e (k ())))
Let’s think about the implementation of the ThrowTo
primop. It first needs to check if the target thread is interruptible and, if so, raises the exception in that thread; if not, it blocks the current thread. A thread is interruptible if its masking state is Unmasked
, or MaskedInterruptible
and it’s currently blocked.
Let’s encapsulate that logic:
import Data.Maybe (isJust) isInterruptible :: Thread m -> Bool isInterruptible thrd = threadMask thrd == E.Unmasked || (threadMask thrd == E.MaskedInterruptible && isJust (threadBlock thrd))
Given that, the implementation of ThrowTo
is straightforward:
stepThread {- ... -} where -- ... go (ThrowTo threadid e k) = simple $ case M.lookup threadid threads of Just t | isInterruptible t -> goto k . M.update (raise e) threadid | otherwise -> block (Left threadid) Nothing -> goto k
First, check if the thread exists. Then check if it’s interruptible: if it is, raise the exception, otherwise block. If the thread doesn’t exist any more, just continue.
Now we just need to handle unblocking threads which are blocked in ThrowTo
. For that, we’ll go back to the run
function and add a pass to unblock threads if the current one is interruptible after it processes its action:
run :: C.MonadConc m => Scheduler s -> s -> PrimOp m -> m s run sched s0 = go s0 . initialise where go s (threads, idsrc) | initialThreadId `M.member` threads = case runnable threads of Just tids -> do let (chosen, s') = sched tids s (threads', idsrc') <- stepThread chosen (threads, idsrc) let threads'' = if (isInterruptible <$> M.lookup chosen threads') /= Just False then unblock (Left chosen) threads' else threads' -- ^- new go s' (threads'', idsrc') Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . threadBlock) initialThreadId = fst (nextThreadId initialIdSource)
So after stepping a thread, we unblock every thread blocked on it if it either doesn’t exist, of if it does exist and is interruptible. It’s much more robust to do this once here than everywhere in stepThread
which might cause the thread to become interruptible.
There are two operations at the programmer’s disposal to change the masking state of a thread, mask
and uninterruptibleMask
. Here’s what the MiniFu
types will look like:
{-# LANGUAGE RankNTypes #-} mask :: ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> MiniFu m a uninterruptibleMask :: ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> MiniFu m a
Each takes an action to run, and runs it as either MaskedInterruptible
or MaskedUninterruptible
. The action is provided with a polymorphic callback to run a subcomputation with the original masking state.
This is going to need, you guessed it, a new primop! We could modify the Mask
primop to do this job as well, but I think it’s a little clearer to have two separate ones:
data PrimOp m where -- ... InMask :: E.MaskingState -> ((forall x. MiniFu m x -> MiniFu m x) -> MiniFu m a) -> (a -> PrimOp m) -> PrimOp m
And here’s the implementations of our masking functions:
mask ma = MiniFu (K.cont (InMask E.MaskedInterruptible ma)) uninterruptibleMask ma = MiniFu (K.cont (InMask E.MaskedUninterruptible ma))
We can now fulfil another requirement of MonadConc
: a MonadMask
instance!
instance MonadMask (MiniFu m) where mask = -- 'mask' from above uninterruptibleMask = -- 'uninterruptibleMask' from above
The very last piece of the puzzle for exception handling in MiniFu is to implement this InMask
primop. Its type looks quite intense, but the implementation is really not that bad. There are three parts:
stepThread {- ... -} where -- ... go (InMask ms ma k) = simple . adjust $ \thrd -> thrd { threadK = let ms0 = threadMask thrd -- (1) we need to construct the polymorphic argument function umask :: MiniFu m x -> MiniFu m x umask (MiniFu mx) = MiniFu $ do K.cont (\c -> Mask ms0 (c ())) x <- mx K.cont (\c -> Mask ms (c ())) pure x -- (2) we need to run the inner continuation, resetting the masking state -- when done in K.runCont (runMiniFu (ma umask)) (Mask ms0 . k) -- (3) we need to change the masking state , threadMask = ms }
The explicit type signature on umask
is needed because we’re using GADTs
, which implies MonoLocalBinds
, which prevents the polymorphic type from being inferred. We could achieve the same effect by turning on NoMonoLocalBinds
.
Now we have asynchronous exceptions, check it out:
example_async :: MiniFu m String example_async = do a <- newEmptyMVar tid <- fork (putMVar a "hello from the other thread") throwTo tid E.ThreadKilled readMVar a demo_async :: IO () demo_async = do g <- R.newStdGen print . fst =<< minifu randomSched g example_async
See:
λ> demo_async Just "hello from the other thread" λ> demo_async Just "hello from the other thread" λ> demo_async Nothing
We have come to the end of part 2! Again, I hope you enjoyed this post, any feedback is welcome. This is all on GitHub, and you can see the code we ended up with at the “post-02” tag.
Once again, I have some homework for you. Your task, should you choose to accept it, is to implement:
tryPutMVar :: MVar m a -> a -> MiniFu m Bool tryTakeMVar :: MVar m a -> MiniFu m (Maybe a) tryReadMVar :: MVar m a -> MiniFu m (Maybe a)
Solutions will be up in a few days, as before, at the “homework-02” tag.
Stay tuned because next time we’re going to implement STM: all of it in one go. Then we can finally get on to the testing.
Thanks to Will Sewell for reading an earlier draft of this post.
]]>example = do a <- newEmptyMVar forkIO (putMVar a 1) forkIO (putMVar a 2) takeMVar a
Will tell us the possible results of that computation:
λ> test example [1, 2]
We’re going to build this from the ground up, using the concurrency library, as it provides a typeclass abstraction over forking, MVars, STM, and suchlike.
You may have come across my dejafu library before. If not, don’t worry, but you may want to check it out as we’re going to be building something very similar.
Ok, with the preliminaries over, let’s get coding! All the code written in this series is on GitHub, with one tag for each post. The code for this post is under the “post-01” tag.
The goal in this post is to be able to implement a function which can execute simple thread-and-MVar computations (like the example from the beginning) with a stateful scheduler. Firstly, let’s say what we know:
MonadConc
typeclass from concurrency, rather than IO
.MonadConc
computations.That sounds rather like something based on continuations or a free monad. Furthermore, we’re going to need mutable state to implement all of this, as we’re modelling a DSL with mutable references, and doing that purely is a huge pain.
Let’s write down some types. Because we’re writing a mini-dejafu, I’m calling this project “minifu”. So we want a function:
import qualified Control.Concurrent.Classy as C import Data.List.NonEmpty (NonEmpty(..)) newtype ThreadId = ThreadId Int deriving (Eq, Ord) type Scheduler s = NonEmpty ThreadId -> s -> (ThreadId, s) minifu :: C.MonadConc m => Scheduler s -> s -> MiniFu m a -> m (Maybe a, s)
For some suitable MiniFu
monad transformer. Now we’re going to take the standard way of constructing a free monad, and have a data structure representing our class of interest (MonadConc
), with one constructor for every function. Because we’re only talking about threads and MVars in this post, it will be a fairly small type:
{-# LANGUAGE GADTs #-} data PrimOp m where Fork :: MiniFu m () -> (ThreadId -> PrimOp m) -> PrimOp m NewEmptyMVar :: (MVar m a -> PrimOp m) -> PrimOp m PutMVar :: MVar m a -> a -> PrimOp m -> PrimOp m TakeMVar :: MVar m a -> (a -> PrimOp m) -> PrimOp m Stop :: m () -> PrimOp m newtype MVarId = MVarId Int deriving (Eq, Ord) data MVar m a = MVar { mvarId :: MVarId , mvarRef :: C.CRef m (Maybe a) }
The Stop
action is what is going to let us communicate the final result out of the computation. I’ve also defined an MVar
type. Our MVars are going to be implemented as a CRef
(what concurrency calls an IORef
) holding a maybe value, along with an identifier. These identifiers will come into play when we look at threads blocking.
Given this set up, the MiniFu
type is very simple:
{-# LANGUAGE GeneralizedNewtypeDeriving #-} import qualified Control.Monad.Cont as K newtype MiniFu m a = MiniFu { runMiniFu :: K.Cont (PrimOp m) a } deriving (Functor, Applicative, Monad)
We’re not actually going to write a MonadConc
instance for MiniFu
yet, because there are a bunch of constraints which we can’t really satisfy. But we can still define the functions of interest:
fork :: MiniFu m () -> MiniFu m ThreadId fork ma = MiniFu (K.cont (Fork ma)) newEmptyMVar :: MiniFu m (MVar m a) newEmptyMVar = MiniFu (K.cont NewEmptyMVar) putMVar :: MVar m a -> a -> MiniFu m () putMVar v a = MiniFu (K.cont (\k -> PutMVar v a (k ()))) takeMVar :: MVar m a -> MiniFu m a takeMVar v = MiniFu (K.cont (TakeMVar v))
Hey, not bad! Now we can slap a MiniFu m Int
type signature on our example from the start (and rename the forkIO
calls) and it compiles!
example :: MiniFu m Int example = do a <- newEmptyMVar fork (putMVar a 1) fork (putMVar a 2) takeMVar a
Take a moment to make sure you’re happy with this section before moving on to the next. MiniFu is going to be a layered application: this is the basic layer which defines the functions we can test; the next layer executes a MiniFu computation; the layers above that will implement the systematic testing behaviour.
minifu
Recall the type of minifu
:
minifu :: C.MonadConc m => Scheduler s -> s -> MiniFu m a -> m (Maybe a, s)
So, what does it need to do? It needs to set up the execution environment: in this case that’s specifying that the provided computation is the main thread, and then it needs to repeatedly call the scheduler, executing one PrimOp
of the chosen thread at a time, until either the main thread terminates or everything is blocked.
In the best functional programming practice, minifu
is going to do the minimum it can and call another function to do the rest. So what minifu
is actually going to do is to extract the continuation and set up the mechanism to communicate the final result back:
minifu sched s (MiniFu ma) = do out <- C.newCRef Nothing s' <- run sched s (K.runCont ma (Stop . C.writeCRef out . Just)) a <- C.readCRef out pure (a, s')
Before we move on to the implementation of run
, let’s first look at two concerns we’ll have along the way: getting unique names (for threads and MVars) and representing threads.
Each thread gets a unique ThreadId
, and each MVar gets a unique MVarId
. As these are just an Int
, we can use the same source for both:
type IdSource = Int initialIdSource :: IdSource initialIdSource = 0 nextThreadId :: IdSource -> (ThreadId, IdSource) nextThreadId n = (ThreadId n, n + 1) nextMVarId :: IdSource -> (MVarId, IdSource) nextMVarId n = (MVarId n, n + 1)
This is as simple as it gets, but it’s good enough for now.
What is a thread? Well, it has a continuation, which is some value of type PrimOp m
, and it might be blocked. We want to know if a thread is blocked for two reasons: we don’t want the scheduler to schedule a blocked thread, and we want to be able to tell if the computation is deadlocked. Threads can only block on reading from or writing to MVars (currently), so let’s use a Maybe MVarId
to indicate whether the thread is blocked:
data Thread m = Thread { threadK :: PrimOp m , threadBlock :: Maybe MVarId }
When we create a thread, it’s initially unblocked:
thread :: PrimOp m -> Thread m thread k = Thread { threadK = k , threadBlock = Nothing }
And finally we need a way to construct our initial collection of threads:
import Data.Map (Map) import qualified Data.Map as M type Threads m = Map ThreadId (Thread m) initialise :: PrimOp m -> (Threads m, IdSource) initialise k = let (tid, idsrc) = nextThreadId initialIdSource in (M.singleton tid (thread k), idsrc)
And now back to the implementation of minifu
.
run
The run
function is responsible for taking the first continuation, creating the collection of threads, and repeatedly calling the scheduler and stepping the chosen thread, until the computation is done.
It has this type:
run :: C.MonadConc m => Scheduler s -> s -> PrimOp m -> m s
As with minifu
, we shall keep it simple, and delegate most of the work to yet another function:
import Data.List.NonEmpty (nonEmpty) import Data.Maybe (isNothing) run sched s0 = go s0 . initialise where go s (threads, ids) | initialThreadId `M.member` threads = case runnable threads of Just tids -> let (chosen, s') = sched tids s in go s' =<< stepThread chosen (threads, ids) Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . threadBlock) initialThreadId = fst (nextThreadId initialIdSource)
Let’s break down that go
function a bit:
stepThread
function to execute one step of that thread.Not too bad, hey? Finally (really finally) we just have one function to go, stepThread
. Can you see what the type will be?
It’s going to start like this:
stepThread :: C.MonadConc m => ThreadId -> (Threads m, IdSource) -> m (Threads m, IdSource) stepThread tid (threads, idsrc) = case M.lookup tid threads of Just thrd -> go (threadK thrd) Nothing -> pure (threads, idsrc) where adjust :: (Thread m -> Thread m) -> Threads m -> Threads m adjust f = M.adjust f tid goto :: PrimOp m -> Threads m -> Threads m goto k = adjust (\thrd -> thrd { threadK = k }) block :: Maybe MVarId -> Threads m -> Threads m block mv = adjust (\thrd -> thrd { threadBlock = mv }) unblock :: MVarId -> Threads m -> Threads m unblock v = fmap (\thrd -> if threadBlock thrd == Just v then thrd { threadBlock = Nothing } else thrd) go :: PrimOp m -> m (Threads m, IdSource) -- go ...
I’ve introduced a few helper functions, which will crop up a lot. That go
function will have a case for every constructor of PrimOp m
, and it’s going to look a bit hairy, so we’ll take it one constructor at a time. Let’s do the constructors in order.
First, we can fork threads:
go (Fork (MiniFu ma) k) = let (tid', idsrc') = nextThreadId idsrc thrd' = thread (K.runCont ma (\_ -> Stop (pure ()))) in pure (goto (k tid') (M.insert tid' thrd' threads), idsrc')
Forking is pretty straightforward. We simply get the next available ThreadId
from the IdSource
, create a thread with the provided continuation, and insert it into the Threads m
map.
Next up is NewEmptyMVar
:
go (NewEmptyMVar k) = do ref <- C.newCRef Nothing let (mvid, idsrc') = nextMVarId idsrc pure (goto (k (MVar mvid ref)) threads, idsrc')
Remember that we’re implementing our MVar
type using the CRef
type of the underlying MonadConc
. As the MVar
starts out empty, the CRef
starts out holding Nothing
.
The PutMVar
and TakeMVar
actions are almost the same, so let’s tackle them together:
go (PutMVar (MVar mvid ref) a k) = do old <- C.readCRef ref case old of Just _ -> pure (block (Just mvid) threads, idsrc) Nothing -> do C.writeCRef ref (Just a) pure (goto k (unblock mvid threads), idsrc) go (TakeMVar (MVar mvid ref) k) = do old <- C.readCRef ref case old of Just a -> do C.writeCRef ref Nothing pure (goto (k a) (unblock mvid threads), idsrc) Nothing -> pure (block (Just mvid) threads, idsrc)
In both cases, we start out by reading the value of the reference. Remember that Nothing
indicates emptiness, and Just
indicates the presence of a value. So, for PutMVar
if there already is a value (and for TakeMVar
if there isn’t a value), we block the thread. In the other case, we update the value in the reference, putting in the new value (or taking out the old), unblock all the relevant threads, and go to the continuation.
These implementations are not atomic. But that’s fine: despite MiniFu testing concurrent programs, there’s no concurrency going on within MiniFu itself. We can do as much or as little as we want during one atomic “step” of our program. This will turn out to be very useful when we implement STM in a few posts time.
Finally, we have Stop
:
go (Stop mx) = do mx pure (M.delete tid threads, idsrc)
And we’re done! That’s it! All we need now is a scheduler, and we can execute our example!
Our example is nondeterministic, so we want a scheduler which will let us see that. It would be no good us implementing something which always made the same decisions, as we’d only see one result! So until we implement the systematic testing behaviour, let’s just use a simple random scheduler.
import qualified System.Random as R randomSched :: R.RandomGen g => Scheduler g randomSched (t:|ts) g = let (i, g') = R.randomR (0, length ts) g in ((t:ts) !! i, g')
There’s no deep magic here, we’re just picking a random value from a nonempty list. Finally, we can construct a little demo:
demo :: IO () demo = do g <- R.newStdGen print . fst =<< minifu randomSched g example
Which we can run in ghci like so:
λ> demo Just 1 λ> demo Just 1 λ> demo Just 1 λ> demo Just 2 λ> demo Just 1
Success!
A random scheduler is fine for demonstration purposes, but not so great for testing. Different seeds may lead to the same execution, which makes it hard to know how many executions of a test is enough. It can be a useful technique, but for us this is only the beginning.
Next time we’ll look at implementing exceptions, both synchronous and asynchronous.
I hope you enjoyed this post, any feedback is welcome. As I mentioned at the start, this is on GitHub, you can get the code we ended up with at the “post-01” tag.
Before next time, I have some homework for you! You have seen how to implement MVars, so now try implementing CRefs! Here are the functions should you have a go at writing:
data CRef m a = -- ... newCRef :: a -> MiniFu m (CRef m a) readCRef :: CRef m a -> MiniFu m a writeCRef :: CRef m a -> a -> MiniFu m () atomicModifyCRef :: CRef m a -> (a -> (a, b)) -> MiniFu m b
Don’t worry about any of the relaxed memory stuff implemented in dejafu, just do sequential consistency (and if you don’t know what that means: it means to do the obvious). I’ll put up a solution (and maybe do a little refactoring) before the next post.
Thanks to José Manuel Calderón Trilla for reading an earlier draft of this post.
]]>We have a typeclass abstracting over concurrency.
IO
.ConcT
.Computations of type MonadRef r n => ConcT r n a
can be executed with a given scheduler, to produce a result and an execution trace.
Unlike IO
, the threads in a ConcT r n
computation are executed in a single-step fashion based on the decisions of the scheduler.
To implement this single-step execution, all threads are executed in a single “real” thread.
It’s the third point that gives dejafu the ability to systematically explore different executions. If execution were not single-step, then it wouldn’t be possible in general to context switch between arbitrary concurrency actions.
The fourth point greatly simplifies the implementation, but also causes problems: GHC Haskell has a notion of “bound threads”, which are Haskell threads bound to a particular OS thread. Bound threads are absolutely essential to use FFI calls which rely on thread-local state. Deja Fu cannot support bound threads if it executes everything in a single thread!
How can we address this?
PULSE is a concurrency-testing tool for Erlang. It works by code instrumentation: around every communication operation is inserted a call to the PULSE scheduler process. The scheduler process tells processes when they can run. Execution is not serialised into a single thread, the distinct Erlang processes still exist, but only one of them may run at a time.
We can do the same thing in Haskell.
Let’s look at a much simplified version of dejafu to try this idea out.
{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE TypeFamilies #-} import qualified Control.Concurrent as C import qualified Control.Monad.Cont as K import Data.List.NonEmpty (NonEmpty(..), nonEmpty) import qualified Data.Map as M import Data.Maybe (isNothing) import qualified System.Random as R class Monad m => MonadConc m where type ThreadId m :: * type MVar m :: * -> * fork :: m () -> m (ThreadId m) forkOS :: m () -> m (ThreadId m) newEmptyMVar :: m (MVar m a) putMVar :: MVar m a -> a -> m () takeMVar :: MVar m a -> m a newMVar :: MonadConc m => a -> m (MVar m a) newMVar a = do v <- newEmptyMVar putMVar v a pure v
There’s a straightforward implementation for IO
:
instance MonadConc IO where type ThreadId IO = C.ThreadId type MVar IO = C.MVar fork = C.forkIO forkOS = C.forkOS newEmptyMVar = C.newEmptyMVar putMVar = C.putMVar takeMVar = C.takeMVar
The testing implementation is a little hairier. Because we want to be able to single-step it, we’ll use continuations:
newtype ConcT m a = ConcT { runConcT :: K.Cont (Action m) a } deriving (Functor, Applicative, Monad) newtype CTThreadId = CTThreadId Int deriving (Eq, Ord) data CTMVar m a = CTMVar { mvarID :: Int, mvarRef :: MVar m (Maybe a) } data Action m = Fork (ConcT m ()) (CTThreadId -> Action m) | ForkOS (ConcT m ()) (CTThreadId -> Action m) | forall a. NewEmptyMVar (CTMVar m a -> Action m) | forall a. PutMVar (CTMVar m a) a (Action m) | forall a. TakeMVar (CTMVar m a) (a -> Action m) | Stop (m ()) instance MonadConc (ConcT m) where type ThreadId (ConcT m) = CTThreadId type MVar (ConcT m) = CTMVar m fork ma = ConcT (K.cont (Fork ma)) forkOS ma = ConcT (K.cont (ForkOS ma)) newEmptyMVar = ConcT (K.cont NewEmptyMVar) putMVar mvar a = ConcT (K.cont (\k -> PutMVar mvar a (k ()))) takeMVar mvar = ConcT (K.cont (TakeMVar mvar))
Let’s talk about the Action
type a bit before moving on. The general structure is Name [<args> ...] (<result> -> Action m)
, where m
is some MonadConc
. For MVar
s, we’re just re-using the MVar
type of the underlying monad (dejafu proper re-uses the IORef
s of the underlying monad). For ThreadId
s we’re using Int
s. And we’re going to get the final result out of the computation with the Stop
action.
Let’s keep things simple and not support most of the fancy scheduling stuff dejafu does. Our scheduler is just going to be a stateful function from runnable threads to a single thread:
type Scheduler s = NonEmpty CTThreadId -> s -> (CTThreadId, s)
So now our execution function is going to look like this:
minifu :: MonadConc m => Scheduler s -> s -> ConcT m a -> m (Maybe a, s) minifu sched s (ConcT ma) = do out <- newMVar Nothing s' <- run sched s (K.runCont ma (\a -> Stop (takeMVar out >> putMVar out (Just a)))) a <- takeMVar out pure (a, s')
The real meat is the run
function:
run :: MonadConc m => Scheduler s -> s -> Action m -> m s run sched s0 a0 = go s0 initialIdSource =<< initialThreads a0 where go s ids threads | initialThreadId `M.member` threads = case runnable threads of Just tids -> let (chosen, s') = sched tids s in uncurry (go s') =<< loopStepThread ids chosen threads Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . blockedOn)
Like in dejafu proper, execution is going to if the main thread terminates, even if there are other threads. Threads are going to live in a map keyed by ThreadId
.
type Threads m = M.Map CTThreadId (Thread m) initialThreads :: MonadConc m => Action m -> m (Threads m) initialThreads a0 = do t <- forkThread False a0 pure (M.singleton initialThreadId t) initialThreadId :: CTThreadId initialThreadId = CTThreadId 0
Each thread in our program-under-test is going to be executed in an actual thread. So, like PULSE, we’ll introduce communication (in the form of MVar
s) around concurrency actions to ensure that we get single-step execution. So a thread is going to have three components: the MVar
(if any) it’s currently blocked on, an MVar
to signal that it should execute one step, and an MVar
to communicate what the thread did.
data Thread m = Thread { blockedOn :: Maybe Int , signalStep :: MVar m IdSource , awaitResult :: MVar m (IdSource, ThreadResult m) } data ThreadResult m = BusinessAsUsual | Killed | Updated Int | Blocked Int | Forked (Thread m)
The IdSource
is used to generate new unique thread and MVar
IDs:
type IdSource = (Int, Int) initialIdSource :: IdSource initialIdSource = (1, 0) nextThreadId :: IdSource -> (CTThreadId, IdSource) nextThreadId (t, m) = (CTThreadId t, (t + 1, m)) nextMVarId :: IdSource -> (Int, IdSource) nextMVarId (t, m) = (m, (t, m + 1))
Forking a thread is going to set up these MVar
s and the small bit of logic to ensure things happen as we like:
forkThread :: MonadConc m => Bool -> Action m -> m (Thread m) forkThread isOS act = do signal <- newEmptyMVar await <- newEmptyMVar _ <- (if isOS then forkOS else fork) (runThread signal await act) pure (Thread Nothing signal await) runThread :: MonadConc m => MVar m IdSource -> MVar m (IdSource, ThreadResult m) -> Action m -> m () runThread signal await = go where go act = do ids <- takeMVar signal (act', ids', res) <- runStepThread ids act putMVar await (ids', res) maybe (pure ()) go act'
The final pieces of the puzzle are the two *StepThread
functions, which executes one action of our chosen thread. These are a little tricker than in normal dejafu.
Firstly, loopStepThread
, which tells the thread that was chosen by the scheduler to step:
loopStepThread :: MonadConc m => IdSource -> CTThreadId -> Threads m -> m (IdSource, Threads m) loopStepThread ids tid threads = case M.lookup tid threads of Just thread -> do putMVar (signalStep thread) ids (ids', res) <- takeMVar (awaitResult thread) let resf = case res of BusinessAsUsual -> id Killed -> M.delete tid Updated i -> fmap (\t -> if blockedOn t == Just i then t { blockedOn = Nothing } else t) Blocked i -> M.insert tid (thread { blockedOn = Just i }) Forked thread' -> M.insert (fst (nextThreadId ids)) thread' pure (ids', resf threads) Nothing -> pure (ids, threads)
Finally runStepThread
, which executes an action:
runStepThread :: MonadConc m => IdSource -> Action m -> m (Maybe (Action m), IdSource, ThreadResult m) runStepThread ids (Fork (ConcT ma) k) = do t <- primFork False ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (ForkOS (ConcT ma) k) = do t <- primFork True ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (NewEmptyMVar k) = do v <- newEmptyMVar putMVar v Nothing let (mvid, ids') = nextMVarId ids let mvar = CTMVar mvid v pure (Just (k mvar), ids', BusinessAsUsual) runStepThread ids k0@(PutMVar (CTMVar mvid v) a k) = do old <- takeMVar v case old of Just _ -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Nothing -> putMVar v (Just a) >> pure (Just k, ids, Updated mvid) runStepThread ids k0@(TakeMVar (CTMVar mvid v) k) = do old <- takeMVar v case old of Nothing -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Just a -> putMVar v Nothing >> pure (Just (k a), ids, Updated mvid) runStepThread ids (Stop ma) = do ma pure (Nothing, ids, Killed) primFork :: MonadConc m => Bool -> K.Cont (Action m) () -> m (Thread m) primFork isOS ma = forkThread isOS (K.runCont ma (\_ -> Stop (pure ())))
This looks pretty horrible, but each case is fairly small, so just look at those.
Now we can run it (with a random scheduler for fun) and see that it works:
test :: MonadConc m => m Int test = do a <- newEmptyMVar b <- newMVar 2 c <- newMVar 3 forkOS (putMVar a b) forkOS (putMVar a c) forkOS (takeMVar b >> putMVar b 14) forkOS (takeMVar c >> putMVar c 15) takeMVar =<< takeMVar a randomSched :: Scheduler R.StdGen randomSched (t:|ts) g = let (i, g') = R.randomR (0, length ts) g in ((t:ts) !! i, g') main :: IO () main = do g <- R.newStdGen print . fst =<< minifu randomSched g test
Giving:
λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 14 λ> main Just 15 λ> main Just 15
That wasn’t so bad!
Mini Fu is much smaller than Deja Fu, but it demonstrates the key concepts. To get a multithreaded runtime into dejafu, I think the main change to this stuff is to figure out how thread communication is going to work: in dejafu proper, actions can change the continuation of an arbitrary thread (eg, throwing an exception to a thread will call its exception handler).
The overhead of this method compared to the single-threaded approach must be measured. It would be great to support bound threads, but not at the cost of everything else becoming much worse! If the overhead is bad, perhaps a hybrid approach could be used: unbound threads in the program-under-test are executed as they are currently, whereas bound threads get the fancy multithreaded implementation. It would complicate things, but possibly eliminate the overhead in the common case.
Finally, when the main thread terminates, any still-running ones should terminate as well, so the Thread
record will need to contain the ThreadId m
of the underlying monad, so killThread
can be used.
dejafu leans more towards correctness than performance, by default. Your test cases will be executed using the Test.DejaFu.SCT.sctBounded
function, which is complete but can be slow; every result you get will have an associated trace, which can be useful for debugging, but takes up memory.
dejafu-0.7.1.0 gives you an extra knob to tweak, and 0.7.1.1 makes it even better.
Full-size images: before, after.
Test cases with long traces have been a particularly bad case, as all the traces stuck around in memory until you did something with them at the end (like print the bad ones). This is such a case:
contendedMVar :: MonadConc m => m () contendedMVar = do threadId <- myThreadId mvar <- newEmptyMVar let maxval = 150 let go = takeMVar mvar >>= \x -> if x == maxval then killThread threadId else go for_ [1..20] . const $ fork go fork $ for_ [1..maxval] (putMVar mvar) takeMVar =<< newEmptyMVar
I ran that 100 times with random scheduling, and the traces varied from about 2500 to 3000 elements long. That’s a lot of stuff to keep around in memory!
Sometimes you don’t want all the results or traces of your test case, you only want some of them. Now you can tell dejafu to throw things away as it’s running, allowing garbage collection to kick in sooner, and reduce the resident memory usage.
There’s a new type and some new functions:
module Test.DejaFu.SCT where -- ... -- | An @Either Failure a -> Maybe Discard@ value can be used to -- selectively discard results. -- -- @since 0.7.1.0 data Discard = DiscardTrace -- ^ Discard the trace but keep the result. The result will appear -- to have an empty trace. | DiscardResultAndTrace -- ^ Discard the result and the trace. It will simply not be -- reported as a possible behaviour of the program. deriving (Eq, Show, Read, Ord, Enum, Bounded) -- | A variant of 'runSCT' which can selectively discard results. -- -- @since 0.7.1.0 runSCTDiscard :: MonadRef r n => (Either Failure a -> Maybe Discard) -- ^ Selectively discard results. -> Way -- ^ How to run the concurrent program. -> MemType -- ^ The memory model to use for non-synchronised @CRef@ operations. -> ConcT r n a -- ^ The computation to run many times. -> n [(Either Failure a, Trace)] -- and: runSCTDiscard', resultsSetDiscard, resultsSetDiscard', sctBoundDiscard, -- sctUniformRandomDiscard, sctWeightedRandomDiscard -- and: dejafuDiscard, dejafuDiscardIO (Test.DejaFu) -- and: testDejafuDiscard, testDejafuDiscardIO (Test.{HUnit,Tasty}.DejaFu)
Every iteration of the SCT loop, an Either Failure a
value is produced. The *Discard
function variants will throw it (or its trace) away if you so tell it.
For example, you can now check that a test case doesn’t deadlock in a far more memory-efficient way like so:
dejafuDiscard -- "efa" == "either failure a", discard everything but deadlocks (\efa -> if efa == Left Deadlock then Nothing else Just DiscardResultAndTrace) -- try 1000 executions with random scheduling (randomly (mkStdGen 42) 1000) -- use the default memory model defaultMemType -- your test case testCase -- the predicate to check (which is a bit redundant in this case) ("Never Deadlocks", deadlocksNever)
Full-size images: before, after.
Unfortunately, 0.7.1.0 was only a win for random testing, as systematic testing explicitly constructed the tree of executions in memory. This has been a long-standing issue with dejafu, but I’d never gotten around to solving it before, because it wasn’t really any worse than what was happening elsewhere in the codebase. But now it was the worst!
The solution, in principle, was simple: you can avoid constructing the complete tree by instead exploring schedules in a depth-first fashion, which means you only need a stack and some bookkeeping information.
The implementation was fairly simple too! I like simple things.
So now we can check every possible execution of our test case for deadlocks, still in a memory-efficient fashion:
dejafuDiscard (\efa -> if efa == Left Deadlock then Nothing else Just DiscardResultAndTrace) -- the default way is systematic testing defaultWay defaultMemType testCase ("Never Deadlocks", deadlocksNever)
It’s not as memory-efficient as random scheduling, as it needs to keep around some information about prior executions, but the amount it is keeping around is greatly reduced from before.
What’s next? I don’t really know. There are still a lot of memory inefficiencies in dejafu, but they all pale in comparison to these two, so they can probably sit for a while longer. I’d like to build a suite of benchmarks, because I don’t really have any other than the test suite (which makes a poor benchmark). If you have any test cases which dejafu just can’t handle, let me know!
I think it’s fair to say that the frontiers of what dejafu is capable of have been pushed back a long way by these changes.
]]>We can check if two expressions have equivalent behaviours, or if one has fewer behaviours than the other. Such properties can serve both as documentation and as regression tests.
Let’s dive straight into an example:
Is
readMVar
equivalent to atakeMVar
followed by aputMVar
?
We might phrase this property like so:
prop_read_equiv_take_put = sig readMVar `equivalentTo` sig (\v -> takeMVar v >>= putMVar v)
The property-testing uses signatures, where a signature tells dejafu how to (1) create a new state; (2) make some observation of the state; (3) concurrently interfere with the state in some way; and (4) the expression to evaluate.
Properties are monomorphic, so we can’t directly express a property about any MVar
, we need to pick a concrete type for its contents. Let’s just pick Int
:
type State = MVar ConcIO Int
Properties operate in the ConcIO
monad. There is no option to use ConcST
yet, as I couldn’t get a nice interface working which didn’t break type inference in GHCi.
The state is constructed using a pure seed value the property-checker generates. We want to consider both full and empty MVar
s, so we’ll ask it to supply a Maybe Int
:
type Seed = Maybe Int
The initialisation function we will include in the signature then just calls newMVar
or newEmptyMVar
as appropriate:
makeMVar :: Seed -> ConcIO State makeMVar (Just x) = newMVar x makeMVar Nothing = newEmptyMVar
Seed values are generated using LeanCheck, an enumerative property-based testing library.
We want to know if the MVar
contains a value when we observe it, and we want to know what that value is; another Maybe
:
type Observation = Maybe Int
It is important that the observation function does not block, so we use tryReadMVar
here rather than readMVar
or takeMVar
:
observeMVar :: State -> Seed -> ConcIO Observation observeMVar v _ = tryReadMVar v
It does not matter if making the observation has side-effects, so tryTakeMVar
would have been equally valid.
Our interference function will just mess with the value in the MVar
:
interfereMVar :: State -> Seed -> ConcIO () interfereMVar mvar mx = do tryTakeMVar mvar void . tryPutMVar mvar $ case mx of Just x -> (x+1) * 3000 Nothing -> 7000
As LeanCheck is enumerative, large values like 3000 and 7000 will stand out if the tool reports a failure.
Now we package these operations up into a signature:
sig :: (State -> ConcIO a) -> Sig State Observation Seed sig e = Sig { initialise = makeMVar , observe = observeMVar , interfere = interfereMVar , expression = void . e }
We could, of course, have defined all this inside sig
without the top-level functions and type synonyms.
Now we can test the property:
> check $ sig readMVar `equivalentTo` sig (\v -> takeMVar v >>= putMVar v) *** Failure: (seed Just 0) left: [(Nothing,Just 3000)] right: [(Nothing,Just 0),(Nothing,Just 3000),(Just Deadlock,Just 3000)]
We get a failure! This is because the left term is atomic, whereas the right is not: another thread writing to the MVar
has the opportunity to swoop in and insert a new value after the takeMVar
but before the putMVar
. The right has strictly more behaviours than the left.
We can capture this, by using a different comparison:
> check $ sig readMVar `strictlyRefines` sig (\v -> takeMVar v >>= putMVar v) +++ OK
To “strictly refine” something is to have a proper subset of the behaviour. There is also a “refines” comparison, which does not require the subset to be proper.
Doesn’t readMVar v
return a different thing to takeMVar v >>= putMVar v
?
Yes!
If they return at all, the former returns the value in the MVar
, whereas the latter returns unit. Properties do not care about the return value of an expression, only the effects.
You can see this by looking at the definition of sig
again: it throws away the result of the expression using void
.
Both of our properties are of the form sig f `cmp` sig g
, can’t that redundancy be removed?
No!
You can use different signatures with different state types! As long as the seed and observation types are the same, check
can compare them.
You can use this to compare different implementations of a similar concurrent data structure.
Properties can also have arguments, using LeanCheck to generate their values. This doesn’t work in any mysterious way, here’s a property about the QSemN
functions:
> check $ \x y -> sig' (\v -> signalQSemN v (x + y)) `equivalentTo` sig' (\v -> signalQSemN v x >> signalQSemN v y) *** Failure: -1 1 (seed 0) left: [(Nothing,0)] right: [(Nothing,0),(Just Deadlock,0)]
You can even use your own types, as long as they have a Listable
(the typeclass LeanCheck uses) instance:
> :{ newtype Nat n = Nat n deriving Show instance (Num n, Ord n, Listable n) => Listable (Nat n) where list = [Nat n | n <- list, n >= 0] :} > check $ \(Nat x) (Nat y) -> sig' (\v -> signalQSemN v (x + y)) `equivalentTo` sig' (\v -> signalQSemN v x >> signalQSemN v y) +++ OK!
Currently it’s a bit slow as I need to fiddle with the implementation and work out what a good number of tests to run is. check
uses 10 seed values with 100 variable assignments each (1000 tests total), you can use checkFor
to reduce that.
So there you have it, property-testing for side-effects of stateful operations.
This has come out of my work on CoCo, a tool for automatically discovering these properties (paper here). In the CoCo repository are a few more examples. CoCo is still a work-in-progress, but one of the goals is to be able to generate dejafu-compatible output, so that CoCo can discover properties which dejafu can immediately begin using for regression testing.
]]>