Déjà Fu is a concurrency testing library, and one thing you definitely don’t want to do when testing concurrent programs is to try every possible interleaving of threads.
Trying every possible interleaving will give you, in general, an exponential blow-up of the executions you need to perform as your test case grows in size. The core testing algorithm we use, a variant of dynamic partial-order reduction (DPOR)1, attempts to reduce this blow-up. DPOR identifies actions which are dependent, and only tries interleavings which permute dependent actions.
Here are some examples:
It doesn’t matter which order two threads execute
readMVar
, for the sameMVar
. These actions are independent.It does matter which order two threads execute
putMVar
, for the sameMVar
. These actions are dependent.It doesn’t matter which order two threads execute
putMVar
for differentMVar
s. These actions are independent.
Two actions are dependent if the order in which they are performed matters.
So the intuition behind DPOR is that most actions in a concurrent program are independent. DPOR won’t help you much if you have a single piece of shared state which every thread is hitting, but most concurrent programs aren’t like that. The worst case is still a terrible exponential blow-up, but the average case is much better.
The dependency relation is core part of Déjà Fu today. It has impacts on both performance and correctness. If it says two actions are dependent when they are not, then we may see unnecessary interleavings tried. If it says two actions are not dependent when they really are, then we may miss necessary interleavings.
Being such an important component, it must be well-tested, right? Well, sort of. The Déjà Fu testsuite mostly consists of small concurrent programs together with a list of expected outputs, testing that Déjà Fu finds all the nondeterminism in the program. This does exercise the dependency relation, but only very indirectly.
The Idea
There things would have remained had I not experienced one of those coincidence-driven flashes of insight:
aherrmann opened an issue on GitHub asking how to take an execution trace and replay it.
agnishom posted a thread on /r/algorithms asking how to check the equivalence of traces where only some elements commute.
I had my idea. I can directly test the dependency relation like so:
- Execute a concurrent program.
- Normalise its execution trace in some way.
- “Replay” the normalised trace.
- Assert that the result is the same.
Normalising Traces
So, what is a good normal form for a trace? I tried out a few approaches here, but there was one I kept coming back to: we should shuffle around independent actions to keep the program on the main thread for as long as possible.
There are two reasons I think this works well. (1) The traces we get will be easier for a human to read, as the program will stay on its main thread and only execute another thread where necessary. (2) A Haskell program terminates when the main thread terminates, so by executing the main thread as much as possible, we may find that some actions don’t need to be executed at all.
So firstly we need to know when two actions commute. Let’s just use the dependency relation for that:
-- | Check if two actions commute.
independent
:: DepState
-> (ThreadId, ThreadAction)
-> (ThreadId, ThreadAction)
-> Bool
independent ds (tid1, ta1) (tid2, ta2) = not (dependent ds tid1 ta1 tid2 ta2)
The DepState
parameter tracks information about the history of the execution, allowing us to make better decisions. For example: while in general it matters in which order two putMVar
s to the same MVar
happen; it doesn’t matter if the MVar
is already full, as both actions will block without achieving anything.
The approach works well in practice, but has been the source of so many off-by-one errors. Even while writing this memo!
So now onto trace normalisation. The easiest way to do it is bubble sort, but with an additional constraint on when we can swap things:
- For every adjacent pair of items
x
andy
in the trace:- If
x
andy
commute andthread_id y < thread_id x
:- Swap
x
andy
.
- Swap
- Update the
DepState
and continue to the next pair.
- If
- Repeat until there are no more changes.
And here’s the code:
-- | Rewrite a trace into a canonical form.
normalise
:: [(ThreadId, ThreadAction)]
-> [(ThreadId, ThreadAction)]
normalise trc0 = if changed then normalise trc' else trc'
where
(changed, trc') = bubble initialDepState False trc0
bubble ds flag ((x@(tid1, _)):(y@(tid2, _)):trc)
| independent ds x y && tid2 < tid1 = go ds True y (x : trc)
| otherwise = go ds flag x (y : trc)
bubble _ flag trc = (flag, trc)
go ds flag t@(tid, ta) trc =
second (t :) (bubble (updateDepState ds tid ta) flag trc)
Testing Normalised Traces
Now we need a scheduler which can play a given list of scheduling decisions. This isn’t built in, but we can make one. Schedulers look like this:
-- from Test.DejaFu.Schedule
newtype Scheduler state = Scheduler
{ scheduleThread
:: Maybe (ThreadId, ThreadAction)
-> NonEmpty (ThreadId, Lookahead)
-> state
-> (Maybe ThreadId, state)
}
A scheduler is a stateful function, which takes the previously scheduled action and the list of runnable threads, and gives back a thread to execute. We don’t care about those parameters. We just want to play a fixed list of scheduling decisions. And here is how we do that:
-- | Execute a concurrent program by playing a list of scheduling decisions.
play
:: MemType
-> [ThreadId]
-> ConcIO a
-> IO (Either Failure a, [ThreadId], Trace)
play = runConcurrent (Scheduler sched)
where
sched _ _ (t:ts) = (Just t, ts)
sched _ _ [] = (Nothing, [])
Now all the background is in place, so we can test what we want to test: that an execution, and the play-back of its normalised trace, give the same result. For reasons which will become apparent in the next section, I’m going to parameterise over the normalisation function:
-- | Execute a concurrent program with a random scheduler, normalise its trace,
-- execute the normalised trace, and return both results.
runNorm
:: ([(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)])
-> Int
-> MemType
-> ConcIO a
-> IO (Either Failure a, [ThreadId], Either Failure a, [ThreadId])
runNorm norm seed memtype conc = do
let g = mkStdGen seed -- 1
(efa1, _, trc) <- runConcurrent randomSched memtype g conc
let -- 2
trc' = tail
( scanl
(\(t, _) (d, _, a) -> (tidOf t d, a))
(initialThread, undefined)
trc
)
let tids1 = map fst trc'
let tids2 = map fst (norm trc') -- 3
(efa2, s, _) <- play memtype tids2 conc
let truncated = take (length tids2 - length s) tids2 -- 4
pure (efa1, tids1, efa2, truncated)
There’s a lot going on here, so let’s break it down:
We execute the program with the built-in random scheduler, using the provided seed.
The trace that
runConcurrent
gives us is in the form[(Decision, [(ThreadId, Lookahead)], ThreadAction)]
, whereas we want a[(ThreadId, ThreadAction)]
. So this scan just changes the format. It’s a scan rather than a map because to convert aDecision
into aThreadId
potentially requires knowing what the previous thread was.We normalise the trace, and run it again.
If the entire normalised trace wasn’t used up, then it has some unnecessary suffix (because the main thread is now terminating sooner). So we make the normalised trace easier to read by chopping off any such suffix.
Finally, we can write a little function to test using the normalise
function:
-- | Execute a concurrent program with a random scheduler, normalise its trace,
-- execute the normalised trace, and check that both give the same result.
testNormalise
:: (Eq a, Show a)
=> Int
-> MemType
-> ConcIO a
-> IO Bool
testNormalise seed memtype conc = do
(efa1, tids1, efa2, tids2) <- runNorm normalise seed memtype conc
unless (efa1 == efa2) $ do
putStrLn "Mismatched result!"
putStrLn $ " expected: " ++ show efa1
putStrLn $ " but got: " ++ show efa2
putStrLn ""
putStrLn $ "rewritten from: " ++ show tids1
putStrLn $ " to: " ++ show tids2
pure (efa1 == efa2)
And does it work? Let’s copy two example programs from the Test.DejaFu docs:
-- from Test.DejaFu
example1
:: MonadConc m
=> m String
example1 = do
var <- newEmptyMVar
fork (putMVar var "hello")
fork (putMVar var "world")
readMVar var
example2
:: MonadConc m
=> m (Bool, Bool)
example2 = do
r1 <- newCRef False
r2 <- newCRef False
x <- spawn $ writeCRef r1 True >> readCRef r2
y <- spawn $ writeCRef r2 True >> readCRef r1
(,) <$> readMVar x <*> readMVar y
And then test them:
> testNormalise 0 TotalStoreOrder example1
True
> testNormalise 0 TotalStoreOrder example2
True
According to my very unscientific method, everything works perfectly!
Enter Hedgehog
You can probably see where this is going: just supplying one random seed and one memory model is a poor way to test things. Ah, if only we had some sort of tool to generate arbitrary values for us!
But that’s not all: if the dependency relation is correct, then any permutation of independent actions should give the same result, not just the one which normalise
implements. So before we introduce Hedgehog and arbitrary values, let’s make something a little more chaotic:
-- | Shuffle independent actions in a trace according to the given list.
shuffle
:: [Bool]
-> [(ThreadId, ThreadAction)]
-> [(ThreadId, ThreadAction)]
shuffle = go initialDepState
where
go ds (f:fs) (t1:t2:trc)
| independent ds t1 t2 && f = go' ds fs t2 (t1 : trc)
| otherwise = go' ds fs t1 (t2 : trc)
go _ _ trc = trc
go' ds fs t@(tid, ta) trc =
t : go (updateDepState ds tid ta) fs trc
In normalise
, two independent actions will always be re-ordered if it gets us closer to the canonical form. However, in shuffle
, two independent actions will either be re-ordered or not, depending on the supplied list of Bool
.
This is much better for testing our dependency relation, as we can now get far more re-orderings which all should satisfy the same property: that no matter how the independent actions in a trace are shuffled, we get the same result.
I think it’s about time to bring out Hedgehog:
-- | Execute a concurrent program with a random scheduler, arbitrarily permute
-- the independent actions in the trace, and check that we get the same result
-- out.
hog :: (Eq a, Show a) => ConcIO a -> IO Bool
hog conc = Hedgehog.check . property $ do
mem <- forAll Gen.enumBounded -- 1
seed <- forAll $ Gen.int (Range.linear 0 100)
fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool
(efa1, tids1, efa2, tids2) <- liftIO -- 2
$ runNorm (shuffle fs) seed mem conc
footnote (" to: " ++ show tids2) -- 3
footnote ("rewritten from: " ++ show tids1)
efa1 === efa2
Let’s break that down:
We’re telling Hedgehog that this property should hold for all memory models, all seeds, and all
Bool
-lists. Unlike most Haskell property-testing libraries, Hedgehog takes generator functions rather than using a typeclass. I think this is nicer.We run our program, normalise it, and get all the results just as before.
We add some footnotes: messages which Hedgehog will display along with a failure. For some reason these get displayed in reverse order.
Alright, let’s see if Hedgehog finds any bugs for us:
> hog example1
? <interactive> failed after 3 tests and 1 shrink.
??? extra.hs ???
82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool
83 ? hog conc = Hedgehog.check . property $ do
84 ? mem <- forAll Gen.enumBounded
? ? SequentialConsistency
85 ? seed <- forAll $ Gen.int (Range.linear 0 100)
? ? 0
86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool
? ? [ False , True ]
87 ?
88 ? (efa1, tids1, efa2, tids2) <- liftIO
89 ? $ runNorm (shuffle fs) seed mem conc
90 ? footnote (" to: " ++ show tids2)
91 ? footnote ("rewritten from: " ++ show tids1)
92 ? efa1 === efa2
? ^^^^^^^^^^^^^
? ? Failed (- lhs =/= + rhs)
? ? - Right "hello"
? ? + Left InternalError
rewritten from: [main,main,1,main,1,2,main,2,main]
to: [main,1]
This failure can be reproduced by running:
> recheck (Size 2) (Seed 1824012233418733250 (-4876494268681827407)) <property>
False
It did! And look at that output! Magical! I must see if I can get Déjà Fu to give annotated source output like that.
Let’s look at example1
again:
do
var <- newEmptyMVar
fork (putMVar var "hello")
fork (putMVar var "world")
readMVar var
Oh dear, our rewritten trace is trying to execute thread 1
immediately after the first action of the main thread. The first action of the main thread is newEmptyMVar
: thread 1
doesn’t exist at that point!
Let’s change our independent
function to say that an action is dependent with the fork which creates its thread:
independent ds (tid1, ta1) (tid2, ta2)
| ta1 == Fork tid2 = False
| ta2 == Fork tid1 = False
| otherwise = not (dependent ds tid1 ta1 tid2 ta2)
How about now?
> hog example1
? <interactive> failed after 13 tests and 2 shrinks.
??? extra.hs ???
82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool
83 ? hog conc = Hedgehog.check . property $ do
84 ? mem <- forAll Gen.enumBounded
? ? SequentialConsistency
85 ? seed <- forAll $ Gen.int (Range.linear 0 100)
? ? 0
86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool
? ? [ True , True ]
87 ?
88 ? (efa1, tids1, efa2, tids2) <- liftIO
89 ? $ runNorm (shuffle fs) seed mem conc
90 ? footnote (" to: " ++ show tids2)
91 ? footnote ("rewritten from: " ++ show tids1)
92 ? efa1 === efa2
? ^^^^^^^^^^^^^
? ? Failed (- lhs =/= + rhs)
? ? - Right "hello"
? ? + Left InternalError
rewritten from: [main,main,1,main,1,2,main,2,main]
to: [main,1]
This failure can be reproduced by running:
> recheck (Size 12) (Seed 654387260079025817 (-6686572164463137223)) <property>
False
Well, that failing trace looks exactly like the previous error. But the parameters are different: the first error happened with the list [False, True]
, this requires the list [True, True]
. So let’s think about what happens to the trace in this case.
We start with:
[(main, NewEmptyMVar 0), (main, Fork 1), (1, PutMVar 0)]
.The first two actions are independent, and the flag is
True
, so we swap them. We now have:[(main, Fork 1), (main, NewEmptyMVar 1), (1, PutMVar 0)]
.The second two actions are independent, and the flag is
True
, so we swap them. We now have:[(main, Fork 1), (1, PutMVar 0), (main, NewEmptyMVar 0)]
.
We can’t actually re-order actions of the same thread, so we should never have swapped the first two. I suppose there’s another problem here, that no action on an MVar
commutes with creating that MVar
, but we should never be in a situation where that could happen. So we need another case in independent
:
independent ds (tid1, ta1) (tid2, ta2)
| tid1 == tid2 = False
| ta1 == Fork tid2 = False
| ta2 == Fork tid1 = False
| otherwise = not (dependent ds tid1 ta1 tid2 ta2)
Our first example program works fine now:
> hog example1
? <interactive> passed 100 tests.
True
The second is a little less happy:
> hog example2
? <interactive> failed after 48 tests and 9 shrinks.
??? extra.hs ???
82 ? hog :: (Eq a, Show a) => ConcIO a -> IO Bool
83 ? hog conc = Hedgehog.check . property $ do
84 ? mem <- forAll Gen.enumBounded
? ? TotalStoreOrder
85 ? seed <- forAll $ Gen.int (Range.linear 0 100)
? ? 0
86 ? fs <- forAll $ Gen.list (Range.linear 0 100) Gen.bool
? ? [ False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , False
? ? , True
? ? ]
87 ?
88 ? (efa1, tids1, efa2, tids2) <- liftIO
89 ? $ runNorm (shuffle fs) seed mem conc
90 ? footnote (" to: " ++ show tids2)
91 ? footnote ("rewritten from: " ++ show tids1)
92 ? efa1 === efa2
? ^^^^^^^^^^^^^
? ? Failed (- lhs =/= + rhs)
? ? - Right ( False , True )
? ? + Left InternalError
rewritten from: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,main,2,-1,2,2,main,main]
to: [main,main,main,main,main,1,-1,1,1,main,1,main,main,main,2,main,-1]
This failure can be reproduced by running:
> recheck (Size 47) (Seed 2159662051602767058 (-7857629802164753123)) <property>
False
This is a little trickier. Here’s my diagnosis:
It’s an
InternalError
again, which means we’re trying to execute a thread which isn’t runnable.The memory model is
TotalStoreOrder
, and the thread we’re trying to execute is thread-1
, a “fake” thread used in the relaxed memory implementation. So this is a relaxed memory bug.The traces only differ in one place: where
main, 2, -1
is changed to2, main, -1
. So the issue is caused by re-orderingmain
and thread2
.If the
main
action is a memory barrier, then thread-1
will not exist after it.So the
main
action is probably a memory barrier.
Let’s push along those lines and add a case for memory barriers to independent
:
independent ds (tid1, ta1) (tid2, ta2)
| tid1 == tid2 = False
| ta1 == Fork tid2 = False
| ta2 == Fork tid1 = False
| otherwise = case (simplifyAction ta1, simplifyAction ta2) of
(UnsynchronisedWrite _, a) | isBarrier a -> False
(a, UnsynchronisedWrite _) | isBarrier a -> False
_ -> not (dependent ds tid1 ta1 tid2 ta2)
Did we get it?
> hog example2
? <interactive> passed 100 tests.
True
Great!
Bugs?
So, we explored the dependency relation with Hedgehog, and found three missing cases:
Two actions of the same thread are dependent.
Any action of a thread is dependent with the
fork
which creates that thread.Unsynchronised writes are dependent with memory barriers.
But are these bugs? I’m not so sure:
The dependency relation is only ever used to compare different threads.
This is technically correct, but it’s not interesting or useful.
This could be a bug. The relaxed memory implementation is pretty hairy and I’ve had a lot of problems with it in the past. Honestly, I just need to rewrite it (or campaign for Haskell to become sequentially consistent2 and rip it out).
But even if not bugs, these are definitely confusing. The dependency relation is currently just an internal thing, not exposed to users. However, I’m planning to expose a function to normalise traces, in which case providing an independent
function is entirely reasonable.
So even if these changes don’t make it into dependent
, they will be handled by independent
.
Next steps: I’m going to get this into the test suite, to get a large number of extra example programs for free. My hacky and cobbled-together testing framework in dejafu-tests is capable of running every test case with a variety of different schedulers, so I just need to add another way it runs everything. I won’t need to touch the actual tests, just the layer of glue which runs them all, which is nice.
The only problem is that this glue is currently based on HUnit and test-framework, whereas the only integration I can find for Hedgehog is tasty-hedgehog, so I might need to switch to tasty first. As usual, the hardest part is getting different libraries to co-operate!
Hopefully I’ll find some bugs! Well, not exactly hopefully, but you know what I mean.
For all the gory details, see:
Dynamic partial order reduction for relaxed memory models, N. Zhang, M. Kusano, and C. Wang (2015)
Bounded partial-order reduction, K. Coons, M. Musuvathi, and K. McKinley (2013)
Refining dependencies improves partial-order verification methods (extended abstract), P. Godefroid and D. Pirottin (1993)
SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap, M. Vollmer, R. G. Scott, M. Musuvathi, and R. R. Newton (2017)↩︎