Simplifying Execution Traces

Date
Tags concurrency, dejafu, haskell, programming, research
Target Audience People interested in the dejafu internals.
Epistemic Status This is describing the current state of a wip branch, which may yet be changed.

It’s well known that ran­domly gen­er­ated test fail­ures are a poor de­bug­ging aid. That’s why every non-toy ran­dom­ised prop­erty testing lib­rary (like Hedgehog or Hy­po­thesis or QuickCheck) puts a con­sid­er­able amount of ef­fort into shrinking fail­ures. It’s a non-trivial prob­lem, but it’s ab­so­lutely es­sen­tial.

It’s also some­thing that de­jafu does not do.

Run­ning ex­ample

I’m going to use the “stores are trans­it­ively vis­ible” litmus test as a run­ning ex­ample. Here it is:

im­port qual­i­fied Con­trol.­Mon­ad.­Con­c.­Class as C
im­port           Test.De­jaFu.In­ternal
im­port           Test.De­jaFu.SCT
im­port           Test.De­jaFu.SCT.In­tern­al.DPOR
im­port           Test.De­jaFu.­Types
im­port           Test.De­jaFu.Utils

storesAre­Trans­it­ivelyVis­ible :: C.Mon­ad­Conc m => m (Int, Int, Int)
storesAre­Trans­it­ivelyVis­ible = do
  x <- C.new­CRef 0
  y <- C.new­CRef 0
  j1 <- C.spawn (C.write­CRef x 1)
  j2 <- C.spawn (do r1 <- C.read­CRef x; C.write­CRef x 1; pure r1)
  j3 <- C.spawn (do r2 <- C.read­CRef y; r3 <- C.read­CRef x; pure (r2,r3))
  (\() r1 (r2,r3) -> (r1,r2,r3)) <$> C.read­MVar j1 <*> C.read­MVar j2 <*> C.read­MVar j3

I picked this one be­cause it’s kind of ar­bit­rarily com­plex. It’s a small test, but it’s for the re­laxed memory im­ple­ment­a­tion, so there’s a lot going on. It’s a fairly dense test.

I’m now going to define a metric of trace com­plexity which I’ll jus­tify in a mo­ment:

com­plexity :: Trace -> (Int, Int, Int, Int)
com­plexity = foldr go (0,0,0,0) where
  go (SwitchTo _, _, Com­mitCRef _ _) (w, x, y, z) = (w+1, x+1, y,   z)
  go (Start    _, _, Com­mitCRef _ _) (w, x, y, z) = (w+1, x,   y+1, z)
  go (Con­tinue,   _, Com­mitCRef _ _) (w, x, y, z) = (w+1, x,   y,   z+1)
  go (SwitchTo _, _, _)              (w, x, y, z) = (w,   x+1, y,   z)
  go (Start    _, _, _)              (w, x, y, z) = (w,   x,   y+1, z)
  go (Con­tinue,   _, _)              (w, x, y, z) = (w,   x,   y,   z+1)

Using the 183-shrinking branch, we can now get the first trace for every dis­tinct res­ult, along with its com­plex­ity:

res­ults :: Way -> Mem­Type -> IO ()
res­ults way mem­type = do
  let set­tings = set le­quality (Just (==))
               $ from­Way­And­Mem­Type way mem­type
  res <- run­SCT­With­Set­tings set­tings storesAre­Trans­it­ivelyVis­ible
  flip mapM_ res $ \(efa, trace) ->
    put­StrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (com­plexity trace))

Here are the res­ults for sys­tem­atic test­ing:

λ> res­ults (sys­tem­at­ic­ally de­fault­Bounds) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0------------S1---S0--S2-----S0--S3-----S0--   (0,0,7,24)
Right (0,0,1)   S0------------S2-----S1---S0---S3-----S0--      (0,0,6,24)
Right (0,0,0)   S0------------S2-P3-----S1---S0--S2----S0---    (0,1,6,23)
Right (1,0,0)   S0------------S3-----S1---S0--S2-----S0---      (0,0,6,24)

λ> res­ults (sys­tem­at­ic­ally de­fault­Bounds) Total­Store­Order
Right (1,0,1)   S0------------S1---S0--S2-----S0--S3-----S0--   (0,0,7,24)
Right (0,0,1)   S0------------S1-P2-----S1--S0---S3-----S0--    (0,1,6,23)
Right (0,0,0)   S0------------S1-P2---P3-----S1--S0--S2--S0---  (0,2,6,22)
Right (1,0,0)   S0------------S1-P3-----S1--S0--S2-----S0---    (0,1,6,23)

λ> res­ults (sys­tem­at­ic­ally de­fault­Bounds) Par­tial­Store­Order
Right (1,0,1)   S0------------S1---S0--S2-----S0--S3-----S0--   (0,0,7,24)
Right (0,0,1)   S0------------S1-P2-----S1--S0---S3-----S0--    (0,1,6,23)
Right (0,0,0)   S0------------S1-P2---P3-----S1--S0--S2--S0---  (0,2,6,22)
Right (1,0,0)   S0------------S1-P3-----S1--S0--S2-----S0---    (0,1,6,23)

Pretty messy, right? Here’s the res­ults for random test­ing:

λ> res­ults (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0-----P1-P0----P2-P1-P0-P3-P1-S2-P3--P0-P3-P0-P3-S2-P0-S2-P0--P2-S0-   (0,15,5,9)
Right (0,0,1)   S0-------P2-P1-P2-P0--P2-P0-P1-P0---S2-P3-P0-P2-S3---P1-S3-S0--         (0,12,5,12)
Right (1,0,0)   S0------------S3-----S1-P2-P1-P0--S2---P1-S0---                         (0,4,5,20)
Right (0,0,0)   S0---------P2-P0--P3-P0-S3--P2-P3-P2--P3-S2-S1--P0----                  (0,9,4,15)

λ> res­ults (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0-     (1,13,6,11)
Right (0,0,1)   S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0--              (0,8,6,16)
Right (0,0,0)   S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0----            (2,10,6,14)

λ> res­ults (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0-     (1,13,6,11)
Right (0,0,1)   S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0--              (0,8,6,16)
Right (0,0,0)   S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0----            (2,10,6,14)

Yikes!

The com­plexity metric I defined counts four things:

  1. The number of re­laxed-memory commit ac­tions
  2. The number of pre-emptive con­text switches
  3. The number of non-­pre-emptive con­text switches
  4. The number of con­tinues

I would much rather read a long trace where the only con­text switches are when threads block, than a short one which is rap­idly jumping between threads. So, given two equi­valent traces, I will al­ways prefer the one with a lex­ico­graph­ic­ally smaller com­plex­ity-tuple.

Trace sim­pli­fic­a­tion

The key idea un­der­pin­ning trace sim­pli­fic­a­tion is that de­jafu can tell when two scheduling de­cisions can be swapped without chan­ging the be­ha­viour of the pro­gram. I talked about this idea in the Using Hedgehog to Test Déjà Fu memo. So we can im­ple­ment trans­form­a­tions which are guar­an­teed to pre­serve se­mantics without needing to verify this by re-run­ning the test case.

Al­though we don’t need to re-run the test case at all, the 183-shrinking branch cur­rently does, but only once at the end after the min­imum has been found. This is be­cause it’s easier to gen­erate a sim­pler se­quence of scheduling de­cisions and use de­jafu to pro­duce the cor­res­ponding trace than it is to pro­duce a sim­pler trace dir­ectly. This is still strictly better than a typ­ical shrinking al­gorithm, which would re-run the test case after each shrinking step, rather than only at the end.

Rather than drag this out, here’s what those random traces sim­plify to:

res­ultsS :: Way -> Mem­Type -> IO ()
res­ultsS way mem­type = do
  let set­tings = set lsim­plify True
               . set le­quality (Just (==))
               $ from­Way­And­Mem­Type way mem­type
  res <- run­SCT­With­Set­tings set­tings storesAre­Trans­it­ivelyVis­ible
  flip mapM_ res $ \(efa, trace) ->
    put­StrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (com­plexity trace))
λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0----------P1---S2--P3-----S0---S2---S0---     (0,2,5,22)
Right (0,0,1)   S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20)
Right (1,0,0)   S0------------S3-----S1---S0--S2----P0---       (0,1,5,23)
Right (0,0,0)   S0------------S3--P2-----S3---S1--P0----        (0,2,4,22)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0----------P1---S2-----S0----S3-----S0--       (0,1,5,23)
Right (0,0,1)   S0----------P1-P2-----S0--S1--S0---S3-----S0--  (0,2,6,22)
Right (0,0,0)   S0----------P2--P3-----S0--S2---S1--P0----      (0,3,4,21)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0----------P1---S2-----S0----S3-----S0--       (0,1,5,23)
Right (0,0,1)   S0----------P1-P2-----S0--S1--S0---S3-----S0--  (0,2,6,22)
Right (0,0,0)   S0----------P2--P3-----S0--S2---S1--P0----      (0,3,4,21)

This is much bet­ter.

There are two sim­pli­fic­a­tion phases: a pre­par­a­tion phase, which puts the trace into a normal form and prunes un­ne­ces­sary com­mits; and an it­er­a­tion phase, which re­peats a step func­tion until a fixed point is reached (or the it­er­a­tion limit is).

Pre­par­a­tion

The pre­par­a­tion phase has two steps: first we put the trace into lex­ico­graphic normal form, then we prune un­ne­ces­sary com­mits.

We put a trace in lex­ico­graphic normal form by sorting by thread ID, where only in­de­pendent ac­tions can be swapped:

lex­icoN­or­mal­Form :: Mem­Type -> [(ThreadId, ThreadAc­tion)] -> [(ThreadId, ThreadAc­tion)]
lex­icoN­or­mal­Form mem­type = go where
  go trc =
    let trc' = bubble ini­tialDep­State trc
    in if trc == trc' then trc else go trc'

  bubble ds (t1@(tid1, ta1):t2@(tid2, ta2):trc)
    | in­de­pendent ds tid1 ta1 tid2 ta2 && tid2 < tid1 = bgo ds t2 (t1 : trc)
    | oth­er­wise = bgo ds t1 (t2 : trc)
  bubble _ trc = trc

  bgo ds t@(tid, ta) trc = t : bubble (up­dateDep­State mem­type ds tid ta) trc

If sim­pli­fic­a­tion only put traces into lex­ico­graphic normal form, we would get these res­ults:

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0-----------P1---S2--P0--S2--P0-P3----P0--             (0,5,3,19)
Right (0,0,1)   S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0--    (0,8,4,16)
Right (1,0,0)   S0------------S3----P1--P0--S1-S2----P0---              (0,3,4,21)
Right (0,0,0)   S0------------S2-P3--P2----S3--P1--P0----               (0,4,3,20)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0--        (1,5,5,19)
Right (0,0,1)   S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0--   (0,7,5,17)
Right (0,0,0)   S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0----       (2,6,5,18)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0--        (1,5,5,19)
Right (0,0,1)   S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0--   (0,7,5,17)
Right (0,0,0)   S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0----       (2,6,5,18)

These are better than they were, but we can do better still.

After put­ting the trace into lex­ico­graphic normal form, we de­lete any commit ac­tions which are fol­lowed by any number of in­de­pendent ac­tions and then a memory bar­rier:

drop­Com­mits :: Mem­Type -> [(ThreadId, ThreadAc­tion)] -> [(ThreadId, ThreadAc­tion)]
drop­Com­mits Se­quen­tial­Con­sist­ency = id
drop­Com­mits mem­type = go ini­tialDep­State where
  go ds (t1@(tid1, ta1@(Com­mitCRef _ _)):t2@(tid2, ta2):trc)
    | is­Bar­rier (sim­pli­fy­Ac­tion ta2) = go ds (t2:trc)
    | in­de­pendent ds tid1 ta1 tid2 ta2 = t2 : go (up­dateDep­State mem­type ds tid2 ta2) (t1:trc)
  go ds (t@(tid,ta):trc) = t : go (up­dateDep­State mem­type ds tid ta) trc
  go _ [] = []

Such com­mits don’t af­fect the be­ha­viour of the pro­gram at all, as all buf­fered writes gets flushed when the memory bar­rier hap­pens.

If sim­pli­fic­a­tion only did the pre­par­a­tion phase, we would get these res­ults:

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0-----------P1---S2--P0--S2--P0-P3----P0--             (0,5,3,19)
Right (0,0,1)   S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0--    (0,8,4,16)
Right (1,0,0)   S0------------S3----P1--P0--S1-S2----P0---              (0,3,4,21)
Right (0,0,0)   S0------------S2-P3--P2----S3--P1--P0----               (0,4,3,20)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0-------P1---S2--P0-----P2--P0--S2-S3----P0--          (0,5,4,19)
     ^-- better than just lex­icoN­or­mal­Form
Right (0,0,1)   S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0--   (0,7,5,17)
Right (0,0,0)   S0-----------P2---P3--P0-S2--S3--P1--P0----             (0,5,3,19)
     ^-- better than just lex­icoN­or­mal­Form

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0-------P1---S2--P0-----P2--P0--S2-S3----P0--          (0,5,4,19)
     ^-- better than just lex­icoN­or­mal­Form
Right (0,0,1)   S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0--   (0,7,5,17)
Right (0,0,0)   S0-----------P2---P3--P0-S2--S3--P1--P0----             (0,5,3,19)
     ^-- better than just lex­icoN­or­mal­Form

It­er­a­tion

The it­er­a­tion phase at­tempts to re­duce con­text switching by pushing ac­tions for­wards, or pulling them back­wards, through the trace.

If we have the trace [(tid1, act1), (tid2, act2), (tid1, act3)], where act2 and act3 are in­de­pend­ent, the “pull back” trans­form­a­tion would re-order that to [(tid1, act1), (tid1, act3), (tid2, act2)].

In con­trast, if act1 and act2 were in­de­pend­ent, the “push for­ward” trans­form­a­tion would re-order that to [(tid2, act2), (tid1, act1), (tid1, act3)]. The two trans­form­a­tions are al­most, but not quite op­pos­ites.

Pull-­back walks through the trace and, at every con­text switch, looks for­ward to see if there is a single ac­tion of the ori­ginal thread it can put be­fore the con­text switch:

pull­Back :: Mem­Type -> [(ThreadId, ThreadAc­tion)] -> [(ThreadId, ThreadAc­tion)]
pull­Back mem­type = go ini­tialDep­State where
  go ds (t1@(tid1, ta1):trc@((tid2, _):_)) =
    let ds' = up­dateDep­State mem­type ds tid1 ta1
        trc' = if tid1 /= tid2
               then maybe trc (un­curry (:)) (fin­d­Ac­tion tid1 ds' trc)
               else trc
    in t1 : go ds' trc'
  go _ trc = trc

  fin­d­Ac­tion tid0 = fgo where
    fgo ds (t@(tid, ta):trc)
      | tid == tid0 = Just (t, trc)
      | oth­er­wise = case fgo (up­dateDep­State mem­type ds tid ta) trc of
          Just (ft@(ftid, fa), trc')
            | in­de­pendent ds tid ta ftid fa -> Just (ft, t:trc')
          _ -> Nothing
    fgo _ _ = Nothing

Push-­for­ward walks through the trace and, at every con­text switch, looks for­ward to see if the last ac­tion of the ori­ginal thread can be put at its next ex­e­cu­tion:

push­For­ward :: Mem­Type -> [(ThreadId, ThreadAc­tion)] -> [(ThreadId, ThreadAc­tion)]
push­For­ward mem­type = go ini­tialDep­State where
  go ds (t1@(tid1, ta1):trc@((tid2, _):_)) =
    let ds' = up­dateDep­State mem­type ds tid1 ta1
    in if tid1 /= tid2
       then maybe (t1 : go ds' trc) (go ds) (fin­d­Ac­tion tid1 ta1 ds trc)
       else t1 : go ds' trc
  go _ trc = trc

  fin­d­Ac­tion tid0 ta0 = fgo where
    fgo ds (t@(tid, ta):trc)
      | tid == tid0 = Just ((tid0, ta0) : t : trc)
      | in­de­pendent ds tid0 ta0 tid ta = (t:) <$> fgo (up­dateDep­State mem­type ds tid ta) trc
      | oth­er­wise = Nothing
    fgo _ _ = Nothing

The it­er­a­tion pro­cess just re­peats push­For­ward mem­type . pull­Back mem­type.

If it only used pull­Back, we would get these res­ults:

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0-----------P1---S2---P0--S2--S0-P3-----S0--           (0,3,5,21)
Right (0,0,1)   S0-----------P2-P1-P2--P1--S0--S2--S0-P3-----S0--       (0,5,5,19)
Right (1,0,0)   S0------------S3-----S1---S0--S2----P0---               (0,1,5,23)
Right (0,0,0)   S0------------S2-P3---P2----S3--S1--P0----              (0,3,4,21)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0-----------P1---S2-----S0---S3-----S0--               (0,1,5,23)
Right (0,0,1)   S0-----------P1-P2-----S0-S1--S0---S3-----S0--          (0,2,6,22)
Right (0,0,0)   S0-----------P2---P3-----S0-S2--S1--P0----              (0,3,4,21)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0-----------P1---S2-----S0---S3-----S0--               (0,1,5,23)
Right (0,0,1)   S0-----------P1-P2-----S0-S1--S0---S3-----S0--          (0,2,6,22)
Right (0,0,0)   S0-----------P2---P3-----S0-S2--S1--P0----              (0,3,4,21)

With no ex­cep­tion, it­er­ating pull­Back is an im­prove­ment over just doing pre­par­a­tion.

If it only used push­For­ward, we would get these res­ults:

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0-------P1---S2--P0------S2--P3----P0---               (0,4,3,20)
Right (0,0,1)   S0-------P2-P1-P2-P1-P0------S1-S2---S3----P0---        (0,6,4,18)
Right (1,0,0)   S0------------S3----P1--P0--S1-S2----P0---              (0,3,4,21)
     ^-- no im­prove­ment over pre­par­a­tion
Right (0,0,0)   S0------------S3--P2-----S3--P1--P0----                 (0,3,3,21)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0----P1---S0---P2----P0-------S2-S3----P0--            (0,4,4,20)
Right (0,0,1)   S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0--     (0,6,5,18)
Right (0,0,0)   S0----------P2--P3--P0--S2---S3--P1--P0----             (0,5,3,19)
     ^-- no im­prove­ment over pre­par­a­tion

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0----P1---S0---P2----P0-------S2-S3----P0--            (0,4,4,20)
Right (0,0,1)   S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0--     (0,6,5,18)
Right (0,0,0)   S0----------P2--P3--P0--S2---S3--P1--P0----             (0,5,3,19)
     ^-- no im­prove­ment over pre­par­a­tion

With three ex­cep­tions, where the traces didn’t change, it­er­ating push­For­ward is an im­prove­ment over just doing pre­par­a­tion.

We’ve already seen the res­ults if we com­bine them:

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Se­quen­tial­Con­sist­ency
Right (1,0,1)   S0----------P1---S2--P3-----S0---S2---S0---     (0,2,5,22)
Right (0,0,1)   S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20)
Right (1,0,0)   S0------------S3-----S1---S0--S2----P0---       (0,1,5,23)
     ^-- same as pull­Back, which is better than push­For­ward
Right (0,0,0)   S0------------S3--P2-----S3---S1--P0----        (0,2,4,22)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Total­Store­Order
Right (1,0,1)   S0----------P1---S2-----S0----S3-----S0--       (0,1,5,23)
     ^-- same as pull­Back, which is better than push­For­ward
Right (0,0,1)   S0----------P1-P2-----S0--S1--S0---S3-----S0--  (0,2,6,22)
     ^-- same as pull­Back, which is better than push­For­ward
Right (0,0,0)   S0----------P2--P3-----S0--S2---S1--P0----      (0,3,4,21)

λ> res­ultsS (ran­domly (mk­StdGen 0) 100) Par­tial­Store­Order
Right (1,0,1)   S0----------P1---S2-----S0----S3-----S0--       (0,1,5,23)
     ^-- same as pull­Back, which is better than push­For­ward
Right (0,0,1)   S0----------P1-P2-----S0--S1--S0---S3-----S0--  (0,2,6,22)
     ^-- same as pull­Back, which is better than push­For­ward
Right (0,0,0)   S0----------P2--P3-----S0--S2---S1--P0----      (0,3,4,21)

Next steps

I think what I have right now is pretty good. It’s def­in­itely a vast im­prove­ment over not doing any sim­pli­fic­a­tion.

But, no random traces get sim­pli­fied to the cor­res­ponding sys­tem­atic traces, which is a little dis­ap­point­ing. I think that’s be­cause the cur­rent passes just try to re­duce con­text switches of any form, whereas really I want to re­duce pre-emptive con­text switches more than non-­pre-emptive ones.