It’s also something that dejafu does not do.

I’m going to use the “stores are transitively visible” litmus test as a running example. Here it is:

import qualified Control.Monad.Conc.Class as C import Test.DejaFu.Internal import Test.DejaFu.SCT import Test.DejaFu.SCT.Internal.DPOR import Test.DejaFu.Types import Test.DejaFu.Utils storesAreTransitivelyVisible :: C.MonadConc m => m (Int, Int, Int) storesAreTransitivelyVisible = do x <- C.newCRef 0 y <- C.newCRef 0 j1 <- C.spawn (C.writeCRef x 1) j2 <- C.spawn (do r1 <- C.readCRef x; C.writeCRef x 1; pure r1) j3 <- C.spawn (do r2 <- C.readCRef y; r3 <- C.readCRef x; pure (r2,r3)) (\() r1 (r2,r3) -> (r1,r2,r3)) <$> C.readMVar j1 <*> C.readMVar j2 <*> C.readMVar j3

I picked this one because it’s kind of arbitrarily complex. It’s a small test, but it’s for the relaxed memory implementation, so there’s a lot going on. It’s a fairly dense test.

I’m now going to define a metric of trace complexity which I’ll justify in a moment:

complexity :: Trace -> (Int, Int, Int, Int) complexity = foldr go (0,0,0,0) where go (SwitchTo _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x+1, y, z) go (Start _, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y+1, z) go (Continue, _, CommitCRef _ _) (w, x, y, z) = (w+1, x, y, z+1) go (SwitchTo _, _, _) (w, x, y, z) = (w, x+1, y, z) go (Start _, _, _) (w, x, y, z) = (w, x, y+1, z) go (Continue, _, _) (w, x, y, z) = (w, x, y, z+1)

Using the `183-shrinking`

branch, we can now get the first trace for every distinct result, along with its complexity:

results :: Way -> MemType -> IO () results way memtype = do let settings = set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

Here are the results for systematic testing:

λ> results (systematically defaultBounds) SequentialConsistency Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S2-----S1---S0---S3-----S0-- (0,0,6,24) Right (0,0,0) S0------------S2-P3-----S1---S0--S2----S0--- (0,1,6,23) Right (1,0,0) S0------------S3-----S1---S0--S2-----S0--- (0,0,6,24) λ> results (systematically defaultBounds) TotalStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23) λ> results (systematically defaultBounds) PartialStoreOrder Right (1,0,1) S0------------S1---S0--S2-----S0--S3-----S0-- (0,0,7,24) Right (0,0,1) S0------------S1-P2-----S1--S0---S3-----S0-- (0,1,6,23) Right (0,0,0) S0------------S1-P2---P3-----S1--S0--S2--S0--- (0,2,6,22) Right (1,0,0) S0------------S1-P3-----S1--S0--S2-----S0--- (0,1,6,23)

Pretty messy, right? Here’s the results for *random* testing:

λ> results (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----P1-P0----P2-P1-P0-P3-P1-S2-P3--P0-P3-P0-P3-S2-P0-S2-P0--P2-S0- (0,15,5,9) Right (0,0,1) S0-------P2-P1-P2-P0--P2-P0-P1-P0---S2-P3-P0-P2-S3---P1-S3-S0-- (0,12,5,12) Right (1,0,0) S0------------S3-----S1-P2-P1-P0--S2---P1-S0--- (0,4,5,20) Right (0,0,0) S0---------P2-P0--P3-P0-S3--P2-P3-P2--P3-S2-S1--P0---- (0,9,4,15) λ> results (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14) λ> results (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----P1--P0-P1-S0-P2--C-S0---P2-P3-P2--S3-P0-P3-P0---S3-P0-P3-S0- (1,13,6,11) Right (0,0,1) S0----P1-P0-----P2--P0--P2-P0-S2--S3-P1-P0---S1-S3----S0-- (0,8,6,16) Right (0,0,0) S0--------P2-P0--P3-P2-P0-P3-P2-C-S0-S3---S2--S1-C-S1-P0---- (2,10,6,14)

Yikes!

The complexity metric I defined counts four things:

- The number of relaxed-memory commit actions
- The number of pre-emptive context switches
- The number of non-pre-emptive context switches
- The number of continues

I would much rather read a long trace where the only context switches are when threads block, than a short one which is rapidly jumping between threads. So, given two equivalent traces, I will always prefer the one with a lexicographically smaller complexity-tuple.

The key idea underpinning trace simplification is that dejafu can tell when two scheduling decisions can be swapped without changing the behaviour of the program. I talked about this idea in the Using Hedgehog to Test Déjà Fu memo. So we can implement transformations which are guaranteed to preserve semantics *without needing to verify this by re-running the test case*.

Although we don’t need to re-run the test case at all, the `183-shrinking`

branch currently does, but only once at the end after the minimum has been found. This is because it’s easier to generate a simpler sequence of scheduling decisions and use dejafu to produce the corresponding trace than it is to produce a simpler trace directly. This is still strictly better than a typical shrinking algorithm, which would re-run the test case after *each* shrinking step, rather than only at the end.

Rather than drag this out, here’s what those random traces simplify to:

resultsS :: Way -> MemType -> IO () resultsS way memtype = do let settings = set lsimplify True . set lequality (Just (==)) $ fromWayAndMemType way memtype res <- runSCTWithSettings settings storesAreTransitivelyVisible flip mapM_ res $ \(efa, trace) -> putStrLn (show efa ++ "\t" ++ showTrace trace ++ "\t" ++ show (complexity trace))

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

This is much better.

There are two simplification phases: a preparation phase, which puts the trace into a normal form and prunes unnecessary commits; and an iteration phase, which repeats a step function until a fixed point is reached (or the iteration limit is).

The preparation phase has two steps: first we put the trace into *lexicographic normal form*, then we prune unnecessary commits.

We put a trace in lexicographic normal form by sorting by thread ID, where only independent actions can be swapped:

lexicoNormalForm :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] lexicoNormalForm memtype = go where go trc = let trc' = bubble initialDepState trc in if trc == trc' then trc else go trc' bubble ds (t1@(tid1, ta1):t2@(tid2, ta2):trc) | independent ds tid1 ta1 tid2 ta2 && tid2 < tid1 = bgo ds t2 (t1 : trc) | otherwise = bgo ds t1 (t2 : trc) bubble _ trc = trc bgo ds t@(tid, ta) trc = t : bubble (updateDepState memtype ds tid ta) trc

If simplification only put traces into lexicographic normal form, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--C-S0-----P2--P0--S2-S3----P0-- (1,5,5,19) Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--C-S0-S2--S3--P1-C-S1-P0---- (2,6,5,18)

These are better than they were, but we can do better still.

After putting the trace into lexicographic normal form, we delete any commit actions which are followed by any number of independent actions and then a memory barrier:

dropCommits :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] dropCommits SequentialConsistency = id dropCommits memtype = go initialDepState where go ds (t1@(tid1, ta1@(CommitCRef _ _)):t2@(tid2, ta2):trc) | isBarrier (simplifyAction ta2) = go ds (t2:trc) | independent ds tid1 ta1 tid2 ta2 = t2 : go (updateDepState memtype ds tid2 ta2) (t1:trc) go ds (t@(tid,ta):trc) = t : go (updateDepState memtype ds tid ta) trc go _ [] = []

Such commits don’t affect the behaviour of the program at all, as all buffered writes gets flushed when the memory barrier happens.

If simplification only did the preparation phase, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2--P0--S2--P0-P3----P0-- (0,5,3,19) Right (0,0,1) S0-----------P2-P1-P2-P1-P0--S2--P0-P1-S2-S3----P0-- (0,8,4,16) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) Right (0,0,0) S0------------S2-P3--P2----S3--P1--P0---- (0,4,3,20) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-------P1---S2--P0-----P2--P0--S2-S3----P0-- (0,5,4,19) ^-- better than just lexicoNormalForm Right (0,0,1) S0-----------P1-P2--P0-S1-P0-P2--P0--S1-S2-S3----P0-- (0,7,5,17) Right (0,0,0) S0-----------P2---P3--P0-S2--S3--P1--P0---- (0,5,3,19) ^-- better than just lexicoNormalForm

The iteration phase attempts to reduce context switching by pushing actions forwards, or pulling them backwards, through the trace.

If we have the trace `[(tid1, act1), (tid2, act2), (tid1, act3)]`

, where `act2`

and `act3`

are independent, the “pull back” transformation would re-order that to `[(tid1, act1), (tid1, act3), (tid2, act2)]`

.

In contrast, if `act1`

and `act2`

were independent, the “push forward” transformation would re-order that to `[(tid2, act2), (tid1, act1), (tid1, act3)]`

. The two transformations are almost, but not quite opposites.

Pull-back walks through the trace and, at every context switch, looks forward to see if there is a single action of the original thread it can put before the context switch:

pullBack :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pullBack memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 trc' = if tid1 /= tid2 then maybe trc (uncurry (:)) (findAction tid1 ds' trc) else trc in t1 : go ds' trc' go _ trc = trc findAction tid0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just (t, trc) | otherwise = case fgo (updateDepState memtype ds tid ta) trc of Just (ft@(ftid, fa), trc') | independent ds tid ta ftid fa -> Just (ft, t:trc') _ -> Nothing fgo _ _ = Nothing

Push-forward walks through the trace and, at every context switch, looks forward to see if the last action of the original thread can be put at its next execution:

pushForward :: MemType -> [(ThreadId, ThreadAction)] -> [(ThreadId, ThreadAction)] pushForward memtype = go initialDepState where go ds (t1@(tid1, ta1):trc@((tid2, _):_)) = let ds' = updateDepState memtype ds tid1 ta1 in if tid1 /= tid2 then maybe (t1 : go ds' trc) (go ds) (findAction tid1 ta1 ds trc) else t1 : go ds' trc go _ trc = trc findAction tid0 ta0 = fgo where fgo ds (t@(tid, ta):trc) | tid == tid0 = Just ((tid0, ta0) : t : trc) | independent ds tid0 ta0 tid ta = (t:) <$> fgo (updateDepState memtype ds tid ta) trc | otherwise = Nothing fgo _ _ = Nothing

The iteration process just repeats `pushForward memtype . pullBack memtype`

.

If it only used `pullBack`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-----------P1---S2---P0--S2--S0-P3-----S0-- (0,3,5,21) Right (0,0,1) S0-----------P2-P1-P2--P1--S0--S2--S0-P3-----S0-- (0,5,5,19) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) Right (0,0,0) S0------------S2-P3---P2----S3--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0-----------P1---S2-----S0---S3-----S0-- (0,1,5,23) Right (0,0,1) S0-----------P1-P2-----S0-S1--S0---S3-----S0-- (0,2,6,22) Right (0,0,0) S0-----------P2---P3-----S0-S2--S1--P0---- (0,3,4,21)

With no exception, iterating `pullBack`

is an improvement over just doing preparation.

If it only used `pushForward`

, we would get these results:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0-------P1---S2--P0------S2--P3----P0--- (0,4,3,20) Right (0,0,1) S0-------P2-P1-P2-P1-P0------S1-S2---S3----P0--- (0,6,4,18) Right (1,0,0) S0------------S3----P1--P0--S1-S2----P0--- (0,3,4,21) ^-- no improvement over preparation Right (0,0,0) S0------------S3--P2-----S3--P1--P0---- (0,3,3,21) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----P1---S0---P2----P0-------S2-S3----P0-- (0,4,4,20) Right (0,0,1) S0-------P1-P2--P0-----S1-P2--P0---S1-S2-S3----P0-- (0,6,5,18) Right (0,0,0) S0----------P2--P3--P0--S2---S3--P1--P0---- (0,5,3,19) ^-- no improvement over preparation

With three exceptions, where the traces didn’t change, iterating `pushForward`

is an improvement over just doing preparation.

We’ve already seen the results if we combine them:

λ> resultsS (randomly (mkStdGen 0) 100) SequentialConsistency Right (1,0,1) S0----------P1---S2--P3-----S0---S2---S0--- (0,2,5,22) Right (0,0,1) S0----------P2-P1-P2-P1--S0---S2---S3-----S0--- (0,4,5,20) Right (1,0,0) S0------------S3-----S1---S0--S2----P0--- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0------------S3--P2-----S3---S1--P0---- (0,2,4,22) λ> resultsS (randomly (mkStdGen 0) 100) TotalStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21) λ> resultsS (randomly (mkStdGen 0) 100) PartialStoreOrder Right (1,0,1) S0----------P1---S2-----S0----S3-----S0-- (0,1,5,23) ^-- same as pullBack, which is better than pushForward Right (0,0,1) S0----------P1-P2-----S0--S1--S0---S3-----S0-- (0,2,6,22) ^-- same as pullBack, which is better than pushForward Right (0,0,0) S0----------P2--P3-----S0--S2---S1--P0---- (0,3,4,21)

I think what I have right now is pretty good. It’s definitely a vast improvement over not doing any simplification.

*But*, no random traces get simplified to the corresponding systematic traces, which is a little disappointing. I think that’s because the current passes just try to reduce context switches of any form, whereas really I want to reduce pre-emptive context switches more than non-pre-emptive ones.

We have a

*typeclass*abstracting over concurrency.- There’s an implementation of this typeclass using
`IO`

. - There’s also an implementation of this typeclass using a custom monad transformer called
`ConcT`

.

- There’s an implementation of this typeclass using
Computations of type

`MonadRef r n => ConcT r n a`

can be executed with a given scheduler, to produce a result and an execution trace.Unlike

`IO`

, the threads in a`ConcT r n`

computation are executed in a single-step fashion based on the decisions of the scheduler.To implement this single-step execution, all threads are executed in a single “real” thread.

It’s the third point that gives dejafu the ability to systematically explore different executions. If execution were not single-step, then it wouldn’t be possible in general to context switch between arbitrary concurrency actions.

The fourth point greatly simplifies the implementation, but also causes problems: GHC Haskell has a notion of “bound threads”, which are Haskell threads bound to a particular OS thread. Bound threads are absolutely essential to use FFI calls which rely on thread-local state. **Deja Fu cannot support bound threads if it executes everything in a single thread!**

How can we address this?

PULSE is a concurrency-testing tool for Erlang. It works by code instrumentation: around every communication operation is inserted a call to the PULSE scheduler process. The scheduler process tells processes when they can run. Execution is *not* serialised into a single thread, the distinct Erlang processes still exist, but only one of them may run at a time.

We can do the same thing in Haskell.

Let’s look at a much simplified version of dejafu to try this idea out.

{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE TypeFamilies #-} import qualified Control.Concurrent as C import qualified Control.Monad.Cont as K import Data.List.NonEmpty (NonEmpty(..), nonEmpty) import qualified Data.Map as M import Data.Maybe (isNothing) import qualified System.Random as R class Monad m => MonadConc m where type ThreadId m :: * type MVar m :: * -> * fork :: m () -> m (ThreadId m) forkOS :: m () -> m (ThreadId m) newEmptyMVar :: m (MVar m a) putMVar :: MVar m a -> a -> m () takeMVar :: MVar m a -> m a newMVar :: MonadConc m => a -> m (MVar m a) newMVar a = do v <- newEmptyMVar putMVar v a pure v

There’s a straightforward implementation for `IO`

:

instance MonadConc IO where type ThreadId IO = C.ThreadId type MVar IO = C.MVar fork = C.forkIO forkOS = C.forkOS newEmptyMVar = C.newEmptyMVar putMVar = C.putMVar takeMVar = C.takeMVar

The testing implementation is a little hairier. Because we want to be able to single-step it, we’ll use continuations:

newtype ConcT m a = ConcT { runConcT :: K.Cont (Action m) a } deriving (Functor, Applicative, Monad) newtype CTThreadId = CTThreadId Int deriving (Eq, Ord) data CTMVar m a = CTMVar { mvarID :: Int, mvarRef :: MVar m (Maybe a) } data Action m = Fork (ConcT m ()) (CTThreadId -> Action m) | ForkOS (ConcT m ()) (CTThreadId -> Action m) | forall a. NewEmptyMVar (CTMVar m a -> Action m) | forall a. PutMVar (CTMVar m a) a (Action m) | forall a. TakeMVar (CTMVar m a) (a -> Action m) | Stop (m ()) instance MonadConc (ConcT m) where type ThreadId (ConcT m) = CTThreadId type MVar (ConcT m) = CTMVar m fork ma = ConcT (K.cont (Fork ma)) forkOS ma = ConcT (K.cont (ForkOS ma)) newEmptyMVar = ConcT (K.cont NewEmptyMVar) putMVar mvar a = ConcT (K.cont (\k -> PutMVar mvar a (k ()))) takeMVar mvar = ConcT (K.cont (TakeMVar mvar))

Let’s talk about the `Action`

type a bit before moving on. The general structure is `Name [<args> ...] (<result> -> Action m)`

, where `m`

is some `MonadConc`

. For `MVar`

s, we’re just re-using the `MVar`

type of the underlying monad (dejafu proper re-uses the `IORef`

s of the underlying monad). For `ThreadId`

s we’re using `Int`

s. And we’re going to get the final result out of the computation with the `Stop`

action.

Let’s keep things simple and not support most of the fancy scheduling stuff dejafu does. Our scheduler is just going to be a stateful function from runnable threads to a single thread:

type Scheduler s = NonEmpty CTThreadId -> s -> (CTThreadId, s)

So now our execution function is going to look like this:

minifu :: MonadConc m => Scheduler s -> s -> ConcT m a -> m (Maybe a, s) minifu sched s (ConcT ma) = do out <- newMVar Nothing s' <- run sched s (K.runCont ma (\a -> Stop (takeMVar out >> putMVar out (Just a)))) a <- takeMVar out pure (a, s')

The real meat is the `run`

function:

run :: MonadConc m => Scheduler s -> s -> Action m -> m s run sched s0 a0 = go s0 initialIdSource =<< initialThreads a0 where go s ids threads | initialThreadId `M.member` threads = case runnable threads of Just tids -> let (chosen, s') = sched tids s in uncurry (go s') =<< loopStepThread ids chosen threads Nothing -> pure s | otherwise = pure s runnable = nonEmpty . M.keys . M.filter (isNothing . blockedOn)

Like in dejafu proper, execution is going to if the main thread terminates, even if there are other threads. Threads are going to live in a map keyed by `ThreadId`

.

type Threads m = M.Map CTThreadId (Thread m) initialThreads :: MonadConc m => Action m -> m (Threads m) initialThreads a0 = do t <- forkThread False a0 pure (M.singleton initialThreadId t) initialThreadId :: CTThreadId initialThreadId = CTThreadId 0

Each thread in our program-under-test is going to be executed in an actual thread. So, like PULSE, we’ll introduce communication (in the form of `MVar`

s) around concurrency actions to ensure that we get single-step execution. So a thread is going to have three components: the `MVar`

(if any) it’s currently blocked on, an `MVar`

to signal that it should execute one step, and an `MVar`

to communicate what the thread did.

data Thread m = Thread { blockedOn :: Maybe Int , signalStep :: MVar m IdSource , awaitResult :: MVar m (IdSource, ThreadResult m) } data ThreadResult m = BusinessAsUsual | Killed | Updated Int | Blocked Int | Forked (Thread m)

The `IdSource`

is used to generate new unique thread and `MVar`

IDs:

type IdSource = (Int, Int) initialIdSource :: IdSource initialIdSource = (1, 0) nextThreadId :: IdSource -> (CTThreadId, IdSource) nextThreadId (t, m) = (CTThreadId t, (t + 1, m)) nextMVarId :: IdSource -> (Int, IdSource) nextMVarId (t, m) = (m, (t, m + 1))

Forking a thread is going to set up these `MVar`

s and the small bit of logic to ensure things happen as we like:

forkThread :: MonadConc m => Bool -> Action m -> m (Thread m) forkThread isOS act = do signal <- newEmptyMVar await <- newEmptyMVar _ <- (if isOS then forkOS else fork) (runThread signal await act) pure (Thread Nothing signal await) runThread :: MonadConc m => MVar m IdSource -> MVar m (IdSource, ThreadResult m) -> Action m -> m () runThread signal await = go where go act = do ids <- takeMVar signal (act', ids', res) <- runStepThread ids act putMVar await (ids', res) maybe (pure ()) go act'

The final pieces of the puzzle are the two `*StepThread`

functions, which executes one action of our chosen thread. These are a little tricker than in normal dejafu.

Firstly, `loopStepThread`

, which tells the thread that was chosen by the scheduler to step:

loopStepThread :: MonadConc m => IdSource -> CTThreadId -> Threads m -> m (IdSource, Threads m) loopStepThread ids tid threads = case M.lookup tid threads of Just thread -> do putMVar (signalStep thread) ids (ids', res) <- takeMVar (awaitResult thread) let resf = case res of BusinessAsUsual -> id Killed -> M.delete tid Updated i -> fmap (\t -> if blockedOn t == Just i then t { blockedOn = Nothing } else t) Blocked i -> M.insert tid (thread { blockedOn = Just i }) Forked thread' -> M.insert (fst (nextThreadId ids)) thread' pure (ids', resf threads) Nothing -> pure (ids, threads)

Finally `runStepThread`

, which executes an action:

runStepThread :: MonadConc m => IdSource -> Action m -> m (Maybe (Action m), IdSource, ThreadResult m) runStepThread ids (Fork (ConcT ma) k) = do t <- primFork False ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (ForkOS (ConcT ma) k) = do t <- primFork True ma let (tid', ids') = nextThreadId ids pure (Just (k tid'), ids', Forked t) runStepThread ids (NewEmptyMVar k) = do v <- newEmptyMVar putMVar v Nothing let (mvid, ids') = nextMVarId ids let mvar = CTMVar mvid v pure (Just (k mvar), ids', BusinessAsUsual) runStepThread ids k0@(PutMVar (CTMVar mvid v) a k) = do old <- takeMVar v case old of Just _ -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Nothing -> putMVar v (Just a) >> pure (Just k, ids, Updated mvid) runStepThread ids k0@(TakeMVar (CTMVar mvid v) k) = do old <- takeMVar v case old of Nothing -> putMVar v old >> pure (Just k0, ids, Blocked mvid) Just a -> putMVar v Nothing >> pure (Just (k a), ids, Updated mvid) runStepThread ids (Stop ma) = do ma pure (Nothing, ids, Killed) primFork :: MonadConc m => Bool -> K.Cont (Action m) () -> m (Thread m) primFork isOS ma = forkThread isOS (K.runCont ma (\_ -> Stop (pure ())))

This looks pretty horrible, but each case is fairly small, so just look at those.

Now we can run it (with a random scheduler for fun) and see that it works:

test :: MonadConc m => m Int test = do a <- newEmptyMVar b <- newMVar 2 c <- newMVar 3 forkOS (putMVar a b) forkOS (putMVar a c) forkOS (takeMVar b >> putMVar b 14) forkOS (takeMVar c >> putMVar c 15) takeMVar =<< takeMVar a randomSched :: Scheduler R.StdGen randomSched (t:|ts) g = let (i, g') = R.randomR (0, length ts) g in ((t:ts) !! i, g') main :: IO () main = do g <- R.newStdGen print . fst =<< minifu randomSched g test

Giving:

λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 2 λ> main Just 14 λ> main Just 14 λ> main Just 15 λ> main Just 15

That wasn’t so bad!

Mini Fu is much smaller than Deja Fu, but it demonstrates the key concepts. To get a multithreaded runtime into dejafu, I think the main change to this stuff is to figure out how thread communication is going to work: in dejafu proper, actions can change the continuation of an arbitrary thread (eg, throwing an exception to a thread will call its exception handler).

The overhead of this method compared to the single-threaded approach must be measured. It would be great to support bound threads, but not at the cost of everything else becoming much worse! If the overhead is bad, perhaps a hybrid approach could be used: unbound threads in the program-under-test are executed as they are currently, whereas bound threads get the fancy multithreaded implementation. It would complicate things, but possibly eliminate the overhead in the common case.

Finally, when the main thread terminates, any still-running ones should terminate as well, so the `Thread`

record will need to contain the `ThreadId m`

of the underlying monad, so `killThread`

can be used.

As with the other CoCo memos, this is a Literate Haskell file. In the rendered output, highlighted source is literate source, non-highlighted source is just commentary.

We’re going to re-use the `Typeable`

machinery as much as possible, as implementing it all ourself is nasty.

{-# LANGUAGE DataKinds #-} {-# LANGUAGE PolyKinds #-} import Data.Function (on) import Data.List (nub) import Data.Maybe (fromMaybe, isJust) import Data.Proxy (Proxy(..)) import Data.Typeable

Firstly, we need a representation of type variables. We can’t use actual type variables, because `Typeable`

only works with monomorphic types. So we’ll need to introduce some specific types that we shall treat as variables during type-checking. We can get an arbitrary number of these by using a polymorphic type:

data TyVar t = TyVar deriving (Bounded, Enum, Eq, Ord, Read, Show)

For convenience, we shall also have four named type variables:

type A = TyVar 0 type B = TyVar 1 type C = TyVar 2 type D = TyVar 3

If any type needs more than four distinct type variables, it can always introduce its own.

Finally, we can check if a type is a type variable:

isTypeVar :: TypeRep -> Bool isTypeVar = ((==) `on` (fst . splitTyConApp)) (typeRep (Proxy :: Proxy A))

The Data.Typeable module has a handy little function for checking that a function application is well-typed:

funResultTy :: TypeRep -> TypeRep -> Maybe TypeRep

which takes the function type, the argument type, and returns the result type if the application is well-typed. We want something similar, but supporting our type variables. We don’t just want equality, want *unification*. We want a function like so:

-- | Applies a type to a function type. If type-correct, returns an environment binding type -- variables to types, and the result type (with bindings applied). polyFunResultTy :: TypeRep -> TypeRep -> Maybe (TypeEnv, TypeRep) polyFunResultTy fty aty = do -- get the function and argument types let funTyCon = typeRepTyCon (typeRep (Proxy :: Proxy (() -> ()))) (argTy, resultTy) <- case splitTyConApp fty of (con, [argTy, resultTy]) | con == funTyCon -> Just (argTy, resultTy) _ -> Nothing -- produce a (possibly empty) type environment env <- unify aty argTy -- apply the type environment to the result pure (env, applyBindings env resultTy)

**Type Environments:** Firstly, let’s define what a type environment actually is. We’ll keep it very simple: just a map from types to types:

-- | An environment of type bindings. type TypeEnv = [(TypeRep, TypeRep)]

The first in every tuple will be a `TyVar`

, but there isn’t really a good way to statically enforce that.

Now that we have type environments, let’s apply them to a type. This need not make a type fully monomorphic, the bindings may not cover every type variable used, or may define some type variables in terms of others (as long as a variable isn’t defined in terms of itself). This is fine.

-- | Apply type environment bindings to a type. applyBindings :: TypeEnv -> TypeRep -> TypeRep applyBindings env = go where go ty -- look up type variables in the environment, but fall-back to the naked variable if not found | isTypeVar ty = fromMaybe ty (lookup ty env) -- otherwise continue recursively through type constructors | otherwise = let (con, args) = splitTyConApp ty in mkTyConApp con (map go args)

**Unification:** For reasons that will become apparent later, we’re going to define a few variants of this.

Firstly, our standard unification function. It’ll take two types (the order doesn’t matter) and attempt to unify them. Two types unify if:

- They’re structurally equal; OR
- At least one is a type variable; OR
- They have the same constructor with the same number of arguments, and all the arguments unify, with compatible environments.

Hey, that sounds like a recursive function!

-- | Attempt to unify two types. unify :: TypeRep -> TypeRep -> Maybe [(TypeRep, TypeRep)] unify = unify' True

This next function is the actual workhorse. It implements the recursive decision procedure described above and constructs the environment. It takes a flag to determine if unifying with a naked type variable is allowed here. It always is in the recursive case. This will turn out to be useful in the next section, when we’re talking about polymorphic uses of the state type.

-- | Attempt to unify two types. unify' :: Bool -- ^ Whether to allow either type to be a naked type variable at this level (always true in -- lower levels). -> TypeRep -> TypeRep -> Maybe TypeEnv unify' b tyA tyB -- check equality | tyA == tyB = Just [] -- check if one is a naked type variable | isTypeVar tyA = if not b || occurs tyA tyB then Nothing else Just [(tyA, tyB)] | isTypeVar tyB = if not b || occurs tyB tyA then Nothing else Just [(tyB, tyA)] -- deconstruct each and attempt to unify subcomponents | otherwise = let (conA, argsA) = splitTyConApp tyA (conB, argsB) = splitTyConApp tyB in if conA == conB && length argsA == length argsB then unifyAccum True id argsA argsB else Nothing where -- check if a type occurs in another occurs needle haystack = needle == haystack || any (occurs needle) (snd (splitTyConApp haystack))

The recursive listy case is handled by this `unifyAccum`

function, which is mutually recursive with `unify'`

:

-- | An accumulating unify: attempts to unify two lists of types pairwise and checks that the -- resulting assignments do not conflict with the current type environment. unifyAccum :: Bool -> (Maybe TypeEnv -> Maybe TypeEnv) -> [TypeRep] -> [TypeRep] -> Maybe TypeEnv unifyAccum b f as bs = foldr go (Just []) (zip as bs) where go (tyA, tyB) (Just env) = unifyTypeEnvs b env =<< f (unify' b tyA tyB) go _ Nothing = Nothing

The final piece of the unification puzzle is how to combine type environments. This is necessary to be able to unify types like `T A A`

and `T Int B`

. One option is to enforce equality of bindings, but that is too restrictive (it won’t work in the `T`

example, as `Int`

is not `B`

, yet both *do* unify). The correct solution is to unify the bindings. This is yet another mutually recursive function:

-- | Unify two type environments, if possible. unifyTypeEnvs :: Bool -> TypeEnv -> TypeEnv -> Maybe TypeEnv unifyTypeEnvs b env1 env2 = foldr go (Just []) (nub $ map fst env1 ++ map fst env2) where go tyvar acc@(Just env) = case (lookup tyvar env, lookup tyvar env1, lookup tyvar env2) of (_, Just ty1, Just ty2) -> unifyTypeEnvs b env . ((tyvar, ty1):) =<< unify' b ty1 ty2 (x, Just ty1, _) | isJust x -> unifyTypeEnvs b env [(tyvar, ty1)] | otherwise -> Just ((tyvar, ty1):env) (x, _, Just ty2) | isJust x -> unifyTypeEnvs b env [(tyvar, ty2)] | otherwise -> Just ((tyvar, ty2):env) _ -> acc go _ Nothing = Nothing

And now, an example:

λ> data T a b = T λ> unify (typeOf (undefined :: T A A)) (typeOf (undefined :: T Int B)) Just [(TyVar Nat 0,TyVar Nat 1),(TyVar Nat 1,Int)] λ> let funTy = typeOf (undefined :: A -> B -> Bool -> Either A B) λ> polyFunResultTy funTy (typeOf (undefined::Int)) Just ([(TyVar Nat 0,Int)],TyVar Nat 1 -> Bool -> Either Int (TyVar Nat 1))

In a CoCo signature, the state type is monomorphic, but it can be nice to treat it as polymorphic for two reasons:

- Needing to change the types inside the signature because the type
*of the signature*changed is a pain. - It helps avoid repetition.

Here are a couple of function types that may appear in a signature:

MVar Concurrency Int -> Concurrency Int MVar Concurrency Int -> Int -> Concurrency ()

If the state type is `MVar Concurrency Int`

, then (1) is saying that if we change it to `MVar Concurrency Bool`

we now need to change those two types above, and (2) is saying that we’re needlessly repeating the `Int`

. It would be much nicer to have these types in the signature:

MVar Concurrency A -> Concurrency A MVar Concurrency A -> A -> Concurrency ()

Now that we have implemented type unification, we can do this! For a function type in the signature:

- Try to unify every argument,
*excluding naked type variables*, against the state type. - Check that the environments are compatible.
- Apply the combined environment to the function type.

This effect of this is to monomorphise polymorphic uses of the state type. This is good because the state type often determines other argument types, and once we know the concrete types of function arguments we can infer what hole types we need. If functions have totally polymorphic types, hole inference doesn’t work so well.

-- | Monomorphise polymorphic uses of the state type in a function type. monomorphise :: Typeable s => Proxy s -- ^ The state type. -> TypeRep -- ^ The function type. -> TypeRep monomorphise s ty0 = fromMaybe ty0 $ do let stateTy = typeRep s argTys <- funArgTys ty0 env <- unifyAccum False (maybe (Just []) Just) (repeat stateTy) argTys pure (applyBindings env ty0)

Now we see the purpose of the boolean argument to `unify'`

. If we have the type `A -> MVar Concurrency A -> Concurrency ()`

, we don’t want to unify `MVar Concurrency Int`

with `A`

, we want to skip that over! More generally, we want to avoid unifying a fully-polymorphic type with our state type but, as all our type variables have kind `*`

, just preventing the top-level unification suffices.

Oh, we’ll also need this helper function to get all the argument types of a function:

-- | Get all of a function's argument types. Returns @Nothing@ if not a function type. funArgTys :: TypeRep -> Maybe [TypeRep] funArgTys ty = case splitTyConApp ty of (con, [argTy, resultTy]) | con == funTyCon -> Just $ argTy : fromMaybe [] (funArgTys resultTy) _ -> Nothing where funTyCon = typeRepTyCon (typeRep (Proxy :: Proxy (() -> ())))

And we’re done!

λ> let s = Proxy :: Proxy (Either Int Bool) λ> let f = typeOf (undefined :: A -> Either A B -> B) λ> monomorphise s f Int -> Either Int Bool -> Bool]]>

The problem is: given two terms (from different schemas) and their results, how do we compare them? For instance, consider we have two terms:

term1 = f (x :: Int) (y :: Int) (z :: Bool) term2 = g (x :: Int) (y :: Bool) (z :: Bool)

Which fulfil the property that when the `z`

in `term1`

is equal to the `y`

in `term2`

, the terms have the same result. Each term introduces its own environment variable namespace, so what we want is a projection into a shared namespace, in which we can reason about the equalities we desire.

We can simplify the problem a little by not changing the number of unique variables inside a term, only restricting ourselves to identifying (or not) variables across terms. Here are some possible projections of those terms:

f x1 y1 z1 =?= g x1 y2 z2 f x1 y1 z1 =?= g x2 z1 z2 f x1 y1 z1 =?= g x2 y2 z1 f x1 y1 z1 =?= g x2 y1 z2 and so on

So we want to produce a *set* of all such type-correct projections.

This memo is a literate Haskell file (though confusingly I mix source with examples, which are NOT syntax-highlighted), and we’ll need this:

```
{-# LANGUAGE LambdaCase #-}
```

`These`

typeI recently read a blog post entitled These, Align, and Crosswalk about safely zipping (or merging) data structures. The core of the blog post is the `These`

type, defined as so:

-- | The @These@ type is like 'Either', but also has the case for when we have both values. data These a b = This a | That b | These a b deriving Show

I don’t *think* I need anything else from the these package, so I’ll just inline that definition there. It turns out to be exactly the thing we need!

Firstly, we need a representation of terms, from which we can extract variables. We don’t actually need anything other than variable names and types, so let’s just go for that:

type Name = String type Type = Int type Term = [(Name, Type)]

Our challenge is to find two functions:

projections :: Term -> Term -> [[(These Name Name, Type)]] renaming :: (Type -> Name) -> [(These Name Name, Type)] -> ([(Name, Name)], [(Name, Name)])

Where a `These String String`

value represents a type-correct renaming of variables:

`This s`

means a variable from the left term is kept distinct from all variables in the right term.`That s`

means a variable from the right term is kept distinct from all variables in the left term.`These s1 s2`

means a variable from the left term is identified with a variable from the right term.

-- | Find all type-correct ways of associating variables. projections :: Term -> Term -> [[(These Name Name, Type)]] projections t1 [] = [[(This v, ty) | (v, ty) <- t1]] projections [] t2 = [[(That v, ty) | (v, ty) <- t2]] projections ((vL, tyL):t1) t2 = concat [map ((These vL vR, tyL) :) (projections t1 (filter (/=x) t2)) | x@(vR, tyR) <- t2, tyL == tyR] ++ map ((This vL, tyL) :) (projections t1 t2)

By appending the `concat`

to the `map`

we can instead generate in order from most general (most `This`

/`That`

usage) to least general (most `These`

usage).

Now that we have projections, we can produce consistent renamings:

-- | Given a projection into a common namespace, produce a consistent variable renaming. Variables -- of the same type, after the first, will have a number appended starting from 1. renaming :: (Type -> Name) -> [(These Name Name, Type)] -> ([(Name, Name)], [(Name, Name)]) renaming varf = go [] ([], []) where go e x ((these, ty):rest) = let name = varf ty in rename e x name (maybe 0 (+1) $ lookup name e) these rest go e x [] = x rename e ~(l, r) name n = let name' = if n == 0 then name else name ++ show n in \case This vL -> go ((name, n):e) ((vL, name'):l, r) That vR -> go ((name, n):e) (l, (vR, name'):r) These vL vR -> go ((name, n):e) ((vL, name'):l, (vR, name'):r)

The two steps can be combined:

-- | Find all consistent renamings of a pair of terms. renamings :: (Type -> Name) -> Term -> Term -> [([(Name, Name)], [(Name, Name)])] renamings varf t1 t2 = map (renaming varf) (projections t1 t2)

My only regret is that I found no use for the fancier functions in the these package.

]]>This memo is about exhaustively generating schemas. Let’s go!

{-# LANGUAGE KindSignatures #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE StandaloneDeriving #-} import Control.Monad (filterM) import Data.Function (on) import Data.IntMap (IntMap) import qualified Data.IntMap as M import Data.Maybe (maybeToList) import Data.Proxy (Proxy(..)) import Data.Semigroup (Semigroup, (<>)) import Data.Set (Set) import qualified Data.Set as S import Data.Typeable (Typeable, TypeRep, funResultTy, splitTyConApp, typeRep) import GHC.Exts (Any) import Unsafe.Coerce (unsafeCoerce)

To concretely tie down what we’re doing, we’re going to generate expressions in *size* order. Expression size here corresponds roughly to the number of nodes in the tree:

sizeOf :: Exp m h -> Int sizeOf (Lit _ _) = 1 sizeOf (Var _ _) = 1 sizeOf (Bind _ b e) = 1 + sizeOf b + sizeOf e sizeOf (Let _ b e) = 1 + sizeOf b + sizeOf e sizeOf (Ap _ f e) = sizeOf f + sizeOf e

In order to avoid duplicates, we’re going to want sets of expressions. We’re going to cheat a little because we can’t actually compare `BDynamic`

values, so we’ll just compare their types, and hope that that plus the string representations in `Lit`

will be enough to disambiguate.

instance Eq BDynamic where (==) = (==) `on` bdynTypeRep instance Ord BDynamic where compare = compare `on` bdynTypeRep deriving instance Eq h => Eq (Var h) deriving instance Ord h => Ord (Var h) deriving instance Eq h => Eq (Exp m h) deriving instance Ord h => Ord (Exp m h)

We’re going to keep quite a simple interface for our schema generator, we shall have:

- A type of generators, which is a map from size to generated schemas of that size.
- A function to get all the schemas of a size.
- A function to make a new generator from a collection of “primitive” schemas.
- A function to generate a size, assuming all smaller sizes have been generated.

-- | A generator of schemas, in size order. newtype Generator1 m = Generator1 { tiers1 :: IntMap (Set (Schema m)) } -- | Get all schemas of the given size, if generated. schemas1 :: Generator1 m -> Int -> Set (Schema m) schemas1 g i = M.findWithDefault (S.empty) i (tiers1 g)

**Creation**: to make a new generator, we’ll just plug the provided schemas into the appropriate tiers.

-- | Create a new generator from a set of initial schemas. create1 :: [Schema m] -> Generator1 m create1 initial = Generator1 $ M.unionsWith S.union [M.singleton (sizeOf s) (S.singleton s) | s <- initial]

**Generation**: now we see the benefit of all the smart `Maybe`

-returning constructors: we can do the incredibly naive thing of just trying all correctly-sized combinations of already-known schemas. The ones which return a `Just`

are good and shall be kept.

-- | Generate schemas of the given size, assuming all smaller tiers have been generated. generate1 :: (Applicative m, Typeable m) => Int -> Generator1 m -> Generator1 m generate1 i g = Generator1 $ M.unionsWith S.union [ tiers1 g , M.singleton i aps , M.singleton i binds , M.singleton i lets ] where -- sizeOf (ap f e) = 0 + sizeOf f + sizeOf e aps = makeTerms 0 $ \terms candidates -> [ new | f <- terms , e <- candidates , new <- maybeToList (ap f e) ] -- sizeOf (bind is b e) = 1 + sizeOf b + sizeOf e binds = makeTerms 1 $ \terms candidates -> [ new | b <- terms , e <- candidates , holeset <- powerset . map fst $ holes e , new <- maybeToList (bind holeset b e) ] -- sizeOf (let_ is b e) = 1 + sizeOf b + sizeOf e lets = makeTerms 1 $ \terms candidates -> [ new | b <- terms , e <- candidates , holeset <- powerset . map fst $ holes e , new <- maybeToList (let_ holeset b e) ] makeTerms n f = M.foldMapWithKey go (tiers1 g) where go tier terms = S.fromList $ let candidates = schemas1 g (i - tier - n) in f (S.toList terms) (S.toList candidates) powerset = filterM (const [False,True])

Here’s a small demo:

demo1 :: Generator1 IO demo1 = create1 [ hole $ typeRep (Proxy :: Proxy Int) , lit "0" . toBDynamic $ (0 :: Int) , lit "1" . toBDynamic $ (1 :: Int) , lit "+" . toBDynamic $ ((+) :: Int -> Int -> Int) , lit "*" . toBDynamic $ ((*) :: Int -> Int -> Int) , lit "pure" . toBDynamic $ (pure :: Int -> IO Int) ] λ> let upto n = mapM_ print . S.toList $ schemas1 (foldl (flip generate1) demo1 [1..n]) n λ> upto 1 * + 0 1 pure (_ :: Int) λ> upto 2 (pure) (0) (pure) (1) (pure) ((_ :: Int)) (*) (0) (*) (1) (*) ((_ :: Int)) (+) (0) (+) (1) (+) ((_ :: Int)) λ> upto 3 let <*> in <0> let <*> in <1> let <*> in <(_ :: Int)> let <+> in <0> let <+> in <1> let <+> in <(_ :: Int)> let <0> in <0> let <0> in <1> let <0> in <(_ :: Int)> let <0> in <(b0 :: Int)> let <1> in <0> let <1> in <1> let <1> in <(_ :: Int)> let <1> in <(b0 :: Int)> let <pure> in <0> let <pure> in <1> let <pure> in <(_ :: Int)> let <(_ :: Int)> in <0> let <(_ :: Int)> in <1> let <(_ :: Int)> in <(_ :: Int)> let <(_ :: Int)> in <(b0 :: Int)> let <*> in <*> let <*> in <+> let <+> in <*> let <+> in <+> let <0> in <*> let <0> in <+> let <1> in <*> let <1> in <+> let <pure> in <*> let <pure> in <+> let <(_ :: Int)> in <*> let <(_ :: Int)> in <+> let <*> in <pure> let <+> in <pure> let <0> in <pure> let <1> in <pure> let <pure> in <pure> let <(_ :: Int)> in <pure> ((*) (0)) (0) ((*) (0)) (1) ((*) (0)) ((_ :: Int)) ((*) (1)) (0) ((*) (1)) (1) ((*) (1)) ((_ :: Int)) ((*) ((_ :: Int))) (0) ((*) ((_ :: Int))) (1) ((*) ((_ :: Int))) ((_ :: Int)) ((+) (0)) (0) ((+) (0)) (1) ((+) (0)) ((_ :: Int)) ((+) (1)) (0) ((+) (1)) (1) ((+) (1)) ((_ :: Int)) ((+) ((_ :: Int))) (0) ((+) ((_ :: Int))) (1) ((+) ((_ :: Int))) ((_ :: Int))

The current generator is nice and simple, but produces some uninteresting terms:

- Let-bindings where the body has no holes bound:
`let <1> in <pure>`

- Let- and monadic- bindings where the binder is a hole:
`let <(_ :: Int)> in <(b0 :: Int)>`

Note that we do want to keep monadic bindings where the body has no holes bound, as the monadic bind may cause an interesting effect.

Furthermore, we may want to store additional information with the generated schemas, which we can use to prune generation further, and record information for further use.

-- | A generator of schemas with metadata, in size order. newtype Generator2 m ann = Generator2 { tiers2 :: IntMap (Set (Schema m, ann)) }

Generation is now a bit more involved:

-- | Generate schemas of the given size, assuming all smaller tiers have been generated. generate2 :: (Applicative m, Typeable m, Semigroup ann, Ord ann) => (ann -> ann -> Schema m -> Bool) -- ^ A predicate to filter generated schemas. -> Int -> Generator2 m ann -> Generator2 m ann generate2 annp i g = Generator2 $ M.unionsWith S.union [ tiers2 g , M.singleton i aps , M.singleton i binds , M.singleton i lets ] where aps = makeTerms 0 $ \terms candidates -> [ (new, fAnn <> eAnn) -- produce a new annotation by combining the old | (f, fAnn) <- terms , (e, eAnn) <- candidates , new <- maybeToList (ap f e) -- check the new expression and old annotations against the predicate , annp fAnn eAnn new ] binds = makeTerms 1 $ \terms candidates -> [ (new, bAnn <> eAnn) -- produce a new annotation by combining the old | (b, bAnn) <- terms -- don't allow a binder which is a hole , case b of Var _ (Hole _) -> False; _ -> True , (e, eAnn) <- candidates , holeset <- powerset . map fst $ holes e , new <- maybeToList (bind holeset b e) -- check the new expression and old annotations against the predicate , annp bAnn eAnn new ] lets = makeTerms 1 $ \terms candidates -> [ (new, bAnn <> eAnn) -- produce a new annotation by combining the old | (b, bAnn) <- terms -- don't allow a binder which is a hole , case b of Var _ (Hole _) -> False; _ -> True , (e, eAnn) <- candidates , holeset <- powerset . map fst $ holes e -- don't allow an empty holeset , not (null holeset) , new <- maybeToList (let_ holeset b e) -- check the new expression and old annotations against the predicate , annp bAnn eAnn new ] makeTerms n f = M.foldMapWithKey go (tiers2 g) where go tier terms = S.fromList $ let candidates = schemas2 g (i - tier - n) in f (S.toList terms) (S.toList candidates) powerset = filterM (const [False,True])

The `schemas`

and `create`

code are basically the same:

-- | Get all schemas of the given size, if generated. schemas2 :: Generator2 m ann -> Int -> Set (Schema m, ann) schemas2 g i = M.findWithDefault (S.empty) i (tiers2 g) -- | Create a new generator from a set of initial schemas. create2 :: Ord ann => [(Schema m, ann)] -> Generator2 m ann create2 initial = Generator2 $ M.unionsWith S.union [M.singleton (sizeOf e) (S.singleton s) | s@(e,_) <- initial]

Our demo now looks much better:

demo2 :: Generator2 IO () demo2 = create2 $ map (\e -> (e, ())) [ hole $ typeRep (Proxy :: Proxy Int) , lit "0" . toBDynamic $ (0 :: Int) , lit "1" . toBDynamic $ (1 :: Int) , lit "+" . toBDynamic $ ((+) :: Int -> Int -> Int) , lit "*" . toBDynamic $ ((*) :: Int -> Int -> Int) , lit "pure" . toBDynamic $ (pure :: Int -> IO Int) ] λ> let upto n = mapM_ print . S.toList $ schemas2 (foldl (flip $ generate2 \_ _ _ -> True) demo2 [1..n]) n λ> upto 1 ((*,()) (+,()) (0,()) (1,()) (pure,()) ((_ :: Int),()) λ> upto 2 ((pure) (0),()) ((pure) (1),()) ((pure) ((_ :: Int)),()) ((*) (0),()) ((*) (1),()) ((*) ((_ :: Int)),()) ((+) (0),()) ((+) (1),()) ((+) ((_ :: Int)),()) λ> upto 3 (let <0> in <(b0 :: Int)>,()) (let <1> in <(b0 :: Int)>,()) (((*) (0)) (0),()) (((*) (0)) (1),()) (((*) (0)) ((_ :: Int)),()) (((*) (1)) (0),()) (((*) (1)) (1),()) (((*) (1)) ((_ :: Int)),()) (((*) ((_ :: Int))) (0),()) (((*) ((_ :: Int))) (1),()) (((*) ((_ :: Int))) ((_ :: Int)),()) (((+) (0)) (0),()) (((+) (0)) (1),()) (((+) (0)) ((_ :: Int)),()) (((+) (1)) (0),()) (((+) (1)) (1),()) (((+) (1)) ((_ :: Int)),()) (((+) ((_ :: Int))) (0),()) (((+) ((_ :: Int))) (1),()) (((+) ((_ :: Int))) ((_ :: Int)),())

The Mark 3-ig expression types and smart constructors:

data BDynamic = BDynamic { bdynTypeRep :: TypeRep, bdynAny :: Any } toBDynamic :: forall a. Typeable a => a -> BDynamic toBDynamic a = BDynamic (typeRep (Proxy :: Proxy a)) (unsafeCoerce a) data Exp (m :: * -> *) (h :: *) = Lit String BDynamic | Var TypeRep (Var h) | Bind TypeRep (Exp m h) (Exp m h) | Let TypeRep (Exp m h) (Exp m h) | Ap TypeRep (Exp m h) (Exp m h) instance Show (Exp m h) where show (Lit s _) = s show (Var ty v) = "(" ++ show v ++ " :: " ++ show ty ++ ")" show (Bind _ b e) = "bind <" ++ show b ++ "> in <" ++ show e ++ ">" show (Let _ b e) = "let <" ++ show b ++ "> in <" ++ show e ++ ">" show (Ap _ f e) = "(" ++ show f ++ ") (" ++ show e ++ ")" data Var h = Hole h | Named String | Bound Int instance Show (Var h) where show (Hole _) = "_" show (Named s) = s show (Bound i) = 'b' : show i type Schema m = Exp m () data Ignore = Ignore deriving (Bounded, Enum, Eq, Ord, Read, Show) typeOf :: Exp m h -> TypeRep typeOf (Lit _ dyn) = bdynTypeRep dyn typeOf (Var ty _) = ty typeOf (Bind ty _ _) = ty typeOf (Let ty _ _) = ty typeOf (Ap ty _ _) = ty lit :: String -> BDynamic -> Exp m h lit = Lit hole :: TypeRep -> Schema m hole ty = Var ty (Hole ()) let_ :: [Int] -> Schema m -> Schema m -> Maybe (Schema m) let_ is b e0 = Let (typeOf e0) b <$> letOrBind is (typeOf b) e0 bind :: forall m. Typeable m => [Int] -> Schema m -> Schema m -> Maybe (Schema m) bind is b e0 = case (splitTyConApp (typeOf b), splitTyConApp (typeOf e0)) of ((btyCon, btyArgs), (etyCon, etyArgs)) | btyCon == mtyCon && btyCon == etyCon && not (null btyArgs) && not (null etyArgs) && mtyArgs == init btyArgs && init btyArgs == init etyArgs -> Bind (typeOf e0) b <$> letOrBind is (last btyArgs) e0 _ -> Nothing where (mtyCon, mtyArgs) = splitTyConApp (typeRep (Proxy :: Proxy m)) ap :: forall m h. (Applicative m, Typeable m) => Exp m h -> Exp m h -> Maybe (Exp m h) ap f e = case (splitTyConApp (typeOf f), splitTyConApp (typeOf e)) of ((_, [fargTy,fresTy]), (etyCon, etyArgs)) | fargTy == ignoreTy && etyCon == mtyCon && not (null etyArgs) && mtyArgs == init etyArgs -> Just (Ap fresTy f e) | otherwise -> (\ty -> Ap ty f e) <$> typeOf f `funResultTy` typeOf e _ -> Nothing where ignoreTy = typeRep (Proxy :: Proxy (m Ignore)) (mtyCon, mtyArgs) = splitTyConApp (typeRep (Proxy :: Proxy m)) letOrBind :: [Int] -> TypeRep -> Exp m h -> Maybe (Exp m h) letOrBind is boundTy e0 = fst <$> go 0 0 e0 where go n i (Var ty (Hole h)) | i `elem` is = if boundTy == ty then Just (Var ty (Bound n), i + 1) else Nothing | otherwise = Just (Var ty (Hole h), i + 1) go n i (Bind ty b e) = do (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Bind ty b' e', i'') go n i (Let ty b e) = do (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Let ty b' e', i'') go n i (Ap ty f e) = do (f', i') <- go n i f (e', i'') <- go n i' e Just (Ap ty f' e', i'') go _ i e = Just (e, i) holes :: Schema m -> [(Int, TypeRep)] holes = fst . go 0 where go i (Var ty (Hole _)) = ([(i, ty)], i + 1) go i (Let _ b e) = let (bhs, i') = go i b (ehs, i'') = go i' e in (bhs ++ ehs, i'') go i (Ap _ f e) = let (fhs, i') = go i f (ehs, i'') = go i' e in (fhs ++ ehs, i'') go i _ = ([], i)]]>

This is a simplified version of a problem I’ve been having in CoCo. The current handling of variables in CoCo is very poor, the programmer has to specify exactly which variables may be introduced, binds and lets cause shadowing, and no care is taken to avoid generating alpha equivalent terms.

Here are two representative problems:

If you have the variable

`x`

, and no others, a term like this will not be generated:`f >>= \x1 -> g x1 x`

If you have the variables

`x`

and`y`

of the same type, these equivalent terms will be generated:`f x`

and`f y`

Let’s get started…

{-# LANGUAGE KindSignatures #-} {-# LANGUAGE ScopedTypeVariables #-} import Data.Dynamic (Dynamic, dynApply, dynTypeRep) import Data.Function (on) import Data.List (groupBy, nub, sortOn) import Data.Maybe (mapMaybe, maybeToList) import Data.Ord (Down(..)) import Data.Proxy (Proxy(..)) import Data.Typeable (Typeable, TypeRep, funResultTy, splitTyConApp, typeOf, typeRep) import Data.Void (Void, absurd) import GHC.Exts (Any) import Unsafe.Coerce (unsafeCoerce)

Our expressions are representations of Haskell code, which makes this a bit unlike most toy expression evaluators you see in tutorials. We want everything to be very typed, and not expose the constructors of the expression data type, using smart constructors to ensure that only well-typed expressions can be created.

It’s this typing that makes it really difficult to use the bound library here, as bound doesn’t play so nicely with typed variables. You can infer types for variables later, but I don’t really want to implement that.

-- | A type for expressions. This would not be exported by actual library code, to prevent the user -- from mucking around with the types. data Exp1 = Lit1 String Dynamic -- ^ Literal values are just dynamically-wrapped Haskell values. | Hole1 TypeRep -- ^ "Holes" are free variables, identified by their position in the tree. | Named1 TypeRep String -- ^ Variables which are bound by a provided environment. More on this later. | Bound1 TypeRep Int -- ^ Variables which are bound by a let, with de Bruijn indices. All variables bound to the same -- let must have the correct type. | Let1 TypeRep Exp1 Exp1 -- ^ The binder is assumed to be of the correct type for all the variables bound to it. | Ap1 TypeRep Exp1 Exp1 -- ^ The parameter is assumed to be of the correct type for the function. instance Show Exp1 where show (Lit1 s _) = s show (Hole1 ty) = "(_ :: " ++ show ty ++ ")" show (Named1 ty s) = "(" ++ s ++ " :: " ++ show ty ++ ")" show (Bound1 ty i) = "(" ++ show i ++ " :: " ++ show ty ++ ")" show (Let1 _ b e) = "let <" ++ show b ++ "> in <" ++ show e ++ ">" show (Ap1 _ f e) = "(" ++ show f ++ ") (" ++ show e ++ ")" -- | Get the type of an expression. typeOf1 :: Exp1 -> TypeRep typeOf1 (Lit1 _ dyn) = dynTypeRep dyn typeOf1 (Hole1 ty) = ty typeOf1 (Named1 ty _) = ty typeOf1 (Bound1 ty _) = ty typeOf1 (Let1 ty _ _) = ty typeOf1 (Ap1 ty _ _) = ty -- | Get all the holes in an expression, identified by position. holes1 :: Exp1 -> [(Int, TypeRep)] holes1 = fst . go 0 where go i (Hole1 ty) = ([(i, ty)], i + 1) go i (Let1 _ b e) = let (bhs, i') = go i b (ehs, i'') = go i' e in (bhs ++ ehs, i'') go i (Ap1 _ f e) = let (fhs, i') = go i f (ehs, i'') = go i' e in (fhs ++ ehs, i'') go i _ = ([], i) -- | Get all the named variables in an expression. names1 :: Exp1 -> [(String, TypeRep)] names1 = nub . go where go (Named1 ty s) = [(s, ty)] go (Let1 _ b e) = go b ++ go e go (Ap1 _ f e) = go f ++ go e go _ = []

**Smart constructors**: we only want to be able to produce type-correct expressions, for two reasons: the evaluator in CoCo is more complex and does a lot of unsafe coercion, and being able to just call `error`

when the types don’t work out is nicer than needing to actually handle it; and it makes it easier to generate terms programmatically, as you simply try all possibilities and keep the ones which succeed.

-- | Construct a literal value. lit1 :: String -> Dynamic -> Exp1 lit1 = Lit1 -- | Construct a typed hole. hole1 :: TypeRep -> Exp1 hole1 = Hole1 -- | Perform a function application, if type-correct. ap1 :: Exp1 -> Exp1 -> Maybe Exp1 ap1 f e = (\ty -> Ap1 ty f e) <$> typeOf1 f `funResultTy` typeOf1 e -- | Bind a collection of holes, if type-correct. -- -- The binding is applied "atomically", in that you don't need to worry about holes disappearing and -- so changing their position-indices while this operation happens; however the position of unbound -- holes may be altered in the result of this function. let1 :: [Int] -> Exp1 -> Exp1 -> Maybe Exp1 let1 is b e0 = Let1 (typeOf1 e0) b . fst <$> go 0 0 e0 where go n i (Hole1 ty) | i `elem` is = if typeOf1 b == ty then Just (Bound1 ty n, i + 1) else Nothing | otherwise = Just (Hole1 ty, i + 1) go n i (Let1 ty b e) = do (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Let1 ty b' e', i'') go n i (Ap1 ty f e) = do (f', i') <- go n i f (e', i'') <- go n i' e Just (Ap1 ty f' e', i'') go _ i e = Just (e, i) -- | Give names to holes, if type-correct. -- -- This has the same indexing behaviour as 'let1'. name1 :: [(Int, String)] -> Exp1 -> Maybe Exp1 name1 is e0 = (\(e,_,_) -> e) <$> go [] 0 e0 where go env i n@(Named1 ty s) = case lookup s env of -- if a name gets re-used it had better be at the same type! Just sty | ty == sty -> Just (n, env, i) | otherwise -> Nothing Nothing -> Just (n, env, i) go env i (Hole1 ty) = case lookup i is of Just s -> case lookup s env of Just sty | ty == sty -> Just (Named1 ty s, env, i + 1) | otherwise -> Nothing Nothing -> Just (Named1 ty s, (s,ty):env, i + 1) Nothing -> Just (Hole1 ty, env, i + 1) go env i (Let1 ty b e) = do (b', env', i') <- go env i b (e', env'', i'') <- go env' i' e Just (Let1 ty b' e', env'', i'') go env i (Ap1 ty f e) = do (f', env', i') <- go env i f (e', env'', i'') <- go env' i' e Just (Ap1 ty f' e', env'', i'') go env i e = Just (e, env, i)

**Evaluation**: now we have everything in place to evaluate expressions with no unbound holes. Everything is type-correct-by-construction, so if there are no holes (and the global environment has everything we need) we can get out a value.

-- | Evaluate an expression if it has no holes. eval1 :: [(String, Dynamic)] -> Exp1 -> Maybe Dynamic eval1 globals = go [] where -- the local environment is a list of values, with each new scope prepending a value; this means -- that the de Bruijn indices correspond with list indices go locals (Let1 _ b e) = (\dyn -> go (dyn:locals) e) =<< go locals b go locals (Bound1 _ n) = Just (locals !! n) -- named variables come from the global environment go _ (Named1 ty s) = case lookup s globals of Just dyn | dynTypeRep dyn == ty -> Just dyn _ -> Nothing -- the other operations don't care about either environment go locals (Ap1 _ f e) = do dynf <- go locals f dyne <- go locals e dynf `dynApply` dyne go _ (Lit1 _ dyn) = Just dyn go _ (Hole1 _) = Nothing

**Removing Holes**: we still have one more problem, it would be nice for holes to be given names automatically, and not just individual holes, but groups of holes too.

For example, if we have an expression like so:

f (_ :: Int) (_ :: Bool) (_ :: Bool) (_ :: Int)

It would be nice to be able to generate these expressions automatically:

f (w :: Int) (x :: Bool) (y :: Bool) (z :: Int) f (w :: Int) (x :: Bool) (y :: Bool) (w :: Int) f (w :: Int) (x :: Bool) (x :: Bool) (z :: Int) f (w :: Int) (x :: Bool) (x :: Bool) (w :: Int)

It would be particularly nice if they were generated in a list in that order, from most free to most constrained.

-- | From an expression that may have holes, generate a list of expressions with named variables -- substituted instead, ordered from most free (one hole per variable) to most constrained (multiple -- holes per variable). -- -- This takes a function to assign a letter to each type, subsequent variables of the same type have -- digits appended. terms1 :: (TypeRep -> Char) -> Exp1 -> [Exp1] terms1 nf = sortOn (Down . length . names1) . go where go e0 = case hs e0 of [] -> [e0] (chosen:_) -> concatMap go [ e | ps <- partitions chosen , let (((_,tyc):_):_) = ps , let vname i = if i == 0 then [nf tyc] else nf tyc : show i , let naming = concat $ zipWith (\i vs -> [(v, vname i) | (v,_) <- vs]) [0..] ps , e <- maybeToList (name1 naming e0) ] -- holes grouped by type hs = groupBy ((==) `on` snd) . sortOn snd . holes1 -- all partitions of a list partitions (x:xs) = [[x]:p | p <- partitions xs] ++ [(x:ys):yss | (ys:yss) <- partitions xs] partitions [] = [[]]

Here’s an example from ghci:

λ> let intHole = hole1 $ T.typeOf (5::Int) λ> let boolHole = hole1 $ T.typeOf True λ> let ibf = lit1 "f" (D.toDyn ((\_ _ _ a -> a) :: Int -> Bool -> Bool -> Int -> Int)) λ> let ibfExp = fromJust $ do { x <- ibf `ap1` intHole; y <- x `ap1` boolHole; z <- y `ap1` boolHole; z `ap1` intHole } λ> mapM_ print $ terms1 (head.show) ibfExp ((((f) ((I :: Int))) ((B :: Bool))) ((B1 :: Bool))) ((I1 :: Int)) ((((f) ((I :: Int))) ((B :: Bool))) ((B1 :: Bool))) ((I :: Int)) ((((f) ((I :: Int))) ((B :: Bool))) ((B :: Bool))) ((I1 :: Int)) ((((f) ((I :: Int))) ((B :: Bool))) ((B :: Bool))) ((I :: Int))

Pretty sweet!

What we have now is pretty good, but it leaves a little to be desired: it would be nice to be able to statically forbid passing expressions with holes to `eval`

. As always in Haskell, the solution is to add another type parameter.

data Exp2 h = Lit2 String Dynamic | Var2 TypeRep (Var2 h) -- ^ One constructor for holes, named, and bound variables. | Let2 TypeRep (Exp2 h) (Exp2 h) | Ap2 TypeRep (Exp2 h) (Exp2 h) instance Show (Exp2 h) where show (Lit2 s _) = s show (Var2 ty v) = "(" ++ show v ++ " :: " ++ show ty ++ ")" show (Let2 _ b e) = "let <" ++ show b ++ "> in <" ++ show e ++ ">" show (Ap2 _ f e) = "(" ++ show f ++ ") (" ++ show e ++ ")" data Var2 h = Hole2 h -- ^ Holes get a typed tag. | Named2 String -- ^ Environment variables. | Bound2 Int -- ^ Let-bound variables. instance Show (Var2 h) where show (Hole2 _) = "_" show (Named2 s) = s show (Bound2 i) = show i

**Schemas and Terms**: what does this hole tag buy us? Well, actually, it lets us very simply forbid the presence of holes! Constructing an `h`

value is required to construct a hole, so if we set it to `Void`

, then no holes can be made at all! If the tag is some inhabited type, then an expression may contain holes (but may not).

Let’s introduce two type synonyms to talk about these:

-- | A schema is an expression which may contain holes. A single schema may correspond to many -- terms. type Schema2 = Exp2 () -- | A term is an expression with no holes. Many terms may correspond to a single schema. type Term2 = Exp2 Void -- | Convert a Schema into a Term if there are no holes. toTerm2 :: Schema2 -> Maybe Term2 toTerm2 (Lit2 s dyn) = Just (Lit2 s dyn) toTerm2 (Var2 ty v) = case v of Hole2 _ -> Nothing Named2 s -> Just (Var2 ty (Named2 s)) Bound2 i -> Just (Var2 ty (Bound2 i)) toTerm2 (Let2 ty b e) = Let2 ty <$> toTerm2 b <*> toTerm2 e toTerm2 (Ap2 ty f e) = Ap2 ty <$> toTerm2 f <*> toTerm2 e

**Evaluation & Hole Removal**: now we can evaluate *terms* after removing holes from *schemas*. Statically-checked guarantees that we’re dealing with all of our holes properly, nice!

-- | Evaluate a term eval2 :: [(String, Dynamic)] -> Term2 -> Maybe Dynamic eval2 globals = go [] where go locals (Let2 _ b e) = (\dyn -> go (dyn:locals) e) =<< go locals b go locals v@(Var2 _ _) = env locals v go locals (Ap2 _ f e) = do dynf <- go locals f dyne <- go locals e dynf `dynApply` dyne go _ (Lit2 _ dyn) = Just dyn env locals (Var2 _ (Bound2 n)) | length locals > n = Just (locals !! n) | otherwise = Nothing env _ (Var2 ty (Named2 s)) = case lookup s globals of Just dyn | dynTypeRep dyn == ty -> Just dyn _ -> Nothing env _ (Var2 _ (Hole2 v)) = absurd v -- this is actually unreachable now -- | From a schema that may have holes, generate a list of terms with named variables -- substituted instead. terms2 :: (TypeRep -> Char) -> Schema2 -> [Term2] terms2 nf = mapMaybe toTerm2 . sortOn (Down . length . names2) . go where go e0 = case hs e0 of [] -> [e0] (chosen:_) -> concatMap go [ e | ps <- partitions chosen , let (((_,tyc):_):_) = ps , let vname i = if i == 0 then [nf tyc] else nf tyc : show i , let naming = concat $ zipWith (\i vs -> [(v, vname i) | (v,_) <- vs]) [0..] ps , e <- maybeToList (name2 naming e0) ] -- holes grouped by type hs = groupBy ((==) `on` snd) . sortOn snd . holes2 -- all partitions of a list partitions (x:xs) = [[x]:p | p <- partitions xs] ++ [(x:ys):yss | (ys:yss) <- partitions xs] partitions [] = [[]]

The rest of the code hasn’t changed much, but is included for completeness:

-- | Get the type of an expression. typeOf2 :: Exp2 h -> TypeRep typeOf2 (Lit2 _ dyn) = dynTypeRep dyn typeOf2 (Var2 ty _) = ty typeOf2 (Let2 ty _ _) = ty typeOf2 (Ap2 ty _ _) = ty -- | Get all the holes in an expression, identified by position. holes2 :: Schema2 -> [(Int, TypeRep)] holes2 = fst . go 0 where go i (Var2 ty (Hole2 _)) = ([(i, ty)], i + 1) -- tag is ignored go i (Let2 _ b e) = let (bhs, i') = go i b (ehs, i'') = go i' e in (bhs ++ ehs, i'') go i (Ap2 _ f e) = let (fhs, i') = go i f (ehs, i'') = go i' e in (fhs ++ ehs, i'') go i _ = ([], i) -- | Get all the named variables in an expression. names2 :: Exp2 h -> [(String, TypeRep)] names2 = nub . go where go (Var2 ty (Named2 s)) = [(s, ty)] go (Let2 _ b e) = go b ++ go e go (Ap2 _ f e) = go f ++ go e go _ = [] -- | Construct a literal value. lit2 :: String -> Dynamic -> Exp2 h lit2 = Lit2 -- | Construct a typed hole. hole2 :: TypeRep -> Schema2 hole2 ty = Var2 ty (Hole2 ()) -- holes get tagged with unit -- | Perform a function application, if type-correct. ap2 :: Exp2 h -> Exp2 h -> Maybe (Exp2 h) ap2 f e = (\ty -> Ap2 ty f e) <$> typeOf2 f `funResultTy` typeOf2 e -- | Bind a collection of holes, if type-correct. let2 :: [Int] -> Schema2 -> Schema2 -> Maybe Schema2 let2 is b e0 = Let2 (typeOf2 e0) b . fst <$> go 0 0 e0 where go n i (Var2 ty (Hole2 h)) | i `elem` is = if typeOf2 b == ty then Just (Var2 ty (Bound2 n), i + 1) else Nothing -- tag is ignored | otherwise = Just (Var2 ty (Hole2 h), i + 1) -- tag is preserved go n i (Let2 ty b e) = do (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Let2 ty b' e', i'') go n i (Ap2 ty f e) = do (f', i') <- go n i f (e', i'') <- go n i' e Just (Ap2 ty f' e', i'') go _ i e = Just (e, i) -- | Give names to holes, if type-correct. name2 :: [(Int, String)] -> Schema2 -> Maybe Schema2 name2 is e0 = (\(e,_,_) -> e) <$> go [] 0 e0 where go env i n@(Var2 ty (Named2 s)) = case lookup s env of Just sty | ty == sty -> Just (n, env, i) | otherwise -> Nothing Nothing -> Just (n, env, i) go env i (Var2 ty (Hole2 h)) = case lookup i is of Just s -> case lookup s env of Just sty | ty == sty -> Just (Var2 ty (Named2 s), env, i + 1) -- tag is ignored | otherwise -> Nothing Nothing -> Just (Var2 ty (Named2 s), (s,ty):env, i + 1) -- tag is ignored Nothing -> Just (Var2 ty (Hole2 h), env, i + 1) -- tag is preserved go env i (Let2 ty b e) = do (b', env', i') <- go env i b (e', env'', i'') <- go env' i' e Just (Let2 ty b' e', env'', i'') go env i (Ap2 ty f e) = do (f', env', i') <- go env i f (e', env'', i'') <- go env' i' e Just (Ap2 ty f' e', env'', i'') go env i e = Just (e, env, i)

In the Mark 3 evaluator, we’re going to need a function of type `Monad m => m Dynamic -> Dynamic`

, which “pushes” the `m`

inside the `Dynamic`

, and of type `Monad m => Dynamic -> Maybe (m Dynamic)`

. Unfortunately, Data.Dynamic doesn’t provide a way to do this, for good reason: it’s impossible in general! There’s no way to know what the type of the dynamic value inside the monad is, so there’s no way to do this safely.

Fortunately, implementing a Data.Dynamic-lite is pretty simple.

-- | A dynamic value is a pair of its type and 'Any'. Any is a magical type which is guaranteed to -- | work with 'unsafeCoerce'. data BDynamic = BDynamic { bdynTypeRep :: TypeRep, bdynAny :: Any } instance Show BDynamic where show = show . bdynTypeRep

(`BDynamic`

for “barrucadu’s dynamic”)

We need to be able to construct and deconstruct dynamic values, these operations do use `unsafeCoerce`

, but are safe:

-- | Convert an arbitrary value into a dynamic one. toBDynamic :: Typeable a => a -> BDynamic toBDynamic a = BDynamic (typeOf a) (unsafeCoerce a) -- | Convert a dynamic value into an ordinary value, if the types match. fromBDynamic :: Typeable a => BDynamic -> Maybe a fromBDynamic (BDynamic ty v) = case unsafeCoerce v of -- this is a bit mind-bending, but the 'typeOf r' here is the type of the 'a', as 'unsafeCoerce -- v :: a' (regardless of whether it actually is an 'a' value or not). The same result could be -- achieved using ScopedTypeVariables and 'typeRep'. r | ty == typeOf r -> Just r | otherwise -> Nothing

The final operation needed for the Marks 1 and 2 implementation is function application:

-- | Dynamically-typed function application. bdynApply :: BDynamic -> BDynamic -> Maybe BDynamic bdynApply (BDynamic ty1 f) (BDynamic ty2 x) = case funResultTy ty1 ty2 of Just ty3 -> Just (BDynamic ty3 ((unsafeCoerce f) x)) Nothing -> Nothing

Now we can construct our strange monad-shuffling operations:

-- | "Push" a functor inside a dynamic value, given the type of the resultant value. -- -- This is unsafe because if the type is incorrect and the value is later used as that type, good -- luck. unsafeWrapFunctor :: Functor f => TypeRep -> f BDynamic -> BDynamic unsafeWrapFunctor ty fdyn = BDynamic ty (unsafeCoerce $ fmap bdynAny fdyn) -- | "Extract" a functor from a dynamic value. unwrapFunctor :: forall f. (Functor f, Typeable f) => BDynamic -> Maybe (f BDynamic) unwrapFunctor (BDynamic ty v) = case splitTyConApp ty of (tyCon, tyArgs) | tyCon == ftyCon && not (null tyArgs) && init tyArgs == ftyArgs -> Just $ BDynamic (last tyArgs) <$> unsafeCoerce v _ -> Nothing where (ftyCon, ftyArgs) = splitTyConApp (typeRep (Proxy :: Proxy f))

It’s almost a shame that Data.Dynamic doesn’t expose enough to implement this. It has gone for a safe but limited API. A common Haskell “design” pattern is to have safe public APIs and unsafe but publically-exposed “internal” APIs, but base doesn’t seem to follow that.

This expression representation is pretty nice, but it’s rather cumbersome to express monadic operations for a couple of reasons:

Everything is monomorphic, so there would need to be a separate

`lit`

for`>>=`

at every desired type.Due to function application having a

`Maybe`

result, even at a single type writing`ap2 (lit2 ((>>=) :: Type)) e1 >>= \f -> ap2 f e2`

is not nice.The original need for this expression representation was for generating Haskell terms, and generating lambda terms is tricky; it would be nice to be able to bind holes directly.

This calls for a third representation of expressions. For reasons that will become apparent when looking at the evaluator, we’ll specalise this to only working in one monad, and track the monad type as another parameter of `Exp`

:

data Exp3 (m :: * -> *) (h :: *) = Lit3 String BDynamic | Var3 TypeRep (Var3 h) | Bind3 TypeRep (Exp3 m h) (Exp3 m h) | Let3 TypeRep (Exp3 m h) (Exp3 m h) | Ap3 TypeRep (Exp3 m h) (Exp3 m h) instance Show (Exp3 m h) where show (Lit3 s _) = s show (Var3 ty v) = "(" ++ show v ++ " :: " ++ show ty ++ ")" show (Bind3 _ b e) = "bind <" ++ show b ++ "> in <" ++ show e ++ ">" show (Let3 _ b e) = "let <" ++ show b ++ "> in <" ++ show e ++ ">" show (Ap3 _ f e) = "(" ++ show f ++ ") (" ++ show e ++ ")"

**Construction**: bind is going to be treated just as a let with special evaluation rules. This means that de Bruijn indices will be able to refer to a bind or a let. Rather than have two separate counters for those, we’ll just put everything in the same namespace (index-space?).

-- | Bind a collection of holes, if type-correct. let3 :: [Int] -> Schema3 m -> Schema3 m -> Maybe (Schema3 m) let3 is b e0 = Let3 (typeOf3 e0) b <$> letOrBind3 is (typeOf3 b) e0 -- | Monadically bind a collection of holes, if type-correct. -- -- This has the same indexing behaviour as 'let3'. bind3 :: forall m. Typeable m => [Int] -> Schema3 m -> Schema3 m -> Maybe (Schema3 m) bind3 is b e0 = case (splitTyConApp (typeOf3 b), splitTyConApp (typeOf3 e0)) of ((btyCon, btyArgs), (etyCon, etyArgs)) | btyCon == mtyCon && btyCon == etyCon && not (null btyArgs) && not (null etyArgs) && mtyArgs == init btyArgs && init btyArgs == init etyArgs -> Bind3 (typeOf3 e0) b <$> letOrBind3 is (last btyArgs) e0 _ -> Nothing where (mtyCon, mtyArgs) = splitTyConApp (typeRep (Proxy :: Proxy m)) -- | A helper for 'bind3' and 'let3': bind holes to the top of the expression. letOrBind3 :: [Int] -> TypeRep -> Exp3 m h -> Maybe (Exp3 m h) letOrBind3 is boundTy e0 = fst <$> go 0 0 e0 where go n i (Var3 ty (Hole3 h)) | i `elem` is = if boundTy == ty then Just (Var3 ty (Bound3 n), i + 1) else Nothing | otherwise = Just (Var3 ty (Hole3 h), i + 1) go n i (Bind3 ty b e) = do -- a new case for Bind3, the same as the case for Let3 (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Bind3 ty b' e', i'') go n i (Let3 ty b e) = do (b', i') <- go n i b (e', i'') <- go (n+1) i' e Just (Let3 ty b' e', i'') go n i (Ap3 ty f e) = do (f', i') <- go n i f (e', i'') <- go n i' e Just (Ap3 ty f' e', i'') go _ i e = Just (e, i)

We could make `letOrBind3`

also work for `name3`

by carrying around a third state token and letting the `Var3`

case apply an arbitrary function.

**Evaluation**: the new bind case, unfortunately, complicates things somewhat here. It’s *much* more awkward to deal with errors during evaluation, but fortunately the only errors that can actually arise are unbound named variables: the smart constructors ensure expressions are well-typed and have valid de Bruijn indices. This means we can just check the named variables for validity up front and then use `error`

once we’re sure there actually are no errors.

-- | Evaluate a term eval3 :: forall m. (Monad m, Typeable m) => [(String, BDynamic)] -> Term3 m -> Maybe BDynamic eval3 globals e0 | all check (names3 e0) = Just (go [] e0) | otherwise = Nothing where go locals (Bind3 ty b e) = case (unwrapFunctor :: BDynamic -> Maybe (m BDynamic)) (go locals b) of Just mdyn -> unsafeWrapFunctor ty $ mdyn >>= \dyn -> case unwrapFunctor (go (dyn:locals) e) of Just dyn -> dyn Nothing -> error "type error I can't deal with here!" -- this is unreachable Nothing -> error "type error I can't deal with here!" -- this is unreachable go locals (Let3 _ b e) = go (go locals b : locals) e go locals v@(Var3 _ _) = case env locals v of Just dyn -> dyn Nothing -> error "environment error I can't deal with here!" -- this is unreachable go locals (Ap3 _ f e) = case go locals f `bdynApply` go locals e of Just dyn -> dyn Nothing -> error "type error I can't deal with here!" -- this is unreachable go _ (Lit3 _ dyn) = dyn env locals (Var3 _ (Bound3 n)) | length locals > n = Just (locals !! n) | otherwise = Nothing env _ (Var3 ty (Named3 s)) = case lookup s globals of Just dyn | bdynTypeRep dyn == ty -> Just dyn _ -> Nothing env _ (Var3 _ (Hole3 v)) = absurd v check (s, ty) = case lookup s globals of Just dyn -> bdynTypeRep dyn == ty Nothing -> False

Now it becomes apparent why the monad type parameter is needed in the expression type, the evaluator uses `>>=`

, and so it needs to know which monad to bind it as. An alternative would be to use a type like this, but this still restricts you to using a single monad and so doesn’t gain anything:

eval3alt :: (Monad m, Typeable m) => proxy m -> [(String, BDynamic)] -> Term3 -> Maybe BDynamic

Here’s a little example showing that side-effects do work (when I first did this, they didn’t, so it’s not quite trivial to get right):

λ> r <- newIORef (5::Int) λ> let intHole = hole3 $ T.typeOf (5::Int) λ> let plusLit = lit3 "+" . toBDynamic $ ((+) :: Int -> Int -> Int) λ> let plusTwo = fromJust $ (fromJust $ plusLit `ap3` intHole) `ap3` intHole λ> let pureInt = lit3 "pure" . toBDynamic $ (pure :: Int -> IO Int) λ> let plusTwoIO = fromJust $ pureInt `ap3` plusTwo λ> let intAndTimes = (lit3 "*2" . toBDynamic $ (modifyIORef r (*7) >> pure (7::Int))) :: Exp3 IO h λ> let eval = fromJust $ (fromBDynamic :: BDynamic -> Maybe (IO Int)) =<< eval3 [] =<< toTerm3 =<< bind3 [0,1] intAndTimes plusTwoIO λ> eval 14 λ> readIORef r 7 λ> eval 14 λ> readIORef r 49

Again, the rest of the code hasn’t changed much, but is included for completeness.

data Var3 h = Hole3 h | Named3 String | Bound3 Int instance Show (Var3 h) where show (Hole3 _) = "_" show (Named3 s) = s show (Bound3 i) = show i -- | A schema is an expression which may contain holes. type Schema3 m = Exp3 m () -- | A term is an expression with no holes. type Term3 m = Exp3 m Void -- | Convert a Schema into a Term if there are no holes. toTerm3 :: Schema3 m -> Maybe (Term3 m) toTerm3 (Lit3 s dyn) = Just (Lit3 s dyn) toTerm3 (Var3 ty v) = case v of Hole3 _ -> Nothing Named3 s -> Just (Var3 ty (Named3 s)) Bound3 i -> Just (Var3 ty (Bound3 i)) toTerm3 (Bind3 ty b e) = Bind3 ty <$> toTerm3 b <*> toTerm3 e toTerm3 (Let3 ty b e) = Let3 ty <$> toTerm3 b <*> toTerm3 e toTerm3 (Ap3 ty f e) = Ap3 ty <$> toTerm3 f <*> toTerm3 e -- | Get the type of an expression. typeOf3 :: Exp3 m h -> TypeRep typeOf3 (Lit3 _ dyn) = bdynTypeRep dyn typeOf3 (Var3 ty _) = ty typeOf3 (Bind3 ty _ _) = ty typeOf3 (Let3 ty _ _) = ty typeOf3 (Ap3 ty _ _) = ty -- | Get all the holes in an expression, identified by position. holes3 :: Schema3 m -> [(Int, TypeRep)] holes3 = fst . go 0 where go i (Var3 ty (Hole3 _)) = ([(i, ty)], i + 1) -- tag is ignored go i (Let3 _ b e) = let (bhs, i') = go i b (ehs, i'') = go i' e in (bhs ++ ehs, i'') go i (Ap3 _ f e) = let (fhs, i') = go i f (ehs, i'') = go i' e in (fhs ++ ehs, i'') go i _ = ([], i) -- | Get all the named variables in an expression. names3 :: Exp3 m h -> [(String, TypeRep)] names3 = nub . go where go (Var3 ty (Named3 s)) = [(s, ty)] go (Let3 _ b e) = go b ++ go e go (Ap3 _ f e) = go f ++ go e go _ = [] -- | Construct a literal value. lit3 :: String -> BDynamic -> Exp3 m h lit3 = Lit3 -- | Construct a typed hole. hole3 :: TypeRep -> Schema3 m hole3 ty = Var3 ty (Hole3 ()) -- | Perform a function application, if type-correct. ap3 :: Exp3 m h -> Exp3 m h -> Maybe (Exp3 m h) ap3 f e = (\ty -> Ap3 ty f e) <$> typeOf3 f `funResultTy` typeOf3 e -- | Give names to holes, if type-correct. name3 :: [(Int, String)] -> Schema3 m -> Maybe (Schema3 m) name3 is e0 = (\(e,_,_) -> e) <$> go [] 0 e0 where go env i n@(Var3 ty (Named3 s)) = case lookup s env of Just sty | ty == sty -> Just (n, env, i) | otherwise -> Nothing Nothing -> Just (n, env, i) go env i (Var3 ty (Hole3 h)) = case lookup i is of Just s -> case lookup s env of Just sty | ty == sty -> Just (Var3 ty (Named3 s), env, i + 1) | otherwise -> Nothing Nothing -> Just (Var3 ty (Named3 s), (s,ty):env, i + 1) Nothing -> Just (Var3 ty (Hole3 h), env, i + 1) go env i (Bind3 ty b e) = do -- a new case for Bind3, the same as the case for Let3 (b', env', i') <- go env i b (e', env'', i'') <- go env' i' e Just (Bind3 ty b' e', env'', i'') go env i (Let3 ty b e) = do (b', env', i') <- go env i b (e', env'', i'') <- go env' i' e Just (Let3 ty b' e', env'', i'') go env i (Ap3 ty f e) = do (f', env', i') <- go env i f (e', env'', i'') <- go env' i' e Just (Ap3 ty f' e', env'', i'') go env i e = Just (e, env, i) -- | From a schema that may have holes, generate a list of terms with named variables -- substituted instead. terms3 :: (TypeRep -> Char) -> Schema3 m -> [Term3 m] terms3 nf = mapMaybe toTerm3 . sortOn (Down . length . names3) . go where go e0 = case hs e0 of [] -> [e0] (chosen:_) -> concatMap go [ e | ps <- partitions chosen , let (((_,tyc):_):_) = ps , let vname i = if i == 0 then [nf tyc] else nf tyc : show i , let naming = concat $ zipWith (\i vs -> [(v, vname i) | (v,_) <- vs]) [0..] ps , e <- maybeToList (name3 naming e0) ] -- holes grouped by type hs = groupBy ((==) `on` snd) . sortOn snd . holes3 -- all partitions of a list partitions (x:xs) = [[x]:p | p <- partitions xs] ++ [(x:ys):yss | (ys:yss) <- partitions xs] partitions [] = [[]]

The representation so far is good, and lets us express everything we want, but it’s still not very friendly to use in one common case: polymorphic monadic functions.

There are many monadic operations of the type `Monad m => m a -> m ()`

: the actual type of the first argument is ignored. At the moment, dealing with such terms requires either specialising that `a`

to each concrete type used, or using something like `void`

and specialising *that*.

Implementing full-blown Haskell polymorphism would be a pain, but this is a small and irritating enough case that it’s worth dealing with.

Presenting (trumpets please), the “ignore” type:

-- | A special type for enabling basic polymorphism. -- -- A function parameter of type @m Ignore@ unifies with values of any type @m a@, where @fmap -- (const Ignore)@ is applied to the parameter automatically. This avoids the need to clutter -- expressions with calls to 'void', or some other such function. data Ignore = Ignore deriving (Bounded, Enum, Eq, Ord, Read, Show)

`Ignore`

is going to give us our limited polymorphism, by changing the typing rules for `ap3`

and evaluation rules for `Ap3`

slightly.

**Application**: function application is as normal, with the exception that if the formal parameter has type `m Ignore`

and the actual parameter has type `m a`

, for any `a`

, then the application also succeeds:

-- | Perform a function application, if type-correct. -- -- There is a special case, see the comment of the 'Ignore' type. ap3ig :: forall m h. (Applicative m, Typeable m) => Exp3 m h -> Exp3 m h -> Maybe (Exp3 m h) ap3ig f e = case (splitTyConApp (typeOf3 f), splitTyConApp (typeOf3 e)) of ((_, [fargTy,fresTy]), (etyCon, etyArgs)) -- check if the formal parameter is of type @m Ignore@ and the actual parameter is of type @m a@ | fargTy == ignoreTy && etyCon == mtyCon && not (null etyArgs) && mtyArgs == init etyArgs -> Just (Ap3 fresTy f e) -- otherwise try normal function application | otherwise -> (\ty -> Ap3 ty f e) <$> typeOf3 f `funResultTy` typeOf3 e _ -> Nothing where ignoreTy = typeOf (pure Ignore :: m Ignore) (mtyCon, mtyArgs) = splitTyConApp (typeRep (Proxy :: Proxy m))

**Evaluation**: evaluation of applications has an analogous case. When applying a function, the type of the formal parametr is checked and, if it’s `m Ignore`

, the argument gets `fmap (const Ignore)`

applied:

-- | Evaluate a term eval3ig :: forall m. (Monad m, Typeable m) => [(String, BDynamic)] -> Term3 m -> Maybe BDynamic eval3ig globals e0 | all check (names3 e0) = Just (go [] e0) | otherwise = Nothing where go locals (Bind3 ty b e) = case (unwrapFunctor :: BDynamic -> Maybe (m BDynamic)) (go locals b) of Just mdyn -> unsafeWrapFunctor ty $ mdyn >>= \dyn -> case unwrapFunctor (go (dyn:locals) e) of Just dyn -> dyn Nothing -> error "type error I can't deal with here!" Nothing -> error "type error I can't deal with here!" go locals (Let3 _ b e) = go (go locals b : locals) e go locals v@(Var3 _ _) = case env locals v of Just dyn -> dyn Nothing -> error "environment error I can't deal with here!" go locals (Ap3 _ f e) = let f' = go locals f e' = go locals e in case f' `bdynApply` (if hasIgnoreArg f' then ignore e' else e') of Just dyn -> dyn Nothing -> error "type error I can't deal with here!" go _ (Lit3 _ dyn) = dyn env locals (Var3 _ (Bound3 n)) | length locals > n = Just (locals !! n) | otherwise = Nothing env _ (Var3 ty (Named3 s)) = case lookup s globals of Just dyn | bdynTypeRep dyn == ty -> Just dyn _ -> Nothing env _ (Var3 _ (Hole3 v)) = absurd v hasIgnoreArg fdyn = let (_, [fargTy,_]) = splitTyConApp (bdynTypeRep fdyn) in fargTy == ignoreTy ignore dyn = case (unwrapFunctor :: BDynamic -> Maybe (m BDynamic)) dyn of Just ma -> unsafeToBDynamic ignoreTy (const Ignore <$> ma) Nothing -> error "non-monadic value I can't deal with here!" -- this is unreachable ignoreTy = typeOf (pure Ignore :: m Ignore) check (s, ty) = case lookup s globals of Just dyn -> bdynTypeRep dyn == ty Nothing -> False

The final piece of the puzzle is this:

-- | Convert an arbitrary value into a dynamic value, given its type. -- -- This is unsafe because if the type is incorrect and the value is later used as that type, good -- luck. unsafeToBDynamic :: TypeRep -> a -> BDynamic unsafeToBDynamic ty = BDynamic ty . unsafeCoerce

And a demo:

λ> r <- newIORef (5::Int) λ> let double = lit3 $ toBDynamic ((\x -> x >> x >> pure ()) :: IO Ignore -> IO ()) :: Exp3 IO h λ> let addOne = lit3 $ toBDynamic (modifyIORef r (+1)) :: Exp3 IO h λ> let addTwo = fromJust $ double `ap3ig` addOne λ> let eval = fromJust $ (fromBDynamic :: BDynamic -> Maybe (IO ())) =<< eval3ig [] =<< toTerm3 addTwo λ> eval λ> readIORef r 7 λ> eval λ> readIORef r 9]]>

- Run your concurrent program, recording an execution trace.
- Look back through the trace for pairs of actions which
*might*give a different result if done in the other order. These are called*dependent actions*. - For each pair of dependent actions, attempt to schedule the latter before the former by inserting a context switch.

It works pretty well. The space of potential schedules is cut down by a *huge* proportion. In papers presenting specific partial-order algorithms, the authors will typically give a little example like so:

We assume a core concurrent language consisting of reads and writes to shared variables. In this context, two reads are independent, but two writes, or a read and a write, are dependent if to the same variable.

So here’s a little example, we have two threads, `T0`

and `T1`

, which are executing concurrently:

T0: read x T1: read x write x 1

Initially `x = 0`

. If we take the result of this program to be the value of `x`

that `T0`

reads, then there are clearly two possible results, and here are possible traces which could lead to them:

T0: read x T1: read x T1: read x T1: write x 1 T1: write x 1 T0: read x result = 0 result = 1

If we run this program under a non-preemptive scheduler starting with `T0`

, it’ll run `T0`

until it terminates, then run `T1`

, giving the left trace above. The algorithm will then note the dependency between `T1: write x 1`

and `T0: read x`

, and so schedule `T1`

before `T0`

, giving the right trace above. This will then give rise to a third execution:

T1: read x T0: read x T1: write x 1 result = 0

But it doesn’t change the result. You might think an obvious optimisation is to apply the context switch before as many independent actions as possible, which would then not give a new unique schedule to try. Unfortunately this isn’t sound in general because what if the write is conditional on the read? The execution trace only contains *visible* actions, if-statements and the like are not shown.

I have made an assumption when I ran that program, can you see what it is?

**[pause for dramatic effect]**

I am assuming that the program only terminates after every thread terminates! If the program instead terminates when `T0`

terminates, we would get this trace:

T0: read x result = 0

There is no dependency between reads, so `T1`

would never be scheduled. In this context, `T1`

is a *daemon thread*. Daemon threads do not block program termination, they are killed by the runtime after all non-daemon threads terminate. In Haskell, all threads other than the main thread are demon threads. Variants of this problem have cropped up in a few places in dejafu and bitten me. I couldn’t find anything written about this online, so I decided to document it for posterity.

There are two solutions:

Make your dependency function

*smarter*, by having a special case: two actions are dependent if one of them is the last action in the trace. This handles daemon threads, and also cases where the program did not terminate gracefully but was aborted by the testing framework: for instance, if you bound the length of an execution, you still want to explore different schedules within that bound.Add a special “stop” action to the end of the main thread, which is dependent with everything. This does

*not*handle the case where execution is aborted by the test framework, only when execution terminates gracefully.

The dpor library implements a variant of case 1, to accommodate length-bounding. I’m not convinced it’s a great solution, as it leads to a lot of additional schedules being explored which are then quickly aborted, but I’m not sure of a better approach yet. The dejafu library implements case 2, because having a special “stop” action also gives you a convenient place to clean up everything associated with a thread.

So that’s it: daemon threads do work within the context of partial-order reduction, but you need to make sure you’re explicitly handling the termination of the main thread.

]]>- How many times do we need to run it?
- Does the number of runs we need grow with relation to the program?
- Is this even a good approach?

Running the program lots of times might be ok *if every run uses a unique schedule*, otherwise you’ll explore less of the possibility space than you might think. Even if you can guarantee uniqueness of schedules, larger programs tend to have more possible schedules, so you need more runs to get the same coverage guarantee: but it’s difficult to know exactly *how many*.

So random testing is out. Let’s see what the alternatives are.

Completeness in concurrency testing is hard, we’d already given up on it when we considered random testing, so why not throw it out in a much more principled fashion?

Enter schedule bounding. Here, we define some bound function which, given a list of scheduling decisions, will determine if that is within the bound or not. We then test all schedules within the bound.

There are a few bound functions in common use:

*Pre-emption bounding*: restricting the number of pre-emptive context switches.*Delay bounding*: restricting the number of deviations from an otherwise deterministic scheduler.*Fair bounding*: restricting the number of consecutive times a non-blocking loop accessing shared state (like a spinlock) is executed.

Furthermore, these bound functions are often *iterated*. For example, trying all schedules with 0 pre-emptions, then trying all schedules with 1 pre-emption, and so on, up to some limit.

Pre-emption bounding is a common one, and empirical studies00P. Thomson, A. F. Donaldson, and A. Betts. Concurrency Testing Using Schedule Bounding: an Empirical Study. In *Proceedings of the 19th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming*, pages 15–28. ACM, 2014.

have found that test cases with as few as two threads and two pre-emptions are enough to find many concurrency bugs.

You may be wondering how schedule bounding can actually be *implemented*. Maybe you’re wondering if you simply monitor the execution and abort it if it exceeds the bound.

Well, that would certainly enforce the bound, but it wouldn’t give you many coverage guarantees in finite time.

By executing a program you can gather a lot of information: things like, what threads are runnable at each step (and what they would do if you scheduled them), what the thread that was scheduled did, the state of all shared variables (for CVars/MVars, this is whether they’re full or empty). You can use this to inform your initial set of scheduling decisions when starting a new execution, to systematically explore the possibility space.

You can store this information in a tree structure modified between executions: each edge represents a scheduling decision, each node contains information on the threads runnable at that point, and the alternative decisions still to make. This turns out to be quite a space-efficient representation, as schedule prefixes are explicitly shared as much as possible.

If we have a bunch of runnable threads, but some of them will block immediately without modifying any shared state, then we can restrict the set of runnable threads to those which won’t block.

This is safe when there’s no schedule bounding happening, or when it can’t result in otherwise legal schedules no longer being legal.

In the case of pre-emption bounding, this is safe because it doesn’t alter the pre-emption count of any schedules reachable from this state, as if a thread blocks then any other thread can be executed without incurring an extra pre-emption.

Eliminating schedules which obviously don’t change the state is a nice step, but it’s only a *first* step.

We can characterise the execution of a concurrent program by the ordering of dependent actions, such as reads and writes to the same variable. This is a *partial order* on the program actions, for which there may be many *total orders*. Ideally, we would only check one total order for each partial order, as different total orders will have the same result.

Partial-order reduction (POR) can be implemented by only exploring multiple scheduling decisions (when there is a choice) if they can interfere with each other.

Unfortunately, POR isn’t quite that simple when using schedule bounding, as it can impose dependencies between previously-unrelated actions, as they can affect whether a state is reachable within the bound or not.

How this is solved depends on the specific bound function used. For pre-emption bounding, it suffices to try different possibilities when a context switch happens. Furthermore, when implementing blocking avoidance, don’t remove the context switch entirely, instead perform it earlier in the execution, where it won’t block.

Unfortunately, the context switches introduced by POR can still result in the same program state being reached by multiple different schedules.

Sleep sets are a complementary improvement, which do not require POR. The idea is that, if you have two or more decisions to make, and you have explored one possibility, there’s no point in making that same decision when trying the other possibility, unless something happens which can alter the result. If nothing has changed, you’ll get the same result.

Déjà Fu uses a combination of all of these approaches, including some tweaks to the order of schedule exploration to try to find bugs sooner rather than later, when there are any.

The algorithm used by the standard testing functions is pre-emption bounded partial-order reduction, with a bound of 2, using blocking avoidance (note that that doesn’t reduce the number of schedules when used in conjuction with bounded partial-order reduction!) and sleep sets. Enough of the internals are exposed to allow implementing your own bound function, such as fairness bounding or delay bounding.

To give some figures, here’s the effect of every improvement to the algorithm for `runPar $ parMap id [1..5]`

with the Par monad’s “direct” scheduler:

**Pre-emption bounding:**12539 unique schedules,**+ blocking avoidance:**11400,**+ partial-order reduction:**8181,**+ sleep sets:**2237.

As can be seen, sleep sets are a massive improvement in this case, and I would wager that just pre-emption bounding with sleep sets would also see a similar improvement. Obviously this is a very simple example, with little communication between threads, and so can’t really be generalised to other cases, but it’s a nice result.

]]>