The C standard bakes in enough details about pointers such that the amount of memory a C program can access (even on a hypothetical infinite-memory machine) is bounded and statically known. Access to an unbounded amount of memory is necessary (but not sufficient) for Turing completeness. Therefore C is not Turing complete.

This is an argument about the

*specification*of C, not any particular*implementation*. The fact that no real machine has unbounded memory is totally irrelevant.This is not a criticism of C.

A friend told me that C isn’t actually Turing-complete due to the semantics of pointers, so I decided to dig through the (C11) spec to find evidence for this claim. The two key bits are 6.2.6.1.4 and 6.5.9.5:

Values stored in non-bit-field objects of any other object type consist of

`n × CHAR_BIT`

bits, where`n`

is the size of an object of that type, in bytes. The value may be copied into an object of type`unsigned char [n]`

(e.g., by`memcpy`

); the resulting set of bytes is called the object representation of the value. Values stored in bit-fields consist of`m`

bits, where`m`

is the size specified for the bit-field. The object representation is the set of`m`

bits the bit-field comprises in the addressable storage unit holding it. Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

The important bit is the use of the definite article in the first sentence, “where `n`

is **the** size of an object of that type”, this means that all types have a size which is known statically.

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Pointers to distinct objects of the same typeInterestingly, you could have a distinct heap for every type, with overlapping pointer values. And this is totally fine according to the spec! This doesn’t help you, however, because the number of types is finite: they’re specified in the text of the program, which is necessarily finite.

compare unequal. As pointers are fixed in size, this means that there’s only a finite number of them. You can take a pointer to any object “The unary `&`

operator yields the address of its operand.”, first sentence of 6.5.3.2.3.

, therefore there are a finite number of objects that can exist at any one time!

However, C is slightly more interesting than a finite-state machine. We have one more mechanism to store values: the return value of a function! Fortunately, the C spec doesn’t impose a maximum stack depth“Recursive function calls shall be permitted, both directly and indirectly through any chain of other functions.”, 6.5.2.2.11, nothing else is said on the matter.

, and so we can in principle implement a pushdown automata.

Just an interesting bit of information about C, because it’s so common to see statements like “because C is Turing-complete…”. Of course, on a real computer, nothing is Turing-complete, but C doesn’t even manage it in theory.

In a discussion about this on Twitter, the possibility of doing some sort of virtual memory shenanigans to make a pointer see different things depending on its context of use came up. I believe that this is prohibited by the semantics of object lifetimes (6.2.4.2):

The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

The lifetime for heap-allocated objects is from the allocation until the deallocation (7.22.3.1):

The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

I had a fun discussion on IRC, where someone argued that the definition of pointer equality does not mention the object representation, therefore the fixed object representation size is irrelevant! Therefore, pointers could have extra information somehow which is not part of the object representation.

It took a while to resolve, but I believe the final sentence of the object representation quote and the first clause of the pointer equality quote, together with the fact that pointers are values, resolves this:

- Pointers are values.
Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

- Points (1) and (2) mean that pointers with the same object representation compare equal.
Two pointers compare equal if and only if…

- The “only if” in (4) means that if two pointers compare equal, then the rest of the rules apply.
- Points (3) and (5) mean that two pointers with the same object representation compare equal, and therefore point to the same object (or are both null pointers, etc).

This means that there cannot be any further information that what is stored in the object representation.

Interestingly, I believe this forbids something I initially thought to be the case: I say in a footnote that different types could have different heaps. They *could*, but that doesn’t let you use the same object representation for pointers of different types!

I recently implemented async-dejafu, a version of the async library using Deja Fu so programs written with it can be tested, and I was curious about checking the relevant typeclass laws automatically.

Checking typeclass laws has been done with QuickCheck before, but the difference here is that async uses *concurrency*! If only we had some way to test concurrent Haskell code! Oh, wait…

Specifically, I want to test the laws for the `Concurrently`

type. `Concurrently`

is a monad for expressing `IO`

actions which should be run concurrently.

Firstly, we need some language extensions and imports:

{-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ViewPatterns #-} module Concurrently where import Control.Applicative import Control.Exception (SomeException) import Control.Monad ((>=>), ap, liftM, forever) import Control.Monad.Catch (onException) import Control.Monad.Conc.Class import Data.Maybe (isJust) import Data.Set (Set, fromList) import Test.DejaFu (Failure(..), defaultMemType) import Test.DejaFu.Deterministic (ConcST, Trace) import Test.DejaFu.SCT (sctBound, defaultBounds) import Test.QuickCheck (Arbitrary(..)) import Test.QuickCheck.Function (Fun, apply) import Unsafe.Coerce (unsafeCoerce)

I have sadly not managed to eliminate that `unsafeCoerce`

, it shows up because of the use of higher-ranked types, and makes me very sad. If anyone knows how I can get rid of it, I would be very happy!

Now we need our `Concurrently`

type. The original just uses `IO`

, so we have to parameterise ours over the underlying monad:

newtype Concurrently m a = Concurrently { runConcurrently :: m a }

We’ll also be using a `ConcST`

variant for testing a lot, so here’s a type synonym for that:

type CST t = Concurrently (ConcST t)

We also need some instances for `Concurrently`

in order to make QuickCheck happy, but these aren’t terribly important:

instance Show (Concurrently m a) where show _ = "<concurrently>" instance (Arbitrary a, Applicative m) => Arbitrary (Concurrently m a) where arbitrary = Concurrently . pure <$> arbitrary

Ok, let’s get started!

`Functor`

lets you apply a pure function to a value in a context.

class Functor f where fmap :: (a -> b) -> f a -> f b

A `Functor`

should satisfy the identity law:

fmap id = id

And the composition law:

fmap f . fmap g = fmap (f . g)

The `Functor`

instance for `Concurrently`

just delegates the work to the instance for the underlying monad:

instance MonadConc m => Functor (Concurrently m) where fmap f (Concurrently a) = Concurrently $ f <$> a

The composition law is a little awkward to express in a way that QuickCheck can deal with, as it involves arbitrary functions. QuickCheck has a `Fun`

type, representing functions which can be serialised to a string. Bearing that in mind, here is how we can express those two laws as tests:

prop_functor_id :: Ord a => CST t a -> Bool prop_functor_id ca = ca `eq` (id <$> ca) prop_functor_comp :: Ord c => CST t a -> Fun a b -> Fun b c -> Bool prop_functor_comp ca (apply -> f) (apply -> g) = (g . f <$> ca) `eq` (g <$> (f <$> ca))

We’re using view patterns here to extract the actual function from the `Fun`

value. let’s see if the laws hold!

λ> quickCheck (prop_functor_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_functor_comp :: CST t Int -> Fun Int Integer -> Fun Integer String -> Bool) +++ OK, passed 100 tests.

Cool! Wait, what’s that `eq`

function?

I’ve decided to treat two concurrent computations as equal if the sets of values that they can produce are equal:

eq :: Ord a => CST t a -> CST t a -> Bool eq left right = runConcurrently left `eq'` runConcurrently right eq' :: forall t a. Ord a => ConcST t a -> ConcST t a -> Bool eq' left right = results left == results right where results cst = fromList . map fst $ sctBound' cst sctBound' :: ConcST t a -> [(Either Failure a, Trace)] sctBound' = unsafeCoerce $ sctBound defaultMemType defaultBounds

This is where the unfortunate `unsafeCoerce`

comes in. The definition of `sctBound'`

there doesn’t type-check without it, which is a shame. If anyone could offer a solution, I would be very grateful.

`Applicative`

extends `Functor`

with the ability to inject a value into a context without introducing any effects, and to apply a function in a context to a value in a context.

class Functor f => Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b

An `Applicative`

should satisfy the identity law:

pure id <*> a = a

The homomorphism law, which says that applying a pure function to a pure value in a context is the same as just applying the function to the value and injecting the entire result into a context:

pure (f a) = pure f <*> pure a

The interchange law, which says that when applying a function in a context to a pure value, the order in which each is evaluated doesn’t matter:

u <*> pure y = pure ($ y) <*> u

And the composition law, which is a sort of associativity property:

u <*> (v <*> w) = pure (.) <*> u <*> v <*> w

Finally, there is a law relating `Applicative`

to `Functor`

, that says we can decompose `fmap`

into two steps, injecting a function into a context, and then application within that context:

f <$> x = pure f <*> x

This is where `Concurrently`

gets its concurrency. `(<*>)`

runs its two arguments concurrently, killing the other if one throws an exception.

instance MonadConc m => Applicative (Concurrently m) where pure = Concurrently . pure Concurrently fs <*> Concurrently as = Concurrently $ (\(f, a) -> f a) <$> concurrently fs as concurrently :: MonadConc m => m a -> m b -> m (a, b) concurrently = ...

Armed with the knowledge of how to generate arbitrary functions, these are all fairly straight-forward to test

prop_applicative_id :: Ord a => CST t a -> Bool prop_applicative_id ca = ca `eq` (pure id <*> ca) prop_applicative_homo :: Ord b => a -> Fun a b -> Bool prop_applicative_homo a (apply -> f) = (pure $ f a) `eq` (pure f <*> pure a) prop_applicative_inter :: Ord b => CST t (Fun a b) -> a -> Bool prop_applicative_inter u y = (u' <*> pure y) `eq` (pure ($ y) <*> u') where u' = apply <$> u prop_applicative_comp :: Ord c => CST t (Fun b c) -> CST t (Fun a b) -> CST t a -> Bool prop_applicative_comp u v w = (u' <*> (v' <*> w)) `eq` (pure (.) <*> u' <*> v' <*> w) where u' = apply <$> u v' = apply <$> v prop_applicative_fmap :: Ord b => Fun a b -> CST t a -> Bool prop_applicative_fmap (apply -> f) a = (f <$> a) `eq` (pure f <*> a)

And indeed we see that the laws hold:

λ> quickCheck (prop_applicative_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_homo :: String -> Fun String Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_inter :: CST t (Fun Int String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_comp :: CST t (Fun Int String) -> CST t (Fun Char Int) -> CST t Char -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_applicative_fmap :: Fun Int String -> CST t Int -> Bool) +++ OK, passed 100 tests.

`Alternative`

is a kind of monoid over `Applicative`

.

class Applicative f => Alternative f where empty :: f a (<|>) :: f a -> f a -> f a -- These both have default definitions some :: f a -> f [a] many :: f a -> f [a]

An `Alternative`

should satisfy the monoid laws. Namely, left and right identity:

empty <|> x = x x <|> empty = x

And associativity:

(x <|> y) <|> z = x <|> (y <|> z)

The `Alternative`

instance for `Concurrently`

is used to express races, with `(<|>)`

executing both of its arguments concurrently and returning the first to finish:

instance MonadConc m => Alternative (Concurrently m) where empty = Concurrently $ forever yield Concurrently as <|> Concurrently bs = Concurrently $ either id id <$> race as bs race :: MonadConc m => m a -> m b -> m (Either a b) race = ...

Once again, the translation into QuickCheck properties is quite simple:

prop_alternative_right_id :: Ord a => CST t a -> Bool prop_alternative_right_id x = x `eq` (x <|> empty) prop_alternative_left_id :: Ord a => CST t a -> Bool prop_alternative_left_id x = x `eq` (empty <|> x) prop_alternative_assoc :: Ord a => CST t a -> CST t a -> CST t a -> Bool prop_alternative_assoc x y z = (x <|> (y <|> z)) `eq` ((x <|> y) <|> z)

And the laws hold!

λ> quickCheck (prop_alternative_right_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_alternative_left_id :: CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_alternative_assoc :: CST t Int -> CST t Int -> CST t Int -> Bool) +++ OK, passed 100 tests.

There are also some laws relating `Alternative`

to `Applicative`

, but these are expressed in terms of `some`

and `many`

, which have default law-satisfying definitions.

`Monad`

extends `Applicative`

with the ability to squash nested monadic values together, and are commonly used to express sequencing.

class Applicative m => Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b

There are a few different formulations of the `Monad`

laws, I prefer the one in terms of `(>=>)`

(the fish operator), which is defined as:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c f >=> g = \x -> f x >>= g

Using this function the laws become simply the monoid laws:

return >=> f = f f >=> return = f (f >=> g) >=> h = f >=> (g >=> h)

There are also a few laws relating `Monad`

to `Applicative`

and `Functor`

:

f <$> a = f `liftM` a return = pure (<*>) = ap

As with the `Functor`

, the `Monad`

instance just delegates the work:

instance MonadConc m => Monad (Concurrently m) where return = pure Concurrently a >>= f = Concurrently $ a >>= runConcurrently . f

As these laws are mostly about function equality, a helper function to express that is used:

eqf :: Ord b => (a -> CST t b) -> (a -> CST t b) -> a -> Bool eqf left right a = left a `eq` right a

Given that, the translation is simple:

prop_monad_left_id :: Ord b => Fun a (CST t b) -> a -> Bool prop_monad_left_id (apply -> f) = f `eqf` (return >=> f) prop_monad_right_id :: Ord b => Fun a (CST t b) -> a -> Bool prop_monad_right_id (apply -> f) = f `eqf` (f >=> return) prop_monad_comp :: Ord d => Fun a (CST t b) -> Fun b (CST t c) -> Fun c (CST t d) -> a -> Bool prop_monad_comp (apply -> f) (apply -> g) (apply -> h) = ((f >=> g) >=> h) `eqf` (f >=> (g >=> h)) prop_monad_fmap :: Ord b => Fun a b -> CST t a -> Bool prop_monad_fmap (apply -> f) a = (f <$> a) `eq` (f `liftM` a) prop_monad_pure :: Ord a => a -> Bool prop_monad_pure = pure `eqf` return prop_monad_ap :: Ord b => Fun a b -> a -> Bool prop_monad_ap (apply -> f) a = (pure f <*> pure a) `eq` (return f `ap` return a)

Are there any counterexamples? No there aren’t!

λ> quickCheck (prop_monad_left_id :: Fun Int (CST t String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_right_id :: Fun Int (CST t String) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_comp :: Fun Int (CST t String) -> Fun String (CST t Bool) -> Fun Bool (CST t Int) -> Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_fmap :: Fun Int String -> CST t Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_pure :: Int -> Bool) +++ OK, passed 100 tests. λ> quickCheck (prop_monad_ap :: Fun Int String -> Int -> Bool) +++ OK, passed 100 tests.

So, it certainly *looks* like all the laws hold! Yay!

Consider the `eq'`

function. This sort of “value-level” equality is good enough for most types, where any type of effect is a value, but it doesn’t work so well when concurrency (or any sort of `IO`

) is involved, as there effects do not directly correspond to values.

There’s one type of effect we particularly care about for the case of `Concurrently`

: namely, the amount of concurrency going on! To test this, we need to write our tests such that different amounts of concurrency can produce different results, which means our current `Arbitrary`

instance for `Concurrently`

isn’t good enough. We need interaction between different concurrent inputs.

So let’s try writing a test case for the `(<*>) = ap`

law, but explicitly testing the amount of concurrency:

prop_monad_ap2 :: forall a b. Ord b => Fun a b -> Fun a b -> a -> Bool prop_monad_ap2 (apply -> f) (apply -> g) a = go (<*>) `eq'` go ap where go :: (CST t (a -> b) -> CST t a -> CST t b) -> ConcST t b go combine = do var <- newEmptyCVar let cf = do { res <- tryTakeCVar var; pure $ if isJust res then f else g } let ca = do { putCVar var (); pure a } runConcurrently $ Concurrently cf `combine` Concurrently ca

Here we have two functions, `f`

and `g`

, and are using whether a `CVar`

is full or empty to choose between them. If the combining function executes its arguments concurrently, then we will see both cases; otherwise we’ll only see the `g`

case. *If* the law holds, and `(<*>) = ap`

, then we will see both cases for both of them!

λ> quickCheck (prop_monad_ap2 :: Fun Int String -> Fun Int String -> Int -> Bool) *** Failed! Falsifiable (after 3 tests and 8 shrinks): {_->""} {_->"a"} 0

Oops! We found a counterexample! Let’s see what’s happening:

λ> results $ go (<*>) (\_ -> "") (\_ -> "a") 0 fromList [Right "",Right "a"] λ> results $ go ap (\_ -> "") (\_ -> "a") 0 fromList [Right "a"]

If we look at the definition of `ap`

, the problem becomes clear:

ap :: Monad m => m (a -> b) -> m a -> m b ap mf ma = mf >>= \f -> ma >>= \a -> return (f a)

The issue is that our definiton of `(>>=)`

is *sequential*, whereas `(<*>)`

is *concurrent*. The `Monad`

instance is not consistent with that `Applicative`

*when there is interaction between actions*, as this shows!

So what’s the problem? It’s *close enough*, right? Well, close enough isn’t good enough, when it comes to laws. This very issue caused breakage, and is the reason that the `Monad`

instance for `Concurrently`

got removed!

So what’s the point of this? Big deal, laws are important.

Well, that *is* the point. Laws *are* important, but often we don’t bother to test them. That’s possibly fine if the instances are simple, and you can check the laws by just juggling definitions in your head, but when `IO`

is involved, the situation becomes a bit more murky.

Code involving `IO`

and concurrency is easy to get wrong, so when building up a monad or whatever based on it, why not *actually test* the laws, rather than just assume they’re right? Because if, as a library author, your assumption is wrong, your users will suffer for it.