The generalised algebraic data types (GADTs) GHC extension is really powerful, and let’s you write some neat code. This memo is a Literate Haskell file showing how you can use GADTs to write an interface for sets which works for types which only have an Eq
instance, but which enables faster implementations of methods if you have an Ord
instance around. It’s not complely transparent—functions which construct an entirely new set will have Eq
and Ord
variants—but it’s better than duplicating every single function.
The FlexiSet
type
I’m going to call my set type FlexiSet
, because it’s a flexible set1.
A FlexiSet
is either a list of values of a type which has an Eq
instance, or a set (from our old friend Data.Set
) of values of a type which has an Ord
instance:
{-# LANGUAGE GADTs #-}
import Prelude hiding (filter, null)
import qualified Data.List as L
import qualified Data.Set as S
-- | A flexible set: elements will have at least an 'Eq' instance,
-- maybe also an 'Ord' instance.
data FlexiSet a where
EqSet :: Eq a => [a] -> FlexiSet a
OrdSet :: Ord a => S.Set a -> FlexiSet a
-- | Get the values from a 'FlexiSet'. The order of the resultant
-- list is arbitrary.
toList :: FlexiSet a -> [a]
toList (EqSet as) = as
toList (OrdSet as) = S.toList as
The Eq
and Ord
constraints don’t leak outside the type, they’re entirely contained. When an EqSet
or OrdSet
value is constructed, the constraint dictionary is captured as well. So pattern matching on a FlexiSet
will bring the instance into scope, without needing to include the constraint in the function signature. Great! Leaky constraints are the main reason why people don’t like type-constrained data
declarations.
We do need Eq
-aware and Ord
-aware functions to construct a FlexiSet
:
-- | Construct a 'FlexiSet' for a type @a@ which has an 'Eq'
-- instance.
--
-- If @a@ has an 'Ord' instance, use 'makeOrdSet' instead.
makeEqSet :: Eq a => [a] -> FlexiSet a
makeEqSet = EqSet . L.nub
-- | Construct a 'FlexiSet' for a type @a@ which has an 'Ord'
-- instance.
makeOrdSet :: Ord a => [a] -> FlexiSet a
makeOrdSet = OrdSet . S.fromList
Sadly we also need specific functions for mapping, as we need to constrain the result type of the map function. This also means we can’t give FlexiSet
a Functor
instance (just like Set
can’t have one):
-- | Map a function over a 'FlexiSet'.
--
-- If the result type has an 'Ord' instance, use 'mapOrd' instead.
--
-- This is @O(n)@.
mapEq :: Eq b => (a -> b) -> FlexiSet a -> FlexiSet b
mapEq f = EqSet . L.nub . map f . toList
-- | Map a function over a 'FlexiSet'.
--
-- This is @O(n)@.
mapOrd :: Ord b => (a -> b) -> FlexiSet a -> FlexiSet b
mapOrd f = OrdSet . S.fromList . map f . toList
But we don’t need to know anything about constraints for filtering!2 This is because we’re not changing the type of value in the FlexiSet
, and by pattern matching on it we bring the instance into scope:
-- | Remove values from a 'FlexiSet' which fail to satisfy the given
-- predicate.
--
-- This is @O(n)@.
filter :: (a -> Bool) -> FlexiSet a -> FlexiSet a
filter f (EqSet as) = EqSet (L.filter f as)
filter f (OrdSet as) = OrdSet (S.filter f as)
Here are a few more functions which do something with a single FlexiSet
. Note how the OrdSet
ones can use the more efficient Data.Set
operations, but the EqSet
ones are stuck with slow linear-time list functions:
-- | Insert a value into a 'FlexiSet' if it's not already present.
--
-- This is @O(n)@ for 'Eq'-based sets and @O(log n)@ for 'Ord'-based
-- sets.
insert :: a -> FlexiSet a -> FlexiSet a
insert a (EqSet as) = EqSet (L.nub (a:as))
insert a (OrdSet as) = OrdSet (S.insert a as)
-- | Remove a value from a 'FlexiSet' if it's present
--
-- This is @O(n)@ for 'Eq'-based sets and @O(log n)@ for 'Ord'-based
-- sets.
delete :: a -> FlexiSet a -> FlexiSet a
delete a (EqSet as) = EqSet (L.filter (/=a) as)
delete a (OrdSet as) = OrdSet (S.delete a as)
-- | Check if a value is present in a 'FlexiSet'.
--
-- This is @O(n)@ for 'Eq'-based sets and @O(log n)@ for 'Ord'-based
-- sets.
member :: a -> FlexiSet a -> Bool
member a (EqSet as) = any (==a) as
member a (OrdSet as) = S.member a as
Sometimes it doesn’t matter whether we have an EqSet
or an OrdSet
:
-- | Check if a 'FlexiSet' is empty.
--
-- This is @O(1)@.
null :: FlexiSet a -> Bool
null (EqSet as) = L.null as
null (OrdSet as) = S.null as
Sometimes it matters a lot:
-- | Get the number of elements in a 'FlexiSet'.
--
-- This is @O(n)@ for 'Eq'-based sets and @O(1)@ for 'Ord'-based
-- sets.
size :: FlexiSet a -> Int
size (EqSet as) = length as
size (OrdSet as) = S.size as
We could improve this case by changing our EqSet
representation to also track the length.
Functions which combine two FlexiSet
values of the same type are interesting, as we get to “upgrade” from an EqSet
to an OrdSet
in some cases:
-- | Take the union of two 'FlexiSet' values.
--
-- This is @O(n)@ if both or either of the sets are 'Eq'-based and
-- @O(m*log(n/m + 1)), m <= n@ if both are 'Ord'-based.
--
-- If one set is 'Eq'-based and one is 'Ord'-based, the result will
-- be 'Ord'-based.
union :: FlexiSet a -> FlexiSet a -> FlexiSet a
union (EqSet as) (EqSet bs) = EqSet (L.nub (as ++ bs))
union (EqSet as) (OrdSet bs) = OrdSet (S.union (S.fromList as) bs)
union (OrdSet as) (EqSet bs) = OrdSet (S.union as (S.fromList bs))
union (OrdSet as) (OrdSet bs) = OrdSet (S.union as bs)
-- | Take the intersection of two 'FlexiSet' values.
--
-- This is @O(n)@ if both or either of the sets are 'Eq'-based and
-- @O(m*log(n/m + 1)), m <= n@ if both are 'Ord'-based.
--
-- If one set is 'Eq'-based and one is 'Ord'-based, the result will
-- be 'Ord'-based.
intersection :: FlexiSet a -> FlexiSet a -> FlexiSet a
intersection (EqSet as) (EqSet bs) = EqSet (L.filter (`elem` bs) as)
intersection (EqSet as) (OrdSet bs) = OrdSet (S.intersection (S.fromList as) bs)
intersection (OrdSet as) (EqSet bs) = OrdSet (S.intersection as (S.fromList bs))
intersection (OrdSet as) (OrdSet bs) = OrdSet (S.intersection as bs)
-- | Take the intersection of two 'FlexiSet' values.
--
-- This is @O(n)@ if both or either of the sets are 'Eq'-based and
-- @O(m*log(n/m + 1)), m <= n@ if both are 'Ord'-based.
--
-- If one set is 'Eq'-based and one is 'Ord'-based, the result will
-- be 'Ord'-based.
difference :: FlexiSet a -> FlexiSet a -> FlexiSet a
difference (EqSet as) (EqSet bs) = EqSet (L.filter (`notElem` bs) as)
difference (EqSet as) (OrdSet bs) = OrdSet (S.difference (S.fromList as) bs)
difference (OrdSet as) (EqSet bs) = OrdSet (S.difference as (S.fromList bs))
difference (OrdSet as) (OrdSet bs) = OrdSet (S.difference as bs)
But when we combine sets of different types, we have to “downgrade” from an OrdSet
to an EqSet
:
-- | Take the disjoint union of two 'FlexiSet' values.
--
-- This is @O(n)@ if both or either of the sets are 'Eq'-based and
-- @O(m*log(n/m + 1)), m <= n@ if both are 'Ord'-based.
--
-- If one set is 'Eq'-based and one is 'Ord'-based, the result will
-- be 'Eq'-based.
disjointUnion :: FlexiSet a -> FlexiSet b -> FlexiSet (Either a b)
disjointUnion (EqSet as) (EqSet bs) = EqSet (map Left as ++ map Right bs)
disjointUnion (EqSet as) (OrdSet bs) = EqSet (map Left as ++ map Right (S.toList bs))
disjointUnion (OrdSet as) (EqSet bs) = EqSet (map Left (S.toList as) ++ map Right bs)
disjointUnion (OrdSet as) (OrdSet bs) = OrdSet (S.disjointUnion as bs)
Wrap up
GADTs are a neat generalisation of regular Haskell data types which allow you to do all sorts of cool things.
For example, in dejafu, my concurrency testing library, I’m using GADTs to:
…unify a few different variations on the same type behind a common interface (source):
-- | A representation of a concurrent program for testing. -- -- To construct these, use the 'C.MonadConc' instance, or see -- 'Test.DejaFu.Conc.withSetup', 'Test.DejaFu.Conc.withTeardown', and -- 'Test.DejaFu.Conc.withSetupAndTeardown'. -- -- @since 2.0.0.0 data Program pty n a where ModelConc :: { runModelConc :: (a -> Action n) -> Action n } -> Program Basic n a WithSetup :: { wsSetup :: ModelConc n x , wsProgram :: x -> ModelConc n a } -> Program (WithSetup x) n a WithSetupAndTeardown :: { wstSetup :: ModelConc n x , wstProgram :: x -> ModelConc n y , wstTeardown :: x -> Either Condition y -> ModelConc n a } -> Program (WithSetupAndTeardown x y) n a
…hide a type variable which doesn’t need to be exposed (source):
-- | A buffered write is a reference to the variable, and the value to -- write. Universally quantified over the value type so that the only -- thing which can be done with it is to write it to the reference. data BufferedWrite n where BufferedWrite :: ThreadId -> ModelIORef n a -> a -> BufferedWrite n
…in a few places (source):
-- | How to explore the possible executions of a concurrent program. -- -- @since 0.7.0.0 data Way where Systematic :: Bounds -> Way Randomly :: RandomGen g => (g -> (Int, g)) -> g -> Int -> Way
In all these cases GADTs let me be more specific about what type information leaks out of a constructor, meaning I can have types which more precisely convey my intent, and not just types which are full of implementation details.