Almost a million years ago, I was dealing with some sinister bugs inside the
data structures in linear-base
and to stop the nightmares I decided to just
test the specification of the data structures themselves. I ended up using
something that I’ve been calling denotational homomorphic testing. In this
post, I’ll walk through how I ended up with this and why this is legitimately helpful.
Our Toy Example
Let’s consider a toy example which corresponds to one of the data structures I
was working with; see PR #263. How shall we test a specification of a simple Set
implementation shown below?
-- Constructors / Modifiers
empty :: Set a
insert :: Keyed a => a -> Set a -> Set a
delete :: Keyed a => a -> Set a -> Set a
intersect :: Keyed a => Set a -> Set a -> Set a
union :: Keyed a => Set a -> Set a -> Set a
-- Accessors
member :: Keyed a => a -> Set a -> Bool
size :: Keyed a => Set a -> Int
Specification via Axioms
My first idea was to create a specification for my data structures via axioms, and then property test those axioms.
I actually got this idea from my introductory computer science course. Our professor chose to introduce us to data structures by understanding axioms that functionally specify their behavior. Together, these axioms specified “simplification” rules that would define the value of the accessors on an arbitrary instance of that data structure. Think of it as defining rules to evaluate any well typed expression whose head is an accessor. The idea being that if you define the interface for an arbitrary use of the data structure by an external program, then you’ve defined what you want (functionally) from the data structure1.
In this system, we need at least one axiom for each accessor applied to each
constructor or modifier. If we had accessors and constructors or
modifiers, we would need at least axioms. For instance, for the accessor
member
, we would need at least 5 axioms, one for each of the 5 constructors
or modifiers, and these are some of the axioms we would need:
-- For all x, y != x,
member empty x == False
member (insert x s) x == True
member (insert x s) y == member s y
Intuitively, the axioms provide a way to evaluate any well typed expression starting
with member
to either True
or False
.
With this specification by axioms, I thought I could just property test each axiom
in hedgehog
and provided my samples were large enough and random enough, we’d have
a high degree of confidence in each axiom, and hence in the specification and everyone would
live happily ever after. It was a nice dream while it lasted.
It turns out that there are just too many axioms to test. For instance,
consider the axioms for size
on intersection
.
size (intersect s1 s2) = ...
You would effectively need axioms that perform the intersection on two terms
s1
and s2
. A few of simple axioms would include the following.
size (intersect empty s2) = 0
-- For x ∉ s1
size (intersect (insert x s1) s2) = boolToInt (member x s2) + size (intersect s1 s2)
So, if we can’t come up with a specification that’s easy to specify, much less easy to test, what are we to do?
Specification via Denotational Semantics
I looked at a test written by my colleague Utku Demir and it sparked an idea. Utku wrote some tests, actually for arrays, that translated operations on arrays to analogous operations on lists:
f <$> fromList xs == fromList (f <$> xs)
fromList :: [a] -> Array a
I asked myself whether we could test sets by testing whether they map onto lists
in a sensible way. I didn’t understand this at the time, but I was really testing
a semantic function that was itself a specification for Set
.
Denotational Semantics
Concretely, consider the following equations.
-- toList :: Set a -> [a]
-- toList (1) does not produce repeats and (2) is injective
-- Semantic Function
--------------------
toList empty == []
toList (insert x s) == listInsert x (toList s)
-- listInsert x l = nub (x : l)
toList (delete x s) == listDelete x (toList s)
-- listDelete x = filter (/= x)
toList (union s1 s2) == listUnion (toList s1) (toList s2)
-- listUnion a b = nub (a ++ b)
toList (intersect s1 s2) == listIntersect (toList s1) (toList s2)
-- listIntersect a b = filter (`elem` a) b
-- Accessor Axioms
------------------
member x s == elem x (toList s)
size s == length (toList s)
These effectively model a set as a list with no repeats. This model is a
specification for Set
. In other words, we can say our Set
implementation
is correct if and only if it corresponds to this model of lists without
repeats.
Consequences of Denotational Semantics
Why is this so, you ask me? Well, I’ll tell you. The first five equations cover
each constructor or modifier and hence fully characterise toList
. We
say that toList
is
a semantic function. Any expression using Set
either is a
Set
or is composed of operations that use the accessors. Thus, any
expression using Set
s can be converted to one using lists.
Moreover, since
toList
is injective, we can go back and forth between expressions in
the list model and expressions using Set
s.
For instance, we can prove that for any sets s1
and s2
,
insert x (union s1 s2) = union (insert x s1) s2
. First we have:
toList (insert x (union s1 s2)) = -- Function inverses
listInsert x (toList (union s1 s2)) = -- Semantic function
listInsert x (nub ((toList s1) ++ (toList s2))) = -- Semantic function
nub (listInsert x (toList s1) ++ (toList s2)) = -- List properties
nub (toList (insert x s1) ++ (toList s2)) = -- Semantic function
toList (union (insert x s1) s2) = -- Semantic function
Then by injectivity of toList
we deduced, as expected:
insert x (union s1 s2) =
union (insert x s1) s2
This actually means that our denotational specification subsumes the axiomatic specification that we were reaching for at the beginning of the post.
Now, there’s an interesting remark to make here. Our axiomatic system had axioms
that all began with an accessor. The accessors’ denotations
only have toList
on the right-hand side, removing the need for injectivity
of the semantic function.
For instance, member (insert s x) x = elem x (toList (insert s x))
which simplifies to elem x (nub (x:toList s))
which is clearly True
.
The injectivity of the semantic function is only needed
for equalities between sets like the one in the proof above.
Denotational Homomorphic Testing
In summary, we now have an equivalent way to give a specification of Set
in
terms of laws that we can property test. Namely, we could property test all the
laws that define the semantic function, like toList empty = []
, and
the accessors axioms, like member x s = elem x (toList s)
, and this would
test that our implementation of Set
met our specification. And to boot,
unlike before, we’d only write one property for each constructor, accessor or
modifier. If we had accessors and constructors and modifiers, we’d need
to test about laws.
Stated abstractly, we have the following technique. Provide a specification (à
la denotational semantics) of a data structure D
by modeling it by
another simpler data structure T
. The model consists of a semantic function
sem :: D -> T
and axioms that correspond constructors on D
to
constructors of T
and accessors on D
to accessors on
T
, e.g. acc someD = acc' (sem someD)
. Now, test the specification by
property testing each of the sem
axioms. These
axioms look like homomorphisms in that operations on D
s are recursively
translated to related operations on T
s, which preserves some semblance of
structure (more on this below). If our property testing goes well, we have a
strong way to test the implementation of D
.
Related Work
It’s critical that I note that providing and testing a specification for data structures is not at all a novel problem. It has been well studied in a variety of contexts. Two of the most relevant approaches to this technique are in Conal Elliot’s paper Denotational design with type class morphisms and the textbook Algebra Driven Design.
In Elliot’s paper, he outlines a design principle: provide denotational semantics
to data structures such that the semantic function is a homomorphism over any
type classes shared by the data structure at hand and the one mapped to.
Applying this to our example, this would be like creating an IsSet
type
class, having an instance for Ord a => IsSet [a]
and Ord a => IsSet (Set a)
, and verifying that if we changed each equation in the former to use
toList
, it would be the corresponding equation in the latter. So, for example
class IsSet s where
member :: a -> s a -> Bool
-- ... etc
-- Actual homomorphism property:
--
-- > member x (toList s) = member x s
--
where the property to test is actually a homomorphism.
This has the same underlying structure to our approach and the type class laws
that hold on the list instance of IsSet
prove the laws hold on the Set
instance, analogous to how facts on lists prove facts about Set
s.
In Algebra Driven Design, the approach is similar. First generate axioms on
a reference implementation RefSet a
(which might be backed by lists) via
quickspec
and test those. Then, property test the homomorphism laws from
Set
to RefSet
.
Concluding Thoughts
To conclude, it seems to me that denotational homomorphic testing is a pretty cool way to think about and test data structures. It gives us a way to abstract a complex data structure by making a precise correspondence to a simpler one. Facts we know about the simpler one prove facts about the complex one, and yet, the correspondence is itself the specification. Hence proving the correspondence would be a rigorous formal verification. Falling a few inches below that, we can property test the correspondence because it’s in the form of a small number of laws and this in turn gives us a high degree of confidence in our implementation.
-
Think about the batman line: “It’s not who I am underneath, but what I do that defines me”.
↩