Deconstructing classes

23 June 2021 — by Nicholas Clarke

What is a lawless type class, and should they be avoided? Are orphan instances inherently evil, or just misunderstood? Is global coherence a mistake or a fundamental requirement? In this post we argue that most of the arguments over these questions stem from the fact that type classes embody a number of disparate concepts, with each facet giving rise to perceived wisdom which does not necessarily map to other facets. We look at the various concepts which are combined in type classes, and posit how one might disentangle them.

I am large, I contain multitudes

Functions from Types to Values

Consider the following diagram:

Type ----> Type
 |          ▵
 |          |
 |          |
 ▿          |
Value ---> Value

Haskell has been steadily filling out the arrows here. From Value to Value we have regular functions. GADTs let us go from Value to Type.¹ Type families give us functions from Type to Type. Classes give us the final arrow, that from Type to Value.

Two examples of this spring to mind: the Default class from data-default gives us precisely this — a single value matched to each type, whereas the singletons library uses this functionality to effectively allow case analysis on types, through mapping types to values which, as constructors of a GADT, may then allow us to recover the type information.

Are there other ways we can go from types to values? The use of Typeable allows us to do this, albeit we cannot in such a case prove that we have a value for any given type (or, alternatively, we must provide a value for all types), since we lack the constraint also provided by the class. But note that Typeable itself is a class, whose instances are automatically generated by the compiler.

Seen from this perspective, a class should be seen as defining a function signature, and its collection of instances as collectively defining that function. It is necessarily global (as are all top-level function definitions — provided we have the function in scope, its body is uniquely defined!).

Modularity

What does modularity mean? In essence, just the ability to break something into pieces which may then be used together. Haskell has various flavours of modularity: its module system for one, and the natural composability of pure functions as an extremely powerful other. What we refer to here is something more akin to OCaml’s modules, or something like the classes of object-oriented languages — to wit, a class allows us to group a set of definitions (including type definitions, with associated type families) and refer to them together. In this way, each class instance acts as a module whose interface is given by the class definition.

We can use a type class purely in this fashion with what I call the “makeshift module” pattern:

class CatDB a where
  type family Config a :: Type
  type family Handle a :: Type

  openDB :: Config a -> IO (Handle a)
  getAll :: Handle a -> IO ([Cat])

data MockDB

instance CatDB MockDB where
  type instance Config MockDB = ()
  type instance Handle MockDB = ()

  openDB = pure
  getAll () = pure exampleCatList

In this pattern, the type to which the class is associated serves purely as a name for the module instance — it doesn’t even have any associated values. One may even use this to build OCaml-style functors (that is, modules themselves parametrised over other modules):

class CatDB (DB a) => CatApp a where
  type family DB a :: Type

  ...

Implicit resolution

The third facet to type classes is their use of implicit resolution and global scoping. When we ask for an instance of a class Foo, we do not explicitly provide such an instance but have it provided to us by the compiler, depending on the type.

Do we have any other means for implicit resolution? Indeed we do; the ImplicitParams extension gives us name-directed implicit resolution of parameters. Perhaps one should not be surprised to discover that implicit parameters are themselves implemented using the following class, under the covers:

class IP (x :: Symbol) a | x -> a where
  ip :: a
  {-# MINIMAL ip #-}

This class is imbued by GHC with some custom behaviour to allow local scoping.

From this perspective, we can see a class definition as akin to part of a function signature, identifying the type of an argument, and an instance as “filling in” that part of the signature with a required value.

Global scoping

Related to implicit resolution is the issue of global scoping. When trying to satisfy a constraint Foo a, GHC consults a global table of all type class instances. It expects that (demands!) a unique instance be found².

Global uniqueness is most often thought of in order to provide class coherence. This enables a particular use of type classes, as best evidenced by the use of the Ord constraint in containers such as Set:

fromList :: Ord a => [a] -> Set a
insert :: Ord a => a -> Set a -> Set a

In this snippet we rely on the fact that the means of ordering a given type a must remain the same in the place where we call fromList and the place where we call insert (or simply in successive calls to insert). Were the instance to change, due to local scoping, we could break the internal ordering and do things like accidentally lose elements.

From this angle, then, a class can be seen as imbuing a type with extra semantics (or, depending on one’s perspective, encoding a semantics it might platonically possess).

Many languages have a concept of global scoping more generally. In Bash and older versions of Perl, for example, all variables are global (although Perl has packages, which do provide some namespace division). It’s hard to see why we might want such a thing in Haskell, but could we acquire it? Well, yes - the above implementation of implicit parameters shows us how:

class Global (x :: Symbol) a | x -> a where
  global :: a

instance Global "foo" String where
  global = "Hello world"

Constrained polymorphism

Perhaps it is unfair to leave this facet until last, since in essence it is both the driver beyond all of the preceding sections and the unifier bringing them all together. Why do we have classes in the first place? Because they allow us to do constrained polymorphism, to say neither “this is valid for this type” nor “this is valid for all types”, but somewhere in between. And indeed, this requirement underlies all of the above use cases: it gives the domain of a function from types to values (and hence guarantees that the function is not used partially), it provides a means to depend only on a module interface, it demands implicit resolution so that the constraint can be dispatched by the compiler, rather than having to be proved by the user, and… well, arguably global scoping isn’t required here, but it does make sense from certain perspectives - more on that below.

Do we have other means of constraining polymorphism in Haskell? Only one - we can construct equality constraints of the form a ~ b. Other languages, however, do permit different types of constraint — the most obvious one being subtype polymorphism in object-oriented languages. It is from this perspective that global coherence makes the most sense, when we see a type as being given not just by its structure, but also by the set of operations defined on it (as in an object-oriented language, where a class will have not only fields but methods, as well as superclasses etc.).

From this perspective, both classes and instance declarations can be seen as something like a Prolog “rule”, a deductive technique which the compiler may use in attempting to satisfy a constraint. At the extreme end, it’s the use of this deduction engine that lets us start to do things like type-level programming in Haskell.

The means of dialectical evolution

Let’s go back to our original questions, with this framework in mind. When we talk about a lawless class, what we really talk about is a lawless module (perhaps combined with the fact that these modules will be implicitly and globally resolved). When using a class as a function from types to values, it makes little sense to talk about laws (at least any more than one might be concerned by the laws of a regular function). Likewise when used as a means to pass arguments implicitly.

Similarly, orphan instances mostly pertain to the aspect of global scoping, where we think of the class as adding semantics to the type. From this perspective, it makes sense that those semantics are grouped with the type; indeed, for the semantics of a type to change depending on what we have imported seems like a big bug! From the perspective of classes-as-implicits, though, this restriction seems strange. And where we are using classes purely as modules, it seems irrelevant to talk of orphan instances, since the type to which the instance is attached only serves as a name for the instance; that is to say, the instance is all there is. And since the name may be fully qualified, orphan instances would only be a concern if two people wanted to use the same name (including package name, given PackageImports!) for their module!

The case of orphan instances when viewing classes-as-functions is interesting, since it comes down to allowing open functions⁴. Going back to our diagram above, we can see that our arrows with Value as the domain are closed - one cannot have an open (regular) function or GADT — but the arrows going the other way are open. This is somewhat necessarily true, since our prototypical kind - Type - is itself open, whereas a given type is closed. The restriction against orphan instances then is basically one which helps ensure the function is well-defined.

As for global uniqueness, it goes naturally with global scoping, but is a bit of a weird beast otherwise. Our other perspectives on classes don’t really call for it (one might argue that classes-as-functions would desire this, but in such a case it’s only really important that a function is well-defined at its use site). However, the usage of classes in Haskell does rely heavily on global uniqueness.³

Conclusion

Type classes are powerful tools with various facets to them; we’ve shown how they can be viewed in multiple ways to exploit these individually, and to what extent the various received wisdom around classes maps to these facets.

The interpretation of GADTs as arrows from Value to Type may not be entirely obvious; consider the following types
```
data Even
data Odd

data SParity a where
‎  SEven :: SParity Even
‎  SOdd :: SParity Odd
```
We can think of SParity as mapping each constructor name SEven or SOdd to a type (Even and Odd). In this view, each constructor declaration reads like a branch in a case expression.
↩
Modulo Overlapping and similar pragmas.
↩
Though see Winant 2018 for a discussion of how to differentiate cases requiring global uniqueness from those not.
↩
Open functions - that is, functions whose body need not be defined in one place but could be spread out amongst multiple modules. Consider a function foo :: Int -> String whose value of even numbers is defined in Evens.hs and whose value on odd numbers is defines in Odds.hs.
↩

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← An AsciiDoc processor and Pandoc front-end in Haskell Building OCaml Projects with Bazel →