During my internship at Tweag, I got the opportunity to work on the GHC Haskell compiler, under the mentorship of Richard Eisenberg. For the first third of my internship, I tackled the implementation of Proposal 99. This proposal introduces additional syntax to the language, allowing programmers to manually annotate type variables with their specificity.
In this blog post, I will describe specificity, the proposal features, and why it could prove useful to you as a developer. In order to tackle this first question, let us look at the title of the proposal: “Explicit specificity in type variable binders”.
Specificity?
In order to explain what specificity means, we first have to take a step back and look at type applications.
Type Applications
The TypeApplications language extension was introduced in GHC 8.0.1, in 2016, and is currently used by over 900 packages on Hackage. The original paper by Richard Eisenberg et al. does a great job of motivating and explaining the feature, but it boils down to this (example taken from the paper):
Imagine we want to write a function normalize
which parses a String
and
then pretty-prints the result. The function could for example, remove any
redundant brackets in expressions. Writing this function however, is not
trivial. A first version could look like this, using the predefined
read :: forall a. Read a => String -> a
and
show :: forall a. Show a => a -> String
:
normalize :: forall a. (Show a, Read a) => String -> String
normalize s = show (read s)
However, this code is (rightly) rejected, as GHC can’t infer the output type
of read
, making the code ambiguous. We can solve this by manually
instantiating the polymorphic type of read
as follows:
normalize :: forall a. (Show a, Read a) => String -> String
normalize s = show (read @a s)
This instantiates the return type of read
to be the type variable a
as
declared in the type signature of normalize
.
Note that as the type has yet to be instantiated, visible type application is
required at the call site.
Specificity
So now that we know what visible type application looks like, let’s give it a shot for ourselves. We’ll just enable the language flag, write the first simple polymorphic function that comes to mind, and try instantiating it:
{-# LANGUAGE TypeApplications #-}
module Main where
id' x = x
id_int = id' @Int
main = putStrLn "Hi there!"
Let’s load it up in GHCi:
/home/gertjan/Desktop/TypeApp.hs:7:10: error:
• Cannot apply expression of type ‘p0 -> p0’
to a visible type argument ‘Int’
• In the expression: id' @Int
In an equation for ‘id_int’: id_int = id' @Int
|
7 | id_int = id' @Int
| ^^^^^^^^
Hmm, that’s a bit strange. Instantiating the predefined id
function works fine
though:
id_int = id @Int
So what is really going on here? Well, let’s ask GHCi to clarify by showing the
types, setting -fprint-explicit-foralls
so that we can see the
type variable binders.
*Main> :set -fprint-explicit-foralls
*Main> :info id
id :: forall a. a -> a
*Main> :info id'
id' :: forall {p}. p -> p
What’s happening here is that, because I did not provide a type annotation
for my id'
function, GHC effectively treats its type fundamentally differently
from the type of the (annotated) predefined id
function, defined as follows in
base
:
id :: a -> a
id x = x
Since the predefined version of id
has a type signature which explicitly abstracts
over the type variable a
, this variable is marked as specified.
On the other hand, our id'
function does not have a type signature, making
its type variable p
inferred,
as shown by the braces in the above example.
So why make this distinction? Because the lack of a type signature makes
inferred variable binders inherently a bit unstable. After all, who is to say
that the next update of GHC won’t alter the order of the inferred foralls, for
example? For this reason, type application is limited to specified variables
only: inferred type variables cannot be manually instantiated.
How do other programming languages handle this issue of unstable inferred type variables? Languages like Agda, Java and C++ do feature type instantiation, but as none of them automatically generalises functions without user-defined type signatures, generics or templates, respectively: they do not have inferred type variables. On the other hand, languages like ML do infer polymorphic types (with inferred type variables as a consequence), but do not have syntax for visible type application. Finally, Idris is the only language that I know of that both generalises and features type instantiation. As such, Idris takes a very similar approach to GHC’s: while it does not show this distinction to the programmer, the compiler differentiates between types variables that arise from a user annotation and those that do not, and only allows instantiation of the former. You can find an example of this here.
Explicit Specificity?
To recap: writing a type signature makes the type variables specified, and thus available for instantiation. While the rule seems sensible, this is not always what we might want.
Consider the following type signature:
{-# LANGUAGE PolyKinds, KindSignatures #-}
import Type.Reflection
typeRep :: Typeable (a :: k) => TypeRep (a :: k)
The example is adapted from A Reflection on Types by Peyton
Jones et al. For the purposes of this blog post, it is not important to go into
too much detail on typeRep
, besides looking at its type variables.
While writing this function, the programmer wants to annotate it as kind
polymorphic, by explicitly mentioning the k
kind variable.
Unfortunately, the full type of typeRep
now becomes
forall k (a :: k). Typeable a => TypeRep a
.
The k
variable has become specified, which means that in order to
instantiate a
, users of our function always have to instantiate k
first. Inferring the kind is trivial, which makes having to write something like
typeRep @Type @Int
or even typeRep @_ @Int
in order to instantiate a
quite silly.
The new explicit specificity extension allows us to manually annotate type variables we want to act as inferred variables. We do this by placing the braces around the variables we want to be inferred:
typeRep' :: forall {k} (a :: k). Typeable a => TypeRep a
In summary
Writing a type signature for your functions is always a good idea, as it
serves both as a check of your code and as documentation. Writing a
type signature is also necessary for -XScopedTypeVariables
.
However, writing a type signature, in current GHC, forces all the
variables occurring in the signature to be specified.
This alters the function’s interface! Explicit specificity remedies
this and lets you write exactly the type signature you want.
The original proposal features additional use cases. I encourage anyone who might be interested to have a look at the proposal.
How can I use this?
The code is currently
under review and will be released in a future GHC version.
If you want to try some examples out for yourself, we have a Docker container
available with this development build of GHC.
Just clone the container with
docker pull gertjanb/explicit-specificity-ghc
.
Once the container has finished downloading, you can launch your new
development GHC build using
docker run -it gertjanb/explicit-specificity-ghc
,
and start experimenting. For example, a good way to start is by defining a
simple function (don’t forget to enable the -XRankNTypes
extension):
Prelude> :set -XRankNTypes
Prelude> let foo :: forall a {b}. a -> b -> b ; foo x y = y
Play with the type signature, see what type GHCi assigns to foo
, and try
instantiating the type variables. Have a look at the GHC proposal
to find out more, and see where the new syntax is and isn’t allowed. Have fun!
Will my code break?
For regular Haskell code, the new feature is entirely backwards compatible with existing code bases, since it only introduces new syntax. The update won’t change the behaviour of your code in any way, unless you decide to utilise the new syntax.
The same can’t be said for Template Haskell code, unfortunately. As this new
syntax is available in Template Haskell as well, the language AST had to be
altered to allow for passing this additional information
around. Concretely, TyVarBndr
s are now annotated with a flag to store
additional information. In the case of forall-types and data constructors, they
are annotated with their Specificity
which is either SpecifiedSpec
or
InferredSpec
.
Updating your code should thus be as simple as following the type-checker and
updating the types you wrote with the correct flag.
Closing Remarks
I’m grateful to Tweag for supporting this internship, for encouraging me to work on exciting projects like this and for introducing me to interesting people who share my passion for functional programming. I’d also like to thank Richard Eisenberg for his mentorship, his insights and his enthusiasm for the topic. Finally, I hope you, the reader, enjoyed reading this post. Expect a follow-up post to be arriving soon! :)