In an earlier post about inline-java, a Haskell library for interoperating with Java, we discussed the hardships of coordinating the Haskell and Java runtimes and their respective garbage collectors. In this post I aim to walk you through the upcoming safe interface of the library, which allows detecting memory management mistakes at compile time using linear types.
Managing references in two languages
Recapping the discussion from the previous post, the most ubiquitous source of
mistakes when using inline-java
has been the management of Java references.
f :: IO Int32
f = do
it <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
x0 <- [java| $it.next() |]
deleteLocalRef it
return x0
In this program, it
is a reference to a Java iterator, created
by the first quasiquotation. The reference it
needs to be destroyed
eventually or the garbage collector in the Java Virtual Machine (JVM)
won’t be able to reclaim the memory it points to.
This requires a bit of care.
If the reference is collected prematurely, the behavior of the program
will become undefined when the reference is used after it is destroyed.
If the reference is collected too late, the garbage collector in
Java might not be able to prevent memory from being exhausted in some
executions of the program.
In general, the more resources a program has to manage, the more
opportunities there are to get it wrong. Memory is a case of a resource
which is used very often, and managing it is so difficult
that we reach for garbage collectors, or tools like Valgrind’s
memcheck
to cope with it. But no such tool exists for the case of
inline-java
.
A shared garbage collector for the Haskell and the Java runtimes would solve the problem. But alas, we have two garbage collectors instead. Therefore, we are reduced to request and destroy references explicitly from the JVM runtime.
A language solution
The appearance, on the horizon, of linear types for Haskell brings the opportunity to use the compiler as a tool for checking resource management. When a function has an argument declared as linear, the compiler will check and demand that the argument is used exactly once. It turns out that this restriction can be leveraged to enforce some operations to occur in a specific order.
I’ve added a new memory-safe interface to inline-java
based on linear
types. With this new interface, our example would become
f :: Linear.IO Int32
f = do
it <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
[java| $it.next() |]
In a linear setting, the monad in which these operations
run declares
that the bound variables are linear and therefore should be used
exactly once. This is the case of the reference it
, which we
originally used twice: once for reading the next element and once to
destroy it. Because the compiler doesn’t allow this anymore, we adjust
the meaning of the java
quasiquotation so it
deletes any references that it gets from Haskell before returning.
In this way, it is not possible to write a program that neglects to destroy a reference. Nor is it possible to write a program that destroys the reference prematurely.
f :: Linear.IO Int32
f = do
it <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
deleteLocalRef it
x0 <- [java| $it.next() |] -- compilation error: 'it' is used more than once
it2 <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
return x0 -- compilation error: 'it2' is used less than once
But what if we wanted to read the next element of an iterator a second time? The compiler would reject the following program.
f :: Linear.IO Int32
f = do
it <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
x0 <- [java| $it.next() |]
x1 <- [java| $it.next() |] -- compilation error: 'it' is used more than once
return (x0 + x1)
To placate the compiler, we can duplicate the reference ahead of using it.
f :: Linear.IO Int32
f = do
it1 <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
(it2, it3) <- newLocalRef it1
x0 <- [java| $it2.next() |]
x1 <- [java| $it3.next() |]
return (x0 + x1)
Now it1
is used exactly once to produce two other references it2
and it3
. They all refer to the same iterator, and each of it2
and
it3
can be used to read an element.
Duplicating references like this is, arguably, undesirable bookkeeping to please the compiler. Eventually, I hope to take this overhead away from the hands of the programmer, either via GHC plugins or further extensions to the language.
Escaping linearity
A nuance of using a linear monad in this setting is that demanding every bound variable to be used exactly once means that even the integers that we are reading from the iterator need to comply. Integers certainly take some memory, but they don’t affect the JVM runtime, and they are managed by the garbage collector in Haskell.
f :: Linear.IO Int32
f = do
it1 <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
x0 <- [java| $it1.next() |]
return (x0 + x0) -- compilation error: 'x0' must be used only once
The fix here is to use the Unrestricted
type, which allows the
integer to escape the linear restriction.
f :: Linear.IO Int32
f = do
it1 <- [java| java.util.Arrays.asList(0, 1, 2).iterator() |]
Unrestricted x0 <- [java| $it1.next() |]
return (x0 + x0)
The java
quasiquotations can both take unrestricted values as inputs
and produce unrestricted values as output.
Exceptions
Special provisions need to be taken to clean up resources in the presence of exceptions. Consider the following program, which uses an iterator with no elements to yield.
f :: Linear.IO (Unrestricted Int32)
f = do
it1 <- [java| java.util.Arrays.asList().iterator() |]
(it2, it3) <- newLocalRef it1
Unrestricted x0 <- [java| $it2.next() |]
Unrestricted x1 <- [java| $it3.next() |]
return $ Unrestricted (x0 + x1)
The first call to next
produces an exception complaining that
there are no more elements in the iterator. The last
quasiquotation will not execute. Even though it is responsible for cleaning
up the reference it3
! Without further measures, if the program
recovers from the exception, it3
won’t be destroyed, and yet
it will be unreachable to the Haskell runtime.
The current solution is to surround manually all uses of the linear
monad with a function withLocalFrame
coming from the jni
package.
import Foreign.JNI.Safe
(withLocalFrame) -- :: Linear.IO (Unrestricted a) -> IO a
main :: IO
main = withLocalFrame f >>= print
This function runs the argument in a local frame, a concept of the JNI interface. A local frame is a scope in a thread of the program. All local references created during the scope belong to the frame, and are destroyed automatically when the scope ends if they haven’t been destroyed yet.
In our running example, this means that it3
will stay alive until
the exception reaches withLocalFrame
, which will have it3
destroyed
before propagating the exception.
There are safer and more flexible solutions to deal with exceptions, like resourcet or type-level monadic regions. Let me defer refining this aspect, though, until we gain some more experience with the safe interface.
Final remarks
In this post I have shown how the safe interface of inline-java
helps avoiding the most common mistakes when managing references to
Java objects.
We depend on the implementation of the
LinearTypes proposal to
be merged into GHC. But the safe inline-java
can be used already
with the forked linear-types-enabled GHC.
Linear types are not the first solution proposed in Haskell for resource management, but as we have discussed in the past, we still believe it is the most effective solution for integrating Haskell with the JVM currently.