Tweag.io has a bit of a history with language interop. By this point, we created or collaborated with others in the community on HaskellR, inline-c, inline-java, and now inline-js. The original idea for this style of interop was realized in language-c-inline by Manuel Chakravarty a few years before joining, concurrently to HaskellR. Manuel wrote a blog post about the design principles that underpin all these different libraries. Others in the community have since created similar libraries such as clr-inline, inline-rust and more. In this post, we’ll present our latest contribution to the family: inline-js.
The tagline for inline-js: program Node.js from Haskell.
A quick taste of inline-js
Here is a quick demo of calling the Node.js DNS Promises API to resolve a domain:
import Data.Aeson
import GHC.Generics
import Language.JavaScript.Inline
data DNSRecord = DNSRecord
{ address :: String
, family :: Int
} deriving (FromJSON, Generic, Show)
dnsLookup :: String -> IO [DNSRecord]
dnsLookup hostname =
withJSSession
defJSSessionOpts
[block|
const dns = (await import("dns")).promises;
return dns.lookup($hostname, {all: true});
|]
To run it in ghci
:
*Blog> dnsLookup "tweag.io"
[DNSRecord {address = "104.31.68.163", family = 4},DNSRecord {address = "104.31.69.163", family = 4},DNSRecord {address = "2606:4700:30::681f:44a3", family = 6},DNSRecord {address = "2606:4700:30::681f:45a3", family = 6}]
We can see that the A/AAAA records of tweag.io
are returned as Haskell values.
This demo is relatively small, yet already enough to present some important features described below.
The QuasiQuoters
In the example above, we used block
to embed a JavaScript snippet. Naturally, two
questions arise: what content can be quoted, and what’s the generated
expression’s type?
block
quotes a series of JavaScript statements, and in-scope Haskell
variables can be referred to by prefixing their names with $
. Before
evaluation, we wrap the code in a JavaScript async function, and this clearly
has advantages against evaluating unmodified code:
- When different
block
s of code share aJSSession
, the local bindings in oneblock
don’t pollute the scope of anotherblock
. And it’s still possible to add global bindings by explicitly operating onglobal
; these global bindings will persist within the sameJSSession
. - We can
return
the result back to Haskell any time we want; otherwise we’ll need to ensure the last executed statement happens to be the result value itself, which can be tricky to get right. - Since it’s an async function, we have
await
at our disposal, so working with async APIs becomes much more pleasant.
When we call dnsLookup "tweag.io"
, the constructed JavaScript code looks like
this:
;(async $hostname => {
const dns = (await import("dns")).promises
return dns.lookup($hostname, { all: true })
})("tweag.io").then(r => JSON.stringify(r))
As we can see, the Haskell variables are serialized and put into the argument
list of the async function. Since we’re relying on FromJSON
to parse the
result in this case, the result of the async function is further mapped with
JSON.stringify
.
We also provide an expr
QuasiQuoter when the quoted code is expected to be a
single expression. Under the hood it adds return
and reuses the implementation
of block
, to save a few keystrokes for the user.
Haskell/JavaScript data marshaling
The type of block
’s generated expression is JSSession -> IO r
, with hidden
constraints placed on r
. In our example, we’re returning [DNSRecord]
which
has a FromJSON
instance, so that instance is picked up, and on the JavaScript
side, JSON.stringify()
is called automatically before returning the result
back to Haskell. Likewise, since hostname
is a String
which supports
ToJSON
, upon calling dnsLookup
, hostname
is serialized to a JSON to be
embedded in the JavaScript code.
For marshaling user-defined types, ToJSON
/FromJSON
is sufficient. This
is quite convenient when binding a JavaScript function, since the
ToJSON
/FromJSON
instances are often free due to Haskell’s amazing generics
mechanism. However, there are also a few other useful non-JSON types which are
supported here. These non-JSON types are:
- The
ByteString
types in thebytestring
package, including strict/lazy/short versions. It’s possible to pass a HaskellByteString
to JavaScript, which shows up as aBuffer
. Going in the other direction works too. - The
JSVal
type which is an opaque reference to a JavaScript value, described in later sections of this post. - The
()
type (only as a return value), meaning that the JavaScript return value is discarded.
Ensuring the expr
/block
QuasiQuoters work with both JSON/non-JSON types
involves quite a bit of type hackery, so we hide the relevant internal classes
and it’s currently not possible for inline-js
users to add new such non-JSON
types.
Importing modules & managing sessions
When prototyping inline-js
, we felt the need to support the importing of
modules, either built-in or user-supplied ones. Currently, there are two
different import mechanisms coexisting in Node.js: the old CommonJS-style
require()
and the new ECMAScript native import
. It’s quite non-trivial to
support both, and we eventually chose to support ECMAScript dynamic import()
since it works out-of-the-box on both web and Node, making it more future-proof.
Importing a built-in module is straightforward: import(module_name)
returns a
Promise
which resolves to that module’s namespace object. When we need to
import npm
-installed modules, we need to specify their location in the
settings to initialize JSSession
:
import Data.ByteString (ByteString)
import Data.Foldable
import Language.JavaScript.Inline
import System.Directory
import System.IO.Temp
import System.Process
getMagnet :: String -> FilePath -> IO ByteString
getMagnet magnet filename =
withSystemTempDirectory "" $ \tmpdir -> do
withCurrentDirectory tmpdir $
traverse_
callCommand
["npm init --yes", "npm install --save --save-exact webtorrent@0.103.1"]
withJSSession
defJSSessionOpts {nodeWorkDir = Just tmpdir}
[block|
const WebTorrent = (await import("webtorrent")).default,
client = new WebTorrent();
return new Promise((resolve, reject) =>
client.add($magnet, torrent =>
torrent.files
.find(file => file.name === $filename)
.getBuffer((err, buf) => (err ? reject(err) : resolve(buf)))
)
);
|]
Here, we rely on the webtorrent
npm package to implement a
simple BitTorrent client function getMagnet
, which fetches the file content
based on a magnet
URI and a filename. First, we allocate a temporary directory
and run npm install
in it; then we supply the directory path in the
nodeWorkDir
field of session config, so inline-js
knows where node_modules
is. And finally, we use the webtorrent
API to perform downloading, returning
the result as a Haskell ByteString
.
Naturally, running npm install
for every single getMagnet
call doesn’t sound
like a good idea. In a real world Haskell application which calls npm-installed
modules with inline-js
, the required modules shall be installed by the package
build process, e.g. by using Cabal hooks to install to the package’s data
directory, and getMagnet
can use the data directory as the working directory
of Node.
Now, it’s clear that all code created by the QuasiQuoters in inline-js
requires a JSSession
state, which can be created by newJSSession
or
withJSSession
. There are a couple of config fields available, which allows one
to specify the working directory of Node, pass extra arguments or redirect
back the Node process standard error output.
How it works
Interacting with Node from Haskell
There are multiple possible methods to interact with Node in other applications, including in particular:
- Whenever we evaluate some code, start a Node process to run it, and fetch the result either via standard output or a temporary file; persistent Node state can be serialized via structural cloning. This is the easiest way but also has the highest overhead.
- Use pipes/sockets for IPC, with
inline-js
starting a script to get the code, perform evaluation and return results, reusing the same Node process throughout the session. This requires more work and has less overhead than calling Node for each call. - Use the Node.js N-API to build a native addon, and whatever
Haskell application relying on
inline-js
gets linked with the addon, moving the program entry point to the Node side. We have ABI stability with N-API, and building a native addon is surely less troublesome than building the whole Node stack. Although the IPC overhead is spared, this complicates the Haskell build process. - Try to link with Node either as a static or dynamic library, then directly call internal functions. Given that the build system of Node and V8 is a large beast, we thought it would take a considerable amount of effort; even if it’s known to work for a specific revision of Node, there’s no guarantee later revisions won’t break it.
The current implementation uses the second method listed above.
inline-js
starts an “eval server” which passes binary messages
between Node and the host Haskell process via a pair of pipes. At the
cost of a bit of IPC-related overhead, we make inline-js
capable of
working with multiple installations of Node without recompiling. The
schema of binary messages and implementation of “eval server” is
hidden from users and thus can evolve without breaking the exposed API
of inline-js
.
The “eval server”
The JavaScript specification provides the eval()
function, allowing a
dynamically constructed code string to be run anywhere. However, it’s better to
use the built-in vm module of Node.js, since it’s possible to supply
a custom global
object where JavaScript evaluation happens, so we can prevent
the eval server’s declarations leaking into the global scope of the evaluated
code, while still being able to add custom classes or useful functions to the eval server.
Once started, the eval server accepts binary requests from the host Haskell
process and returns responses. Upon an “eval request” containing a piece of
UTF-8 encoded JavaScript code, it first evaluates the code, expecting a
Promise
to be returned. When the Promise
resolves with a final result, the
result is serialized and returned. Given the asynchronous nature of this
pipeline, it’s perfectly possible for the Haskell process to dispatch a batch of
eval requests, and the eval server to process them concurrently, therefore we
also export a set of “async” APIs in Language.JavaScript.Inline
which
decouples sending requests and fetching responses.
On the Haskell side, we use STM to implement send/receive queues, and they are
accompanied by threads which perform the actual sending/receiving. All user-facing
interfaces either enqueue a request or try to fetch the corresponding
response from a TVar
, blocked if the response is not ready yet. In this way,
we make almost all exposed interfaces of inline-js
thread-safe.
Marshaling data based on types
Typically, the JavaScript code sent to the eval server is generated by the QuasiQuoter’s returned code, potentially including some serialized Haskell variables in the code, and the raw binary data included in the eval response is deserialized into a Haskell value. So how are the Haskell variables recognized in quoted code, and how does the Haskell/JavaScript marshaling take place?
To recognize Haskell variables, it’s possible to simply use a simple regex to
parse whatever token starting with $
and assume it’s a captured Haskell
variables, yet this introduces a lot of false positives, e.g. "$not_var"
,
where $not_var
is actually in a string. So in the QuasiQuoters of inline-js
,
we perform JavaScript lexical analysis on quoted code, borrowing the lexer in
language-javascript
. After the Haskell variables are found, the QuasiQuoters
generate a Haskell expression including them as free variables, and at runtime,
they can be serialized as parts of the quoted JavaScript code.
To perform type-based marshaling between Haskell and JavaScript data, the
simplest thing to do is solely relying on aeson
’s FromJSON
/ToJSON
classes.
All captured variables should have a ToJSON
instance, serialized to JSON which
is also a valid piece of ECMAScript, and whatever returned value should also have
a FromJSON
instance. However, there are annoying exceptions which aren’t
appropriate to recover from FromJSON
/ToJSON
instances.
One such type is ByteString
. It’s very important to be able to support
Haskell ByteString
variables and expect them to convert to Buffer
on the
Node side (or vice versa). Unfortunately, the JSON spec doesn’t have a special
variant for raw binary data. While there are other cross-language serialization
schemes (e.g. CBOR) that support it, they introduce heavy npm dependencies to
the eval server. Therefore, a reasonable choice is: expect inline-js
users to
solely rely on FromJSON
/ToJSON
for their custom types, while also supporting
a few special types which have different serialization logic.
Therefore, we have a pair of internal classes for this purpose: ToJSCode
and
FromEvalResult
. All ToJSON
instances are also ToJSCode
instances, while
for ByteString
, we encode it with base64 and generate an expression which
recovers a Buffer
and is safe to embed in any JavaScript code. The
FromEvalResult
class contains two functions: one to generate a
“post-processing” JavaScript function that encodes the result to binary on the
Node side, another to deserialize from binary on the Haskell side. For the
instances derived from FromJSON
, the “post-processing” code is r => JSON.stringify(r)
,
and for ByteString
it’s simply r => r
.
To keep the public API simple, ToJSCode
and FromEvalResult
are not exposed, and
although type inference is quite fragile for QuasiQuoter output, everything
works well as long as the relevant variables and return values have explicit
type annotations.
Passing references to arbitrary JavaScript values
It’s also possible to pass opaque references to arbitrary JavaScript values
between Haskell and Node. On the Haskell side, we have a JSVal
type to
represent such references, and when the returned value’s type is annotated to be
a JSVal
, on the Node side, we allocate a JSVal
table slot for the result
and pass the table index back. JSVal
can also be included in quoted JavaScript
code, and they convert to JavaScript expressions which fetch the indexed value.
Exporting Haskell functions to the JavaScript world
Finally, here’s another important feature worth noting: inline-js
supports a
limited form of exporting Haskell functions to the JavaScript world! For
functions of type [ByteString] -> IO ByteString
, we can use exportHSFunc
to
get the JSVal
corresponding to a JavaScript wrapper function which calls this
Haskell function. When the wrapper function is called, it expects all parameters
to be convertible to Buffer
, then sends a request back to the Haskell process.
The regular response-processor Haskell thread has special logic to handle them;
it fetches the indexed Haskell function, calls it with the serialized JavaScript
parameters in a forked thread, then the result is sent back to the Node side.
The wrapper function is async and returns a Promise
which resolves once the
expected response is received from the Haskell side. Due to the async nature of
message processing on both the Node and Haskell side, it’s even possible for
an exported Haskell function to call into Node again, and it also works the
other way.
Normally, the JavaScript wrapper function is async, and async functions work
nicely for most cases. There are corner cases where we need the JavaScript
function to be synchronous, blocking when the Haskell response is not ready and
returning the result without firing a callback. One such example is WebAssembly
imports: the JavaScript embedding spec of WebAssembly doesn’t allow async
functions to be used as imports since this involves the “suspending” and
“resuming” of WebAssembly instance state, which might be not economical to
implement in today’s JavaScript engines. Therefore, we also provide
exportSyncHSFunc
which makes a synchronous wrapper function to be used in such
scenarios. Since it involves completely locking up the main thread in Node
with Atomics
, this is an extremely heavy hammer and should be used with much
caution. We also lose reentrancy with this “sync mode”; when the exported
Haskell function calls back into Node, the relevant request will be forever
stuck in the message queue, freezing both the Haskell/Node process.
Summary
We’ve presented how inline-js
allows JavaScript code to be used
directly from Haskell, and explained several key aspects of
inline-js
internals. The core ideas are quite simple, and the
potential use cases are potentially endless, given the enormous
ecosystem the Node.js community has accumulated over the past few
years. Even for development tasks that are not specifically tied to
Node.js, it is still nice to have the ability to easily call relevant
JavaScript libraries, to accelerate prototyping in Haskell and to
compare correctness/performance of Haskell/JavaScript implementations.
There are still potential improvements to make, e.g. implementing
type-based exporting of Haskell functions. But we decided that now is
a good time to announce the framework and collect some first-hand
user experience, spot more bugs and hear user opinions on how it can be
improved. When we get enough confidence from the feedback of seed
users, we can prepare an initial Hackage release. Please spread the
word, make actual stuff with inline-js
and tell us what you think :)