Services
BiotechFintechAutonomous Vehicles
Open sourceContactCareersTeamResearchBlog
10 July 2018 — by Divesh Otwani, Nicholas Clarke
Funflow Example: emulating Make
haskell

Funflow is a workflow management tool. It turns out that workflow management tools and build tools are closely related. So if you’re more familiar with the latter, this post might be of interest to you. We’ll be illustrating some of Funflow’s features by implementing Make on top.

All the code for this example as well as some documentation can be found here.

Today’s example: Make

Our example is a simple version of GNU’s Make restricted to building C files (though it could be generalized). It takes a makefile that looks like this

source-files: main.cpp factorial.cpp hello.cpp functions.h

hello: main.o factorial.o hello.o functions.h
    g++ main.o factorial.o hello.o -o hello

main.o: main.cpp functions.h
    g++ -c main.cpp

factorial.o: factorial.cpp functions.h
    g++ -c factorial.cpp

hello.o: hello.cpp functions.h
    g++ -c hello.cpp

and can build the hello executable:

$ ls
factorial.cpp  functions.h  hello.cpp  main.cpp  makefile
$ stack exec makefile-tool
$ ./hello
Hello World!
The factorial of 5 is 120

Because we used Funflow, our tool has several desirable properties, both of the tool and of the code itself:

  • No repeated builds: If we’ve built something before, we don’t build it again. Period. With make, if you make a tiny change and find out it isn’t what you want, when you revert back make will rebuild something it had built before. Our tool won’t.
  • Enforced target dependencies: We build each target file in a docker container with exactly the dependencies listed in the Makefile. There’s no opportunity for hidden dependencies or hidden requirements on the environment. Further, such ‘hidden’ preconditions are caught early and flagged making it easy to fix the Makefile.
  • Clean Sequencing Code: The function that makes a target file sequences file processing, recursive calls for making the dependencies of the given target, and running docker containers. Usually, this sequencing is messy and difficult to follow. With Funflow, however, we can inject these various forms of computation into Flows and sequence them seamlessly with arrow notation.
  • Concise Code: Because of the abstractions Funflow provides, we can focus on the essential algorithm and get some efficiency & safety for free.

No Repeat Builds: CAS Gives Us Free Dynamic Programming

The essential function is buildTarget which, given a Makefile and a specific rule in that Makefile, creates a flow to build the target specified by that rule. A rule specifies a target file by a list of file dependencies and a command to build that target file. Each dependency is either a source file or itself a target file.

Here is a slightly simplified version of that function:

-- For readability, we introduce a type alias for Flow
type a ==> b = Flow a b

buildTarget :: MakeFile -> MakeRule -> (() ==> (Content File))
buildTarget mkfile target@(MakeRule targetNm deps cmd) = proc _ -> do
     -- 1) Get source file contents
     contentSrcFiles <- grabSrcsFlow -< neededSources
     -- 2) Zip these with the names of each source file
     let fullSrcFiles = Map.fromList $ zip neededSources contentSrcFiles
     -- 3) Recursively build all dependent targets
     depFiles <- flowJoin depTargetFlows -< (replicate countDepFlows ())
     -- 4) Compile this file given
       -- a) the name of the target
       -- b) the dependencies that are source files
       -- c) the dependencies that were targets
       -- d) the gcc command to execute
    compiledFile <- compileFile -< (targetNm, fullSrcFiles, depFiles,cmd)
    returnA -< compiledFile
  where
    srcfiles = sourceFiles mkfile
    neededTargets = Set.toList $ deps `Set.difference` srcfiles
    neededSources  = Set.toList $ deps `Set.intersection` srcfiles
    depRules = (findRules mkfile neededTargets :: [MakeRule])
    depTargetFlows = map (buildTarget mkfile) depRules
    countDepFlows = length depTargetFlows
    grabSources srcs = traverse (readFile . ("./" ++)) srcs
    grabSrcsFlow = stepIO grabSources

The code for building the flow is straightforward.

First, we do some file processing to get the dependent source files (in steps 1 & 2). Then, we recursively call buildTarget to create flows for each of the dependent target files and then run those flows to build the dependent targets (in step 3). Finally, we put these dependencies in a docker container in which we run the gcc command and extract the produced target file (provided the compilation succeeded).

On the surface, this code appears inefficient. Step 3, the recursive building of dependent target files, looks like it repeats a lot of work since many of the recursive calls are identical. For example, if the rule for main.o in our sample Makefile listed factorial.o as a dependency, it looks like this code would build factorial.o twice -once as a dependency of hello and once as a dependency of main.o. If factorial.o took a long time to build, this would be disastrous.

Yet, this code isn’t inefficient? Why? This problem of ‘overlapping’ recursive calls is solved by dynamic programming, the algorithmic design pattern of saving the values of function calls in a dictionary and checking this dictionary to avoid repeated recursive calls. Funflow’s caching gives this to us for free. Hence, we can write DP-algorithms without dealing with the added complexity of keeping track of the dictionary ourselves. This is precisely what happens in buildTarget.

Each time a file is compiled with compileFile, a hash of the inputs and output is cached. So, on a repeated recursive call building say, target4.o, the use of compiledFile is constant time.

However, this goes a step further. Because of Funflow’s caching our tool, unlike GNU’s make, doesn’t re-build targets even after reverting back changes. Say factorial.cpp took a long time to build and was part of a larger project. Suppose further that to fix a bug in this large project, we tried something out in factorial.cpp and found out it didn’t work. When we revert back, our tool would build factorial.cpp in constant time whereas plain ol’ make would not.

Enforced Dependencies

makefile-tool also showcases Funflow’s docker capabilities. We compile each target in its own docker container with exactly its list of dependencies.

This prevents hidden preconditions. If some rule fails to mention a source file it needs, the docker container in which the rule’s target file is being compiled won’t have that missing source file. Hence, if a file dependency is missing from the dependency list of a certain target, that target won’t build. In the same vein, if there was a hidden requirement on the environment, like a custom gcc executable, the build would fail inside the docker container.

Further, this approach doesn’t merely prevent hidden preconditions; it flags them. For example, if our Makefile above had the line factorial.o : factorial.cpp instead of factorial.o : factorial.cpp functions.h, we would get an error from gcc indicating that it couldn’t find functions.h. With ordinary make, the build would succeed and this hidden precondition would last. Within the scope of a large project, this could be maddening.

This dependency enforcement could work for more than just C projects. As suggested earlier, the makefile-tool can be generalized. We could extend this tool by having the user provide a docker container for the build command and a way of directing the naming of the target file. (For C projects, this would be the gcc container with the -o command line argument.)

Clean Sequencing: Diverse Injection & Arrow Notation

This example also demonstrates Funflow’s ability to inject various forms of computation into Flows and sequence them in fancy ways. This makes the code readable and maintainable.

Injecting IO and External Steps

At one point, we needed to inject the IO action

parseRelFile :: String -> IO (Path Rel File)

and could easily do so:

flowStringToRelFile :: Flow String (Path Rel File)
flowStringToRelFile = stepIO parseRelFile

Then, inside the compileFile flow, we were able to sequence this with dockerFlow, a flow that was made from an external step. That is, dockerFlow was made from the Funflow library function docker:

import qualified Control.Funflow.External.Docker as Docker
import qualified Control.Funflow.ContentStore as CS

docker :: (a -> Docker.Config) -> Flow a CS.Item

dockerConfFn :: (Content Dir, Content File) -> Docker.Config

dockerFlow = docker dockerConfFn


data Config = Config
  { image      :: T.Text        --  e.g., "gcc", "ubuntu", "python".
  , optImageID :: Maybe T.Text
  , input      :: Bind          --  Files in the content store & locations to
                                --  put them in the docker container. This
                                --  includes a bash script to execute.
  , command    :: FilePath      --  A path to the bash script.
  , args       :: [T.Text]      --  Command line arguments to 'command'.
} deriving Generic

Then by using arrow notation, a generalization of ‘do notation’, we were able to elegantly and easily sequence these Flows. Without going into the details, we can see the sequencing is as readable and intuitive as ‘do notation’:

compiledFile <- dockerFlow -< (inputDir,compileScript)
relpathCompiledFile <- flowStringToRelFile -< tf
returnA -< (compiledFile :</> relpathCompiledFile)

Further, these properties persist when we need more sophisticated forms of sequencing.

Recursive Calls & Linearly Sequencing Similar Flows

The buildTarget function calls itself and recursively builds a list of Flows for the other targets:

depTargetFlows :: [Flow () (Content File)]
depTargetFlows = map (buildTarget mkfile) depRules

where

buildTarget :: MakeFile -> MakeRule -> (Flow () (Content File))
mkfile :: MakeFile
depRules :: [MakeRule]

Now, to actually use these we need to linearly sequence these Flows. (We can’t sequence them in parallel because then repeated recursive calls would repeat work if flows were distributed.) In other words, we need the power to combine an arbitrary amount of similar flows into one flow. Because Funflow is embedded in haskell, we can write the function flowJoin:

flowJoin :: [Flow a b] -> Flow [a] [b]
flowJoin [] = step (\_ -> [])          -- step :: (a -> b) -> Flow a b
flowJoin (f:fs) = proc (a:as) -> do
  b <- f -< a
  bs <- flowJoin fs -< as
  returnA -< (b:bs)

joinedDepTargets :: Flow [()] [Content File]
joinedDepTargets = flowJoin depTargetFlows

A careful programmer might note that the base case is ill defined. Indeed, the right approach is to use length indexed vectors and have

safeFlowJoin :: forall (n :: Nat) a b. Vec n (Flow a b) -> Flow (Vec n a) (Vec n b)

but even without this, flowJoin is a clean, simple, and powerful way of sequencing Flows.

Summary

makefile-tool is only 266 lines long of which 180 pertain to following a Makefile and yet,

  • the CAS gives us free dynamic programming,
  • we never rebuild targets even when reverting changes (which is an improvement over make), and,
  • the dependencies are checked and made explicit, which lends to a crisp functional model.
If you enjoyed this article, you might be interested in joining the Tweag team.
This article is licensed under a Creative Commons Attribution 4.0 International license.
Interested in working at Tweag?Join us
See our work
  • Biotech
  • Fintech
  • Autonomous vehicles
  • Open source
Tweag
Tweag HQ → 207 Rue de Bercy — 75012 Paris — France
hello@tweag.io
© Tweag I/O Limited. All rights reserved
Privacy Policy