posts `for_` publish

Package Environment Files Run Counter to Reproducibility

The "package environment file" feature (silently) introduced into GHC and cabal-install has caused a good deal of discussion [1,2] already. But it seems we have so far missed one more fundamental issue.

Core Issue With Package Environment Files

A compiler is conceptually a pure function. "Package environment files" are a feature that adds nothing useful to GHC, while introducing (persistent) state to GHC's UI and API.

This should be considered a conceptual type error, and have been rejected accordingly. I am surprised that in haskell, where purity and explicitness about effects is paramount, something like this gets merged in the first place.

The feature detracts from reproducibility of the whole build process, is thusly a step in the wrong direction, and has already bitten numerous users [5,6,7,8,9]. It should be removed, or at the very least be made non-default.


Package Environment Files

Purpose of Package Environment Files

So we have a new feature with the potential to break things. There must be a good reason for this, right? From the GHC user's-guide [3]:

"It can be used to create environments for ghc or ghci that are local to a shell session or to some file system location. They are intended to be managed by build/package tools, to enable ghc and ghci to automatically use an environment created by the tool."

An Equivalence

If in haskell you wrote

  putStrLn "hello"
  putStrLn "hello"

then you would expect that both putStrLn invocation had the same effect. Yet, here, it suddenly is a "feature" if in

$ ghc MyProgram.hs
$ some-tool some-arg
$ ghc MyProgram.hs

the second ghc invocation behaves differently from the first, even though the explicit inputs (the file contents of "MyProgram.hs") have not changed.

I know that ghc needs to do IO to do its job. It nonetheless can and should follow the conceptual "compiler is a pure function" idea. The fact that we often place inputs into the filesystem and pass a reference to those constants (aka paths) as input does not change this. The fact that there is a build directory does not change this, because it merely acts as a memoizer/cache of the pure function, nothing more.

The expressed purpose of package environment files is to communicate between (build) tooling and ghc, in a persistent, stateful manner. Together with the fact that the files go stale, you end up with a "feature" that is counter to core concepts, counter to user's expectations (especially if things are not announced), that breaks with people's attempt to ensure reproducable builds, and that generally invites usability problems.


But GHC Always Had "State" in That Sense

Well, that is no counterargument. Still, let's see:

Is the real problem not that cabal(-install) writes these files automatically?

Yes, and no. Thing is, package environment files have no valid, sensible purpose. You always had the ability to pass package(-db)s to ghc as explicit commandline args. Tools can and always have used that feature, without any problem. So the best case is that nothing useful is added. The worst case is that tools such as cabal-install decide that breaking users without any sort of announcement is a clever move.

Further, consider any use-case where there is more than one tool writing these files. Whenever one sets up the environment in some specific manner, it breaks the setup of any other tools.

But this feature is convenient

Global variables, side effects and dynamic typing all can be considered "convenient".

Convenience to a few users is not sufficient justification for changing default behaviour when it breaks many other users.



Posted by Lennart Spitzner on 2018-08-25. Feedback to (blog at thisdomain) welcome!
Tags: . Source.