Package Environment Files Run Counter to Reproducibility
The “package environment file” feature (silently) introduced into GHC and cabal-install has caused a good deal of discussion [1,2] already. But it seems we have so far missed one more fundamental issue.
Core Issue With Package Environment Files
A compiler is conceptually a pure function. “Package environment files” are a feature that adds nothing useful to GHC, while introducing (persistent) state to GHC’s UI and API.
This should be considered a conceptual type error, and have been rejected accordingly. I am surprised that in haskell, where purity and explicitness about effects is paramount, something like this gets merged in the first place.
The feature detracts from reproducibility of the whole build process, is thusly a step in the wrong direction, and has already bitten numerous users [5,6,7,8,9]. It should be removed, or at the very least be made non-default.
Package Environment Files
ghcconsiders package environment files that may make packages available for usage.
ghclooks for these files in the current working directory and its parents.
Invalid and even just outdated package environment files make
This affects both the
ghcUI and uses of the GHC API. E.g.
> cat ./.ghc.environment.x86_64-linux-8.2.2 outdated > echo "" | brittany <command line>: cannot satisfy -package-id outdated (use -v for more information)
(brittany here being a user of the GHC API)
Package environment files are produced automatically by recent version(s) of
cabal-installas a side effect of e.g.
A lot of this was not announced and remains mostly undocumented.
Purpose of Package Environment Files
So we have a new feature with the potential to break things. There must be a good reason for this, right? From the GHC user’s-guide :
“It can be used to create environments for ghc or ghci that are local to a shell session or to some file system location. They are intended to be managed by build/package tools, to enable ghc and ghci to automatically use an environment created by the tool.”
If in haskell you wrote
do putStrLn "hello" initializeMyGuiFrameworkputStrLn "hello"
then you would expect that both
putStrLn invocation had the same effect. Yet, here, it suddenly is a “feature” if in
$ ghc MyProgram.hs $ some-tool some-arg $ ghc MyProgram.hs
the second ghc invocation behaves differently from the first, even though the explicit inputs (the file contents of “MyProgram.hs”) have not changed.
I know that
ghc needs to do
IO to do its job. It nonetheless can and should follow the conceptual “compiler is a pure function” idea. The fact that we often place inputs into the filesystem and pass a reference to those constants (aka paths) as input does not change this. The fact that there is a build directory does not change this, because it merely acts as a memoizer/cache of the pure function, nothing more.
The expressed purpose of package environment files is to communicate between (build) tooling and
ghc, in a persistent, stateful manner. Together with the fact that the files go stale, you end up with a “feature” that is counter to core concepts, counter to user’s expectations (especially if things are not announced), that breaks with people’s attempt to ensure reproducable builds, and that generally invites usability problems.
But GHC Always Had “State” in That Sense
Well, that is no counterargument. Still, let’s see:
Correct. These are stateful, and at least in the case of the user package database, they have formed a stateful aspect to GHC behaviour.
But you could not make this argument without acknowledging that it is a feature that has caused much grieve to many many users in the past. Sandboxes where introduced exactly to address this: Localize the state and make it so that the package must be explicitly passed to GHC via commandline arg or env variable.
Further, I may point out that the whole point of nix-style building in cabal is to get rid of the state that package databases present. Instead, the new-build “store” is a memoizer over packages, just like build directories are a memoizer on the module level.
This is an implicit input. It does not become stale, and it is meant to be automatically updated. And even this leads to breakage, e.g. see .
I might let these count. However, there are some important differences. Most importantly, they are not persistent. If you open a new shell session, only the stuff explicitly placed into my shell-init-scripts will end up in my
env. And the questionable upside of them being relatively easy to observe and being relatively well-known.
Is the real problem not that cabal(-install) writes these files automatically?
Yes, and no. Thing is, package environment files have no valid, sensible purpose. You always had the ability to pass package(-db)s to
ghc as explicit commandline args. Tools can and always have used that feature, without any problem. So the best case is that nothing useful is added. The worst case is that tools such as cabal-install decide that breaking users without any sort of announcement is a clever move.
Further, consider any use-case where there is more than one tool writing these files. Whenever one sets up the environment in some specific manner, it breaks the setup of any other tools.
But this feature is convenient
Global variables, side effects and dynamic typing all can be considered “convenient”.
Convenience to a few users is not sufficient justification for changing default behaviour when it breaks many other users.