Throwing Clever Types at Program Configuration
(A similar approach was described in this post by Sandy Maguire which attributes the core technique to Travis Athougies. The CZipWith
abstraction however is new, as far as I can tell.)
The Problem
You have some non-trivial configuration data-type in your program and want to have its fields filled from several sources:
- the commandline
- config file(s)
- built-in default values
And you do not want to
- handle
Maybe
s in config fields throughout the program; - have redundant, very similar datatypes (
PartialConfig
+Config
); - write custom merging logic that scales in the size of your config data-type.
A simple example
data MyConfig
useColors :: Bool
{ applyFerbulator :: Bool
, verbosity :: Int
, dingUpperBound :: Float
, ignoreWords :: [Text]
, }
Of course this type already will not work entirely - for example we do not want the user to specify each field on the commandline, but simple defaults that we conditionally overwrite do not work either, because what if we also have values specified in the config file?
The correct behaviour seems to depend on the desired merging logic, which differs for the fields:
- For the first three fields, it’s a simple “take the first specified”: E.g. if the user specified “verbosity n” on the commandline, use that. Otherwise, look at the value in the config file. If nothing is specified, use some default value.
- For
dingUpperBound
, the default is some large value and commandline or file should overwrite it, but if both are specified, we want to respect the minimum. - For
ignoreWords
the default is an empty list, and we want to append inputs and not do any overwriting.
Scope of Our Approach
We will present an approach to the merging logic and what data-types to use for it. But this will not be fully usable yet, because we will not discuss how to implement any parsing that matches these types.
A Failing First Attempt
data MyConfig
useColors :: Bool
{ applyFerbulator :: Bool
, verbosity :: Int
, dingUpperBound :: Float
, ignoreWords :: [Text]
,
}data MyConfigPartial
useColors :: Maybe Bool
{ applyFerbulator :: Maybe Bool
, verbosity :: Maybe Int
, dingUpperBound :: Maybe Float
, ignoreWords :: [Text]
,
}
-- insert some magic here to derive (Monoid MyConfigPartial)
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
= _todo finalizePartialConfig default conf
This approach has two clear disadvantages: There is redundancy in the two data-types and we will have to implement the function by hand, and it will need to be adapted for every new field.
So how to do we abstract over these two data-types?
The Solution, Part One
Consider this definition:
data MyConfigF f
useColors :: f (Last Bool)
{ applyFerbulator :: f (Last Bool)
, verbosity :: f (Last Int)
, dingUpperBound :: f (Min Float)
, ignoreWords :: f [Text]
, }
where Last
and Min
are the newtypes from Semigroup
. This now allows us these very simple definitions:
type MyConfig = MyConfigF Identity
type MyConfigPartial = myConfigF Option
where Option
is the Maybe
wrapper from Semigroup
. What exactly does this give us, and why did we choose Semigroup
stuff?
Most importantly, the redundancy is gone. Yay!
Consider the
Monoid
(<>)
behaviour forMaybe a
and contrast it to theSemigroup
(<>)
behaviour forOption (First a)
. They are the same! And the switch toLast
only swaps the ordering, to more closely match how for example later commandline arguments overwrite earlier ones.The switch from
Monoid
/Maybe
toSemigroup
/Option
essentially gives us one thing: It separates the handling of the “Nothing versus Just” question (or “was it defined or not”) from the “How are two defined things merged” question.We can derive a
Semigroup
instance for thisMyConfigPartial
, and it behaves exactly as we desire (!)We could do
defaultConfig <> fileConfig <> commandlineConfig
. Only downside is that this all works with (and returns) “partial” configs, where we would really prefer the non-partial one.There is one difference between the previous and this
MyConfigPartial
definition: We now haveOption Nothing
andOption (Just [])
that roughly convey the same semantics. But that bit of semantic redundancy is worth it.
However we still lack one thing: the finalizePartialConfig
function is still missing, or rather any non-boilerplaty conversion between the partial and the non-partial datatype. But how different are these types even now?
The Solution, Part Two
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
-- without type aliases becomes
finalizePartialConfig :: MyConfigF Identity -> MyConfigF Option -> MyConfigF Identity
If only the kinds were a bit easier, this would need nothing more than some Applicative
usage. But remember that our type argument has kind * -> *
, not *
like Applicative
expects. Can we make this work nonetheless? Is there some “lifter Applicative” thingy that clever category theorists came up with? I don’t know the answer to the last question, but indeed, there is a way to make this work.
An Attempt at some Theory
(This is not precise by any means. If only interested in how to apply the solution, feel free to skip this section.)
Let us first abstract this problem a bit more, via some quick iteration:
finalizePartialConfig :: MyConfigF Identity -> MyConfigF Option -> MyConfigF Identity
-- first step: We want some mechanism that works for some class of types, not
-- just our particular `MyConfigF` - because we don't want to have to adapt it.
-- So let us just invent a new typeclass:
class StrangeLiftedApplicative c where
combineFunc :: SomeNewTC c => c Identity -> c Option -> c Identity
-- And now replace the remaining non-abstract stuff:
combineFunc :: SomeNewTC c => c f -> c g -> c f
-- We won't be able to make that work if we know nothing about f or g.
-- But really we only need one thing: to compute an output `g a` from inputs
-- `f a` and `g a`, for any `a`. (Example `a`s from our example above would be
-- `Bool` or `Int` or `Text` (plus some newtype wrappers).
-- Let us just expect that from the user:
combineFunc :: SomeNewTC c => (forall a . f a -> g a -> f a) -> c f -> c g -> c f
-- We can generalize this a bit:
combineFunc :: SomeNewTC c => (forall a . f a -> g a -> h a) -> c f -> c g -> c h
This deviates a bit from how one might write this one started with the “lifted applicative” idea, but it still works and is intuitive to use. For those familiar with standard Prelude
functions, this signature should remind you of something:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
And we decided to let this lend the name for the typeclass and name it CZipWith
and its method cZipWith
. We can give it these two laws:
cZipWith (\x _ -> x) g _ = g
cZipWith (\_ y -> y) _ h = h
The first question to ask next is: What instances does it have? Considering any “standard” data-types, not anything really - the kinds we are working with here are a bit exotic (note how c :: (* -> *) -> *
there).
But we can define some new things, for example:
newtype CUnit a f = CUnit (f a) -- corresponding to Identity
data CPair a b f = CPair (f a) (f b) -- corresponding to 'data MonoPair a = MonoPair a a'
data CStream a f = CStream (f a) (CStream a f) -- corresponding to an infinite stream
But for example for non-infinite streams (aka lists) there is no instance.
When trying to describe more generally what instances are possible and considered other typeclasses, we noticed this description from the docs of Distributive
:
To be distributable a container will need to have a way to consistently zip a potentially infinite number of copies of itself. This effectively means that the holes in all values of that type, must have the same cardinality, fixed sized vectors, infinite streams, functions, etc. and no extra information to try to merge together.
This comes very close!
But more important for this discussion is perhaps that our MyConfigF
value has a (lawful) instance, and adding fields (and even nesting several datatypes) poses no problem.
instance CZipWith MyConfigF where
MyConfigF c1 f1 v1 d1 i1) (MyConfigF c2 f2 v2 d2 i2)
cZipWith f (= MyConfigF (f c1 c2) (f f1 f2) (f v1 v2) (f d1 d2) (f i1 i2)
Usage of the CZipWith typeclass
So there is a typeclass and instance for our data-type:
class CZipWith k where
cZipWith :: (forall a. g a -> h a -> i a) -> k g -> k h -> k i
instance CZipWith MyConfigF where ..
And we can trivially express finalizePartialConfig
in terms of cZipWith
(details left as an exercise :p).
That is all good and well, but so far this type-class does not give us anything useful really. We still have to write that instance by hand, don’t we? Well, quite the contrary: The instances are boilerplate to such a degree that we can automate it easily with some template haskell (TH). We won’t look at the TH function here - the 50 lines of pattern matching and splicing are not that interesting.
Implementation and Example
The czipwith package is available on hackage contains the CZipWith
type-class and the TH function, plus some other stuff on the same kind-level (CFunctor
, CPointed
, CZipWithM
). Apart from that, you should only need Data.Semigroup
which is in base
since several base
versions.
This approach is battle-tested in brittany
. Relevant source-code that might be of interest are the nested CConfig
type or these two lines that do all the merging/defaulting logic (same as mergePartialConfigs
and finalizePartialConfig
below).
The deriveCZipWith
function also works when nesting data-types. Quoting the docs:
data A f = A a_str :: f String { a_bool :: f Bool , } data B f = B b_int :: f Int { b_float :: f Float , b_a :: A f , } 'A deriveCZipWith ''B deriveCZipWith '
produces the following instances
instance CZipWith A where A x1 x2) (A y1 y2) = A (f x1 y1) (f x2 y2) cZipWith f ( instance CZipWith B where B x1 x2 x3) (B y1 y2 y3) = cZipWith f (B (f x1 y1) (f x2 y2) (cZipWith f x3 y3)
Summary
In summary, as the user of this machinery, what do we end up writing?
data MyConfigF f
useColors :: f (Last Bool)
{ applyFerbulator :: f (Last Bool)
, verbosity :: f (Last Int)
, dingUpperBound :: f (Min Float)
, ignoreWords :: f [Text]
,
}deriving Generic
type MyConfig = MyConfigF Identity
type MyConfigPartial = myConfigF Option
instance Semigroup MyConfigPartial where
<>) = gmappend
(MyConfigF
''deriveCZipWith
mergePartialConfigs :: [MyConfigPartial] -> MyConfigPartial
= Semigroup.mconcat
mergePartialConfigs
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
= cZipWith f
finalizePartialConfig where
f :: Identity a -> Option a -> Identity a
Identity x) (Option Nothing ) = Identity x
f (Identity _) (Option (Just x')) = Identity x f (
and we are done in regards to merging logic. In total you will of course have some more code:
defaultConfig :: MyConfig
= .. -- you can make use of Data.Coerce.coerce to reduce some
defaultConfig -- newtype-writing-overhead here.
= do
main <- .. -- parsing logic
configCmdline <- .. -- other parsing logic
configFile let merged = mergePartialConfigs [configFile, configCmdline]
let config = finalizePartialConfig defaultConfig merged
runActualInterestingStuff config
We do not often encounter data types with strange kinds like (* -> *) -> *
. For program configuration, driven by desire to avoid redundancy we naturally encounter such a type. We have to invent a new type-class to work with this kind, and write some TH, but this is a one-time cost that in turn will allow us to easily implement program configuration even for complex use-cases.