# Throwing Clever Types at Program Configuration

(A similar approach was described in this post by Sandy Maguire which attributes the core technique to Travis Athougies. The `CZipWith`

abstraction however is new, as far as I can tell.)

## The Problem

You have some non-trivial configuration data-type in your program and want to have its fields filled from several sources:

- the commandline
- config file(s)
- built-in default values

And you do *not* want to

- handle
`Maybe`

s in config fields throughout the program; - have redundant, very similar datatypes (
`PartialConfig`

+`Config`

); - write custom merging logic that scales in the size of your config data-type.

## A simple example

```
data MyConfig
{ useColors :: Bool
, applyFerbulator :: Bool
, verbosity :: Int
, dingUpperBound :: Float
, ignoreWords :: [Text]
}
```

Of course this type already will not work entirely - for example we do not want the user to specify each field on the commandline, but simple defaults that we conditionally overwrite do not work either, because what if we also have values specified in the config file?

The correct behaviour seems to depend on the desired merging logic, which differs for the fields:

- For the first three fields, it's a simple "take the first specified": E.g. if the user specified "verbosity n" on the commandline, use that. Otherwise, look at the value in the config file. If nothing is specified, use some default value.
- For
`dingUpperBound`

, the default is some large value and commandline or file should overwrite it, but if both are specified, we want to respect the*minimum*. - For
`ignoreWords`

the default is an empty list, and we want to append inputs and not do any overwriting.

## Scope of Our Approach

We will present an approach to the *merging logic* and what data-types to use for it. But this will not be fully usable yet, because we will not discuss how to implement any parsing that matches these types.

## A Failing First Attempt

```
data MyConfig
{ useColors :: Bool
, applyFerbulator :: Bool
, verbosity :: Int
, dingUpperBound :: Float
, ignoreWords :: [Text]
}
data MyConfigPartial
{ useColors :: Maybe Bool
, applyFerbulator :: Maybe Bool
, verbosity :: Maybe Int
, dingUpperBound :: Maybe Float
, ignoreWords :: [Text]
}
-- insert some magic here to derive (Monoid MyConfigPartial)
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
finalizePartialConfig default conf = _todo
```

This approach has two clear disadvantages: There is redundancy in the two data-types and we will have to implement the function by hand, and it will need to be adapted for every new field.

So how to do we abstract over these two data-types?

## The Solution, Part One

Consider this definition:

```
data MyConfigF f
{ useColors :: f (Last Bool)
, applyFerbulator :: f (Last Bool)
, verbosity :: f (Last Int)
, dingUpperBound :: f (Min Float)
, ignoreWords :: f [Text]
}
```

where `Last`

and `Min`

are the newtypes from `Semigroup`

. This now allows us these very simple definitions:

```
type MyConfig = MyConfigF Identity
type MyConfigPartial = myConfigF Option
```

where `Option`

is the `Maybe`

wrapper from `Semigroup`

. What exactly does this give us, and why did we choose `Semigroup`

stuff?

- Most importantly, the redundancy is gone. Yay!
- Consider the
`Monoid`

`(<>)`

behaviour for`Maybe a`

and contrast it to the`Semigroup`

`(<>)`

behaviour for`Option (First a)`

. They are the same! And the switch to`Last`

only swaps the ordering, to more closely match how for example later commandline arguments overwrite earlier ones. - The switch from
`Monoid`

/`Maybe`

to`Semigroup`

/`Option`

essentially gives us one thing: It separates the handling of the "Nothing versus Just" question (or "was it defined or not") from the "How are two defined things merged" question. We can derive a

We could do`Semigroup`

instance for this`MyConfigPartial`

, and it behaves exactly as we desire (!)`defaultConfig <> fileConfig <> commandlineConfig`

. Only downside is that this all works with (and returns) "partial" configs, where we would really prefer the non-partial one.There is one difference between the previous and this

`MyConfigPartial`

definition: We now have`Option Nothing`

and`Option (Just [])`

that roughly convey the same semantics. But that bit of semantic redundancy is worth it.

However we still lack one thing: the `finalizePartialConfig`

function is still missing, or rather any non-boilerplaty conversion between the partial and the non-partial datatype. But how different are these types even now?

## The Solution, Part Two

```
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
-- without type aliases becomes
finalizePartialConfig :: MyConfigF Identity -> MyConfigF Option -> MyConfigF Identity
```

If only the kinds were a bit easier, this would need nothing more than some `Applicative`

usage. But remember that our type argument has kind `* -> *`

, not `*`

like `Applicative`

expects. Can we make this work nonetheless? Is there some "lifter Applicative" thingy that clever category theorists came up with? I don't know the answer to the last question, but indeed, there is a way to make this work.

### An Attempt at some Theory

(This is not precise by any means. If only interested in how to apply the solution, feel free to skip this section.)

Let us first abstract this problem a bit more, via some quick iteration:

```
finalizePartialConfig :: MyConfigF Identity -> MyConfigF Option -> MyConfigF Identity
-- first step: We want some mechanism that works for some class of types, not
-- just our particular `MyConfigF` - because we don't want to have to adapt it.
-- So let us just invent a new typeclass:
class StrangeLiftedApplicative c where
combineFunc :: SomeNewTC c => c Identity -> c Option -> c Identity
-- And now replace the remaining non-abstract stuff:
combineFunc :: SomeNewTC c => c f -> c g -> c f
-- We won't be able to make that work if we know nothing about f or g.
-- But really we only need one thing: to compute an output `g a` from inputs
-- `f a` and `g a`, for any `a`. (Example `a`s from our example above would be
-- `Bool` or `Int` or `Text` (plus some newtype wrappers).
-- Let us just expect that from the user:
combineFunc :: SomeNewTC c => (forall a . f a -> g a -> f a) -> c f -> c g -> c f
-- We can generalize this a bit:
combineFunc :: SomeNewTC c => (forall a . f a -> g a -> h a) -> c f -> c g -> c h
```

This deviates a bit from how one might write this one started with the "lifted applicative" idea, but it still works and is intuitive to use. For those familiar with standard `Prelude`

functions, this signature should remind you of something:

`zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]`

And we decided to let this lend the name for the typeclass and name it `CZipWith`

and its method `cZipWith`

. We can give it these two laws:

`cZipWith (\x _ -> x) g _ = g`

`cZipWith (\_ y -> y) _ h = h`

The first question to ask next is: What instances does it have? Considering any "standard" data-types, not anything really - the kinds we are working with here are a bit exotic (note how `c :: (* -> *) -> *`

there).

But we can define some new things, for example:

```
newtype CUnit a f = CUnit (f a) -- corresponding to Identity
data CPair a b f = CPair (f a) (f b) -- corresponding to 'data MonoPair a = MonoPair a a'
data CStream a f = CStream (f a) (CStream a f) -- corresponding to an infinite stream
```

But for example for non-infinite streams (aka lists) there is no instance.

When trying to describe more generally what instances are possible and considered other typeclasses, we noticed this description from the docs of `Distributive`

:

To be distributable a container will need to have a way to consistently zip a potentially infinite number of copies of itself. This effectively means that the holes in all values of that type, must have the same cardinality, fixed sized vectors, infinite streams, functions, etc. and no extra information to try to merge together.

This comes very close!

But more important for this discussion is perhaps that our `MyConfigF`

value has a (lawful) instance, and adding fields (and even nesting several datatypes) poses no problem.

```
instance CZipWith MyConfigF where
cZipWith f (MyConfigF c1 f1 v1 d1 i1) (MyConfigF c2 f2 v2 d2 i2)
= MyConfigF (f c1 c2) (f f1 f2) (f v1 v2) (f d1 d2) (f i1 i2)
```

### Usage of the CZipWith typeclass

So there is a typeclass and instance for our data-type:

```
class CZipWith k where
cZipWith :: (forall a. g a -> h a -> i a) -> k g -> k h -> k i
instance CZipWith MyConfigF where ..
```

And we can trivially express `finalizePartialConfig`

in terms of `cZipWith`

(details left as an exercise :p).

That is all good and well, but so far this type-class does not give us anything useful really. We still have to write that instance by hand, don't we? Well, quite the contrary: The instances are boilerplate to such a degree that we can automate it easily with some template haskell (TH). We won't look at the TH function here - the 50 lines of pattern matching and splicing are not that interesting.

## Implementation and Example

The czipwith package is available on hackage contains the `CZipWith`

type-class and the TH function, plus some other stuff on the same kind-level (`CFunctor`

, `CPointed`

, `CZipWithM`

). Apart from that, you should only need `Data.Semigroup`

which is in `base`

since several `base`

versions.

This approach is battle-tested in `brittany`

. Relevant source-code that might be of interest are the nested `CConfig`

type or these two lines that do all the merging/defaulting logic (same as `mergePartialConfigs`

and `finalizePartialConfig`

below).

The `deriveCZipWith`

function also works when nesting data-types. Quoting the docs:

`data A f = A { a_str :: f String , a_bool :: f Bool } data B f = B { b_int :: f Int , b_float :: f Float , b_a :: A f } deriveCZipWith ''A deriveCZipWith ''B`

produces the following instances

`instance CZipWith A where cZipWith f (A x1 x2) (A y1 y2) = A (f x1 y1) (f x2 y2) instance CZipWith B where cZipWith f (B x1 x2 x3) (B y1 y2 y3) = B (f x1 y1) (f x2 y2) (cZipWith f x3 y3)`

## Summary

In summary, as the user of this machinery, what do we end up writing?

```
data MyConfigF f
{ useColors :: f (Last Bool)
, applyFerbulator :: f (Last Bool)
, verbosity :: f (Last Int)
, dingUpperBound :: f (Min Float)
, ignoreWords :: f [Text]
}
deriving Generic
type MyConfig = MyConfigF Identity
type MyConfigPartial = myConfigF Option
instance Semigroup MyConfigPartial where
(<>) = gmappend
''deriveCZipWith MyConfigF
mergePartialConfigs :: [MyConfigPartial] -> MyConfigPartial
mergePartialConfigs = Semigroup.mconcat
finalizePartialConfig :: MyConfig -> MyConfigPartial -> MyConfig
finalizePartialConfig = cZipWith f
where
f :: Identity a -> Option a -> Identity a
f (Identity x) (Option Nothing ) = Identity x
f (Identity _) (Option (Just x')) = Identity x
```

and we are done in regards to merging logic. In total you will of course have some more code:

```
defaultConfig :: MyConfig
defaultConfig = .. -- you can make use of Data.Coerce.coerce to reduce some
-- newtype-writing-overhead here.
main = do
configCmdline <- .. -- parsing logic
configFile <- .. -- other parsing logic
let merged = mergePartialConfigs [configFile, configCmdline]
let config = finalizePartialConfig defaultConfig merged
runActualInterestingStuff config
```

We do not often encounter data types with strange kinds like `(* -> *) -> *`

. For program configuration, driven by desire to avoid redundancy we naturally encounter such a type. We have to invent a new type-class to work with this kind, and write some TH, but this is a one-time cost that in turn will allow us to easily implement program configuration even for complex use-cases.