Rationale behind the Brittany Formatter
Introduction
brittany is a haskell source code formatter. Whether the output from brittany
is “beautiful” or “generally most readable” will of course be subjective, but it follows certain principles laid out below.
This document is not (intended as):
- a haskell style guide or
- an exact, comprehensive description of layouting choices made by brittany.
Related Work
First, lets consider some haskell style guides:
- tibbe haskell-style-guide - set of rules for the most common style questions;
- haskell wiki - a bit random, more focus on coding style rather than layout
- ‘ian’ Good Haskell Style - focus on the tabs-vs-spaces issue
- opinionated importing style - thorough write-up regarding import statements
What is ‘Style’ Anyways?
Style can refer to several things, in no particular order:
- Layouting of the same source code (i.e. mostly usage of whitespace and newlines);
- Naming conventions (identifiers etc.);
- Large scope: Module and project structure (Central
Types.hs
module? etc.); - Choice between semantically equivalent expressions (point or free? etc.);
- Style of comments;
- Ordering (imports, pragmas above or below, etc.);
- More semantical topics: Which language features/extensions to use, which libraries (crypto and random number generation, String vs. Text vs ByteString, etc.), what general-purpose frameworks to use (lens, extensible-effects,
Foldable
instances, etc.)
and probably more.
This document will focus on 1).
Purposes of Style
- Readability - Allow different users to quickly correctly read the code and do not burden them with mental overhead for parsing.
- Preventing technical issues. Tabs that break code is a good example.
- Keeping diffs local and do not hinder refactoring. Good style will reduce the risk of merge conflicts during collaboration.
Conciseness and consistency are not included above, for the simple reason that i don’t view them as goals, even though they help to achieve them. Shorter code is nice, mostly because more of it can fit on one screen. But if “short” was your only goal, you end up with dense blocks of code that you cannot parse visually anymore - that does not help. Similarly, consistency aligns with 3) but can occasionally clash with readability.
Syntax Trees and Layout
The freedom we have in layouting is where to insert whitespace and newlines. If we consider what source code represents - a syntax tree - and that readability has a lot to do with parsing, the trivial idea is to align depth in the syntax tree and indentation in the source code. But if we end up using too many newlines, the source will become overly long and hard to read as well. I invite everyone to have a look at the syntax tree of some simple Haskell functions.
So instead we settle for a somewhat weaker compromise:
Stylistic Layouting Rule
Source code fragments corresponding to siblings in the syntax tree should either be part of the same (single) line or be aligned horizontally.
If appropriate, this rule should be extended to “higher-degree siblings” (nephews/nieces etc.).
If appropriate, indentation of children should be higher or equal to that of parents, that is: more depth in tree should mean more or equal indentation.
One can think of stronger rules, but as mentioned they often force more newlines than is “healthy”. And of course this rule will be hard to grasp for those not familiar with the syntax tree, but it should still makes sense and it seems to be a useful foundation.
On the plus-side:
The Haskell layouting rule is fairly close in spirit to this already: Elements of do, let and where are siblings and must have the same indentation already:
= do func same indentation
The layouting rule also enforces that indentation of these constructs only increases for children, which matches b.
Many of the items from the tibbe style guide can be derived from this rule:
Aligning
then
/else
branch but not necessarilyif
:then
andelse
are siblings, butif
is a parent.= if .. func then .. else ..
And yes, here we choose to break b) in favor of saving a newline.
Aligning the elements of a list (which are siblings);
let myList = [ firstExpr , secondExpr , thirdExpr ]
alternatively, if there is enough space in the line:
let myList = [firstExpr, secondExpr, thirdExpr]
Aligning the different constructors of a data-type (siblings)
data Clef = Violin | Alto | Tenor | Bass
Aligning the right-hand-sides of alternatives in a case-statement (only second-degree siblings, but it makes sense to apply the rule here):
= case something of foobar Just j -> foo Nothing -> bar
With the right interpretation, the rules also suggest to prefix punctuation and operators. As an example for punctuation, record syntax:
let myRecord = Record { somefield = "abc" = 14 , otherfield = False , lastfield }
The punctuation “{,}” introduces different fields that are on the same level, so it should align. And it separates fields, so should be seen more as a parent than a child, and so the punctuation should be written on the left.
For operators:
let myvalue = (some large arithmetic expression) + (some other arithmetic expression) + (a third arithmetic expression)
Here,
(+)
clearly is a parent in the syntax tree, so postfixing at the end of line would be against b without significant benefit.
I wish to stress the main goals are mentioned above - the rule is just an attempt to capture what makes code easy to parse, which of course will be subjective. And in my experience, being able to ignore everything “to the right” and to grasp the (upper part of) the syntactic structure is very helpful.
Context-Sensitivity versus Least-Newlines
Above we discuss when to align, but not where to indent. Often there are multiple options that are in line with the rule above, for example consider:
= [ myFunction1 arg1
myList
arg2
arg3
, myFunction2 arg4
arg5
arg6 ]
This uses least amount of newlines by making use of “hanging indentation”;
=
myList
[ myFunction1
arg1
arg2
arg3
, myFunction2
arg4
arg5
arg6 ]
This uses more lines, but generally behaves better when doing local changes (is less context sensitive).
Consider what happens when replacing arg2
with someReallyLongArgThatNeedsLotsOfSpace
. We might have to remove the hanging indentation from the first choice as to prevent overflowing our column limit, which affects the parent of the node which we modify. With the second choice, this cannot happen. And when your colleague replaces arg5
similarly in her branch, also forced to make a non-local change, you end up with a merge conflict.
It doesn’t even have to be a code change - adding a longer comment after one of those arguments can have the same result.
One might be inclined to call the context-insensitive approach “more consistent”, but i don’t think this is fair: It is not about whether we consistently apply rules, but how these rules are formed (and perhaps how complex the rules are).
Our goals 1 (less lines -> more readable) and 3 (context insensitive -> local changes stay local) clash. I am not sure if we can specify the best choice in general. Some people will prefer to entirely stop using hanging indentation, and will mention that shorter code is not actually any more readable. But i think that when you have sufficiently complex logic, and accordingly complex code, it does matter if it fits on one screen or not. In many cases i tend to overrule consistency. We’ll look at some such cases and the corresponding reasoning:
Function (and Operator) Application
This has several more complex cases. We have seen the hanging-indent question above already - i have no clear advice for that one. Brittany
currently uses hanging indent, but i may be convinced to change that.
Consider some other example case, where this format is not only not strictly consistent, but also chooses to violate b):
= RH.performEvent_ $ postBuild <&> \() -> liftIO $ do
mybinding >>= \case
runMaybeT postCliInit Nothing -> return ()
Just () -> do
<- forkIO $ postCliInitAsync `catch` \(e :: SomeException) ->
_ LogLevelError (show e)
writeLogS return ()
You can see how everything including and below the runMaybeT
line is part of the \() -> ..
, yet it is indented less than the start of the lambda. The same “issue” exists for \case
and its children. Also, the binary operators in the first line should not even go in one line, because the leftmost branch of the syntax tree extends over several lines.
But as contrast, some fully consistent layouting for the same tree:
=
mybinding
RH.performEvent_$ postBuild
<&>
->
\()
liftIO$ do
runMaybeT postCliInit>>=
case
\Nothing -> return ()
Just () -> do
<-
_
forkIO$ postCliInitAsync
`catch`
e :: SomeException) ->
\(LogLevelError (show e)
writeLogS return ()
Welp, how ugly. And merely very consistently applied rules.
The reasoning we follow instead, to arrive at the much more readable variant, is: If the last child is layouted in such a way that it has a simple “head” (first line) plus a multi-line “body”, then we allow to ignore the body when deciding on layouts for the parent. Example “heads” above are do
, \case
, and, by extension, liftIO $ do
and so on.
Type Signatures
Personal preference is either one-line or non-hanging multiline, that is:
plusOne :: Int -> Int
-- or
complex :: Text
-> (SourcePos, Wobble)
-> StateT TranslationState (Either (SourcePos, Text)) ()
This is one item where my recommendation differs from tibbe: No hanging indentation for the type signature. Refactoring function names forces re-indentation, which I dislike.
If you have grown used to grepping for “myFunction ::” to go to definition, “^myFunction” works with the recommended style.
I don’t see much use in also avoiding the one-line option, because the size of your diff cannot be smaller than one line if you only have on line.
Guarded Patterns
The context-insensitive approach for function definitions would be:
funcSome long pattern)
(Another long pattern)
(| some potentially long guard
= potentially
multiline body
But again, this would involve a lot of newlines for something simple like
| even x = x `div` 2
collatz x | otherwise = 3 * x + 1
This is another example where i think we want both context-sensitivity and even the hanging indentation for the guard.
Layout of Module Imports
For imports, the compact but context-sensitive layout is the following:
import Config (Config)
import qualified Config
import qualified Data.Either.Combinators as Either
import Lens.Micro ((^.))
import Path (Abs, Dir, File, Path, (</>))
import qualified Path.IO as Path
import qualified System.FilePath as F
import qualified System.Process as P
import qualified Text.PDF.Info as PDFI
while the less compact, but context-insensitive layout is:
import Config ( Config )
import qualified Config
import qualified Data.Either.Combinators as Either
import Lens.Micro ( (^.) )
import Path ( Abs
Dir
, File
, Path
, </>)
, (
)import qualified Path.IO as Path
import qualified System.FilePath as F
import qualified System.Process as P
import qualified Text.PDF.Info as PDFI
And there are somewhat strong arguments for both styles:
- The import section can easily get lengthy and can make browsing source files annoying. As a consequence, not wasting space by having almost empty lines is critical.
- The import section is easily subject to modifications from unrelated commits or changesets. To avoid merge conflicts it is critical to avoid any context sensitivity - for example to align things according to the longest import is a stupid idea because a change of the column will create a diff affecting all import items. This practically invites merge conflicts.
My personal judgement assigns a higher relevance to the second argument, because merge conflicts are more annoying than scrolling past some more lines. Also it is rare to consume source files top-to-bottom - most of the time I jump to some definition, fix some stuff, move on. I don’t even see the imports in many cases where I make changes to any source file.
There is also room for a compromise: Use paragraph-fill (or “flex-wrap” in css-speak) but alignment to a static column, i.e.
import Config ( Config )
import qualified Config
import qualified Data.Either.Combinators as Either
import Lens.Micro ( (^.) )
import Path ( Abs, Dir, File, Path
</>) )
, (import qualified Path.IO as Path
import qualified System.FilePath as F
import qualified System.Process as P
import qualified Text.PDF.Info as PDFI
which avoids both affect-all-imports changes and the wasted space of one-item lines, at the cost of risking merge conflicts for changes to the imports of the same module. The risk might be considered sufficiently low to make this a sensible trade-off.
Brittany currently implements (only) the context-insensitive approach. The paragraph-fill approach may be supported in the future, maybe even as the default, but is not implemented yet.
Horizontal Alignment in General
Horizontal alignment is context-sensitive. Consider
"" = True
go [] WildCard ] "" = True
go [WildCard :rest) (c:cs) = go rest (c : cs) || go (WildCard : rest) cs
go (Union globs:rest) cs = any (\glob -> go (glob ++ rest) cs) globs
go (:_) = False
go [] (_:_) "" = False go (_
What happens if we, say, refactor Union
to ThingUnion
? Every single line will be affected, one way or another. Nonetheless, alignment in such cases increases readability a lot.
Non-Newline Whitespace
Always use spaces around operators, even inside sections. Operators include symbols and infixified identifiers. Sometimes users deviate for arithmetic operators and for
(:)
, but I don’t think we can formalize these deviations.= (: []) (9 - f (x `min` y)) val
brittany
currently deviates for sections, where it does never uses spaces. This will probably be fixed in the future.This extends to “syntactic operators” as well, e.g.
->
,=>
,::
and even.
. I am aware some people disagree about the last one.For lists, tuples and other parenthesis: For one-line usage add no spaces on the inside and a space after the separator. For multi-line usage, add a space before the first element:
= [1, 3, 4] myList1 = myList2 [ some loooooooong element one , some loooooooong element three , some loooooooong element four ]
and the same for tuples, including the “1-tuple” case.
The extra space in multi-line layout gives proper alignment of the different elements, and keeps alignment at a multiple of 2.
Don’t use tabs. I think this is covered sufficiently in the referenced guides already.
Other Aspects
For all aspects not covered by the above, there is no one-fits-all rule. Only - Be consistent.
Some of my preferences are:
80 columns limit and 2-space-indentation.
modules look like this:
{-# LANGUAGE SomeExtension #-} {-# LANGUAGE OtherExtension #-} module Mod Data1 (..) ( Data2 (..) , , func1 , func2 )where import qualified X.Y as Y import Z data Data1 = Data1 Int data Data2 = Data2 Bool newtype Frequency = Frequency Double func1 :: Data1 -> Data2 = .. func1 func2 :: Int -> Data1 = .. func2
But the usage of blank lines is flexible. Sometimes i omit them between related data definitions. Sometimes it makes sense to group some functions, and put some comment above them. If you have many local functions in a
where
, it makes sense to add blank lines there too. Even inside statements in ado
-block:.. = do prepare x<- accessor1 y <- accessor2 z <- doSomethingImportant y z r postprocess (f r) cleanup (g r)
Implementation in the Brittany Formatter
Brittany implements the Stylistic Layouting Rule, yet it also implements all of the (“context-sensitive”) compromises discussed in the “Context-Sensitivity versus Least-Newlines” chapter to reduce the number of newlines. This includes hanging indentation for function application.
= RH.performEvent_ $ postBuild <&> \() -> liftIO $ do
mybinding >>= \case
runMaybeT postCliInit Nothing -> return ()
Just () -> do
<- forkIO $ postCliInitAsync `catch` \(e :: SomeException) ->
_ LogLevelError (show e)
writeLogS return ()
is the exact layout produced by brittany
.
Horizontal alignment is currently by default enabled, but only for alignment between consecutive lines.
It is planned to include configuration options to disable several of the potentially problematic context-sensitive layouts.
In several aspects brittany
is configurable, e.g. columns
(default 80) and indent
(ation) (default 2). Horizontal alignment can be turned off entirely, as well.