posts `for_` publish

Rationale behind the Brittany Formatter


brittany is a haskell source code formatter. Whether the output from brittany is "beautiful" or "generally most readable" will of course be subjective, but it follows certain principles laid out below.

This document is not (intended as):

First, lets consider some haskell style guides:

What is 'Style' Anyways?

Style can refer to several things, in no particular order:

  1. Layouting of the same source code (i.e. mostly usage of whitespace and newlines);
  2. Naming conventions (identifiers etc.);
  3. Large scope: Module and project structure (Central Types.hs module? etc.);
  4. Choice between semantically equivalent expressions (point or free? etc.);
  5. Style of comments;
  6. Ordering (imports, pragmas above or below, etc.);
  7. More semantical topics: Which language features/extensions to use, which libraries (crypto and random number generation, String vs. Text vs ByteString, etc.), what general-purpose frameworks to use (lens, extensible-effects, Foldable instances, etc.)
    and probably more.

This document will focus on 1).

Purposes of Style

  1. Readability - Allow different users to quickly correctly read the code and do not burden them with mental overhead for parsing.
  2. Preventing technical issues. Tabs that break code is a good example.
  3. Keeping diffs local and do not hinder refactoring. Good style will reduce the risk of merge conflicts during collaboration.

Conciseness and consistency are not included above, for the simple reason that i don't view them as goals, even though they help to achieve them. Shorter code is nice, mostly because more of it can fit on one screen. But if "short" was your only goal, you end up with dense blocks of code that you cannot parse visually anymore - that does not help. Similarly, consistency aligns with 3) but can occasionally clash with readability.

Syntax Trees and Layout

The freedom we have in layouting is where to insert whitespace and newlines. If we consider what source code represents - a syntax tree - and that readability has a lot to do with parsing, the trivial idea is to align depth in the syntax tree and indentation in the source code. But if we end up using too many newlines, the source will become overly long and hard to read as well. I invite everyone to have a look at the syntax tree of some simple Haskell functions.

So instead we settle for a somewhat weaker compromise:

Stylistic Layouting Rule

Source code fragments corresponding to siblings in the syntax tree should either be part of the same (single) line or be aligned horizontally.

  1. If appropriate, this rule should be extended to "higher-degree siblings" (nephews/nieces etc.).

  2. If appropriate, indentation of children should be higher or equal to that of parents, that is: more depth in tree should mean more or equal indentation.

One can think of stronger rules, but as mentioned they often force more newlines than is "healthy". And of course this rule will be hard to grasp for those not familiar with the syntax tree, but it should still makes sense and it seems to be a useful foundation.

On the plus-side:

  1. The Haskell layouting rule is fairly close in spirit to this already: Elements of do, let and where are siblings and must have the same indentation already:

    func = do

    The layouting rule also enforces that indentation of these constructs only increases for children, which matches b.

  2. Many of the items from the tibbe style guide can be derived from this rule:

    • Aligning then/else branch but not necessarily if: then and else are siblings, but if is a parent.

      func = if ..
        then ..
        else ..

      And yes, here we choose to break b) in favor of saving a newline.

    • Aligning the elements of a list (which are siblings);

      let myList =
            [ firstExpr
            , secondExpr
            , thirdExpr

      alternatively, if there is enough space in the line:

      let myList = [firstExpr, secondExpr, thirdExpr]
    • Aligning the different constructors of a data-type (siblings)

      data Clef
        = Violin
        | Alto
        | Tenor
        | Bass
    • Aligning the right-hand-sides of alternatives in a case-statement (only second-degree siblings, but it makes sense to apply the rule here):

      foobar = case something of
        Just j  -> foo
        Nothing -> bar
  3. With the right interpretation, the rules also suggest to prefix punctuation and operators. As an example for punctuation, record syntax:

    let myRecord = Record { somefield  = "abc"
                          , otherfield = 14
                          , lastfield  = False

    The punctuation "{,}" introduces different fields that are on the same level, so it should align. And it separates fields, so should be seen more as a parent than a child, and so the punctuation should be written on the left.

    For operators:

    let myvalue = (some large arithmetic expression)
                + (some other arithmetic expression)
                + (a third arithmetic expression)

    Here, (+) clearly is a parent in the syntax tree, so postfixing at the end of line would be against b without significant benefit.

I wish to stress the main goals are mentioned above - the rule is just an attempt to capture what makes code easy to parse, which of course will be subjective. And in my experience, being able to ignore everything "to the right" and to grasp the (upper part of) the syntactic structure is very helpful.

Context-Sensitivity versus Least-Newlines

Above we discuss when to align, but not where to indent. Often there are multiple options that are in line with the rule above, for example consider:

myList = [ myFunction1 arg1
         , myFunction2 arg4

This uses least amount of newlines by making use of "hanging indentation";

myList =
  [ myFunction1
  , myFunction2

This uses more lines, but generally behaves better when doing local changes (is less context sensitive).

Consider what happens when replacing arg2 with someReallyLongArgThatNeedsLotsOfSpace. We might have to remove the hanging indentation from the first choice as to prevent overflowing our column limit, which affects the parent of the node which we modify. With the second choice, this cannot happen. And when your colleague replaces arg5 similarly in her branch, also forced to make a non-local change, you end up with a merge conflict.
It doesn't even have to be a code change - adding a longer comment after one of those arguments can have the same result.

One might be inclined to call the context-insensitive approach "more consistent", but i don't think this is fair: It is not about whether we consistently apply rules, but how these rules are formed (and perhaps how complex the rules are).

Our goals 1 (less lines -> more readable) and 3 (context insensitive -> local changes stay local) clash. I am not sure if we can specify the best choice in general. Some people will prefer to entirely stop using hanging indentation, and will mention that shorter code is not actually any more readable. But i think that when you have sufficiently complex logic, and accordingly complex code, it does matter if it fits on one screen or not. In many cases i tend to overrule consistency. We'll look at some such cases and the corresponding reasoning:

Function (and Operator) Application

This has several more complex cases. We have seen the hanging-indent question above already - i have no clear advice for that one. Brittany currently uses hanging indent, but i may be convinced to change that.

Consider some other example case, where this format is not only not strictly consistent, but also chooses to violate b):

mybinding = RH.performEvent_ $ postBuild <&> \() -> liftIO $ do
  runMaybeT postCliInit >>= \case
    Nothing -> return ()
    Just () -> do
      _ <- forkIO $ postCliInitAsync `catch` \(e :: SomeException) ->
        writeLogS LogLevelError (show e)
      return ()

You can see how everything including and below the runMaybeT line is part of the \() -> .., yet it is indented less than the start of the lambda. The same "issue" exists for \case and its children. Also, the binary operators in the first line should not even go in one line, because the leftmost branch of the syntax tree extends over several lines.

But as contrast, some fully consistent layouting for the same tree:

mybinding =
  $ postBuild
      \() ->
        $ do
          runMaybeT postCliInit
              Nothing -> return ()
              Just () -> do
                _ <-
                  $ postCliInitAsync
                      \(e :: SomeException) ->
                        writeLogS LogLevelError (show e)
                return ()

Welp, how ugly. And merely very consistently applied rules.

The reasoning we follow instead, to arrive at the much more readable variant, is: If the last child is layouted in such a way that it has a simple "head" (first line) plus a multi-line "body", then we allow to ignore the body when deciding on layouts for the parent. Example "heads" above are do, \case, and, by extension, liftIO $ do and so on.

Type Signatures

Personal preference is either one-line or non-hanging multiline, that is:

plusOne :: Int -> Int
-- or
  :: Text
  -> (SourcePos, Wobble)
  -> StateT TranslationState (Either (SourcePos, Text)) ()

This is one item where my recommendation differs from tibbe: No hanging indentation for the type signature. Refactoring function names forces re-indentation, which I dislike.

If you have grown used to grepping for "myFunction ::" to go to definition, "^myFunction" works with the recommended style.

I don't see much use in also avoiding the one-line option, because the size of your diff cannot be smaller than one line if you only have on line.

Guarded Patterns

The context-insensitive approach for function definitions would be:

  (Some long pattern)
  (Another long pattern)
  | some potentially long guard
  = potentially

But again, this would involve a lot of newlines for something simple like

collatz x | even x    = x `div` 2
          | otherwise = 3 * x + 1

This is another example where i think we want both context-sensitivity and even the hanging indentation for the guard.

Horizontal Alignment in General

Horizontal alignment is context-sensitive. Consider

go []                 ""     = True
go [WildCard        ] ""     = True
go (WildCard   :rest) (c:cs) = go rest (c : cs) || go (WildCard : rest) cs
go (Union globs:rest) cs     = any (\glob -> go (glob ++ rest) cs) globs
go []                 (_:_)  = False
go (_:_)              ""     = False

What happens if we, say, refactor Union to ThingUnion? Every single line will be affected, one way or another. Nonetheless, alignment in such cases increases readability a lot.

Non-Newline Whitespace

Other Aspects

For all aspects not covered by the above, there is no one-fits-all rule. Only - Be consistent.

Some of my preferences are:

Implementation in the Brittany Formatter

Brittany implements the Stylistic Layouting Rule, yet it also implements all of the ("context-sensitive") compromises discussed in the "Context-Sensitivity versus Least-Newlines" chapter to reduce the number of newlines. This includes hanging indentation for function application.

mybinding = RH.performEvent_ $ postBuild <&> \() -> liftIO $ do
  runMaybeT postCliInit >>= \case
    Nothing -> return ()
    Just () -> do
      _ <- forkIO $ postCliInitAsync `catch` \(e :: SomeException) ->
        writeLogS LogLevelError (show e)
      return ()

is the exact layout produced by brittany.

Horizontal alignment is currently by default enabled, but only for alignment between consecutive lines.

It is planned to include configuration options to disable several of the potentially problematic context-sensitive layouts.

In several aspects brittany is configurable, e.g. columns (default 80) and indent(ation) (default 2). Horizontal alignment can be turned off entirely, as well.

Posted by Lennart Spitzner on 2017-10-28. Feedback to (blog at thisdomain) welcome!
Tags: . Source.