From ed1013717427ba0109f0d3eee29b24a1623ba15f Mon Sep 17 00:00:00 2001 From: Lennart Spitzner Date: Mon, 6 Mar 2017 20:49:08 +0100 Subject: [PATCH] Add documentation around the BriDoc type/api --- README.md | 5 + docs/implementation/bridoc-api.md | 203 +++++++++++++++++++++++ docs/implementation/bridoc-design.md | 232 +++++++++++++++++++++++++++ 3 files changed, 440 insertions(+) create mode 100644 docs/implementation/bridoc-api.md create mode 100644 docs/implementation/bridoc-design.md diff --git a/README.md b/README.md index 0992220..79a7b74 100644 --- a/README.md +++ b/README.md @@ -101,3 +101,8 @@ stack build - -XTemplateHaskell - -XBangPatterns ~~~~ + +# Implementation + +I have started adding documentation about the main data type `BriDoc`; see the +"docs/implementation" folder. Start with "bridoc-design.md". diff --git a/docs/implementation/bridoc-api.md b/docs/implementation/bridoc-api.md new file mode 100644 index 0000000..c4c53e7 --- /dev/null +++ b/docs/implementation/bridoc-api.md @@ -0,0 +1,203 @@ +# BriDoc nodes/Smart constructors and their semantics + +At this point, you should have a rough idea of what the involved +types mean. This leaves us to explain the different `BriDoc` +(smart) constructors and their exact semantics. + +### Special nodes + +- docDebug/BDDebug + + Like the `trace` statement of the `BriDoc` type. It does not affect the + normal output, but prints stuff to stderr when the transformation traverses + this node. + +- BDExternal is used for original-source reproduction. + +### Basic nodes + +- docEmpty/BDEmpty Text + + "" + + The empty document. Has empty output. Should never affect layouting. + +- docLit/BDLit + + "a" "Maybe" "(" + + The most basic building block - a simple string. Has nothing to do with + literals in the parsing sense. Will always be produces as-is in the output. + It must be free of newline characters and should normally be free of any + spaces (because those would never be considered for line-breaking - but there + are cases where this makes sense still). + +- docSeq/BDSeq [BriDoc] + + "func foo = 13" + + A in-line/horizontal sequence of sub-docs. The sub-documents should not + contain any newlines, but there is an exception: The last element of the + sequence may be multi-line. In combination with `docSetBaseY` this allows + for example: + + ~~~~.hs + foo | bar = 1 + | baz = 2 + ~~~~ + + which is represented roughly like + + ~~~~ + docSeq + "foo" + space + docSetBaseY + docLines + stuff that results in "| bar = 1" + stuff that results in "| baz = 2" + ~~~~ + + But in general it should be preferred to use `docPar` to handle multi-line + sub-nodes, where possible. + +- docAlt/BDAlt [BriDoc] + + Specify multiple alternative layouts. Take care to appropriately maintain + sharing for the documents representing the children of the current node. + +- docAltFilter + + simple utility wrapper around `docAlt`: Each alternative is accompanied by + a boolean; if False the alternative is discarded. + +- docPar/BDPar + + TODO + +- docLines/BDLines + + TODO + +- docSeparator/BDSeparator + + Adds a space, unless it is the last element in a line. Also merges with + other separators and has no effect if inserted right after inserting space + (e.g. in the start of a line when indented) or if already indented due to + horizontal alignment. + +### Creating horizontal alignment + +- docCols/BDCols ColSig [BriDoc] + + This works like docSeq, but adds horizontal alignment if possible. The + implementation involves a lot of special-case trickeries and I assume that + it is impossible to specify the exact semantics. But the rough idea is: + If + + 1. horizontal alignment is not turned off via global config + 2. there are consecutive lines (created e.g. by docLines or docPar) and + 3. both lines consist of docCols (where "consist" can ignore certain shallow + wrappers like `docAddBaseY`) and + 4. the two ColSigs are equal and + 5. the two docCols contain an equal number of children and + 6. there is enough horizontal space to insert the additional spaces + + then the contained docs will be aligned horizontally. + + And further, if there are multiple lines so that consecutive pairs fulfill + these requirements, the whole block will be aligned to the same horizontal + tabs. + + And further, if a docCols contains another docCols, and the docCols in the + next line also does, and the child docCols also match in ColSigs and have + the same number of arguments and so on, then the children's children are + also aligned horizontally. + + And of course this nesting also works over blocks built of matching + consecutive pairs. + + Wait, was this not supposed to be broadly simplifying? Well.. it is. uhm. + Let us just.. example.. an example seems fine. + + Considering the following declaration/formatting: + + ~~~~.hs + func (MyLongFoo abc def) = 1 + func (Bar a d ) = 2 + func _ = 3 + ~~~~ + + Note how the "=" are aligned over all three lines, and the patterns in the + first two lines are as well, but the pattern in the third line is just a + structureless underscore? + + The representation behind that source is something in the direction of this + (heavily simplified and not exact at all; e.g. spaces are not represented at + all): + + ~~~~ + docLines + docCols equation + "func" + docCols + "(" + "MyLongFoo" + "abc" + "def" + ")" + docSeq + "=" + "1" + docCols equation + "func" + docCols + "(" + "Bar" + "a" + "d" + ")" + docSeq + "=" + "2" + docCols equation + "func" + "_" + docSeq + "=" + "3" + ~~~~ + +### Controlling indentation level + +TODO + +- docAddBaseY/BDAddBaseY +- docSetBaseY +- docSetIndentLevel +- docSetBaseAndIndent +- docEnsureIndent + +### Controlling layouting + +TODO + +- docNonBottomSpacing +- docSetParSpacing +- docForceParSpacing +- docForceSingleline +- docForceMultiline + +### Inserting comments / Controlling comment placement + +TODO + +- docAnnotationPrior +- docAnnotationKW +- docAnnotationRest + +### Deprecated + +- BDForwardLineMode is unused and apparently should be deprecated. +- BDProhibitMTEL is deprecated + diff --git a/docs/implementation/bridoc-design.md b/docs/implementation/bridoc-design.md new file mode 100644 index 0000000..31103a3 --- /dev/null +++ b/docs/implementation/bridoc-design.md @@ -0,0 +1,232 @@ +# The BriDoc type and the to-BriDoc transformation + +The `BriDoc` type is the brittany equivalent of the `Doc` type from +general-purpose formatting libraries such as the `pretty` package. +It is specialized for this usecase: Representing a formatted +haskell source code document. As a consequence, it is a good amount +more complex than the `Doc` type (which has 8, not directly exposed, +constructors): The `BriDoc` type has ~25 constructors. +(26, but one for debugging, two deprecated and so on.) +Examples are `BDEmpty`, `BDSeq [BriDoc]` (inline sequence), +and `BDAddBaseY BrIndent BriDoc` (add a certain type of indentation +to the inner doc). + +The main bulk of code that makes brittany work is the translation +of different syntactical constructs into a raw `BriDoc` value. +(technically a `BriDocF` value, we'll explain soon.) + +The input of this translation is the syntax tree produced by +GHC/ExactPrint. The ghc api exposes the syntax tree nodes, and +ExactPrint adds certain annotations (e.g. information about +in-source comments). The main thing that you will be looking +at here is the ghc api documentation, for example +https://downloads.haskell.org/~ghc/8.0.2/docs/html/libraries/ghc-8.0.2/HsDecls.html + +## Two examples of the process producing raw BriDoc + +1. For example, `Brittany.hs` contains the following code (shortened a bit): + + ~~~~.hs + ppDecl d@(L loc decl) = case decl of + SigD sig -> [..] $ do + briDoc <- briDocMToPPM $ layoutSig (L loc sig) + layoutBriDoc d briDoc + ValD bind -> [..] $ do + briDoc <- [..] layoutBind (L loc bind) + layoutBriDoc d briDoc + _ -> briDocMToPPM (briDocByExactNoComment d) >>= layoutBriDoc d + ~~~~ + + which matches on the type of module top-level syntax node and + dispatches to `layoutSig`/`layoutBind` to layout type signatures + and equations. For all other constructs, it currently falls back to using + ExactPrint to reproduce the exact original. + +2. Lets look at a "lower" level fragment that actually produces BriDoc (from Type.hs): + + ~~~~.hs + -- if our type is an application; think "HsAppTy Maybe Int" + HsAppTy typ1 typ2 -> do + typeDoc1 <- docSharedWrapper layoutType typ1 -- layout `Maybe` + typeDoc2 <- docSharedWrapper layoutType typ2 -- layout `Int` + docAlt -- produce two possible layouts + [ docSeq -- a singular-line sequence, with a space in between + [ docForceSingleline typeDoc1 -- "Maybe Int" + , docLit $ Text.pack " " + , docForceSingleline typeDoc2 + ] + , docPar -- an multi-line result, with the "child" indented. + typeDoc1 -- "Maybe\ + (docEnsureIndent BrIndentRegular typeDoc2) -- Int" + ] + ~~~~ + + here, all functions prefixed with "doc" produces new BriDoc(F) nodes. + I think this example can be understood already, even when many details + (what is `docSharedWrapper`? + What are the exact semantics of the different `doc..` functions? + Why do we need to wrap the `BriDoc` constructors behind those smart-constructor thingies?) + are not explained yet. + +## Size of BriDoc trees, Sharing and Complexity + +In order to explain the `BriDocF` type and the reasoning behind smart +constructors, we need to consider the size of the `BriDoc` tree produced by +this whole process. +As seen above, we can have multiple alternative layouts (`docAlt`) for +the same node. +This means the number of nodes in the `BriDoc` value we produces in general is +exponential in the number for syntax nodes of the input. + +But we are targeting for linear run-time, right? So what can save us here? +You might think: We have sharing! For `let x = 3+3; (x, x)` we only have one +`x` in memory ever. And indeed, we do the same above: `typeDoc1` and `2` are +used in exactly that manner: Both are referenced once in each of the two +alternatives. + +Unfortunately this does not mean that we can forget this issue entirely. +The problem is that the BriDoc tree value will get transformed by multiple +transformations. And this "breaks" sharing: If we take an exponential-sized +tree that is linear-via-sharing and `fmap` some function `f` on it (think of +some general-purpose tree that is Functor) then `f` will be evaluated an +exponential number of times. And worse, the output will have lost any sharing. +Sharing is not automatic memoization. +And this holds for BriDoc, even when the transformations are not exactly +`fmap`s. + +So.. we already mentioned "memoization" there, right? + +1. The bad news: + Any existing memoization utilities/approaches didn't work for one reason + or another. (I suspect that there is a bug in the GHC StableName + implementation, or I messed up..) After trying several memoization + approaches and wasting tons of time, I went with a manual approach, + and it worked more or less instantly. So that is where we are at. + + Manual memoization means that we manually tag every node of the `BriDoc` + with a unique `Int`. This is rather annoying at places, but then again + we can abstract over that pretty well. + +2. The good news: + With manual memoization, creating an exponentially-sized tree is no + problem, presuming that it is linear-via-sharing. Not messing up this + property can take a bit of consideration - but otherwise we are set. + If the `BriDocF` tree is exponential, the transformations will still + do only linear-amount of "selection work" in order to convert into a + linear-sized `BriDoc` tree. + + This property is the defining one that motivates the BriDoc + intermediate representation. + +## BriDocF + +The `BriDocF f` type encapsulates the idea that each subnode is wrapped +in the `f` container. This notion gives us the following nice properties: + +`BriDocF Identity ~ BriDoc` and `BriDocF ((,) Int)` is the +manual-memoization tree with labeled nodes. Abstractions, abstractions.. + +Lets have a glance at related code/types we have so far: + +~~~~.hs +-- The pure BriDoc: What we really want, but cannot use everywhere due +-- to sharing issues. +-- Isomorphic to `BriDocF Identity`. We still use this type, because +-- then we have to unwrap the `Identities` only in once place. +data BriDoc + = BDEmpty + | BDLit !Text + | BDSeq [BriDoc] + | BDAddBaseY BrIndent BriDoc + | BDAlt [BriDoc] + .. [a good amount more] + +data BriDocF f + = BDFEmpty + | BDFLit !Text + | BDFSeq [f (BriDocF f)] + | BDFAddBaseY BrIndent (f (BriDocF f)) + | BDFAlt [f (BriDocF f)] + .. [a good amount more] + +type BriDocFInt = BriDocF ((,) Int) +type BriDocNumbered = (Int, BriDocFInt) + +-- drop the labels +unwrapBriDocNumbered :: BriDocNumbered -> BriDoc +unwrapBriDocNumbered = .. +~~~~ + +And, because we will need it below: The monadic context that the creation +of the BriDocF tree uses: + +~~~~.hs +-- If you are not familiar with the `multistate` +-- package and RWS, this is somewhat similar to: +-- ReaderT Config (ReaderT Anns (WriterT [LayoutError] (WriterT (Seq String) (State NodeAllocIndex)))) +-- i.e. it is basically an environment allowing: +-- a) read access to global program config `Config` and the exactprint +-- annotations `Anns` of given input; +-- b) write access of errors and "good" output; +-- c) a local/"State" "variable" `NodeAllocIndex` +-- (yep, for the manual memoization node labels). +type ToBriDocM = MultiRWSS.MultiRWS '[Config, Anns] '[[LayoutError], Seq String] '[NodeAllocIndex] +~~~~ + +We don't use this directly, but the code below uses this, +and if the type `ToBriDocM` scared you, see how mundane it +is used here (`m` will be `ToBriDocM` mostly): + +~~~~ +allocNodeIndex :: MonadMultiState NodeAllocIndex m => m Int +allocNodeIndex = do + NodeAllocIndex i <- mGet + mSet $ NodeAllocIndex (i + 1) + return i +~~~~ + +## The `doc..` smart constructors + +In most cases the smart constructors are fairly dumb: Their main purpose +is to allocate the unique label for the current node, and return it +together with the node itself. Lets look at two examples to get a +feeling for the types involved: + +~~~~.hs +docEmpty :: ToBriDocM BriDocNumbered +docEmpty = allocateNode BDFEmpty -- what a "smart" constructor, right? + +docSeq :: [ToBriDocM BriDocNumbered] -> ToBriDocM BriDocNumbered +docSeq l = allocateNode . BDFSeq =<< sequence l +-- this is a bit more elaborate: In order to allow proper +-- composition of these smart constructors, we accept a list of +-- actions instead of just `BriDocNumbered`s, and use `sequence` +-- to make it work. Nothing unusual otherwise. +~~~~ + +There is one rather special `doc..` function: `docSharedWrapper`. +Lets consider the code first: + +~~~~.hs +docSharedWrapper :: Monad m => (x -> m y) -> x -> m (m y) +docSharedWrapper f x = return <$> f x +~~~~ + +How is this useful? Consider this: All the smart constructors +expect as input actions returning (freshly labeled) nodes. +But what if we want sharing? In those cases we do _not_ want +fresh labels on multiple uses. Here `docSharedWrapper` comes +into play: It executes the contained label-allocation once +and returns a pure action via `return`; this pure action +can then be passed e.g. to docSeq but does not do any new +allocation. This gives us sharing in the cases where we +want it. + +But wait, one more thing: Not all `BriDoc` constructors have +an exactly matching smart constructor, and there are smart +constructors that involve multiple BriDoc constructors behind +the scenes. For this reason, we will focus on the smart +constructors in the following, because they define the +real interface to be used. + +You now might have a glance at "bridoc-api.md"