Add documentation around the BriDoc type/api
parent
cea81d5369
commit
ed10137174
|
@ -101,3 +101,8 @@ stack build
|
||||||
- -XTemplateHaskell
|
- -XTemplateHaskell
|
||||||
- -XBangPatterns
|
- -XBangPatterns
|
||||||
~~~~
|
~~~~
|
||||||
|
|
||||||
|
# Implementation
|
||||||
|
|
||||||
|
I have started adding documentation about the main data type `BriDoc`; see the
|
||||||
|
"docs/implementation" folder. Start with "bridoc-design.md".
|
||||||
|
|
|
@ -0,0 +1,203 @@
|
||||||
|
# BriDoc nodes/Smart constructors and their semantics
|
||||||
|
|
||||||
|
At this point, you should have a rough idea of what the involved
|
||||||
|
types mean. This leaves us to explain the different `BriDoc`
|
||||||
|
(smart) constructors and their exact semantics.
|
||||||
|
|
||||||
|
### Special nodes
|
||||||
|
|
||||||
|
- docDebug/BDDebug
|
||||||
|
|
||||||
|
Like the `trace` statement of the `BriDoc` type. It does not affect the
|
||||||
|
normal output, but prints stuff to stderr when the transformation traverses
|
||||||
|
this node.
|
||||||
|
|
||||||
|
- BDExternal is used for original-source reproduction.
|
||||||
|
|
||||||
|
### Basic nodes
|
||||||
|
|
||||||
|
- docEmpty/BDEmpty Text
|
||||||
|
|
||||||
|
""
|
||||||
|
|
||||||
|
The empty document. Has empty output. Should never affect layouting.
|
||||||
|
|
||||||
|
- docLit/BDLit
|
||||||
|
|
||||||
|
"a" "Maybe" "("
|
||||||
|
|
||||||
|
The most basic building block - a simple string. Has nothing to do with
|
||||||
|
literals in the parsing sense. Will always be produces as-is in the output.
|
||||||
|
It must be free of newline characters and should normally be free of any
|
||||||
|
spaces (because those would never be considered for line-breaking - but there
|
||||||
|
are cases where this makes sense still).
|
||||||
|
|
||||||
|
- docSeq/BDSeq [BriDoc]
|
||||||
|
|
||||||
|
"func foo = 13"
|
||||||
|
|
||||||
|
A in-line/horizontal sequence of sub-docs. The sub-documents should not
|
||||||
|
contain any newlines, but there is an exception: The last element of the
|
||||||
|
sequence may be multi-line. In combination with `docSetBaseY` this allows
|
||||||
|
for example:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
foo | bar = 1
|
||||||
|
| baz = 2
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
which is represented roughly like
|
||||||
|
|
||||||
|
~~~~
|
||||||
|
docSeq
|
||||||
|
"foo"
|
||||||
|
space
|
||||||
|
docSetBaseY
|
||||||
|
docLines
|
||||||
|
stuff that results in "| bar = 1"
|
||||||
|
stuff that results in "| baz = 2"
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
But in general it should be preferred to use `docPar` to handle multi-line
|
||||||
|
sub-nodes, where possible.
|
||||||
|
|
||||||
|
- docAlt/BDAlt [BriDoc]
|
||||||
|
|
||||||
|
Specify multiple alternative layouts. Take care to appropriately maintain
|
||||||
|
sharing for the documents representing the children of the current node.
|
||||||
|
|
||||||
|
- docAltFilter
|
||||||
|
|
||||||
|
simple utility wrapper around `docAlt`: Each alternative is accompanied by
|
||||||
|
a boolean; if False the alternative is discarded.
|
||||||
|
|
||||||
|
- docPar/BDPar
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
- docLines/BDLines
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
- docSeparator/BDSeparator
|
||||||
|
|
||||||
|
Adds a space, unless it is the last element in a line. Also merges with
|
||||||
|
other separators and has no effect if inserted right after inserting space
|
||||||
|
(e.g. in the start of a line when indented) or if already indented due to
|
||||||
|
horizontal alignment.
|
||||||
|
|
||||||
|
### Creating horizontal alignment
|
||||||
|
|
||||||
|
- docCols/BDCols ColSig [BriDoc]
|
||||||
|
|
||||||
|
This works like docSeq, but adds horizontal alignment if possible. The
|
||||||
|
implementation involves a lot of special-case trickeries and I assume that
|
||||||
|
it is impossible to specify the exact semantics. But the rough idea is:
|
||||||
|
If
|
||||||
|
|
||||||
|
1. horizontal alignment is not turned off via global config
|
||||||
|
2. there are consecutive lines (created e.g. by docLines or docPar) and
|
||||||
|
3. both lines consist of docCols (where "consist" can ignore certain shallow
|
||||||
|
wrappers like `docAddBaseY`) and
|
||||||
|
4. the two ColSigs are equal and
|
||||||
|
5. the two docCols contain an equal number of children and
|
||||||
|
6. there is enough horizontal space to insert the additional spaces
|
||||||
|
|
||||||
|
then the contained docs will be aligned horizontally.
|
||||||
|
|
||||||
|
And further, if there are multiple lines so that consecutive pairs fulfill
|
||||||
|
these requirements, the whole block will be aligned to the same horizontal
|
||||||
|
tabs.
|
||||||
|
|
||||||
|
And further, if a docCols contains another docCols, and the docCols in the
|
||||||
|
next line also does, and the child docCols also match in ColSigs and have
|
||||||
|
the same number of arguments and so on, then the children's children are
|
||||||
|
also aligned horizontally.
|
||||||
|
|
||||||
|
And of course this nesting also works over blocks built of matching
|
||||||
|
consecutive pairs.
|
||||||
|
|
||||||
|
Wait, was this not supposed to be broadly simplifying? Well.. it is. uhm.
|
||||||
|
Let us just.. example.. an example seems fine.
|
||||||
|
|
||||||
|
Considering the following declaration/formatting:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
func (MyLongFoo abc def) = 1
|
||||||
|
func (Bar a d ) = 2
|
||||||
|
func _ = 3
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
Note how the "=" are aligned over all three lines, and the patterns in the
|
||||||
|
first two lines are as well, but the pattern in the third line is just a
|
||||||
|
structureless underscore?
|
||||||
|
|
||||||
|
The representation behind that source is something in the direction of this
|
||||||
|
(heavily simplified and not exact at all; e.g. spaces are not represented at
|
||||||
|
all):
|
||||||
|
|
||||||
|
~~~~
|
||||||
|
docLines
|
||||||
|
docCols equation
|
||||||
|
"func"
|
||||||
|
docCols
|
||||||
|
"("
|
||||||
|
"MyLongFoo"
|
||||||
|
"abc"
|
||||||
|
"def"
|
||||||
|
")"
|
||||||
|
docSeq
|
||||||
|
"="
|
||||||
|
"1"
|
||||||
|
docCols equation
|
||||||
|
"func"
|
||||||
|
docCols
|
||||||
|
"("
|
||||||
|
"Bar"
|
||||||
|
"a"
|
||||||
|
"d"
|
||||||
|
")"
|
||||||
|
docSeq
|
||||||
|
"="
|
||||||
|
"2"
|
||||||
|
docCols equation
|
||||||
|
"func"
|
||||||
|
"_"
|
||||||
|
docSeq
|
||||||
|
"="
|
||||||
|
"3"
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
### Controlling indentation level
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
- docAddBaseY/BDAddBaseY
|
||||||
|
- docSetBaseY
|
||||||
|
- docSetIndentLevel
|
||||||
|
- docSetBaseAndIndent
|
||||||
|
- docEnsureIndent
|
||||||
|
|
||||||
|
### Controlling layouting
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
- docNonBottomSpacing
|
||||||
|
- docSetParSpacing
|
||||||
|
- docForceParSpacing
|
||||||
|
- docForceSingleline
|
||||||
|
- docForceMultiline
|
||||||
|
|
||||||
|
### Inserting comments / Controlling comment placement
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
- docAnnotationPrior
|
||||||
|
- docAnnotationKW
|
||||||
|
- docAnnotationRest
|
||||||
|
|
||||||
|
### Deprecated
|
||||||
|
|
||||||
|
- BDForwardLineMode is unused and apparently should be deprecated.
|
||||||
|
- BDProhibitMTEL is deprecated
|
||||||
|
|
|
@ -0,0 +1,232 @@
|
||||||
|
# The BriDoc type and the to-BriDoc transformation
|
||||||
|
|
||||||
|
The `BriDoc` type is the brittany equivalent of the `Doc` type from
|
||||||
|
general-purpose formatting libraries such as the `pretty` package.
|
||||||
|
It is specialized for this usecase: Representing a formatted
|
||||||
|
haskell source code document. As a consequence, it is a good amount
|
||||||
|
more complex than the `Doc` type (which has 8, not directly exposed,
|
||||||
|
constructors): The `BriDoc` type has ~25 constructors.
|
||||||
|
(26, but one for debugging, two deprecated and so on.)
|
||||||
|
Examples are `BDEmpty`, `BDSeq [BriDoc]` (inline sequence),
|
||||||
|
and `BDAddBaseY BrIndent BriDoc` (add a certain type of indentation
|
||||||
|
to the inner doc).
|
||||||
|
|
||||||
|
The main bulk of code that makes brittany work is the translation
|
||||||
|
of different syntactical constructs into a raw `BriDoc` value.
|
||||||
|
(technically a `BriDocF` value, we'll explain soon.)
|
||||||
|
|
||||||
|
The input of this translation is the syntax tree produced by
|
||||||
|
GHC/ExactPrint. The ghc api exposes the syntax tree nodes, and
|
||||||
|
ExactPrint adds certain annotations (e.g. information about
|
||||||
|
in-source comments). The main thing that you will be looking
|
||||||
|
at here is the ghc api documentation, for example
|
||||||
|
https://downloads.haskell.org/~ghc/8.0.2/docs/html/libraries/ghc-8.0.2/HsDecls.html
|
||||||
|
|
||||||
|
## Two examples of the process producing raw BriDoc
|
||||||
|
|
||||||
|
1. For example, `Brittany.hs` contains the following code (shortened a bit):
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
ppDecl d@(L loc decl) = case decl of
|
||||||
|
SigD sig -> [..] $ do
|
||||||
|
briDoc <- briDocMToPPM $ layoutSig (L loc sig)
|
||||||
|
layoutBriDoc d briDoc
|
||||||
|
ValD bind -> [..] $ do
|
||||||
|
briDoc <- [..] layoutBind (L loc bind)
|
||||||
|
layoutBriDoc d briDoc
|
||||||
|
_ -> briDocMToPPM (briDocByExactNoComment d) >>= layoutBriDoc d
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
which matches on the type of module top-level syntax node and
|
||||||
|
dispatches to `layoutSig`/`layoutBind` to layout type signatures
|
||||||
|
and equations. For all other constructs, it currently falls back to using
|
||||||
|
ExactPrint to reproduce the exact original.
|
||||||
|
|
||||||
|
2. Lets look at a "lower" level fragment that actually produces BriDoc (from Type.hs):
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
-- if our type is an application; think "HsAppTy Maybe Int"
|
||||||
|
HsAppTy typ1 typ2 -> do
|
||||||
|
typeDoc1 <- docSharedWrapper layoutType typ1 -- layout `Maybe`
|
||||||
|
typeDoc2 <- docSharedWrapper layoutType typ2 -- layout `Int`
|
||||||
|
docAlt -- produce two possible layouts
|
||||||
|
[ docSeq -- a singular-line sequence, with a space in between
|
||||||
|
[ docForceSingleline typeDoc1 -- "Maybe Int"
|
||||||
|
, docLit $ Text.pack " "
|
||||||
|
, docForceSingleline typeDoc2
|
||||||
|
]
|
||||||
|
, docPar -- an multi-line result, with the "child" indented.
|
||||||
|
typeDoc1 -- "Maybe\
|
||||||
|
(docEnsureIndent BrIndentRegular typeDoc2) -- Int"
|
||||||
|
]
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
here, all functions prefixed with "doc" produces new BriDoc(F) nodes.
|
||||||
|
I think this example can be understood already, even when many details
|
||||||
|
(what is `docSharedWrapper`?
|
||||||
|
What are the exact semantics of the different `doc..` functions?
|
||||||
|
Why do we need to wrap the `BriDoc` constructors behind those smart-constructor thingies?)
|
||||||
|
are not explained yet.
|
||||||
|
|
||||||
|
## Size of BriDoc trees, Sharing and Complexity
|
||||||
|
|
||||||
|
In order to explain the `BriDocF` type and the reasoning behind smart
|
||||||
|
constructors, we need to consider the size of the `BriDoc` tree produced by
|
||||||
|
this whole process.
|
||||||
|
As seen above, we can have multiple alternative layouts (`docAlt`) for
|
||||||
|
the same node.
|
||||||
|
This means the number of nodes in the `BriDoc` value we produces in general is
|
||||||
|
exponential in the number for syntax nodes of the input.
|
||||||
|
|
||||||
|
But we are targeting for linear run-time, right? So what can save us here?
|
||||||
|
You might think: We have sharing! For `let x = 3+3; (x, x)` we only have one
|
||||||
|
`x` in memory ever. And indeed, we do the same above: `typeDoc1` and `2` are
|
||||||
|
used in exactly that manner: Both are referenced once in each of the two
|
||||||
|
alternatives.
|
||||||
|
|
||||||
|
Unfortunately this does not mean that we can forget this issue entirely.
|
||||||
|
The problem is that the BriDoc tree value will get transformed by multiple
|
||||||
|
transformations. And this "breaks" sharing: If we take an exponential-sized
|
||||||
|
tree that is linear-via-sharing and `fmap` some function `f` on it (think of
|
||||||
|
some general-purpose tree that is Functor) then `f` will be evaluated an
|
||||||
|
exponential number of times. And worse, the output will have lost any sharing.
|
||||||
|
Sharing is not automatic memoization.
|
||||||
|
And this holds for BriDoc, even when the transformations are not exactly
|
||||||
|
`fmap`s.
|
||||||
|
|
||||||
|
So.. we already mentioned "memoization" there, right?
|
||||||
|
|
||||||
|
1. The bad news:
|
||||||
|
Any existing memoization utilities/approaches didn't work for one reason
|
||||||
|
or another. (I suspect that there is a bug in the GHC StableName
|
||||||
|
implementation, or I messed up..) After trying several memoization
|
||||||
|
approaches and wasting tons of time, I went with a manual approach,
|
||||||
|
and it worked more or less instantly. So that is where we are at.
|
||||||
|
|
||||||
|
Manual memoization means that we manually tag every node of the `BriDoc`
|
||||||
|
with a unique `Int`. This is rather annoying at places, but then again
|
||||||
|
we can abstract over that pretty well.
|
||||||
|
|
||||||
|
2. The good news:
|
||||||
|
With manual memoization, creating an exponentially-sized tree is no
|
||||||
|
problem, presuming that it is linear-via-sharing. Not messing up this
|
||||||
|
property can take a bit of consideration - but otherwise we are set.
|
||||||
|
If the `BriDocF` tree is exponential, the transformations will still
|
||||||
|
do only linear-amount of "selection work" in order to convert into a
|
||||||
|
linear-sized `BriDoc` tree.
|
||||||
|
|
||||||
|
This property is the defining one that motivates the BriDoc
|
||||||
|
intermediate representation.
|
||||||
|
|
||||||
|
## BriDocF
|
||||||
|
|
||||||
|
The `BriDocF f` type encapsulates the idea that each subnode is wrapped
|
||||||
|
in the `f` container. This notion gives us the following nice properties:
|
||||||
|
|
||||||
|
`BriDocF Identity ~ BriDoc` and `BriDocF ((,) Int)` is the
|
||||||
|
manual-memoization tree with labeled nodes. Abstractions, abstractions..
|
||||||
|
|
||||||
|
Lets have a glance at related code/types we have so far:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
-- The pure BriDoc: What we really want, but cannot use everywhere due
|
||||||
|
-- to sharing issues.
|
||||||
|
-- Isomorphic to `BriDocF Identity`. We still use this type, because
|
||||||
|
-- then we have to unwrap the `Identities` only in once place.
|
||||||
|
data BriDoc
|
||||||
|
= BDEmpty
|
||||||
|
| BDLit !Text
|
||||||
|
| BDSeq [BriDoc]
|
||||||
|
| BDAddBaseY BrIndent BriDoc
|
||||||
|
| BDAlt [BriDoc]
|
||||||
|
.. [a good amount more]
|
||||||
|
|
||||||
|
data BriDocF f
|
||||||
|
= BDFEmpty
|
||||||
|
| BDFLit !Text
|
||||||
|
| BDFSeq [f (BriDocF f)]
|
||||||
|
| BDFAddBaseY BrIndent (f (BriDocF f))
|
||||||
|
| BDFAlt [f (BriDocF f)]
|
||||||
|
.. [a good amount more]
|
||||||
|
|
||||||
|
type BriDocFInt = BriDocF ((,) Int)
|
||||||
|
type BriDocNumbered = (Int, BriDocFInt)
|
||||||
|
|
||||||
|
-- drop the labels
|
||||||
|
unwrapBriDocNumbered :: BriDocNumbered -> BriDoc
|
||||||
|
unwrapBriDocNumbered = ..
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
And, because we will need it below: The monadic context that the creation
|
||||||
|
of the BriDocF tree uses:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
-- If you are not familiar with the `multistate`
|
||||||
|
-- package and RWS, this is somewhat similar to:
|
||||||
|
-- ReaderT Config (ReaderT Anns (WriterT [LayoutError] (WriterT (Seq String) (State NodeAllocIndex))))
|
||||||
|
-- i.e. it is basically an environment allowing:
|
||||||
|
-- a) read access to global program config `Config` and the exactprint
|
||||||
|
-- annotations `Anns` of given input;
|
||||||
|
-- b) write access of errors and "good" output;
|
||||||
|
-- c) a local/"State" "variable" `NodeAllocIndex`
|
||||||
|
-- (yep, for the manual memoization node labels).
|
||||||
|
type ToBriDocM = MultiRWSS.MultiRWS '[Config, Anns] '[[LayoutError], Seq String] '[NodeAllocIndex]
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
We don't use this directly, but the code below uses this,
|
||||||
|
and if the type `ToBriDocM` scared you, see how mundane it
|
||||||
|
is used here (`m` will be `ToBriDocM` mostly):
|
||||||
|
|
||||||
|
~~~~
|
||||||
|
allocNodeIndex :: MonadMultiState NodeAllocIndex m => m Int
|
||||||
|
allocNodeIndex = do
|
||||||
|
NodeAllocIndex i <- mGet
|
||||||
|
mSet $ NodeAllocIndex (i + 1)
|
||||||
|
return i
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
## The `doc..` smart constructors
|
||||||
|
|
||||||
|
In most cases the smart constructors are fairly dumb: Their main purpose
|
||||||
|
is to allocate the unique label for the current node, and return it
|
||||||
|
together with the node itself. Lets look at two examples to get a
|
||||||
|
feeling for the types involved:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
docEmpty :: ToBriDocM BriDocNumbered
|
||||||
|
docEmpty = allocateNode BDFEmpty -- what a "smart" constructor, right?
|
||||||
|
|
||||||
|
docSeq :: [ToBriDocM BriDocNumbered] -> ToBriDocM BriDocNumbered
|
||||||
|
docSeq l = allocateNode . BDFSeq =<< sequence l
|
||||||
|
-- this is a bit more elaborate: In order to allow proper
|
||||||
|
-- composition of these smart constructors, we accept a list of
|
||||||
|
-- actions instead of just `BriDocNumbered`s, and use `sequence`
|
||||||
|
-- to make it work. Nothing unusual otherwise.
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
There is one rather special `doc..` function: `docSharedWrapper`.
|
||||||
|
Lets consider the code first:
|
||||||
|
|
||||||
|
~~~~.hs
|
||||||
|
docSharedWrapper :: Monad m => (x -> m y) -> x -> m (m y)
|
||||||
|
docSharedWrapper f x = return <$> f x
|
||||||
|
~~~~
|
||||||
|
|
||||||
|
How is this useful? Consider this: All the smart constructors
|
||||||
|
expect as input actions returning (freshly labeled) nodes.
|
||||||
|
But what if we want sharing? In those cases we do _not_ want
|
||||||
|
fresh labels on multiple uses. Here `docSharedWrapper` comes
|
||||||
|
into play: It executes the contained label-allocation once
|
||||||
|
and returns a pure action via `return`; this pure action
|
||||||
|
can then be passed e.g. to docSeq but does not do any new
|
||||||
|
allocation. This gives us sharing in the cases where we
|
||||||
|
want it.
|
||||||
|
|
||||||
|
But wait, one more thing: Not all `BriDoc` constructors have
|
||||||
|
an exactly matching smart constructor, and there are smart
|
||||||
|
constructors that involve multiple BriDoc constructors behind
|
||||||
|
the scenes. For this reason, we will focus on the smart
|
||||||
|
constructors in the following, because they define the
|
||||||
|
real interface to be used.
|
||||||
|
|
||||||
|
You now might have a glance at "bridoc-api.md"
|
Loading…
Reference in New Issue