233 lines
9.4 KiB
Markdown
233 lines
9.4 KiB
Markdown
# The BriDoc type and the to-BriDoc transformation
|
|
|
|
The `BriDoc` type is the brittany equivalent of the `Doc` type from
|
|
general-purpose formatting libraries such as the `pretty` package.
|
|
It is specialized for this usecase: Representing a formatted
|
|
haskell source code document. As a consequence, it is a good amount
|
|
more complex than the `Doc` type (which has 8, not directly exposed,
|
|
constructors): The `BriDoc` type has ~25 constructors.
|
|
(26, but one for debugging, two deprecated and so on.)
|
|
Examples are `BDEmpty`, `BDSeq [BriDoc]` (inline sequence),
|
|
and `BDAddBaseY BrIndent BriDoc` (add a certain type of indentation
|
|
to the inner doc).
|
|
|
|
The main bulk of code that makes brittany work is the translation
|
|
of different syntactical constructs into a raw `BriDoc` value.
|
|
(technically a `BriDocF` value, we'll explain soon.)
|
|
|
|
The input of this translation is the syntax tree produced by
|
|
GHC/ExactPrint. The ghc api exposes the syntax tree nodes, and
|
|
ExactPrint adds certain annotations (e.g. information about
|
|
in-source comments). The main thing that you will be looking
|
|
at here is the ghc api documentation, for example
|
|
https://downloads.haskell.org/~ghc/8.0.2/docs/html/libraries/ghc-8.0.2/HsDecls.html
|
|
|
|
## Two examples of the process producing raw BriDoc
|
|
|
|
1. For example, `Brittany.hs` contains the following code (shortened a bit):
|
|
|
|
~~~~.hs
|
|
ppDecl d@(L loc decl) = case decl of
|
|
SigD sig -> [..] $ do
|
|
briDoc <- briDocMToPPM $ layoutSig (L loc sig)
|
|
layoutBriDoc d briDoc
|
|
ValD bind -> [..] $ do
|
|
briDoc <- [..] layoutBind (L loc bind)
|
|
layoutBriDoc d briDoc
|
|
_ -> briDocMToPPM (briDocByExactNoComment d) >>= layoutBriDoc d
|
|
~~~~
|
|
|
|
which matches on the type of module top-level syntax node and
|
|
dispatches to `layoutSig`/`layoutBind` to layout type signatures
|
|
and equations. For all other constructs, it currently falls back to using
|
|
ExactPrint to reproduce the exact original.
|
|
|
|
2. Lets look at a "lower" level fragment that actually produces BriDoc (from Type.hs):
|
|
|
|
~~~~.hs
|
|
-- if our type is an application; think "HsAppTy Maybe Int"
|
|
HsAppTy typ1 typ2 -> do
|
|
typeDoc1 <- docSharedWrapper layoutType typ1 -- layout `Maybe`
|
|
typeDoc2 <- docSharedWrapper layoutType typ2 -- layout `Int`
|
|
docAlt -- produce two possible layouts
|
|
[ docSeq -- a singular-line sequence, with a space in between
|
|
[ docForceSingleline typeDoc1 -- "Maybe Int"
|
|
, docLit $ Text.pack " "
|
|
, docForceSingleline typeDoc2
|
|
]
|
|
, docPar -- an multi-line result, with the "child" indented.
|
|
typeDoc1 -- "Maybe\
|
|
(docEnsureIndent BrIndentRegular typeDoc2) -- Int"
|
|
]
|
|
~~~~
|
|
|
|
here, all functions prefixed with "doc" produces new BriDoc(F) nodes.
|
|
I think this example can be understood already, even when many details
|
|
(what is `docSharedWrapper`?
|
|
What are the exact semantics of the different `doc..` functions?
|
|
Why do we need to wrap the `BriDoc` constructors behind those smart-constructor thingies?)
|
|
are not explained yet.
|
|
|
|
## Size of BriDoc trees, Sharing and Complexity
|
|
|
|
In order to explain the `BriDocF` type and the reasoning behind smart
|
|
constructors, we need to consider the size of the `BriDoc` tree produced by
|
|
this whole process.
|
|
As seen above, we can have multiple alternative layouts (`docAlt`) for
|
|
the same node.
|
|
This means the number of nodes in the `BriDoc` value we produces in general is
|
|
exponential in the number for syntax nodes of the input.
|
|
|
|
But we are targeting for linear run-time, right? So what can save us here?
|
|
You might think: We have sharing! For `let x = 3+3; (x, x)` we only have one
|
|
`x` in memory ever. And indeed, we do the same above: `typeDoc1` and `2` are
|
|
used in exactly that manner: Both are referenced once in each of the two
|
|
alternatives.
|
|
|
|
Unfortunately this does not mean that we can forget this issue entirely.
|
|
The problem is that the BriDoc tree value will get transformed by multiple
|
|
transformations. And this "breaks" sharing: If we take an exponential-sized
|
|
tree that is linear-via-sharing and `fmap` some function `f` on it (think of
|
|
some general-purpose tree that is Functor) then `f` will be evaluated an
|
|
exponential number of times. And worse, the output will have lost any sharing.
|
|
Sharing is not automatic memoization.
|
|
And this holds for BriDoc, even when the transformations are not exactly
|
|
`fmap`s.
|
|
|
|
So.. we already mentioned "memoization" there, right?
|
|
|
|
1. The bad news:
|
|
Any existing memoization utilities/approaches didn't work for one reason
|
|
or another. (I suspect that there is a bug in the GHC StableName
|
|
implementation, or I messed up..) After trying several memoization
|
|
approaches and wasting tons of time, I went with a manual approach,
|
|
and it worked more or less instantly. So that is where we are at.
|
|
|
|
Manual memoization means that we manually tag every node of the `BriDoc`
|
|
with a unique `Int`. This is rather annoying at places, but then again
|
|
we can abstract over that pretty well.
|
|
|
|
2. The good news:
|
|
With manual memoization, creating an exponentially-sized tree is no
|
|
problem, presuming that it is linear-via-sharing. Not messing up this
|
|
property can take a bit of consideration - but otherwise we are set.
|
|
If the `BriDocF` tree is exponential, the transformations will still
|
|
do only linear-amount of "selection work" in order to convert into a
|
|
linear-sized `BriDoc` tree.
|
|
|
|
This property is the defining one that motivates the BriDoc
|
|
intermediate representation.
|
|
|
|
## BriDocF
|
|
|
|
The `BriDocF f` type encapsulates the idea that each subnode is wrapped
|
|
in the `f` container. This notion gives us the following nice properties:
|
|
|
|
`BriDocF Identity ~ BriDoc` and `BriDocF ((,) Int)` is the
|
|
manual-memoization tree with labeled nodes. Abstractions, abstractions..
|
|
|
|
Lets have a glance at related code/types we have so far:
|
|
|
|
~~~~.hs
|
|
-- The pure BriDoc: What we really want, but cannot use everywhere due
|
|
-- to sharing issues.
|
|
-- Isomorphic to `BriDocF Identity`. We still use this type, because
|
|
-- then we have to unwrap the `Identities` only in once place.
|
|
data BriDoc
|
|
= BDEmpty
|
|
| BDLit !Text
|
|
| BDSeq [BriDoc]
|
|
| BDAddBaseY BrIndent BriDoc
|
|
| BDAlt [BriDoc]
|
|
.. [a good amount more]
|
|
|
|
data BriDocF f
|
|
= BDFEmpty
|
|
| BDFLit !Text
|
|
| BDFSeq [f (BriDocF f)]
|
|
| BDFAddBaseY BrIndent (f (BriDocF f))
|
|
| BDFAlt [f (BriDocF f)]
|
|
.. [a good amount more]
|
|
|
|
type BriDocFInt = BriDocF ((,) Int)
|
|
type BriDocNumbered = (Int, BriDocFInt)
|
|
|
|
-- drop the labels
|
|
unwrapBriDocNumbered :: BriDocNumbered -> BriDoc
|
|
unwrapBriDocNumbered = ..
|
|
~~~~
|
|
|
|
And, because we will need it below: The monadic context that the creation
|
|
of the BriDocF tree uses:
|
|
|
|
~~~~.hs
|
|
-- If you are not familiar with the `multistate`
|
|
-- package and RWS, this is somewhat similar to:
|
|
-- ReaderT Config (ReaderT Anns (WriterT [LayoutError] (WriterT (Seq String) (State NodeAllocIndex))))
|
|
-- i.e. it is basically an environment allowing:
|
|
-- a) read access to global program config `Config` and the exactprint
|
|
-- annotations `Anns` of given input;
|
|
-- b) write access of errors and "good" output;
|
|
-- c) a local/"State" "variable" `NodeAllocIndex`
|
|
-- (yep, for the manual memoization node labels).
|
|
type ToBriDocM = MultiRWSS.MultiRWS '[Config, Anns] '[[LayoutError], Seq String] '[NodeAllocIndex]
|
|
~~~~
|
|
|
|
We don't use this directly, but the code below uses this,
|
|
and if the type `ToBriDocM` scared you, see how mundane it
|
|
is used here (`m` will be `ToBriDocM` mostly):
|
|
|
|
~~~~
|
|
allocNodeIndex :: MonadMultiState NodeAllocIndex m => m Int
|
|
allocNodeIndex = do
|
|
NodeAllocIndex i <- mGet
|
|
mSet $ NodeAllocIndex (i + 1)
|
|
return i
|
|
~~~~
|
|
|
|
## The `doc..` smart constructors
|
|
|
|
In most cases the smart constructors are fairly dumb: Their main purpose
|
|
is to allocate the unique label for the current node, and return it
|
|
together with the node itself. Lets look at two examples to get a
|
|
feeling for the types involved:
|
|
|
|
~~~~.hs
|
|
docEmpty :: ToBriDocM BriDocNumbered
|
|
docEmpty = allocateNode BDFEmpty -- what a "smart" constructor, right?
|
|
|
|
docSeq :: [ToBriDocM BriDocNumbered] -> ToBriDocM BriDocNumbered
|
|
docSeq l = allocateNode . BDFSeq =<< sequence l
|
|
-- this is a bit more elaborate: In order to allow proper
|
|
-- composition of these smart constructors, we accept a list of
|
|
-- actions instead of just `BriDocNumbered`s, and use `sequence`
|
|
-- to make it work. Nothing unusual otherwise.
|
|
~~~~
|
|
|
|
There is one rather special `doc..` function: `docSharedWrapper`.
|
|
Lets consider the code first:
|
|
|
|
~~~~.hs
|
|
docSharedWrapper :: Monad m => (x -> m y) -> x -> m (m y)
|
|
docSharedWrapper f x = return <$> f x
|
|
~~~~
|
|
|
|
How is this useful? Consider this: All the smart constructors
|
|
expect as input actions returning (freshly labeled) nodes.
|
|
But what if we want sharing? In those cases we do _not_ want
|
|
fresh labels on multiple uses. Here `docSharedWrapper` comes
|
|
into play: It executes the contained label-allocation once
|
|
and returns a pure action via `return`; this pure action
|
|
can then be passed e.g. to docSeq but does not do any new
|
|
allocation. This gives us sharing in the cases where we
|
|
want it.
|
|
|
|
But wait, one more thing: Not all `BriDoc` constructors have
|
|
an exactly matching smart constructor, and there are smart
|
|
constructors that involve multiple BriDoc constructors behind
|
|
the scenes. For this reason, we will focus on the smart
|
|
constructors in the following, because they define the
|
|
real interface to be used.
|
|
|
|
You now might have a glance at "bridoc-api.md"
|