This defines a typeclass `Reducible` which allows us to "reduce" pandoc
Inlines and Blocks, like so
Emph [Strong [Str "foo", Space]] <++> Strong [Emph [Str "bar"]], Str
"baz"] =
[Strong [Emph [Str "foo", Space, Str "bar"], Space, Str "baz"]]
So adjacent formattings and strings are appropriately grouped.
Another set of operators for `(Reducible a) => (Many a)` are also
included.
The normalizing tests revealed a problem with unformatted spaces, brought about
by `spanTrim`. This fixes by not trimming the spaces out of spans until they
are in their final form.
There were some problems with the old str normalization. This fixes those
problems. Also, since it drills down on its own, it only needs to be
mapped over the blocks, not walked over the tree.
`<span style="font-variant:small-caps;">foo</span>` will be
parsed as a `SmallCaps` inline, and will work in all output
formats that support small caps.
Closes#1360.
When the `hard_line_breaks` option was specified, pandoc would
produce a spurious line break after a tight list item. This
patch solves the problem. Closes#1137.
`normalize` from Text.Pandoc.Shared is more general. In tests, though,
it more than doubles the run time. `strNormalize` does less, but it does
what we need. This comment is added for future maintainability.
Use a function `stripSpaces`, instead of recursion. Makes it a bit
easier to read and mantain, and simplify normalizing DefinitionList,
which was left out the first time.
This brings pandoc's rendering of haddock markup in line
with the new haddock.
Note that we preserve line breaks in `@` code blocks, unlike
the earlier version.
Modified tests pass. More tests would be good.
Closes#1236.
Note, this is a bit of a kludge, to work around the fact that xml-light
doesn't parse `<?asciidoc-br?>` correctly. We preprocess the input,
replacing that instruction with `<br/>`, and then parse that as a line
break. Other XML instructions are simply removed from the input stream.
Closes#1345. Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.
I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc. The previous behavior resulted in the
text not appearing in any output format.
`\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]`
so that we don't get things like `* hi *` in markdown output.
Also applies to textbf and some other constructions.
Closes#1146. (`--normalize` isn't touched by this, but
normalization should not generally be necessary with the
changes to the readers.)
This change rewrites `inlineLaTeXCommand` so that parsec will
know when input is being consumed. Previously a run-time
error would be produced with some input involving raw latex.
(I believe this does not affect the last release, as the inline
latex reading was added recently.)