Previously some displayed formulas would be floated above
a preceding text line. This is fixed by setting vertical-rel
to 'text' rather than 'paragraph-content'.
Closes#7777.
- With `--wrap=none`, we now output line breaks between
block-level elements. Previously they were omitted
entirely, so the whole document was on one line, unless
there were literal line breaks in pre sections. This makes
the HTML writer's behavior more consistent with that of
other writers.
- Put newline after `<dd>`.
- Put newlines after block-level elements in footnote section.
Previously the HTML writer was exceptional in not being
sensitive to the `--wrap` option. With this change `--wrap`
now works for HTML. The default (as with other formats) is
automatic wrapping to 72 columns.
A new internal module, T.P.Writers.Blaze, exports `layoutMarkup`.
This converts a blaze Html structure into a doclayout Doc Text.
In addition, we now add a line break between an `img` tag
and the associated `figcaption`.
Note: Output is never wrapped in `writeHtmlStringForEPUB`.
This accords with previous behavior since previously the HTML
writer was insensitive to `--wrap` settings. There's no real
need to wrap HTML inside a zipped container.
Note that the contents of script, textarea, and pre tags are
always laid out with the `flush` combinator, so that unwanted
spaces won't be introduced if these occur in an indented context
in a template.
Closes#7764.
The function behaves like the default `type` function from Lua's
standard library, but is aware of pandoc userdata types. A typical
use-case would be to determine the type of a metadata value.
Markua is a markdown variant used by Leanpub.
More information about Markua can be found at https://leanpub.com/markua/read.
Adds a new exported function `writeMarkua` from T.P.Writers.Markdown.
[API change]
Closes#1871.
Co-authored by Tim Wisotzki and Samuel Lemmenmeier.
Previously, both `fmt == f` case and Image have a rank of 1.
In the end, e.g. from ipynb to html conversion,
if both html and image exists, it actually prefers the image.
This commit changes this, so that fmt == f is always highest rank,
and rank never collides.
This is achieved by keeping fmt == f case having rank 1,
and every other rank increased by 1.
Property tests that roundtrip elements through the Lua stack are
performed in the test-suite of the pandoc-lua-marshal package. No need
to test this here as well.
Write RawBlock of markdown in code-cell output.
#7561 makes the ipynb reader reads code-cell output with mime
"text/markdown" to a RawBlock of markdown
This commit makes the ipynb writer writes this RawBlock of markdown
back inside a code-cell output with the same mime, preserving this
information in round-trip
Add tests of ipynb reader (#7561) and ipynb writer (#7563)'s ability to
handle a "text/markdown" mime type in a code-cell output
- `walk` methods are added to `Block` and `Inline` values; the methods
are similar to `pandoc.utils.walk_block` and
`pandoc.utils.walk_inline`, but apply to filter also to the element
itself, and therefore return a list of element instead of a single
element.
- Functions of name `Doc` are no longer accepted as alternatives for
`Pandoc` filter functions. This functionality was undocumented.
The new `pandoc.Inlines` function behaves identical on string input, but
allows other Inlines-like arguments as well.
The `pandoc.utils.text` function could be written as
function pandoc.utils.text (x)
assert(type(x) == 'string')
return pandoc.Inlines(x)
end
The marshaling functions for pandoc's AST are extracted into a separate
package. The package comes with a number of changes:
- Pandoc's List module was rewritten in C, thereby improving error
messages.
- Lists of `Block` and `Inline` elements are marshaled using the new
list types `Blocks` and `Inlines`, respectively. These types
currently behave identical to the generic List type, but give better
error messages. This also opens up the possibility of adding
element-specific methods to these lists in the future.
- Elements of type `MetaValue` are no longer pushed as values which
have `.t` and `.tag` properties. This was already true for
`MetaString` and `MetaBool` values, which are still marshaled as Lua
strings and booleans, respectively. Affected values:
+ `MetaBlocks` values are marshaled as a `Blocks` list;
+ `MetaInlines` values are marshaled as a `Inlines` list;
+ `MetaList` values are marshaled as a generic pandoc `List`s.
+ `MetaMap` values are marshaled as plain tables and no longer
given any metatable.
- The test suite for marshaled objects and their constructors has
been extended and improved.
- A bug in Citation objects, where setting a citation's suffix
modified it's prefix, has been fixed.
We were including the ams environment type in addition
to the number. This is proper behavior for `\cref` but
not for `\ref`. To support `\cref` we need to store
the environment label separately.
Fixed calculation of maximum column widths in pipe tables.
It is now based on the length of the markdown line, rather
than a "stringified" version of the parsed line. This should
be more predictable for users. In addition, we take into account
double-wide characters such as emojis.
Closes#7713.
The function converts a string to `Inlines`, treating interword spaces
as `Space`s or `SoftBreak`s. If you want a `Str` with literal spaces,
use `pandoc.Str`.
Closes: #7709
Using a Lua string where a list of inlines is expected will cause the
string to be split into words, replacing spaces and tabs into
`pandoc.Space()` elements and newlines into `pandoc.SoftBreak()`.
The previous behavior was to treat the string `s` as `{pandoc.Str(s)}`.
The old behavior can be recovered by wrapping the string into a table
`{s}`.
We need to generate a span when the header's ID doesn't match
the one MediaWiki would generate automatically. But MediaWiki's
generation scheme is different from ours (it uses uppercase letters,
and `_` instead of `-`, for example).
This means that in going from markdown -> mediawiki, we'll now get
spans before almost every heading, unless explicit identifiers are
used that correspond to the ones MediaWiki auto-generates.
This is uglier output but it's necessary for internal links to
work properly.
See #7697.
The `lpeg` and `re` modules are loaded into globals of the respective
name, but they are not necessarily registered as loaded packages. This
ensures that
- the built-in library versions are preferred when setting the globals,
- a shared library is used if pandoc has been compiled without `lpeg`,
and
- the `require` mechanism can be used to load the shared library if
available, falling back to the internal version if possible and
necessary.
Reader options can now be passed as an optional third argument to
`pandoc.read`. The object can either be a table or a ReaderOptions value
like `PANDOC_READER_OPTIONS`. Creating new ReaderOptions objects is
possible through the new constructor `pandoc.ReaderOptions`.
Closes: #7656
* Support for <indexterm>s when reading DocBook
* Update implementation status of `<n-ary>` tags
* Remove non-idiomatic parentheses
* More complete `<indexterm>` support, with tests
Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
Comparisons of Citation values are performed in Haskell; values are
equal if they represent the same Haskell value. Converting a Citation
value to a string now yields its native Haskell string representation.
Reasons:
- Performance: HsYAML is around 20 times slower in parsing
large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
any attention. HsYAML seems borderline unmaintained; it hasn't
had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
from C dependencies (#4535). But I don't see a better alternative
until a better pure Haskell parser is available.
Closes#6084.
Notes:
- We've removed the FromYAML instances for all types that had
them, since this is a HsYAML-specific typeclass [API change].
(The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
Users may need to quote these when they are meant to be
interpreted as strings. Similarly, 'null' is parsed as
a YAML null value (and will be treated as an empty string
by pandoc rather than the string 'null'). Quoting it will
force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
escaping errors: instead of just falling back on treating
the section as a table, it raises a YAML parsing error.
Properties of Block values are marshalled lazily, which generally
improves performance considerably. Script users may also notice the
following differences:
- Block element properties can no longer be accessed by numerical
indexing of the `.c` field. The `.c` property now serves as an alias
for `.content`, so some filter that used this undocumented method
for property access may continue to work, while others will need to
be updated and use proper property names.
- The marshalled Block elements now have a `show` method, and a
`__tostring` metamethod. Both return the Haskell string
representation of the element.
- Block values now have the Lua type `userdata` instead of `table`.
- Adds a new `pandoc.AttributeList()` constructor, which creates the
associative attribute list that is used as the third component of
`Attr` values. Values of this type can often be passed to constructors
instead of `Attr` values.
- `AttributeList` values can no longer be indexed numerically.
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.
Furthermore, new abstractions allow to document the code and exposed
functions.
Previously pandoc would parse
[link to (@a)](url)
as a citation; similarly
[(@a)]{#ident}
This is undesirable. One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.
Closes#7632.
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.
To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
This commit changes the `marL` and `indent` values used for plain
paragraphs and numbered lists, and changes the spacing defined in the
reference doc master for bulleted lists.
For paragraphs, there is now a left-indent taken from the `otherStyle`
in the master. For numbered lists, the number is positioned where the
text would be if this were a plain paragraph, and the text is indented
to the next level. This means that continuation paragraphs line up
nicely with numbered lists.
It also /mostly/ matches the observed PowerPoint behaviour when
inserting paragraphs and numbered lists: the only difference is that
PowerPoint was using a different margin value for the first level
numbered lists – I’ve changed this to match the other levels, as I don’t
think it makes the spacing unappealing and it allows continuation
paragraphs at any level to line up.
With bulleted lists, I’m keeping the observed PowerPoint behaviour of
specifying only a level, letting `marL` and `indent` be automatically
taken from `bodyStyle`. To that end, this commit changes the `bodyStyle`
spacing in the master of the default reference doc, to:
- line up the text of the first paragraph in each bullet with any
continuation paragraphs
- line up nested bullet markers in any continuation paragraphs with the
first paragraph, matching lists and plain paragraphs
This does mean the continuation paragraphs still won’t line up for
anyone using their own reference doc where they haven’t matched the
`otherStyle` and `bodyStyle` indent levels, but I think people in that
situation will be able to troubleshoot.
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.
At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.
This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.
- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
This fixes a regression in #7604, which modernized
babel usage but omitted to load babel for pdflatex,
with the result that even simple documents could no
longer be produced.
Closes#7627.
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:
```
[source%linesnum,haskell]
----
some Haskell code here
----
```
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
When I added the tests for moved layouts and deleted layouts, I added
them to all tests. However, this doesn’t really give a lot more info
than having single tests, and the extra tests take up time and disk
space.
This commit removes the moved-layouts and deleted-layouts tests, in
favour of a single test for each of those scenarios.