* Replace org-mode’s verbatim from code to codeWith.
This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.
* Alter test for org-mode’s verbatim change.
See previous commit for further detail on the new implementation.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables. This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.
Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.
Closes#6996.
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.
Closes: #6977
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
The raw text is now included verbatim in the output. Previously is was parsed
into XML elements, which prevented the inclusion of partial XML snippets.
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
- An image alone in its paragraph (but not a figure) is now
rendered as an independent image, with an `alt` attribute
if a description is supplied.
- An inline image that is not alone in its paragraph will
be rendered, as before, using a substitution.
Such an image cannot have a "center", "left", or
"right" alignment, so the classes `align-center`,
`align-left`, or `align-right` are ignored.
However, `align-top`, `align-middle`, `align-bottom`
will generate a corresponding `align` attribute.
Closes#6948.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used. This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.
Closes#6938.
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.
Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.
See: #6916
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.
Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.
For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.
In order for numbers and reference text to be updated the generated
document must be refreshed.
Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Previously bold and italics didn't work properly in LTR
text. This commit causes the w:bCs and w:iCs attributes
to be used, in addition to w:b and w:i, for bold and
italics respectively.
Closes#6911.
Fix appearance of bullets/numbered lists (the first level is slightly
indented to the right instead of right on the margin).
New golden files have been tested using Word 2010 on Windows 10.
- `<tfoot>` elements are no longer added to the table body but used as
table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
elements are preserved.
We need it for checkboxes in todo lists, and maybe for
other things. In this location it seems compatible
with the cases that propmted #6469 and PR #6762.
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.
Closes#6890.
Babel defines "shorthands" for some languages, and these can
produce unexpected results. For example, in Spanish, `1.22`
gets rendered as `122`, and `et~al.` as `etal`.
One would think that babel's `shorthands=off` option (which
we were using) would disable these, but it doesn't. So we
remove `shorthands=off` and add some code that redefines
the shorthands macro. Eventually this will be fixed in babel,
I hope, and we can revert to something simpler.
Closes#6817, closes#6887.
The 3 native table test cases are normalized so that it will looks exactly like it is written by some pandoc writers.
Note that apart from white space normalization, it includes other normalization such as `[Str "Nordic countries"] to [Str "Nordic",Space,Str "countries"]`.
+ Remove the `\strut` that was added at the end of minipage
environments in cells.
+ Replace `\tabularnewline` with `\\ \addlinespace`.
Closes#6842, closes#6860.
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.
Closes#6869.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.
Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.
Closes#6863.
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.
We also skip blank lines like groff itself.
Closes#6858.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
Background: syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).
This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.
This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`. This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.
Closes#6796.
These changes restore the 20px font size while increasing readibility by
reducing line width. (The number of words per line is now similar to
that of pandoc's default LaTeX/PDF output.) With the narrower lines, we
also need less interline and interparagraph space, so the content
becomes more compact and skimmable:
- Change default font size back to 20px.
- Set font-size for print media to 12pt.
- Reduce interline space.
- Reduce interparagraph space.
- Reduce line width.
- Remove the special `line-height: 1` for table cells,
which I had suggested but which now seems a mistake.
- Remove the special line-height for pre.
- Ensure that there is a bit more space before a heading
than after.
- Slightly reduced space after title header.
- Fix margin before codeblock
- Add `monobackgroundcolor` variable, making the background color
and padding of code optional.
- Ensure that backgrounds from highlighting styles take precedence over
monobackgroundcolor
- Remove list markers from TOC
- Add margin-bottom where needed
- Remove italics from blockquote styling
- Change borders and spacing in tables to be more consistent with other
output formats
- Style h5, h6
- Decrease root font-size to 18px
- Update tests for styles.html changes
- Add CSS example to MANUAL
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`.
Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.
This gives better results. Note that normal citations are still
rendered in parentheses.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.
The default template has been modified to load the calc
package if tables are used.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This fixes a problem with author-in-text citations for references
including both an author and an editor. Previously, both were
included in the text, but only the author should be.
Closes#6765. Added a test.
This fixes the citation number issue with ieee.csl and other
styles that do not explicitly sort bibliographies. (Pandoc
was numbering them by their order in the bibliography file,
rather than the order cited, as required by the CSL spec.)
Closes#6741.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.
Closesjgm/citeproc#17.
For security reasons, some legal firms delete the date from comments and
tracked changes.
* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.
Closes#5948.
if they come before csl-right-inline. This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.
Fix that by starting such lists with an `\item[]`.
* Fix hlint suggestions, update hlint.yaml
Most suggestions were redundant brackets. Some required
LambdaCase.
The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
Screen readers read an image's `alt` attribute and the figure caption,
both of which come from the same source in pandoc. The figure caption is
hidden from screen readers with the `aria-hidden` attribute. This
improves accessibility.
For HTML4, where `aria-hidden` is not allowed, pandoc still uses an
empty `alt` attribute to avoid duplicate contents.
Closes: #6491
The reader now parses the contents of the markdown cell to a Pandoc
structure, but *also* stores the raw markdown in a `source`
attribute on the cell Div. When we convert back to markdown,
this attribute is stripped off and the original source is used.
When we convert to other formats, the attribute is usually
ignored (though it will come through in HTML as a `data-source`
attribute, not unhelpfully).
I'll note some potential drawbacks of this approach:
- It makes it impossible to use pandoc to clean up or
change the contents of markdown cells, e.g.
going from `+smart` to `-smart`.
- There may be formats where the addition of the `source`
attribute is problematic. I can't think of any, though.
Closes#5408.
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer
read as metadata, but kept as raw `org` blocks. This ensures that more
information is retained when round-tripping org-mode files;
additionally, this change makes it possible to support non-standard org
extensions via filters.
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has
changed: Org now allows multiple such lines and adds a space between the
contents of each line. Pandoc now always parses these settings as meta
inlines; setting values are no longer treated as comma-separated lists.
Note that a Lua filter can be used to restore the previous behavior.
`#+DESCRIPTION` lines are treated as text with markup. If multiple such
lines are given, then all lines are read and separated by soft
linebreaks.
Closes: #6485
The `tex` export option can be set with `#+OPTION: tex:nil` and allows
three settings:
- `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX,
- `nil` removes all LaTeX fragments from the document, and
- `verbatim` treats LaTeX as text.
The default is `t`.
Closes: #4070
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.
Package jira-wiki-markup must be version 1.3.2 or later.
Fixes: #6478
Closes#6385. (The summary element needs to be the first
child of details and should not be enclosed by p tags.)
NOTE: you need to include a blank line before the closing
`</details>`, if you want the last part of the content to
be parsed as a paragraph.
Some CSS to ensure that display math is
displayed centered and on a new line is now included
in the default HTML-based templates; this may be
overridden if the user wants a different behavior.
This change will not have any effect with the default style.
However, it enables users to use a style (via a reference.docx)
that turns on row and/or column bands.
Closes#6371.
This reverts a change in the last release; the Div is
no longer needed, because we can now put the id right in
the Table's attributes. However, writers may still need
to be modified to do something with the id in a Table
(e.g. create an anchor), so in the short term we may lose
the ability to link to tables in some writers.
The PandocError type is used throughout the Lua subsystem, all Lua
functions throw an exception of this type if an error occurs. The
`LuaException` type is removed and no longer exported from
`Text.Pandoc.Lua`. In its place, a new constructor `PandocLuaError` is
added to PandocError.
Now a cell with dimension (h, w) will be cut up into h*w cells of
dimension (1,1), all in the same grid position, with the upper-left
holding the original cell contents and the rest being empty.
The Builder.simpleTable now only adds a row to the TableHead when the
given header row is not null. This uncovered an inconsistency in the
readers: some would unconditionally emit a header filled with empty
cells, even if the header was not present. Now every reader has the
conditional behaviour. Only the XWiki writer depended on the header
row being always present; it now pads its head as necessary.
- Writers.Native is now adapted to the new Table type.
- Inline captions should now be conditionally wrapped in a Plain, not
a Para block.
- The toLegacyTable function now lives in Writers.Shared.
Icons are now converted as follows: `(/)` to ✔, `(x)` to ❌, `(!)` to
❗, `(+)` to ➕, `(-)` to ➖, `(off)` to 🌙, and `(*)` to ☆. The new
icons render well in most fonts. Furthermore, the UTF-8 characters all
fit into 4-bytes.
Closes: #6264
- SmallCaps instead of Span for the part after the initial capital.
- Ensure that both arguments are parsed, so that in Markdown both
are treated as raw LateX. (Closes #6258.)
Nested syntaxes are specified like this:
{{{sql
SELECT * FROM table
}}}
The preformatted code block parser has been extended to check if the
first attribute of the block is not a `key=value` pair, and in that case
it will be considered as a class.
Closes#6256.
Jira text which is marked as `+inserted+` is converted into pandoc's
default representation for underlined text: a span with class
`underline`. Previously, the span was marked with the non-standard class
`inserted`.
Closes: #6237
Jira images attributes as in `!image.jpg|align=right!` are retained as
key-value pairs. Thumbnail images, such as `!example.gif|thumbnail!`,
are marked by a `thumbnail` class in their attributes.
Related to #6234.
* Simplify by collapsing a do block into a single <$>
* Remove an unnecessary variable: `all` takes any Foldable, so only blocksToInlines needs toList.
A bug was fixed which caused faulty parsing if a table was not preceded
by a newline and the first table cell had no space after the initial `|`
characters.
Fixes: #6198
A bug was fixed which caused non-emphasized text containing digits and/or
non-special symbols (like dots) to sometimes be parsed incorrectly.
Fixes: #6196
* Support for colored inlines has been added.
* Lists are now allowed to be indented; i.e., lists are still recognized
if list markers are preceded by spaces.
Closes: #6183, #6184
This reverts commit 3493d6afaa.
This might be worth considering in the future, but let's not do
it yet...the additional complexity needs a better justification.
This is experimental. Normally metadata values are interpreted
as markdown, but if the !!literal tag is used they will be interpreted
as plain strings.
We need to consider whether this can still be implemented if
we switch back from HsYAML to yaml for performance reasons.
Closes#5849. Previously biblatex citations were only grouped if
there was no prefix. This patch allows them to be grouped in
subgroups split by prefixes and suffixes, which allows better citation
sorting.
New formats:
- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."
The "jats" output format is now an alias for "jats_archiving".
Closes: #6014
* Use concatMap instead of reimplementing it
* Replace an unnecessary multi-way if with a regular if
* Use sortOn instead of sortBy and comparing
* Use guards instead of lots of indents for if and else
* Remove redundant do blocks
* Extract common functions from both branches of maybe
Whenever both the Nothing and the Just branch of maybe do the same
function, do that function on the result of maybe instead.
* Use fmap instead of reimplementing it from maybe
* Use negative forms instead of negating the positive forms
* Use mapMaybe instead of mapping and then using catMaybes
* Use zipWith instead of mapping over the result of zip
* Use unwords instead of reimplementing it
* Use <$ instead of <$> and const
* Replace case of Bool with if and else
* Use find instead of listToMaybe and filter
* Use zipWithM instead of mapM and zip
* Inline lambda wrappers into the real functions
* We get zipWithM from Text.Pandoc.Writers.Shared
* Use maybe instead of fromMaybe and fmap
I'm not sure how this one slipped past me.
* Increase a bit of indentation
Lists of Inline and Block elements can now be filtered via `Inlines` and
`Blocks` functions, respectively. This is helpful if a filter conversion
depends on the order of elements rather than a single element.
For example, the following filter can be used to remove all spaces
before a citation:
function isSpaceBeforeCite (spc, cite)
return spc and spc.t == 'Space'
and cite and cite.t == 'Cite'
end
function Inlines (inlines)
for i = #inlines-1,1,-1 do
if isSpaceBeforeCite(inlines[i], inlines[i+1]) then
inlines:remove(i)
end
end
return inlines
end
Closes: #6038
The functions `table.insert`, `table.remove`, and `table.sort` are added
to pandoc.List elements. They can be used as methods, e.g.
local numbers = pandoc.List {2, 3, 1}
numbers:sort() -- numbers is now {1, 2, 3}
If parsing fails in a raw environment (e.g. due to special
characters like unescaped `_`), try again as a verbatim
environment, which is less sensitive to special characters.
This allows us to capture special environments that change
catcodes as raw tex when `-f latex+raw_tex` is used.
Closes#6034.
The fix to #6030 actually changed behavior, so that the
2D nesting occurred at slide level N-1 and N, instead of
at the top-level section. This commit restores the 2.7.3 behavior.
If there are more than 2 levels, the top level is horizontal
and the rest are collapsed to vertical.
Closes#6032.
The current DTD for the JATS writer template is for Journal Publishing (JATS-journalpublishing1.dtd), which does not permit ext-link as a valid child (https://jats.nlm.nih.gov/publishing/tag-library/1.1/element/publisher-name.html).
This update modifies the default output template to be the less restrictive JATS archiving and interchange DTD which systems like PubMed use internally to represent their articles.
Pandoc's AST is translated into the Jira AST, which is then rendered by
the dedicated Jira printer.
The following improvements are included in this change:
- non-jira raw blocks are fully discarded instead of showing as blank
lines;
- table cells can contain multiple blocks;
- unnecessary blank lines are removed from the output;
- markup chars within words are properly surrounded by braces;
- preserving soft linebreaks via `--wrap=preserve` is supported.
Note that backslashes are rendered as HTML entities, as there appears no
alternative to produce a plain backslash if it is followed by markup.
This may cause problems when used with confluence, where rendering seems
to fail in this case.
Closes: #5926
Added `\setupinterlinespace` to `title`, `subtitle`, `date` and `author` elements.
Otherwise longer titles that run over multiple lines will look squashed as
`\tfd` etc. won't adapt the line spacing to the font size.
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.
Now, there is an asymmetry in positive and negative heading
level shifts:
+ With positive shifts, the metadata title stays the same and
does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
metadata title.
I think this is a desirable combination of features, despite
the asymmetry. One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).
Closes#5957.
Revises #5615.
Backslash-escaping is used instead of HTML entities, as escaped
characters are easier to read this way. Furthermore, Confluence, which
seems to use a subset of Jira markup, seems to get confused by HTML
entities.
Previously they appeared centered at the top of the page;
now we put them centered at the bottom, unless the `pagenumbering`
variable is set (this gives users full control over page
number format and position,
https://wiki.contextgarden.net/Command/setuppagenumbering)
All headings now have a uniform color.
Level-1 headings no longer set `w:themeShade="B5"`.
Level-2 headings are now 14 point rather than 16 point.
Level-3 headings are now 12 point rather than 14 point.
Level-4 headings are italic rather than bold.
Closes#5820.
If a simple table would be too wide, we use a grid table.
The code for generating grid tables has been adjusted to
give more intelligent column widths when widths aren't
given. (This also affects the markdown writer.)
Closes#5899.
`Nothing` means: nothing specified.
`Just []` means: an empty list specified (e.g. in defaults).
Potentially these could lead to different behavior: see #5888.
The check whether a complex inline element following a string must be
escaped, now depends on the last character of the string instead of the
first.
Fixes: #5906
PR #5884.
+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
`manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
`mantyUntilChar` have been added: these are like their
unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
to be intended as an interface to Glob, which uses strings.
It seems to be used only once in the package, in the EPUB writer,
so that is not hard to change.
Superscripts and subscripts cannot contain spaces,
but newlines were previously allowed (unintentionally).
This led to bad interactions in some cases with footnotes.
E.g.
```
foo^[note]
bar^[note]
```
With this change newlines are also not allowed inside
super/subscripts.
Closes#5878.
* Add HTML Reader support for `<dfn>`, parsing this as a Span with class `dfn`.
* Change `htmlSpanLikeElements` implementation to retain classes,
attributes and inline content.
Previously, if a document contained two YAML metadata blocks
that set the same field, the conflict would be resolved in favor
of the first. Now it is resolved in favor of the second (due to
a change in pandoc-types).
This makes the behavior more uniform with other things in pandoc
(such as reference links and `--metadata-file`).
The first list item of a sublist should not resume numbering
from the number of the last sublist item of the same level,
if that sublist was a sublist of a different list item.
That is, we should not get:
```
1. one
1. sub one
2. sub two
2. two
3. sub one
```
* Set dbBook to true when traversing a chapter too.
Currently, a `<title/>` in a chapter and in a `<section/>` below that
chapter have the same level if they're not inside a `<book/>`.
This can happen in a multi-file book project. Also see the example at
https://tdg.docbook.org/tdg/4.5/chapter.html
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
* Add docbook-chapter test
This tests nested `<section/>` and makes sure `<title/>` in the first
`<section/>` below `<chapter/>` is one level deeper than the `<chapter/>`'s
`<title/>`, also when not inside a `<book/>`.
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
The new version of doctemplates adds many features to pandoc's
templating system, while remaining backwards-compatible.
New features include partials and filters. Using template filters,
one can lay out data in enumerated lists and tables.
Templates are now layout-sensitive: so, for example, if a
text with soft line breaks is interpolated near the end of
a line, the text will break and wrap naturally. This makes
the templating system much more suitable for programatically
generating markdown or other plain-text files from metadata.
This package is needed for proper handling of image filenames
containing periods (in addition to the period before the
extension).
Unfortunately, grffile breaks in the latest texlive update.
Until a fix is released (see ho-tex/oberdiek#73) it seems best
to remove this from the default template.
This may cause problems if you have filenames with periods.
The workaround is to put `\usepackage{grffile}` in header-includes,
and be sure you're using an older version of texlive packages.
See #5848. We will leave that issue open to remind us to
check upstream, and restore grffile when it's possible to
do so.
When a div surrounds multiple sections at the same level,
or a section of highre level followed by one of lower level,
then we just leave it as a div and create a new div for the
section.
Closes#5846, closes#5761.
Comment lines in Org-mode can be completely empty; both of these line
should produce no output:
# a comment
#
The reader used to produce a wrong result for the latter, but ignores
that line as well now.
Fixes: #5856
* HTML reader: Handle cite attribute for quotes. If a `<q>` tag has a `cite` attribute, we interpret it as a Quoted element with an inner Span. Closes#5798
* Refactor url canonicalization into a helper function
* Modify HTML writer to handle quote with cite.
[0]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/q
Parse <mark> elements from HTML as HTML span like elements, with a
single class matching the tag name `mark`. Mark elements are rendered to
HTML using the native <mark> element.
Fixes https://github.com/jgm/pandoc/issues/5797.
* Text.Pandoc.Shared: export `htmlSpanLikeElements` [API change]
This commit also introduces a mapping of HTML span like elements that
are internally represented as a Span with a single class, but that are
converted back to the original element by the html writer. As of now,
only the kbd element is handled this way. Ideally these elements should
be handled as plain AST values, but since that would be a breaking
change with a large impact, we revert to this stop-gap solution.
Fixes https://github.com/jgm/pandoc/issues/5796.
We change to use 0.5pt rather than `\linethickness`, which
apparently only ever worked "by accident" and no longer works
with recent updates to texlive.
Closes#5801.
...even when we must lose width information.
All in all this seems to be people's preferred behavior, even though it
is slightly lossier.
Closes#2608.
Closes#4497.
on conflicting fields. This changes earlier behavior (but not in
a release), where first took precedence.
Note that this may seem inconsistent with the behavior of
multiple YAML blocks within a document, where the first takes
precedence. Still, it is convenient to be able to override
defaults with options later on the command line.
If this is present on a heading with the 'unnumbered' class,
the heading won't appear in the TOC. This class has no
effect if 'unnumbered' is not also specified.
This affects HTML-based writers (including slide shows
and epub), LateX (including beamer), RTF, and PowerPoint.
Other writers do not yet support `unlisted`.
Closes#1762.
Motivation: in a man page there's not much use for relative URLs,
which you can't follow. Absolute URLs are still useful. We previously
suppressed relative URLs starting with '#' (purely internal links),
but it makes sense to go a bit farther.
Closes#5770.
We were requiring consistent indentation, but this
isn't required by RST, as long as each nonblank
line of the block has *some* indentation.
Closes#5753.
Previously we used the following Project Gutenberg conventions
for plain output:
- extra space before and after level 1 and 2 headings
- all-caps for strong emphasis `LIKE THIS`
- underscores surrounding regular emphasis `_like this_`
This commit makes `plain` output plainer. Strong and Emph
inlines are rendered without special formatting. Headings
are also rendered without special formatting, and with only
one blank line following.
To restore the former behavior, use `-t plain+gutenberg`.
API change: Add `Ext_gutenberg` constructor to `Extension`.
See #5741.
Notice this commit updates lists.docx. The old test file contained
references to "ListParagraph" style, which should never leak
outside of pandoc, so I'm not sure what that was supposed to test
for exactly.
Motivating issues: #5523, #5052, #5074
Style name comparisons are case-insensitive, since those are
case-insensitive in Word.
w:styleId will be used as style name if w:name is missing (this should
only happen for malformed docx and is kept as a fallback to avoid
failing altogether on malformed documents)
Block quote detection code moved from Docx.Parser to Readers.Docx
Code styles, i.e. "Source Code" and "Verbatim Char" now honor style
inheritance
Docx Reader now honours "Compact" style (used in Pandoc-generated docx).
The side-effect is that "Compact" style no longer shows up in
docx+styles output. Styles inherited from "Compact" will still
show up.
Removed obsolete list-item style from divsToKeep. That didn't
really do anything for a while now.
Add newtypes to differentiate between style names, ids, and
different style types (that is, paragraph and character styles)
Since docx style names can have spaces in them, and pandoc-markdown
classes can't, anywhere when style name is used as a class name,
spaces are replaced with ASCII dashes `-`.
Get rid of extraneous intermediate types, carrying styleId information.
Instead, styleId is saved with other style data.
Use RunStyle for inline style definitions only (lacking styleId and styleName);
for Character Styles use CharStyle type (which is basicaly RunStyle with styleId
and StyleName bolted onto it).
This commit prevents custom styles on divs and spans from overriding
styles on certain elements inside them, like headings, blockquotes,
and links. On those elements, the "native" style is required for the
element to display correctly. This change also allows nesting of
custom styles; in order to do so, it removes the default "Compact"
style applied to Plain blocks, except when inside a table.
Attr values can now be given as normal Lua tables; this can be used as a
convenient alternative to define Attr values, instead of constructing
values with `pandoc.Attr`. Identifiers are taken from the *id* field,
classes must be given as space separated words in the *class* field. All
remaining fields are included as misc attributes.
With this change, the following lines now create equal elements:
pandoc.Span('test', {id = 'test', class = 'a b', check = 1})
pandoc.Span('test', pandoc.Attr('test', {'a','b'}, {check = 1}))
This also works when using the *attr* setter:
local span = pandoc.Span 'text'
span.attr = {id = 'test', class = 'a b', check = 1}
Furthermore, the *attributes* field of AST elements can now be a plain
key-value table even when using the `attributes` accessor:
local span = pandoc.Span 'test'
span.attributes = {check = 1} -- works as expected now
Closes: #5744
Deprecate --base-heading-level.
The new option does everything the old one does, but also
allows negative shifts. It also promotes the document
metadata (if not null) to a level-1 heading with a +1 shift,
and demotes an initial level-1 heading to document metadata
with a -1 shift. This supports converting documents that
use an initial level-1 heading for the document title.
Closes#5615.
* Org reader: allow the `-i` switch to ignore leading spaces.
* Org reader: handle awkwardly-aligned code blocks within lists.
Code blocks in Org lists must have their #+BEGIN_ aligned in a
reasonable way, but their other components can be positioned otherwise.
Text.Pandoc.Shared:
+ Remove `Element` type [API change]
+ Remove `makeHierarchicalize` [API change]
+ Add `makeSections` [API change]
+ Export `deLink` [API change]
Now that we have Divs, we can use them to represent the structure
of sections, and we don't need a special Element type.
`makeSections` reorganizes a block list, adding Divs with
class `section` around sections, and adding numbering
if needed.
This change also fixes some longstanding issues recognizing
section structure when the document contains Divs.
Closes#3057, see also #997.
All writers have been changed to use `makeSections`.
Note that in the process we have reverted the change
c1d058aeb1
made in response to #5168, which I'm not completely
sure was a good idea.
Lua modules have also been adjusted accordingly.
Existing lua filters that use `hierarchicalize` will
need to be rewritten to use `make_sections`.
Revert "hierarchicalize: ensure that sections get ids..."
This reverts commit 212406a61d.
Revert "Improve detection of headings in Divs by hierarchicalize."
This reverts commit 6e2cfd6c97.
Revert "Shared.hierarchicalize: improve handling of div and section structure."
This reverts commit 345b33762e.
The structure
```
<h1>one</h1>
<div>
<h1>two</h1>
</div>
```
should create two coordinate sections, not a section with
a subsection. Now it does.
Extends #3057.
Previously Divs were opaque to hierarchicalize, so headings
inside divs didn't get into the table of contents, for
example (#3057).
Now hierarchicalize treats Divs as sections when appropriate.
For example, these structures both yield a section and a
subsection:
``` html
<div>
<h1>one</h1>
<div>
<h2>two</h2>
</div>
</div>
```
``` html
<div>
<h1>one</h1>
<div>
<h1>two</h1>
</div>
</div>
```
Note that
``` html
<h1>one</h1>
<div>
<h2>two</h2>
</div>
<h1>three</h1>
```
gets parsed as the structure
one
two
three
which may not always be desirable.
Closes#3057.
Previously we used named character references with html5 output.
But these aren't valid XML, and we aim to produce html5 that is
also valid XHTML (polyglot markup). (This is also needed for
epub3.)
Closes#5718.
+ Remove Text.Pandoc.Pretty; use doclayout instead. [API change]
+ Text.Pandoc.Writers.Shared: remove metaToJSON, metaToJSON'
[API change].
+ Text.Pandoc.Writers.Shared: modify `addVariablesToContext`,
`defField`, `setField`, `getField`, `resetField` to work with
Context rather than JSON values. [API change]
+ Text.Pandoc.Writers.Shared: export new function `endsWithPlain` [API
change].
+ Use new templates and doclayout in writers.
+ Use Doc-based templates in all writers.
+ Adjust three tests for minor template rendering differences.
+ Added indentation to body in docbook4, docbook5 templates.
The main impact of this change is better reflowing of content
interpolated into templates. Previously, interpolated variables
were rendered independently and intepolated as strings, which could lead
to overly long lines. Now the templates interpolated as Doc values
which may include breaking spaces, and reflowing occurs
after template interpolation rather than before.
Changed optMetadataFile from `Maybe FilePath` to `[FilePath]`. This allows
for multiple YAML metadata files to be added. The new default value has
been changed from `Nothing` to `[]`.
To account for this change in `Text.Pandoc.App`, `metaDataFromFile` now
operates on two `mapM` calls (for `readFileLazy` and `yamlToMeta`) and a fold.
Added a test (command/5700.md) which tests this functionality and
updated MANUAL.txt, as per the contributing guidelines.
With the current behavior, using `foldr1 (<>)`, values within files
specified first will be used over those in later files. (If the reverse
of this behavior would be preferred, it should be fixed by changing
foldr1 to foldl1.)
Traversal methods are updated to use the new Walk module such that
sequences with nested Inline (or Block) elements are traversed in the
order in which they appear in the linearized document.
Fixes: #5667
the token string is modified by a parser (e.g. accent when
it only takes part of a Word token).
Closes#5686. Still not ideal, because we get the whole
`\t0BAR` and not just `\t0` as a raw latex inline command.
But I'm willing to let this be an edge case, since you
can easily work around this by inserting a space, braces,
or raw attribute. The important thing is that we no longer
drop the rest of the document after a raw latex inline
command that gobbles only part of a Word token!
* Require recent doctemplates. It is more flexible and
supports partials.
* Changed type of writerTemplate to Maybe Template instead
of Maybe String.
* Remove code from the LaTeX, Docbook, and JATS writers that looked in
the template for strings to determine whether it is a book or an
article, or whether csquotes is used. This was always kludgy and
unreliable. To use csquotes for LaTeX, set `csquotes` in your
variables or metadata. It is no longer sufficient to put
`\usepackage{csquotes}` in your template or header includes.
To specify a book style, use the `documentclass` variable or
`--top-level-division`.
* Change template code to use new API for doctemplates.
The web service passed in to `--webtex` may render formulas using inline
or display style by default. Prefixing formulas with the appropriate
command ensures they are rendered correctly.
This is a followup to the discussion in #5656.
Previously we just omitted these. Now we render them
using `\hfill\break` instead of `\\`. This is a revision
of a PR by @sabine (#5591) who should be credited with the
idea.
Closes#3324.
Workaround for Word Online shortcomming. Fixes#5645
Also, make list para properties go first.
This reordering of properties shouldn't be necessary but
it seems Word Online does not understand the docx correctly otherwise.
The `raw_attribute` will be used to mark raw bits, even HTML
and LaTeX, and even when `raw_html` and `raw_tex` are enabled,
as they are by default.
To get the old behavior, disable `raw_attribute` in the writer.
Closes#4311.
* Let the user choose type of PDF/A generated with ConTeXt (closes#5608)
* Updated ConTeXt test documents for changes in tagging
* Updated color profile settings in accordance with ConTeXt wiki
* Made ICC profile and output intent for PDF/A customizable
* Read pdfa variable from meta (and updated manual)
Now we boldface code but not other things. This matches the
most common style in man pages (particularly option lists).
Also, remove a regression in the last commit in which 'nowrap'
was removed.