This is just a small improvement in terms of performance,
but it's simpler and more direct code.
Also, we avoid parsing interparagraph spaces in balanced
brackets, as the original did.
This ensures that when `SOURCE_DATE_EPOCH` is set, the
modification times of files taken from the reference.docx will
be set deterministically, allowing for reproducible builds.
Closes#7654.
Comparisons of Citation values are performed in Haskell; values are
equal if they represent the same Haskell value. Converting a Citation
value to a string now yields its native Haskell string representation.
E.g. "Y", "yes", which are now (with yaml library) considered
boolean values, as well as "null".
This fixes a bug with roundtripping markdown -> markdown:
```
---
foo: "true"
...
```
- For LineEnding use lowercase constructors, e.g. `crlf`, `native`.
This was the original intent, but there was a bug in the
implementation.
- For HTMLSlideVariant use lowercase constructors.
- For ReaderOptions use e.g. `default-image-extension`
instead of `readerDefaultImageExtension` for field names.
- For Extension, use e.g. `tex_math_dollars` instead of
`Ext_tex_math_dollars` as constructor.
- For Extensions, use an array of Extensions, instead of
an object wrapping the tag `Extensions` and an integer.
(The representation is not supposed to be part of the
public API.)
- For Opt, use field names like `tab-stop` instead of `optTabStop`.
Reasons:
- Performance: HsYAML is around 20 times slower in parsing
large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
any attention. HsYAML seems borderline unmaintained; it hasn't
had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
from C dependencies (#4535). But I don't see a better alternative
until a better pure Haskell parser is available.
Closes#6084.
Notes:
- We've removed the FromYAML instances for all types that had
them, since this is a HsYAML-specific typeclass [API change].
(The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
Users may need to quote these when they are meant to be
interpreted as strings. Similarly, 'null' is parsed as
a YAML null value (and will be treated as an empty string
by pandoc rather than the string 'null'). Quoting it will
force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
escaping errors: instead of just falling back on treating
the section as a table, it raises a YAML parsing error.
Properties of Block values are marshalled lazily, which generally
improves performance considerably. Script users may also notice the
following differences:
- Block element properties can no longer be accessed by numerical
indexing of the `.c` field. The `.c` property now serves as an alias
for `.content`, so some filter that used this undocumented method
for property access may continue to work, while others will need to
be updated and use proper property names.
- The marshalled Block elements now have a `show` method, and a
`__tostring` metamethod. Both return the Haskell string
representation of the element.
- Block values now have the Lua type `userdata` instead of `table`.
This includes the following user-facing changes:
- Deprecated inline constructors are removed. These are `DoubleQuoted`,
`SingleQuoted`, `DisplayMath`, and `InlineMath`.
- Attr values are no longer normalized when assigned to an Inline
element property.
- It's no longer possible to access parts of Inline elements via
numerical indexes. E.g., `pandoc.Span('test')[2]` used to give
`pandoc.Str 'test'`, but yields `nil` now. This was undocumented
behavior not intended to be used in user scripts. Use named properties
instead.
- Accessing `.c` to get a JSON-like tuple of all components no longer
works. This was undocumented behavior.
- Only known properties can be set on an element value. Trying to set a
different property will now raise an error.
- Adds a new `pandoc.AttributeList()` constructor, which creates the
associative attribute list that is used as the third component of
`Attr` values. Values of this type can often be passed to constructors
instead of `Attr` values.
- `AttributeList` values can no longer be indexed numerically.
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.
Furthermore, new abstractions allow to document the code and exposed
functions.
All the code we needed to put most styles and scripts into
inline style and script tags was there, but because of the
order of pattern matching, it was never being called.
Putting the catch-all clause at the end fixes the bug.
Closes#7635, closes#7367. See also #3423.
Previously pandoc would parse
[link to (@a)](url)
as a citation; similarly
[(@a)]{#ident}
This is undesirable. One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.
Closes#7632.
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.
To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
This commit changes the `marL` and `indent` values used for plain
paragraphs and numbered lists, and changes the spacing defined in the
reference doc master for bulleted lists.
For paragraphs, there is now a left-indent taken from the `otherStyle`
in the master. For numbered lists, the number is positioned where the
text would be if this were a plain paragraph, and the text is indented
to the next level. This means that continuation paragraphs line up
nicely with numbered lists.
It also /mostly/ matches the observed PowerPoint behaviour when
inserting paragraphs and numbered lists: the only difference is that
PowerPoint was using a different margin value for the first level
numbered lists – I’ve changed this to match the other levels, as I don’t
think it makes the spacing unappealing and it allows continuation
paragraphs at any level to line up.
With bulleted lists, I’m keeping the observed PowerPoint behaviour of
specifying only a level, letting `marL` and `indent` be automatically
taken from `bodyStyle`. To that end, this commit changes the `bodyStyle`
spacing in the master of the default reference doc, to:
- line up the text of the first paragraph in each bullet with any
continuation paragraphs
- line up nested bullet markers in any continuation paragraphs with the
first paragraph, matching lists and plain paragraphs
This does mean the continuation paragraphs still won’t line up for
anyone using their own reference doc where they haven’t matched the
`otherStyle` and `bodyStyle` indent levels, but I think people in that
situation will be able to troubleshoot.
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.
At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.
This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.
- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:
```
[source%linesnum,haskell]
----
some Haskell code here
----
```
We used to attempt automatic sentence splitting in man and ms
output, since sentence-ending periods need to be followed by
two spaces or a newline in these formats.
But it's difficult to do this reliably at the level of
`[Inline]`.
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
This adds the ability to specify EPUB 3 `authority` and `term` specific
refinements to the `subject` tag. Specifying a plain `subject` tag in
metadata will function as before.
Update tests.
Reason: it turns out that the native output generated by
pretty-simple isn't always readable by the native reader.
According to https://github.com/cdepillabout/pretty-simple/issues/99
it is not a design goal of the library that the rendered values
be readable using 'read'. This makes it unsuitable for our
purposes.
pretty-show is a bit slower and it uses 4-space indents
(non-configurable), but it doesn't have this serious drawback.
Previously we used our own homespun formatting. But this
produces over-long lines that aren't ideal for diffs in tests.
Easier to use something off-the-shelf and standard.
Closes#7580.
Performance is slower by about a factor of 10, but this isn't
really a problem because native isn't suitable as a serialization
format. (For serialization you should use json, because the reader
is so much faster than native.)
Previously polyglossia worked better with xelatex, but
that is no longer the case, so we simplify the code so that
babel is used with all latex engines.
This involves a change to the default LaTeX template.
This only affects output with bracketed_spans enabled.
The markdown reader parses spans with either `.ul` or `.underline` as
Underline elements, but we're moving towards preferring the latter.
In PowerPoint, it’s possible to specify footers across all slides,
containing a date (optionally automatically updated to today’s date),
the slide number (optionally starting from a higher number than 1), and
static text. There’s also an option to hide the footer on the title
slide.
Before this commit, none of that footer content was pulled through from
the reference doc: this commit supports all the functionality listed
above.
There is one behaviour which may not be immediately obvious: if the
reference doc specifies a fixed date (i.e. not automatically updating),
and there’s a date specified in the metadata for the document, the
footer date is replaced by the metadata date.
- Include date, slide number, and static footer content from reference
doc
- Respect “slide number starts from” option
- Respect “Don’t show on title slide” option
- Add tests
We previously indented them by two spaces, following a
common convention. Since the convention is fading, and
the indentation is inconvenient for copy/paste, we are
discontinuing this practice.
Closes#5440.
In the reveal-js output, it’s possible to use reveal’s
`data-background-image` class on a slide’s title to specify a background
image for the slide.
With this commit, it’s possible to use `background-image` in the same
way for pptx output. Only the “stretch” mode is supported, and the
background image is centred around the slide in the image’s larger axis,
matching the observed default behaviour of PowerPoint.
- Support `background-image` per slide.
- Add tests.
- Update manual.
- Support -i option
- Support incremental/noincremental divs
- Support older block quote syntax
- Add tests
One thing not clear from the manual is what should happen when the input
uses a combination of these things. For example, what should the
following produce?
```md
::: {.incremental .nonincremental}
- are
- these
- incremental?
:::
::: incremental
::::: nonincremental
- or
- these?
:::::
:::
::: nonincremental
> - how
> - about
> - these?
:::
```
In this commit I’ve taken the following approach, matching the observed
behaviour for beamer and reveal.js output:
- if a div with both classes, incremental wins
- the innermost incremental/nonincremental div is the one which takes
effect
- a block quote containing a list as its first element inverts whether
the list is incremental, whether or not the quote is inside an
incremental/non-incremental div
I’ve added some tests to verify this behaviour.
This commit closes issue #5689
(https://github.com/jgm/pandoc/issues/5689).
There was a mistake in the logic used to choose between the Comparison
and Two Content layouts: if one column contained only non-text (an image
or a table) and the other contained only text, the Comparison layout was
chosen instead of the desired Two Content layout.
This commit fixes that logic:
> If either column contains text followed by non-text, use Comparison.
Otherwise, use Two Content.
It also adds a test asserting this behaviour.
If the image has the id IMAGEID, then we use the id ref_IMAGEID
for the figure number. Closes#7551.
This allows one to create a filter that adds a figure number
with figure name, e.g.
<w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple>
For this to be possible it must be possible to predict the
figure number id from the image id.
If images lack an id, an id of the form `ref_fig1` is used.
for raw cell output
BREAKING CHANGE:
The Jupyter ecosystem, including nbconvert, lab and notebook,
deviated from their own spec in nbformat,
where they used the key `raw_mimetype` instead of `format`.
Moreover, the mime-type of rst used in Jupyter
deviated from that suggested by
https://docutils.sourceforge.io/FAQ.html
and is defined as `text/restructuredtext`
when chosen from "Raw NBConvert Format" in Jupyter.
So while this is backward-compatible,
it should matches the real world usage better,
hence improving the round-trip "identity" in raw-cell.
See #229, jupyter/nbformat#229.
We already copy the relationships and elements in presentation.xml for
embedded fonts, so at the moment using a reference doc with embedded
fonts is broken, producing a pptx that PowerPoint says needs repairing.
This commit copies the fonts over, which I believe is all that’s needed
to work correctly with reference docs with embedded fonts.
Before now, the numbering of rIds was inconsistent when making the
presentation XML and when making the presentation relationships XML.
For the relationships, the slides were inserted into the rId order after
the first master, and everything else was moved up out of the way.
However, this change was then missed in the presentation XML, I think
because `envSlideOffset` was never set. The result was that any slide
masters after the first would have the wrong rIds in the presentation
XML, clashing with the slides, which would lead PowerPoint to view
produced files as corrupt. As well, other relationships (like embedded
fonts) would have their rId changed in the relationships XML but not in
the presentation XML.
This commit:
- Removes `envSlideOffset` in favour of directly passed function
arguments
- Inserts the slides into the rId order after all masters rather than
after the first
- Updates any other rIds in presentation.xml that need to be changed
- Accept test changes: they’re adding the second theme (for all tests
not containing speaker notes), or changing its position in the
XML (for the ones containing speaker notes).
Before now, for any layouts added to the output from the default
reference doc, the relationships were unconditionally added to the
output. However, if there was already a layout in slideMaster1 at the
same index then that results in duplicate relationships.
This commit checks first, and only adds the relationship if it doesn’t
already exist.
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations. EPUB and HTML slide
show formats are also affected by this change.
This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.
The change also takes care to not modify the output if `EndOfDocument`
is used.
We now ensure that groups starting with `\*` never cause
text to be added to the document.
In addition, bookmarks now create a span between the start
and end of the bookmark, rather than an empty span.
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).
This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.
- Support “Comparison” slide layout
This layout is used when a slide contains at least two columns, at
least one of which contains some text followed by some non-text (e.g.
an image or table). The text in each column is inserted into the
“body” placeholder for that column, and the non-text is inserted into
the ObjType placeholder. Any extra content after the non-text is
overlaid on top of the preceding content, rather than dropping it
completely (as currently happens for the two-column layout).
+ Accept straightforward test changes
Adding the new layout means the “-deleted-layouts” tests have an
additional layout added to the master and master rels.
+ Add new tests for the comparison layout
+ Add new tests to pandoc.cabal
- Support “Content with Caption” slide layout
This layout is used when a slide’s body contains some text, followed by
non-text (e.g. and image or a table). Before now, in this case the image
or table would break onto a new slide: to get that output again, users
can add a horizontal rule before the image or table.
+ Accept straightforward tests
The “-deleted-layouts” tests all have an extra layout and relationship
in the master for the Content with Caption layout.
+ Accept remove-empty-slides test
Empty slides are still removed, but the Content with Caption layout is
now used.
+ Change slide-level-0/h1-h2-with-text description
This test now triggers the content with caption layout, giving a
different (but still correct) result.
+ Add new tests for the new layout
+ Add new tests to the cabal file
- Support “Blank” slide layout
This layout is used when a slide contains only blank content (e.g.
non-breaking spaces). No content is inserted into any placeholders in
the layout.
Fixes#5097.
+ Accept straightforward test changes
Blank layout now copied over from reference doc as well, when
layouts have been deleted.
+ Add some new tests
A slide should use the blank layout if:
- It contains only speaker notes
- It contains only an empty heading with a body of nbsps
- It contains only a heading containing only nbsps
- Change ContentType -> Placeholder
This type was starting to have a constructor for each placeholder on
each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
instead to identify a placeholder by type and index, as I think that’s
clearer and less redundant.
- Describe layout-choosing logic in manual
+ Add sandbox feature for readers. When this option is used,
readers and writers only have access to input files (and
other files specified directly on command line). This restriction
is enforced in the type system.
+ Filters, PDF production, custom writers are unaffected. This
feature only insulates the actual readers and writers, not
the pipeline around them in Text.Pandoc.App.
+ Note that when `--sandboxed` is specified, readers won't have
access to the resource path, nor will anything have access to
the user data directory.
+ Add module Text.Pandoc.Class.Sandbox, defining
`sandbox`. Exported via Text.Pandoc.Class. [API change]
Closes#5045.
This change has several parts:
- In Text.Pandoc.App, if the writer is docx, we fill the media
bag and attempt to convert any SVG images to PNG, adding these
to the media bag. The PNG backups have the same filenames as
the SVG images, but with an added .png extension. If the conversion
cannot be done (e.g. because rsvg-convert is not present),
a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016's syntax for
including SVG images. If a PNG fallback is present in the media bag,
we include a link to that too.
It would be helpful if someone with an old Word version could test
to see that the documents we produce can be opened and viewed with
the PNG fallbacks. If not, then perhaps we can eliminate the
slightly complex code for producing these fallbacks.
Closes#4058.
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).
However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.
This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
Previously we used liftIO fairly liberally. The code has
been restructured to avoid this.
A small behavior change is that pandoc will now fall back
to latin1 encoding for inputs that can't be read as UTF-8.
This is what it did previously for content fetched from
the web and not marked as to content type. It makes sense
to do the same for local files.
to any instance of PandocMonad and MonadIO.
This involves an API change, since the type of
runLua is now
(PandocMonad m, MonadIO m) => Lua a -> m (Either PandocError a)
The docx reader made a couple assumptions about how docx
containers were laid out that were not always true, with
the result that some images in documents did not get
found/extracted.
Closes#7511.
The image title (i.e. `![alt text](link "title")`) was previously
ignored when writing to pptx. This commit includes it in PowerPoint's
description of the image, along with the link (which was already
included).
Fixes 7352.
Linkification of URLs in the bibliography is now done in
the citeproc library, depending on the setting of an option.
We set that option depending on the value of the metadata
field `link-bibliography` (defaulting to true, for consistency
with earlier behavior, though the new behavior includes the
CSL draft recommendation of hyperlinking the title or the whole
entry if a DOI, PMID, PMCID, or URL field is present but not
explicitly rendered).
These changes implement the following recommendations from the
draft CSL v1.0.2 spec (Appendix VI):
> The CSL syntax does not have support for configuration of links.
> However, processors should include links on bibliographic references,
> using the following rules:
> If the bibliography entry for an item renders any of the following
> identifiers, the identifier should be anchored as a link, with the
> target of the link as follows:
> - url: output as is
> - doi: prepend with "`https://doi.org/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> If the identifier is rendered as a URI, include rendered URI components
> (e.g. "`https://doi.org/`") in the link anchor. Do not include any other
> affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: ").
> If the bibliography entry for an item does not render any of
> the above identifiers, then set the anchor of the link as the item
> title. If title is not rendered, then set the anchor of the link as the
> full bibliography entry for the item. Set the target of the link as one
> of the following, in order of priority:
>
> - doi: prepend with "`https://doi.org/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - url: output as is
>
> If the item data does not include any of the above identifiers, do not
> include a link.
>
> Citation processors should include an option flag for calling
> applications to disable bibliography linking behavior.
Thanks to Benjamin Bray for getting this all working.
@undergroundquizscene - I think TemplateWarning
is apt to be confusing, since this actually doesn't have
anything to do with what we call 'templates' in pandoc.
Hence the change to a powerpoint-specific name.
Until now, users had to make sure that their reference doc contains
layouts in a specific order: the first four layouts in the file had to
have a specific structure, or else pandoc would error (or sometimes
successfully produce a pptx file, which PowerPoint would then fail to
open).
This commit changes the layout selection to use the layout names rather
than order: users must make sure their reference doc contains four
layouts with specific names, and if a layout with the right name isn’t
found pandoc will output a warning and use the corresponding layout from
the default reference doc as a fallback.
I believe the use of names rather than order will be clearer to users,
and the clearer errors will help them troubleshoot when things go wrong.
- Add tests for moved layouts
- Add tests for deleted layouts
- Add newly included layouts to slideMaster1.xml to fix tests
Any literal backslash needs to be escaped: these are currently showing
up as “‘r’” instead of “‘\r’”.
Co-authored-by: Emily Bourke <undergroundquizscene@protonmail.com>
Added an extension `short_subsuperscripts` which modifies the behavior
of `subscript` and `superscript`, allowing subscripts or superscripts containing only
alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is
combustible`). This improves support for multimarkdown. Closes#5512.
Add `Ext_short_subsuperscripts` constructor to `Extension` [API change].
This is enabled by default for `markdown_mmd`.
Figure and table numbers are now only included if `native_numbering`
is enabled. (By default it is disabled.) This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.
The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.
Closes#7499.
before passing them off to citeproc.
This ensures that we get proper localization and flipflopping
if, e.g., quotes are used in titles.
Closesjgm/citeproc#87.
citeproc changes allow us to ignore Quoted elements;
citeproc now uses its own method for represented quoted
things, and only localizes and flipflops quotes it adds itself.
See #87.
The one thing left to do is to convert Quoted elements in
bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])`
before passing them to citeproc, IF the localized quotes
for the quote type match the standard inverted commas.
Using a code block containing `\end{verbatim}`, one could
inject raw TeX into a LaTeX document even when `raw_tex`
is disabled. Thanks to Augustin Laville for noticing the
bug.
Closes#7497.
We were having trouble building on ghc 8.4 because of
the lack of a Foldable instance for (Alt Maybe) in
base < 4.12.
Mystery: for some reason our builds were failing for gitit
but not in the pandoc CI.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.
We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one. Only the top of the stack
is queried.
This commit adds a parameter for scope to the Macro constructor
(not exported).
Closes#7494.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
macros (so are equivalent to `\gdef` and `\xdef`). This should be
fixed by scoping macro definitions to groups, in a future commit.
Closes#7474.
When the slide level is set to 0, headings won't be used at all
in splitting the document into slides. Horizontal rules must be
used to separate slides.
Closes#7476.
We only depend on the urlEncode function in the package, which is also
provided by http-types. The HTTP package also depends on the network
package, which has difficulty building on ghcjs.
Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
Previously we just set the source name to "chunk" when parsing
from strings, to avoid misleading source positions.
This had the side effect that `rebase_relative_paths` would break
inside sections that were parsed as strings.
So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk".
Closes#7464.
ulem is conditionally included already when the `strikeout`
variable is set, so we set this when there is underlined text,
and use `\uline` instead of `\underline`.
This fixes wrapping for underlined text.
Closes#7351.
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.
As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.
https://tdg.docbook.org/tdg/5.1/citerefentry.htmlhttps://tdg.docbook.org/tdg/5.1/manvolnum.htmlhttps://tdg.docbook.org/tdg/5.1/refentry.html
This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.
https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
We now use source positions from the token stream to tell us
how much of the text stream to consume. Getting this to
work required a few other changes to make token source positions
accurate.
Closes#7434.
Just like it is possible to avoid incorporating an image in EPUB by
passing `data-external="1"` to a raw HTML snippet, this makes the same
possible for native Images, by looking for an associated `external`
attribute.
Preserve all attributes in img tags. If attributes have a `data-`
prefix, it will be stripped. In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
Use latest citeproc, which uses a Span with a class rather
than a Note for notes. This helps us distinguish between
user notes and citation notes.
Don't put citations at the beginning of a note in parentheses.
(Closes #7394.)
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.
In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
In a previous commit we used strings because boolean False
wouldn't render as `false`. This is changed in the dev
version ofdoctemplates, so we can go back to the more
straightforward approach.
This ensures that we have proper spacing before the next
line (which might e.g. be a table bottom border).
This gives better results in cases like test/command/7272.md.
Previously the parser would accept characters in domains
that are illegal in domains, and this sometimes caused it
to gobble bits of the following text.
Closes#7398.
Note that this change, by itself, caused some txt2tag reader
tests to fail. txt2tags allows bare email addresses with
a following form query. So, in addition to the change
to emailAddress, we modify the txt2tags parser so it can
still handle these cases.
Previously it was impossible to specify false values for
options that default to true; setting the option to false
just caused the portion of the template setting the option
to be omitted.
Now we prepopulate all the variables with their default
values, including them unconditionally and allowing them
to be overridden.
Long URLs cannot be treated as mediaPaths, but System.FilePath's
`isRelative` often returns True for them. So we add a check
for an absolute URL. We also ensure that extensions are derived
only from the path portion of URLs (previously a following query
was being included).
Closes#7391.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.
Closes#7376.
With the 2.14 release `--extract-media` stopped working as before;
there could be mismatches between the paths in the rendered document and
the extracted media.
This patch makes several changes (while keeping the same API).
The `mediaPath` in 2.14 was always constructed from the SHA1 hash of
the media contents. Now, we preserve the original path unless it's
an absolute path or contains `..` segments (in that case we use a path
based on the SHA1 hash of the contents).
When constructing a path from the SHA1 hash, we always use the
original extension, if there is one. Otherwise we look up an
appropriate extension for the mime type.
`mediaDirectory` and `mediaItems` now use the `mediaPath`, rather
than the mediabag key, for the first component of the tuple.
This makes more sense, I think, and fits with the documentation
of these functions; eventually, though, we should rework the API so that
`mediaItems` returns both the keys and the MediaItems.
Rewriting of source paths in `extractMedia` has been fixed.
`fillMediaBag` has been modified so that it doesn't modify
image paths (that was part of the problem in #7345).
We now do path normalization (e.g. `\` separators on Windows) only
in writing the media; the paths are left unchanged in the image
links (sensibly, since they might be URLs and not file paths).
These changes should restore the original behavior from before 2.14.
Closes#7345.
When we do a reverse lookup in the MIME table, we just get the
last match, so when the same mime type is associated with several
different extensions, we sometimes got weird results, e.g. `.vs`
for `text/plain`. These special cases help us get the most standard
extensions for mime types like `text/plain`.
In recent versions the table headers were no longer bottom-aligned
(if more than one line). This patch fixes that by using minipages
for table headers in non-simple tables.
Closes#7347.
Previously pipe tables with empty headers (that is, a header
line with all empty cells) would be rendered as headerless
tables. This broke in 2.11.4.
The fix here is to produce an AST with an empty table head
when a pipe table has all empty header cells.
Closes#7343.
Generally we allow optional starred variants of LaTeX commands
(since many allow them, and if we don't accept these explicitly,
ignoring the star usually gives acceptable results). But we
don't want to do this for `\(*\)` and similar cases.
Closes#7340.
* Column spans
* Row spans
- The spec says that if the `val` attribute is ommitted, its value
should be assumed to be `continue`, and that its values are
restricted to {`restart`, `continue`}. If the value has any other
value, I think it seems reasonable to default it to `continue`. It
might cause problems if the spec is extended in the future by adding
a third possible value, in which case this would probably give
incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
- The table description element is like alt text for a table (along
with the table caption element). It seems like we should include
this somewhere, but I’m not 100% sure how – I’m pairing it with the
simple caption for the moment. (Should it maybe go in the block
caption instead?)
* Detect table captions
- Check for caption paragraph style /and/ either the simple or
complex table field. This means the caption detection fails for
captions which don’t contain a field, as in an example doc I added
as a test. However, I think it’s better to be too conservative: a
missed table caption will still show up as a paragraph next to the
table, whereas if I incorrectly classify something else as a table
caption it could cause havoc by pairing it up with a table it’s
not at all related to, or dropping it entirely.
* Update tests and add new ones
Partially fixes: #6316
- Recognize locators spelled with a capital letter.
Closes#7323.
- Add a comma and a space in front of the suffix if it doesn't start
with space or punctuation. Closes#7324.
- Add manual entry for (non-default) extension
`rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
in Text.Pandoc.Extensions [API change]. When enabled, this
extension rewrites relative image and link paths by prepending
the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.
Closes#3752.
NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
- Improve parsing of `\def` macros. We previously set "verbatim mode"
even for parsing the initial `\def`; this caused problems for things
like
```
\def\foo{\def\bar{BAR}}
\foo
\bar
```
- Implement `\newif`.
- Add tests.