Commit graph

731 commits

Author SHA1 Message Date
John MacFarlane
619dfa2a2a Markdown reader: don't allow ^ at beginning of link or image label.
This is reserved for footnotes.
Fixes a regression introduced by 0a93acf.

Closes #7723.
2021-11-30 12:53:54 -08:00
John MacFarlane
0d25232bbf LaTeX reader: Fix semantics of \ref.
We were including the ams environment type in addition
to the number. This is proper behavior for `\cref` but
not for `\ref`.  To support `\cref` we need to store
the environment label separately.
2021-11-24 19:48:56 -08:00
John MacFarlane
7726b69cd3 LaTeX reader: omit visible content for \label{...}.
Previously we included the text of the label in square brackets,
but this is undesirable in many cases.

See discussion in
<https://github.com/jgm/pandoc/issues/813#issuecomment-978232426>.
2021-11-24 14:47:00 -08:00
John MacFarlane
6072bdcec9 HTML reader: parse attributes on links and images.
Closes #6970.
2021-11-24 11:01:55 -08:00
John MacFarlane
79e6f8db13 Improve detection of pipe table line widths.
Fixed calculation of maximum column widths in pipe tables.
It is now based on the length of the markdown line, rather
than a "stringified" version of the parsed line.  This should
be more predictable for users. In addition, we take into account
double-wide characters such as emojis.

Closes #7713.
2021-11-23 13:29:25 -08:00
Albert Krewinkel
96a4bbe264
Capture alt-text in JATS figures (#7703)
Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-11-20 09:48:01 -08:00
John MacFarlane
4f2eac88aa MediaWiki writer: fix code for generating spans for header IDs.
We need to generate a span when the header's ID doesn't match
the one MediaWiki would generate automatically.  But MediaWiki's
generation scheme is different from ours (it uses uppercase letters,
and `_` instead of `-`, for example).

This means that in going from markdown -> mediawiki, we'll now get
spans before almost every heading, unless explicit identifiers are
used that correspond to the ones MediaWiki auto-generates.
This is uglier output but it's necessary for internal links to
work properly.

See #7697.
2021-11-19 09:05:19 -08:00
John MacFarlane
df5ae1c186 HTML writer: Don't create invalid data- attribute...
for empty attribute key. (It would be better to make these
unrepresentable in the type system, but for now this is
an improvement.)

Closes #7546.
2021-11-19 08:50:18 -08:00
John MacFarlane
25bba0cc62 MediaWiki writer: use HTML spans for anchors when header has id.
Closes #7697.
2021-11-18 21:15:51 -08:00
John MacFarlane
c19f063420 Markdown writer: don't create autolinks when this loses information.
Previously we sometimes lost attributes when rendering links as autolinks.

Closes #7692.
2021-11-15 11:06:50 -08:00
Albert Krewinkel
ea268fd8a7
LaTeX reader: add rudimentary support for \autoref (#7693) 2021-11-15 09:40:50 -08:00
John MacFarlane
0a45f2600a Properly handle commented lines in BibTeX/BibLaTeX.
Closes #7668.
2021-11-08 10:15:53 -08:00
John MacFarlane
cc46667953 LaTeX reader: add 'uri' class when parsing \url.
Closes #7672.
2021-11-07 21:08:20 -08:00
John MacFarlane
d226a35c0a Switch back from HsYAML to yaml.
Reasons:

- Performance: HsYAML is around 20 times slower in parsing
  large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
  any attention.  HsYAML seems borderline unmaintained; it hasn't
  had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
  from C dependencies (#4535).  But I don't see a better alternative
  until a better pure Haskell parser is available.

Closes #6084.

Notes:

- We've removed the FromYAML instances for all types that had
  them, since this is a HsYAML-specific typeclass [API change].
  (The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
  parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
  Users may need to quote these when they are meant to be
  interpreted as strings.  Similarly, 'null' is parsed as
  a YAML null value (and will be treated as an empty string
  by pandoc rather than the string 'null').  Quoting it will
  force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
  escaping errors: instead of just falling back on treating
  the section as a table, it raises a YAML parsing error.
2021-10-27 12:50:51 -07:00
John MacFarlane
c712d13b67 Org reader: allow an initial :PROPERTIES: drawer to add to metadata.
Closes #7520.
2021-10-22 22:10:25 -07:00
John MacFarlane
0a93acf91a Markdown reader: don't parse links or bracketed spans as citations.
Previously pandoc would parse

    [link to (@a)](url)

as a citation; similarly

    [(@a)]{#ident}

This is undesirable.  One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.

Closes #7632.
2021-10-20 10:34:47 -07:00
John MacFarlane
49c4e1d014 Fix markdown parsing bug for math in bracketed spans and links.
This affects math with unbalanced brackets (e.g. `$(0,1]$`)
inside links, images, bracketed spans.

Closes #7623.
2021-10-13 08:59:37 -07:00
John MacFarlane
c72277e986 LaTeX reader: Properly handle \^ followed by group closing.
Closes #7615.
2021-10-10 11:24:28 -07:00
John MacFarlane
92abe45863 Further test updates for switch to pretty-show. 2021-09-29 08:28:54 -07:00
John MacFarlane
0bdcf415e4 Switch from pretty-simple to pretty-show for native output.
Update tests.

Reason:  it turns out that the native output generated by
pretty-simple isn't always readable by the native reader.
According to https://github.com/cdepillabout/pretty-simple/issues/99
it is not a design goal of the library that the rendered values
be readable using 'read'.  This makes it unsuitable for our
purposes.

pretty-show is a bit slower and it uses 4-space indents
(non-configurable), but it doesn't have this serious drawback.
2021-09-28 21:17:53 -07:00
John MacFarlane
665e6d3d94 BibTeX parser: fix expansion of special strings in series...
e.g. `newseries` or `library`.  Expansion should not happen
when these strings are protected in braces, or when they're
capitalized.

Closes #7591.
2021-09-23 22:21:05 -07:00
John MacFarlane
aa89f6be18 HTML reader: handle empty tbody element in table.
Closes #7589.
2021-09-23 09:25:37 -07:00
John MacFarlane
c266734448 Use pretty-simple to format native output.
Previously we used our own homespun formatting.  But this
produces over-long lines that aren't ideal for diffs in tests.
Easier to use something off-the-shelf and standard.

Closes #7580.

Performance is slower by about a factor of 10, but this isn't
really a problem because native isn't suitable as a serialization
format. (For serialization you should use json, because the reader
is so much faster than native.)
2021-09-21 12:37:42 -07:00
John MacFarlane
5f7e7f539a Add missing % on command tests.
This prevented `--accept` from working properly.
2021-09-21 10:42:24 -07:00
John MacFarlane
57d93cca56 Org writer: don't indent contents of code blocks.
We previously indented them by two spaces, following a
common convention.  Since the convention is fading, and
the indentation is inconvenient for copy/paste, we are
discontinuing this practice.

Closes #5440.
2021-09-17 09:41:34 -07:00
John MacFarlane
a07d955d6f Fix code blocks using --preserve-tabs.
Previously they did not behave as the equivalent input
with spaces would.  Closes #7573.
2021-09-16 20:46:05 -07:00
John MacFarlane
a3162d341b RST reader: handle escaped colons in reference definitions.
Cloess #7568.
2021-09-13 22:57:08 -07:00
John MacFarlane
8beca46611 Fix command test for #7557. 2021-09-10 12:07:11 -07:00
John MacFarlane
0216a2f504 Org reader: don't parse a list as first item in a list item.
Closes #7557.
2021-09-10 09:50:05 -07:00
Francesco Mazzoli
99a4d1d0b0
Support --reference-location for HTML output (#7461)
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations.  EPUB and HTML slide
show formats are also affected by this change.

This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.

The change also takes care to not modify the output if `EndOfDocument`
is used.
2021-09-10 09:30:05 -07:00
John MacFarlane
5dcd4610e2 Improve asciidoc escaping for -- in URLs. Closes #7529. 2021-08-29 10:12:20 -07:00
John MacFarlane
7ff06a8c43 Fix test for #7521. 2021-08-24 12:56:31 -07:00
John MacFarlane
3f9b7a10ad Markdown reader: fix interaction of --strip-comments and list
parsing.  Use of `--strip-comments` was causing tight lists
to be rendered as loose (as if the comment were a blank line).
Closes #7521.
2021-08-23 22:06:39 -07:00
Simon Schuster
591cdca38b LaTeX-parser: restrict \endinput to current file 2021-08-21 18:08:27 -07:00
John MacFarlane
07d847a910 RST reader: Fix :literal: includes.
These should create code blocks, not insert raw RST.
Closes #7513.
2021-08-20 09:54:42 -07:00
John MacFarlane
fd99fe4d7e Revise citeproc code to fit new citeproc 0.5 API.
Linkification of URLs in the bibliography is now done in
the citeproc library, depending on the setting of an option.
We set that option depending on the value of the metadata
field `link-bibliography` (defaulting to true, for consistency
with earlier behavior, though the new behavior includes the
CSL draft recommendation of hyperlinking the title or the whole
entry if a DOI, PMID, PMCID, or URL field is present but not
explicitly rendered).

These changes implement the following recommendations from the
draft CSL v1.0.2 spec (Appendix VI):

> The CSL syntax does not have support for configuration of links.
> However, processors should include links on bibliographic references,
> using the following rules:

> If the bibliography entry for an item renders any of the following
> identifiers, the identifier should be anchored as a link, with the
> target of the link as follows:

> - url: output as is
> - doi: prepend with "`https://doi.org/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"

> If the identifier is rendered as a URI, include rendered URI components
> (e.g. "`https://doi.org/`") in the link anchor. Do not include any other
> affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: ").
> If the bibliography entry for an item does not render any of
> the above identifiers, then set the anchor of the link as the item
> title. If title is not rendered, then set the anchor of the link as the
> full bibliography entry for the item. Set the target of the link as one
> of the following, in order of priority:
>
> - doi: prepend with "`https://doi.org/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - url: output as is
>
> If the item data does not include any of the above identifiers, do not
> include a link.
>
> Citation processors should include an option flag for calling
> applications to disable bibliography linking behavior.

Thanks to Benjamin Bray for getting this all working.
2021-08-17 15:34:23 -07:00
John MacFarlane
2c466a15af Remove misleading description from command/citeproc-87 test. 2021-08-15 09:26:30 -07:00
John MacFarlane
82638ad53b Convert Quoted in bib entries to special Spans...
before passing them off to citeproc.
This ensures that we get proper localization and flipflopping
if, e.g., quotes are used in titles.

Closes jgm/citeproc#87.
2021-08-13 19:25:29 -07:00
John MacFarlane
15683bb607 Citeproc: avoid odd handling of quotes.
citeproc changes allow us to ignore Quoted elements;
citeproc now uses its own method for represented quoted
things, and only localizes and flipflops quotes it adds itself.

See #87.

The one thing left to do is to convert Quoted elements in
bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])`
before passing them to citeproc, IF the localized quotes
for the quote type match the standard inverted commas.
2021-08-13 18:13:06 -07:00
John MacFarlane
418155aa95 Fix raw LaTeX injection issue (LaTeX writer).
Using a code block containing `\end{verbatim}`, one could
inject raw TeX into a LaTeX document even when `raw_tex`
is disabled.  Thanks to Augustin Laville for noticing the
bug.

Closes #7497.
2021-08-13 11:27:04 -07:00
John MacFarlane
dd1a956a8a LaTeX reader: Support \global before \def, \let, etc.
See #7494.
2021-08-11 16:28:53 -07:00
John MacFarlane
e3a263df46 Fix scope for LaTeX macros.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.

We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one.  Only the top of the stack
is queried.

This commit adds a parameter for scope to the Macro constructor
(not exported).

Closes #7494.
2021-08-11 16:14:34 -07:00
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
Peter Fabinski
8667ba2bcc
LaTeX table writer: Increase column width precision (#7466)
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
2021-08-03 15:34:39 -06:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes #7434.
2021-07-11 13:50:28 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes #7436.  Added regression test.
2021-07-09 12:27:41 -07:00
John MacFarlane
3a31fe68ef Add command test for #7394.
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
b7572db224 Use dev version of citeproc.
This eliminates double hyperlinks in author-in-text citations.
Author-only citations are no longer hyperlinked.
See jgm/citeproc#77.
2021-06-29 09:18:49 -07:00