Commit graph

333 commits

Author SHA1 Message Date
John MacFarlane
d532eb14eb HTML reader: allow tfoot before body rows.
Closes #5079.
2018-11-16 11:29:15 -08:00
John MacFarlane
e61f632531 HTML reader: parse <small> as a Span with class "small".
Closes #5080.
2018-11-15 22:36:01 -08:00
John MacFarlane
e61d1d0da9 Asciidoc writer: Render Spans using [#id .class]#contents#.
See #5080.
2018-11-15 22:29:15 -08:00
John MacFarlane
1a102c11a9 Fix test case for #5014. 2018-11-13 14:50:26 -08:00
John MacFarlane
1cfdd3662f HTML reader: allow thead containing a row with td rather than th.
See #5014.

Note that this doesn't address the original issue in #5014,
only an unrelated side-issue.
2018-11-13 14:49:12 -08:00
John MacFarlane
52a57a5362 LaTeX writer: don't emit [<+->] unless beamer output,
even if `writerIncremental` is True.

See #5072.
2018-11-12 09:43:12 -08:00
John MacFarlane
5bc38a741b Exactly match GitHub's identifier generating algorithm.
See #5057.
2018-11-11 20:45:38 -08:00
John MacFarlane
a36d202e86 Text.Pandoc.Shared: add parameter to uniqueIdent, inlineListToIdentifier.
The parameter is Extensions. This allows these functions to
be sensitive to the settings of `Ext_gfm_auto_identifiers` and
`Ext_ascii_identifiers`.

This allows us to use `uniqueIdent` in the CommonMark reader,
replacing some custom code.

It also means that `gfm_auto_identifiers` can now be used
in all formats.

Semantically, `gfm_auto_identifiers` is now a modifier of
`auto_identifiers`; for identifiers to be set, `auto_identifiers`
must be turned on, and then the type of identifier produced
depends on `gfm_auto_identifiers` and `ascii_identifiers` are set.

Closes #5057.
2018-11-11 13:46:23 -08:00
John MacFarlane
5f030f3c2c Add command test for #5050. 2018-11-06 22:57:11 -08:00
quasicomputational
a747268823 CommonMark writer: respect --ascii (#5043) 2018-11-05 09:33:10 -08:00
John MacFarlane
511d647290 XML: toHtml5Entities: prefer shorter entities...
when there are several choices for a particular character.
2018-11-04 22:15:53 -08:00
John MacFarlane
805b9f8a12 Roff reader: Improved handling of custom strings as arguments.
Added test.
2018-11-02 21:35:49 -07:00
John MacFarlane
26341c1632 Implement --ascii for Markdown writer. 2018-11-01 16:31:04 -07:00
John MacFarlane
f379edc4ad HTML writer: use character entities references when possible for HTML5. 2018-11-01 16:08:27 -07:00
John MacFarlane
e0290fd18b LaTeX writer: add newline if math ends in a comment.
This prevents the closing delimiter from being swalled
up in the comment.

Closes #4880.
2018-10-31 21:51:20 -07:00
John MacFarlane
c51be5dfc8 LaTeX reader: allow space at end of math after \.
Closes #5010.

Expose trimMath from T.P.Shared.
2018-10-29 22:20:14 -07:00
Albert Krewinkel
096cbe6987 Lua: allow access to pandoc state (#5015)
* Lua: allow access to pandoc state

Lua filters and custom writers now have read-only access to most fields
of pandoc's internal state via the global variable `PANDOC_STATE`.

* Lua: allow iterating through fields of PANDOC_STATE

* Lua filters doc: describe CommonState

* Lua filters doc: mention global variable PANDOC_STATE

* Lua: add access to logs

Log messages can currently only be printed, but not decomposed.
2018-10-25 22:12:14 -07:00
John MacFarlane
8efb8975ed Groff writer character escaping changes.
T.P.GroffChar:  replaced `essentialEscapes` with `manEscapes`,
which includes all the escapes mentioned in the groff_man manual.

T.P.Writers.Groff: removed escapeCode; changed parameter on
escapeString from Bool to new type `EscapeMode`.
Rewrote `escapeString`.
2018-10-23 21:44:07 -07:00
Brian Leung
7eea5c62ed LaTeX reader: add support for nolinkurl command. (#4992) 2018-10-22 23:36:44 -07:00
John MacFarlane
efbb329f1a Groff escaping changes.
- `--ascii` is now turned on automatically for man output, for
  portability.  All man output will be escaped to ASCII.
- In T.P.Writers.Groff, `escapeChar`, `escapeString`, and
  `escapeCode` now take a boolean parameter that selects
  ascii-only output.  This is used by the Ms writer for
  `--ascii`, instead of doing an extra pass after writing
  the document.
- In ms output without `--ascii`, unicode is used whenever
  possible (e.g. for double quotes).
- A few escapes are changed: e.g. `\[rs]` instead of `\\` for
  backslash, and `\ga]` instead of `` \` `` for backtick.
2018-10-18 10:21:34 -07:00
John MacFarlane
f48960b75f Move common groff functions to Text.Pandoc.Writers.Groff
(unexported module).  These are used in both the man and ms
writers.

Moved groffEscape out of Text.Pandoc.Writers.Shared [cancels earlier
API change from adding it, which was after last release].

This fixes strong/code combination on man (should be `\f[CB]` not
`\f[BC]`), mentioned in #4973.

Updated tests.

Closes #4975.
2018-10-17 17:26:37 -07:00
Alexander Krotov
b3feaba6af Man writer: use \f[R] instead of \f[] to reset font
Fixes #4973
2018-10-17 18:29:07 +03:00
John MacFarlane
6f6ad0514d LaTeX reader: make macroDef polymorphic and allow in inline context.
Otherwise we can't parse something like
```
\lowercase{\def\x{Foo}}
```
I have actually seen tex like this in the wild.
2018-10-15 11:46:31 -07:00
John MacFarlane
22f81f78bd Added failing test case for macros. 2018-10-15 00:37:17 -07:00
John MacFarlane
88faa45f1d Markdown writer: ensure blank between raw block and normal content.
Otherwise a raw block can prevent a paragraph from being
recognized as such.

Closes #4629.
2018-10-14 17:12:06 -07:00
John MacFarlane
cf8224045b Markdown reader: Fix awkward soft break movements before abbreviations.
Closes #4635.
2018-10-14 13:02:36 -07:00
John MacFarlane
f5c64c3060 HTML reader: fix htmlTag and isInlineTag to accept processing instructions.
Fixes regression #3123 (since 2.0). Added regression test.
2018-10-11 09:58:25 -07:00
John MacFarlane
a92e43575f LaTeX writer: with --biblatex, use \autocite when possible.
`\autocites{a1}{a2}{a3}` will not collapse the entries.
So, if we don't have prefixes and suffixes, we use instead
`\autocite{a1;a2;a3}`.

Closes #4960.
2018-10-08 20:47:09 -07:00
John MacFarlane
145710c4c3 RST reader: don't allow single-dash separator in headerless table.
Closes #4382.
2018-10-07 12:37:08 -07:00
John MacFarlane
b806bff5b4 LaTeX reader: fix bugs omitting raw tex.
The default is `-raw_tex`, so no raw tex should result
unless we explicitly say `+raw_tex`.  Previously some
raw commands did make it through.

Closes #4527.
2018-10-07 12:21:43 -07:00
John MacFarlane
08fef6b210 RST reader: pass through fields in unknown directives as div attributes.
This commit also adds support for `class` and `name` attributes to
directives in general.

Closes #4715.
2018-10-07 11:44:11 -07:00
Brian Leung
e257b54124 Org reader: fix behavior for successive calls of #+EXCLUDE_TAGS. (#4951)
Calling `#+EXCLUDE_TAGS` multiple times should preserve the status of
the previously declared tags.
2018-10-05 22:21:20 -07:00
quasicomputational
6207bdeb68 CommonMark writer: add plain text fallbacks. (#4531)
Previously, the writer would unconditionally emit HTMLish output for
subscripts, superscripts, strikeouts (if the strikeout extension is
disabled) and small caps, even with raw_html disabled.

Now there are plain-text (and, where possible, fancy Unicode)
fallbacks for all of these corresponding (mostly) to the Markdown
fallbacks, and the HTMLish output is only used when raw_html is
enabled.

This commit adds exported functions `toSuperscript` and
`toSubscript` to `Text.Pandoc.Writers.Shared`.  [API change]

Closes #4528.
2018-10-05 21:33:14 -07:00
Brian Leung
a26b3a2d6a Org reader: Add partial support for #+EXCLUDE_TAGS option. (#4950)
Closes #4284.

Headers with the corresponding tags should not appear in the output.

If one or more of the specified tags contains a non-tag character
like `+`, Org-mode will not treat that as a valid tag, but will
nonetheless continue scanning for valid tags. That behavior is not
replicated in this patch; entering `cat+dog` as one of the entries in
`#+EXCLUDE_TAGS` and running the file through Pandoc will cause the
parser to fail and result in the only excluded tag being the default, `noexport`.
2018-10-05 14:28:17 -07:00
John MacFarlane
36f1846cc3 Implement --ascii (writerPreferAscii) in writers, not App.
Now the `write*` functions for Docbook, HTML, ICML, JATS,
Man, Ms, OPML are sensitive to `writerPreferAscii`.  Previously
the to-ascii translation was done in Text.Pandoc.App, and
thus not available to those using the writer functions
directly.

In addition, the LaTeX writer is now sensitive to
`writerPreferAscii` and to `--ascii`.  100% ASCII
output can't be guaranteed, but the writer will use
commands like `\"{a}` and `\l` whenever possible,
to avoid emiting a non-ASCII character.

A new unexported module, Text.Pandoc.Groff, has been
added to store functions used in the different groff-based
writers.
2018-09-30 22:32:00 -07:00
John MacFarlane
190ee279c9 LaTeX reader: allow verbatim blocks ending with blank lines.
Closes #4624.
2018-09-29 10:57:11 -07:00
leungbk
6e8f31dab1 Force inline code blocks to honor export options.
`exportsCode` is moved from `Blocks.hs` to `Shared.hs` and exported accordingly.
2018-09-26 08:49:13 +02:00
Brian Leung
72363cd2fc Add support for multiprenote and multipostnote arguments in LaTeX. (#4930)
* Add support for multiprenote and multipostnote arguments.

The multiprenotes occur before the first prefix of a
multicite, and the multipostnotes follow the last suffix.

* Add test for multiprenote and multipostnote.
2018-09-25 20:49:13 -07:00
John MacFarlane
37c6f6adfe RST reader: fix bug with internal link targets.
They were gobbling up indented content underneath.
Closes #4919.
2018-09-20 11:15:03 -07:00
John MacFarlane
136bf901aa Markdown reader: distinguish autolinks in the AST.
With this change, autolinks are parsed as Links with
the `uri` class. (The same is true for bare links, if
the `autolink_bare_uris` extension is enabled.)  Email
autolinks are parsed as Links with the `email` class.
This allows the distinction to be represented in the
URI.

Formerly the `uri` class was added to autolinks by
the HTML writer, but it had to guess what was an autolink
and could not distinguish `[http://example.com](http://example.com)`
from `<http://example.com>`.  It also incorrectly recognized
`[pandoc](pandoc)` as an autolink.  Now the HTML writer
simply passes through the `uri` attribute if it is present,
but does not add anything.

The Textile writer has been modified so that the `uri`
class is not explicitly added for autolinks, even if it
is present.

Closes #4913.
2018-09-19 14:53:29 -07:00
John MacFarlane
44e4f7b292 Markdown reader: example_lists should work without startnum.
Closes  #4908.
2018-09-16 20:40:32 -07:00
mb21
5347e9454f add test for --metadata-file 2018-09-15 17:06:10 +02:00
mb21
bd5500ba7f add test yaml-metadata-blocks.md 2018-09-15 12:10:10 +02:00
John MacFarlane
fa4ebd71a3 LaTeX reader: resolve \ref for figure numbers. 2018-09-09 22:53:18 -07:00
John MacFarlane
a211edc819 HTML reader: parse <script type="math/tex tags as math.
These are used by MathJax.

Closes #4877.
2018-09-07 09:41:17 -07:00
John MacFarlane
85ed24e849 RSTR reader: don't skip link definitions after comments.
Closes #4860.
2018-08-29 14:40:04 -07:00
John MacFarlane
a2c4261b32 HTML reader: allow enabling raw_tex extension.
This now allows raw LaTeX environments, `\ref`, and `\eqref` to
be parsed (which is helpful for translation HTML documents using
MathJaX).

Closes #1126.
2018-08-24 18:04:00 -07:00
Alexander Krotov
937b92cd30 HTML reader: extract spaces inside links instead of trimming them
Fixes #4845
2018-08-22 12:43:15 +03:00
John MacFarlane
3b5949e8f2 LaTeX reader: support blockcquote, foreignblockquote from csquotes.
Also foreigncblockquote, hyphenblockquote, hyphencblockquote.

Closes #4848.  But note:   currently foreignquote will be
parsed as a regular Quoted inline (not using the quotes
appropriate to the foreign language).
2018-08-21 21:03:43 -07:00
John MacFarlane
a733068ebf LaTeX reader: support enquote*, foreignquote, hypphenquote...
from csquotes.  See #4848.  Still TBD: blockquote, blockcquote,
foreignblockquote.
2018-08-21 17:39:27 -07:00