Commit graph

1925 commits

Author SHA1 Message Date
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
John MacFarlane
1af2cfb287 Handle relative lengths (e.g. 2*) in HTML column widths.
See <https://www.w3.org/TR/html4/types.html#h-6.6>.

"A relative length has the form "i*", where "i" is an integer. When
allotting space among elements competing for that space, user agents
allot pixel and percentage lengths first, then divide up remaining
available space among relative lengths. Each relative length receives a
portion of the available space that is proportional to the integer
preceding the "*". The value "*" is equivalent to "1*". Thus, if 60
pixels of space are available after the user agent allots pixel and
percentage space, and the competing relative lengths are 1*, 2*, and 3*,
the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and
the 3* will be alloted 30 pixels."

Closes #4063.
2021-05-22 22:03:54 -07:00
John MacFarlane
07d299d353 DocBook reader: ensure that first and last names are separated.
Closes #6541.
2021-05-20 18:45:39 -07:00
John MacFarlane
d7b5def287 Ms writer: handle tables with multiple paragraphs.
Previously they overflowed the table cell width.
We now set line lengths per-cell and restore them
after the table has been written.

Closes #7288.
2021-05-20 17:12:38 -07:00
John MacFarlane
bb11f5fb86 LaTeX reader: More siunitx improvements. Closes #6658.
There's still one slight divergence from the siunitx behavior:
we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm
not going to worry about that.
2021-05-20 15:30:31 -07:00
John MacFarlane
4e990a8cf9 LaTeX/siunitx: fix parsing of \cubic etc. See #6658. 2021-05-20 10:13:20 -07:00
John MacFarlane
bc5058234f LaTeX reader sinuitx: fix + sign on ang. 2021-05-20 10:13:20 -07:00
John MacFarlane
5dc917da3e LaTeX reader siunitx: add leading 0 to numbers starting with . 2021-05-20 10:13:20 -07:00
Denis Maier
183ce58477
ConTeXt reader: improve ordered lists (#7304)
Closes #5016 

- change ordered list from itemize to enumerate
- adds new itemgroup for ordered lists
- add fontfeature for table figures
- remove width from itemize in context writer
2021-05-20 09:59:53 -07:00
John MacFarlane
a366bd6abc LaTeX reader: Fix parsing of +- in siunitx numbers.
See #6658.
2021-05-20 09:03:29 -07:00
John MacFarlane
8437a4a002 LaTeX reader: support \pm in SI{..}.
Closes #6620.
2021-05-20 08:16:46 -07:00
Albert Krewinkel
b6239f4150
ZimWiki writer: allow links and emphasis in headers
The latest version of ZimWiki supports this.

Closes: #6605
2021-05-20 12:48:05 +02:00
John MacFarlane
5736b331d8 LaTeX reader: better support for \xspace.
Previously we only supported it in inline contexts; now
we support it in all contexts, including math.

Partially addresses #7299.
2021-05-19 16:14:49 -07:00
Albert Krewinkel
eb3dff148e
LaTeX writer: separate successive quote chars with thin space
Successive quote characters are separated with a thin space to improve
readability and to prevent unwanted ligatures. Detection of these quotes
sometimes had failed if the second quote was nested in a span element.

Closes: #6958
2021-05-18 22:55:47 +02:00
Albert Krewinkel
1843a8793a
HTML writer: keep attributes from code nested below pre tag.
If a code block is defined with `<pre><code
class="language-x">…</code></pre>`, where the `<pre>` element has no
attributes, then the attributes from the `<code>` element are used
instead. Any leading `language-` prefix is dropped in the code's *class*
attribute are dropped to improve syntax highlighting.

Closes: #7221
2021-05-17 18:08:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d3ca48656f
ConTeXt writer tests: keep code lines below 80 chars. 2021-05-17 13:11:33 +02:00
John MacFarlane
cc088687b4 LaTeX template: move title, author, date up to top of preamble.
This allows header-includes to use them, and puts them
in a position where you can see them immediately.
Closes #7295.
2021-05-16 14:35:13 -07:00
John MacFarlane
5a6399d9f6 Markdown writer: fewer unneeded escapes for #.
See #6259.
2021-05-16 12:23:34 -07:00
John MacFarlane
0a4c6925b6 Docx writer: copy over more settings from referenc.odcx.
From settings.xml in the reference-doc, we now include:
`zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`,
`drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`,
`displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`,
`characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`,
`decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`.

Closes #7240.
2021-05-15 15:40:49 -07:00
John MacFarlane
2cf971cf56 docx writer: Remove rsids from settings.docx.
Word will add these when revisions are made.  But it's
pointless to start out with a set of them.
2021-05-15 10:54:05 -07:00
Albert Krewinkel
0794862aac
HTML writer: parse <header> as a Div
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-15 16:46:02 +02:00
Albert Krewinkel
013e4a3164
HTML reader: keep h1 tags as normal headers (#7274)
The tags `<title>` and `<h1 class="title">` often contain the same
information, so the latter was dropped from the document. However, as
this can lead to loss of information, the heading is now always
retained.

Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document
title, or a filter to restore the previous behavior.

Closes: #2293
2021-05-14 12:31:24 -07:00
John MacFarlane
76a4e7127b Beamer writer: support exampleblock and alertblock.
A block will be rendered as an exampleblock if the heading
has class `example` and alertblock if it has class `alert`.

Closes #7278.
2021-05-14 10:09:46 -07:00
Albert Krewinkel
17d96404f5
Docx writer: allow multirow table headers 2021-05-14 16:19:20 +02:00
Albert Krewinkel
875f8f3654
HTML reader: don't fail on unmatched closing "script" tag.
Prevent the reader from crashing if the HTML input contains an unmatched
closing `</script>` tag.

Fixes: #7282
2021-05-14 12:13:40 +02:00
John MacFarlane
3f09f53459 Implement curly-brace syntax for Markdown citation keys.
The change provides a way to use citation keys that contain
special characters not usable with the standard citation
key syntax.  Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`.
Closes #6026.

The change requires adding a new parameter to the `citeKey`
parser from Text.Pandoc.Parsing [API change].

Markdown reader: recognize @{..} syntax for citatinos.

Markdown writer:  use @{..} syntax for citations when needed.

Update manual with curly-brace syntax for citations.

Closes #6026.
2021-05-13 21:59:32 -07:00
John MacFarlane
0217ae2a4f Hande 'annote' field in bibtex/biblatex writer.
Closes #7266.
2021-05-12 11:05:55 -07:00
John MacFarlane
5eb7ad7d1e Improve integration of settings from reference.docx.
The settings we can carry over from a reference.docx are
autoHyphenation, consecutiveHyphenLimit, hyphenationZone,
doNotHyphenateCap, evenAndOddHeaders, and proofState.

Previously this was implemented in a buggy way, so that the
reference doc's values AND the new values were included.

This change allows users to create a reference.docx that
sets w:proofState for spelling or grammar to "dirty,"
so that spell/grammar checking will be triggered on the
generated docx.

Closes #1209.
2021-05-11 22:31:38 -06:00
John MacFarlane
2bd5d0cafb LaTeX writer: better handling of line breaks in simple tables.
Now we also handle the case where they're embedded in other
elements, e.g. spans. Closes #7272.
2021-05-11 07:52:05 -06:00
John MacFarlane
6e45607f99 Change reader types, allowing better tracking of source positions.
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.

As a result, the readers had no way of knowing which file
was the source of any particular bit of text.  This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).

Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class.  A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`.  The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).

Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides.  Modified parsers to use a `Sources` as stream
[API change].

The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.

In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.

T.P.Error: showPos, do not print "-" as source name.
2021-05-09 19:11:34 -06:00
Albert Krewinkel
8357b835d9
App: allow tabs expansion even if file-scope is used
Tabs in plain-text inputs are now handled correctly, even if the
`--file-scope` flag is used.

Closes: #6709
2021-05-05 19:09:21 +02:00
Albert Krewinkel
ddbf83f62c
Docx writer: support colspans and rowspans in tables
See: #6315
2021-05-01 18:52:24 +02:00
mbrackeantidot
b6a65445e1
Docx reader: add handling of vml image objects (jgm#4735) (#7257)
They represent images, the same way as other images in vml format.
2021-04-29 09:11:44 -07:00
John MacFarlane
d14c5f94df Further improvements in smart quotes.
Improves heuristic for detection of an "open double quote."
Closes #2103.
2021-04-29 08:48:49 -07:00
John MacFarlane
80e2e88287 Smarter smart quotes.
Treat a leading " with no closing " as a left curly quote.
This supports the practice, in fiction, of continuing
paragraphs quoting the same speaker without an end quote.
It also helps with quotes that break over lines in line
blocks.

Closes #7216.
2021-04-28 23:32:37 -07:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00
Albert Krewinkel
0921b82d98
Docx writer: autoset table width if no column has an explicit width. 2021-04-27 13:27:20 +02:00
Jan Tojnar
e9c0f9f97b
Markdown writer: Cleaner (code)blocks with single class (#7242)
When a block only has a single class and no other attributes,
it is not necessary to wrap the class attribute in curly braces –
the class name can be placed after the opening mark as is.

This will result in bit cleaner output when pandoc is used
as a markdown pretty-printer.
2021-04-25 10:36:06 -07:00
John MacFarlane
547bc2cdf8 Add quotes properly in markdown YAML metadata fields.
This fixes a bug, which caused the writer to look at the LAST
rather than the FIRST character in determining whether quotes
were needed.  So we got spurious quotes in some cases and
didn't get necessary quotes in others.

Closes #7245.  Updated a number of test cases accordingly.
2021-04-25 10:31:33 -07:00
John MacFarlane
7f4850c9de Remove biblatex-nussbaum.md test.
It is basically the same as biblaetx-quotes.md.
2021-04-25 10:29:03 -07:00
John MacFarlane
73d394ca2a Use MetaInlines not MetaBlocks for multimarkdown metadata fields.
This gives better results in converting to e.g. pandoc markdown.

Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>
2021-04-18 22:01:12 -07:00
John MacFarlane
a478a5c4c8 Update to released unicode-collation, latest citeproc dev version.
Update citeproc test.
2021-04-17 16:15:14 -07:00
John MacFarlane
099ac9985b Use BCP47 language codes in citeproc tests. 2021-04-17 16:15:14 -07:00
John MacFarlane
ff5a504809 Use new citeproc + unicode-collation.
Add command test for unicode-collation.
2021-04-17 16:15:13 -07:00
Albert Krewinkel
5f79a66ed6
JATS writer: reduce unnecessary use of <p> elements for wrapping
The `<p>` element is used for wrapping in cases were the contents would
otherwise not be allowed in a certain context. Unnecessary wrapping is
avoided, especially around quotes (`<disp-quote>` elements).

Closes: #7227
2021-04-16 22:47:37 +02:00
Albert Krewinkel
2d60524de4
JATS writer: convert spans to <named-content> elements
Spans with attributes are converted to `<named-content>` elements
instead of being wrapped with `<milestone-start/>` and `<milestone-end>`
elements. Milestone elements are not allowed in documents using the
articleauthoring tag set, so this change ensures the creation of valid
documents.

Closes: #7211
2021-04-10 11:49:18 +02:00
Albert Krewinkel
051b7ffeaf
JATS writer: add footnote number as label in backmatter
Footnotes in the backmatter are given the footnote's number as a label.
The articleauthoring output is unaffected from this change, as footnotes
are placed inline there.

Closes: #7210
2021-04-10 10:57:06 +02:00
John MacFarlane
20cd33e5a4 Fix regression in grid tables for wide characters.
In the translation from String to Text, a char-width-sensitive
splitAt' was dropped.  This commit reinstates it.
Closes #7214.
2021-04-08 14:48:29 -07:00
John MacFarlane
60974538b2 Commonmark writer: Use backslash escapes for < and |...
instead of entities.  Closes #7208.
2021-04-05 23:29:22 -07:00
Albert Krewinkel
038261ea52
JATS writer: escape disallows chars in identifiers
XML identifiers must start with an underscore or letter, and can contain
only a limited set of punctuation characters. Any IDs not adhering to
these rules are rewritten by writing the offending characters as Uxxxx,
where `xxxx` is the character's hex code.
2021-04-05 21:55:54 +02:00
tecosaur
4371223d13
Org writer: Use LaTeX style maths deliminators (#7196)
Org works better with LaTeX-style delimiters.
2021-04-01 23:36:02 +02:00
niszet
40da6c402b
Treat tabs as spaces in ODT Reader. (#7185) 2021-03-31 16:44:34 -07:00
John MacFarlane
56ce1fc126 Fix DocBook reader mathml regression...
...caused by the switch in XML libraries.
Also fixed a similar issue in JATS.
Closes #7173.
2021-03-24 12:04:33 -07:00
Erik Rask
82e8c29cb0 Include Header.Attr.attributes as XML attributes on section
Add key-value pairs found in the attributes list of Header.Attr as
XML attributes on the corresponding section element.

Any key name not allowed as an XML attribute name is dropped, as
are keys with invalid values where they are defined as enums in
DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not
intervene with computed identifiers.
2021-03-20 21:29:17 +01:00
John MacFarlane
ceadf33246 Tests: Use getExecutablePath from base...
avoiding the need to depend on the executable-path package.
2021-03-19 23:35:47 -07:00
John MacFarlane
dc94601eb5 Tests: factor out setupEnvironment in Test.Helpers.
This avoids code duplication between Command and Old.
2021-03-19 21:17:13 -07:00
John MacFarlane
2ca1b20a85 Fix finding of data files from test programs.
Apparently Cabal sets a `pandoc_datadir` environment variable
so that the data files will be sought in the source directory
rather than in the final destination (where they aren't yet
installed).

So we no longer need to set `--data-dir` in the tests. We just
need to make sure `pandoc_datadir` is set in the environment
when we call the program in the test suite.

This will fix the issue with loading of pandoc.lua when
pandoc is built with `-embed_data_files`, reported in #7163.

Closes #7163.
2021-03-19 18:57:13 -07:00
John MacFarlane
c3f9e8c122 Docx writer: make nsid in abstractNum deterministic.
Previously we assigned a random number (though in a deterministic
way).  But changes in the random package mean we get different
results now on different architectures, even with the same random
seed. We don't need random values; so now we just assign a value
based on the list number id, which is guaranteed to be unique
to the list marker.
2021-03-17 22:31:20 -07:00
John MacFarlane
e66bf891ec Add test for #7155. 2021-03-17 09:10:37 -07:00
John MacFarlane
63a6059790 Update tests for new texmath. 2021-03-15 18:22:38 -07:00
John MacFarlane
35b66a7671 MediaWiki reader: Allow block-level content in notes (ref).
Closes #7145.
2021-03-13 12:50:44 -08:00
John MacFarlane
eed18d231c Use integral values for w:tblW in docx.
Cloess #7141.
2021-03-13 12:05:52 -08:00
Albert Krewinkel
f8b49e77f8
Use jira-wiki-markup 1.3.4
Jira reader:

* Fixed parsing of autolinks (i.e., of bare URLs in the text).
  Previously an autolink would take up the rest of a line, as spaces
  were allowed characters in these items.

* Emoji character sequences no longer cause parsing failures. This was
  due to missing backtracking when emoji parsing fails.

Jira writer:

* Block quotes are only rendered as `bq.` if they do not contain a
  linebreak.
2021-03-13 14:53:58 +01:00
Albert Krewinkel
00e8d0678e
Jira reader: mark divs created from panels with class "panel".
Closes: tarleb/jira-wiki-markup#2
2021-03-13 14:29:47 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
John MacFarlane
5608dc01e5 HTML writer: Add warnings on duplicate attribute values.
This prevents emitting invalid HTML.

Ultimately it would be good to prevent this in the types
themselves, but this is better for now.

T.P.Logging: Add DuplicateAttribute constructor to LogMessage.
[API change]
2021-03-10 10:19:40 -08:00
John MacFarlane
1c23e3a824 RST reader: fix logic for ending comments.
Previously comments sometimes got extended too far.  Closes #7134.
2021-03-09 13:03:27 -08:00
Albert Krewinkel
b9b2586ed3
Org writer: prevent unintended creation of ordered list items
Adjust line wrapping if default wrapping would cause a line to be read
as an ordered list item.

Fixes #7132
2021-03-09 18:14:54 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
John MacFarlane
5aa73bd0a2 LaTeX reader: handle table cells containing & in \verb.
Closes #7129.
2021-03-07 15:49:02 -08:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
12b47656d4 Remove superfluous imports. 2021-02-28 22:57:36 -08:00
John MacFarlane
7e38b8e55a T.P.Readers.LaTeX: Don't export tokenize, untokenize.
[API change]

These were only exported for testing, which seems the
wrong thing to do.  They don't belong in the public
API and are not really usable as they are, without access
to the Tok type which is not exported.

Removed the tokenize/untokenize roundtrip test.

We put a quickcheck property in the comments which
may be used when this code is touched (if it is).
2021-02-28 22:53:42 -08:00
John MacFarlane
a9cc5d2616 Update tests for changes to https URLs. 2021-02-26 18:00:45 -08:00
Salim B
fae6a204f1
Fix/update URLs and use HTTP**S** where possible (#7122) 2021-02-26 17:56:04 -08:00
John MacFarlane
f0a991a22b T.P.CSV: fix parsing of unquoted values.
Previously we didn't allow unescaped quotes in unquoted values,
but they are allowed. Closes #7112.
2021-02-22 21:18:04 -08:00
Albert Krewinkel
00e4bb51e4
tests: print accurate location if a test fails
Ensures that tasty-hunit reports the location of the failing test
instead of the location of the helper `test` function.
2021-02-22 23:56:04 +01:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
005344fb18 Revert "LaTeX template: disable ` ? ` and ! `` ligatures."
This reverts commit 24d7cd539b.
2021-02-18 17:03:11 -08:00
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00
John MacFarlane
59875185b3 Add command test for #7092 2021-02-12 19:04:14 -08:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
John MacFarlane
8e9131db4e Markdown reader: improved handling of mmd link attributes in references.
Previously they only worked for links that had titles.  Closes #7080.
2021-02-06 21:52:12 -08:00
Andrew Dunning
4de9edb8e8
LaTeX template: Update to iftex package (#7073)
Load the iftex package directly rather than via the ifxetex and ifluatex compatibility
wrappers, which have been merged into a single package that is part of the LaTeX core.
The capitalization of the commands has been changed for compatibility with older
versions of TeX Live that have the version of iftex by the Persian TeX Group. This had
been removed in
<2845794c0c>
for compatibility with BasicTeX, but that is no longer an issue.
2021-02-03 08:54:11 -08:00
John MacFarlane
e6c7fcc598 Fixed some compiler warnings in tests. 2021-02-02 21:09:10 -08:00
Albert Krewinkel
6f79042502 Add tests for search_path_separator 2021-02-02 21:04:30 -08:00
Albert Krewinkel
e0bf4bfe82 Check that all documented functions are present.
Rely on tests in the module package to check the correctness of each
function.
2021-02-02 21:04:30 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
2415b2680a Test suite: a more robust way of testing the executable.
Mmny of our tests require running the pandoc
executable. This is problematic for a few different reasons.
First, cabal-install will sometimes run the test suite
after building the library but before building the executable,
which means the executable isn't in place for the tests.
One can work around that by first building, then building
and running the tests, but that's fragile.  Second,
we have to find the executable. So far, we've done that
using a function findPandoc that attempts to locate it
relative to the test executable (which can be located
using findExecutablePath).  But the logic here is delicate
and work with every combination of options.

To solve both problems, we add an `--emulate` option to
the `test-pandoc` executable.  When `--emulate` occurs
as the first argument passed to `test-pandoc`, the
program simply emulates the regular pandoc executable,
using the rest of the arguments (after `--emulate`).
Thus,

    test-pandoc --emulate -f markdown -t latex

is just like

    pandoc -f markdown -t latex

Since all the work is done by library functions,
implementing this emulation just takes a couple lines
of code and should be entirely reliable.

With this change, we can test the pandoc executable
by running the test program itself (locatable using
findExecutablePath) with the `--emulate` option.
This removes the need for the fragile `findPandoc`
step, and it means we can run our integration tests
even when we're just building the library, not the
executable.

Part of this change involved simplifying some complex
handling to set environment variables for dynamic
library paths.  I have tested a build with
`--enable-dynamic-executable`, and it works, but
further testing may be needed.
2021-02-02 20:36:51 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
d1875b69ec RST reader: fix handling of header in CSV tables.
The interpretation of this line is not affected
by the delim option. Closes #7064.
2021-01-31 12:05:46 -08:00
John MacFarlane
9223788a05 Markdown writer: handle math right before digit.
We insert an HTML comment to avoid a `$` right before
a digit, which pandoc will not recognize as a math delimiter.
2021-01-29 18:29:17 -08:00
John MacFarlane
98c2a52b4e Clean up BibTeX parsing.
Previously there was a messy code path that gave strange
results in some cases, not passing through raw tex but
trying to extract a string content.  This was an artefact
of trying to handle some special bibtex-specific commands
in the BibTeX reader. Now we just handle these in the
LaTeX reader and simplify parsing in the BibTeX reader.
This does mean that more raw tex will be passed through
(and currently this is not sensitive to the `raw_tex`
extension; this should be fixed).

Closes #7049.
2021-01-26 22:45:57 -08:00
John MacFarlane
198ce0cde9 ImageSize: use viewBox for svg if no length, width.
This change allows pandoc to extract size information
from more SVGs.  Closes #7045.
2021-01-22 20:49:41 -08:00
Albert Krewinkel
b4b3560191
JATS writer: allow to use element-citation 2021-01-22 19:35:08 +01:00
John MacFarlane
5f98ac62e3 JATS writer: Ensure that disp-quote is always wrapped in p.
Closes #7041.
2021-01-19 20:39:58 -08:00
John MacFarlane
c841bcf3b0 Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"
This reverts commit 6efd3460a7.

Since this extension is designed to be used with
GitHub markdown (gfm), we need to implement the parser
as a commonmark extension (commonmark-extensions),
rather than in pandoc's markdown reader.  When that is
done, we can add it here.
2021-01-16 16:22:04 -08:00
Gautier DI FOLCO
6efd3460a7
Markdown reader: support GitHub wiki's internal links (#2923) (#6458)
Canges overview:

 * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change].
 * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown`
 * Add tests.
2021-01-16 16:15:33 -08:00
John MacFarlane
3d6ebc9051 Use dev version of citeproc.
Change a citation test which had wrong disambiguation
(see jgm/citeproc#44).
2021-01-15 11:51:07 -08:00
John MacFarlane
c451207b08 Docx writer: handle table header using styles.
Instead of hard-coding the border and header cell vertical alignment,
we now let this be determined by the Table style, making use of
Word's "conditional formatting" for the table's first row.
For headerless tables, we use the tblLook element to tell Word
not to apply conditional first-row formatting.

Closes #7008.
2021-01-12 09:49:10 -08:00
Albert Krewinkel
68fa437999
JATS writer: fix citations (#7018)
* JATS writer: keep code lines at 80 chars or below

* JATS writer: fix citations
2021-01-10 15:35:48 -08:00
Albert Krewinkel
fe1378227b
Org reader: allow multiple pipe chars in todo sequences
Additional pipe chars, used to separate "action" state from "no further
action" states, are ignored. E.g., for the following sequence, both
`DONE` and `FINISHED` are states with no further action required.

    #+TODO: UNFINISHED | DONE | FINISHED

Previously, parsing of the todo sequence failed if multiple pipe chars
were included.

Closes: #7014
2021-01-09 13:40:31 +01:00
Albert Krewinkel
4f34345867
Update copyright notices for 2021 (#7012) 2021-01-08 09:38:20 -08:00
John MacFarlane
327e1428c5 gfm/commonmark writer: implement start number on ordered lists.
Previously they always started at 1, but according to the spec
the start number is respected. Closes #7009.
2021-01-07 16:42:05 -08:00
John MacFarlane
c0d8b186d1 T.P.Parsing: modify gridTableWith' for headerless tables.
If the table lacks a header, the header row should be an empty
list. Previously we got a list of empty cells, which caused
an empty header to be emitted instead of no header.  In LaTeX/PDF
output that meant we got a double top line with space between.

@tarleb @despres - please let me know if this is problematic
for some reason I'm not grasping.
2021-01-07 11:07:03 -08:00
John MacFarlane
533b2edd51 Remove \setupthinrules from default context template.
The width parameter this used is not actually supported,
and the command didn't do anything.
2021-01-06 14:39:44 -08:00
John MacFarlane
15ba184e6e HTML writer: fix implicit_figure at end of footnotes.
Closes #7006.
2021-01-05 12:07:02 -08:00
David Martschenko
385b6a3b21
Implement defaults file inheritance (#6924)
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
2021-01-05 10:15:59 -08:00
John MacFarlane
ea479bf28a LaTeX reader: handle filecontents environment.
Closes #7003.
2021-01-04 14:05:03 -08:00
Dimitri Sabadie
57b1094152
Org reader: mark verbatim code with class "verbatim". (#6998)
* Replace org-mode’s verbatim from code to codeWith.

This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.

* Alter test for org-mode’s verbatim change.

See previous commit for further detail on the new implementation.
2021-01-03 08:57:47 +01:00
John MacFarlane
9a18cf4b59 LaTeX writer: revert table line height increase in 2.11.3.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables.  This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.

Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.

Closes #6996.
2021-01-02 07:56:07 -08:00
Albert Krewinkel
17e3efc785
Org reader: restructure output of captioned code blocks
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.

Closes: #6977
2021-01-01 11:18:36 +01:00
John MacFarlane
23f964b907 Mediawiki reader: allow space around storng/emph delimiters.
Closes #6993.
2020-12-30 21:31:28 -08:00
John MacFarlane
ee7cef7624 Update ms table tests. 2020-12-30 13:40:11 -08:00
John MacFarlane
a7a162ea55 Update test for new citeproc and require it in cabal. 2020-12-28 14:40:23 -08:00
timo-a
668596cc89
Add support for writing nested tables to asciidoc (#6972)
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
2020-12-27 18:42:28 -08:00
Albert Krewinkel
dcd89413f3 Powerpoint writer: allow arbitrary OOXML in raw inline elements
The raw text is now included verbatim in the output. Previously is was parsed
into XML elements, which prevented the inclusion of partial XML snippets.
2020-12-27 23:18:54 +01:00
Albert Krewinkel
8f402beab9
LaTeX writer: support colspans and rowspans in tables. (#6950)
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
2020-12-20 18:04:54 -08:00
John MacFarlane
95b15fe6d3 Remove some test files that are no longer used. 2020-12-18 11:22:29 -08:00
John MacFarlane
ff58237d2a Update test with new skylighting CSS. 2020-12-17 11:10:00 -08:00
John MacFarlane
b4b4e32307 Properly handle boolean values in writing YAML metadata.
(Markdown writer.)
This requires doctemplates >= 0.9.
Closes #6388.
2020-12-15 23:45:34 -08:00
John MacFarlane
7d799bfcda Allow both inline and external references to be used
with `--citeproc`.  This fixes a regression, since pandoc-citeproc
allowed these to be combined.

Closes #6951.
2020-12-15 08:51:43 -08:00
John MacFarlane
c43e2dc0f4 RST writer: better image handling.
- An image alone in its paragraph (but not a figure) is now
  rendered as an independent image, with an `alt` attribute
  if a description is supplied.
- An inline image that is not alone in its paragraph will
  be rendered, as before, using a substitution.
  Such an image cannot have a "center", "left", or
  "right" alignment, so the classes `align-center`,
  `align-left`, or `align-right` are ignored.
  However, `align-top`, `align-middle`, `align-bottom`
  will generate a corresponding `align` attribute.

Closes #6948.
2020-12-13 15:25:46 -08:00
Albert Krewinkel
00031fc809
Docx writer: keep raw openxml strings verbatim.
Closes: #6933
2020-12-13 14:09:59 +01:00
mb21
208cb96196 ICML writer: fix image bounding box for custom widths/heights
fixes #6936
2020-12-12 14:49:11 +01:00
John MacFarlane
0a502e5ff5 HTML reader: retain attribute prefixes and avoid duplicates.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used.  This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.

Closes #6938.
2020-12-10 15:44:10 -08:00
John MacFarlane
810df00cf5
Merge pull request #6922 from jtojnar/db-writer-admonitions
Docbook writer: handle admonitions
2020-12-07 08:48:02 -08:00
Jan Tojnar
70c7c5703a
Docbook writer: Handle admonition titles from Markdown reader
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.

Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
2020-12-07 07:28:39 +01:00
Jan Tojnar
dc6856530c
Docbook writer: handle admonitions
Similarly to d6fdfe6f2b,
we should handle admonitions.
2020-12-07 06:23:25 +01:00
Albert Krewinkel
acf932825b
Org reader: preserve targets of spurious links
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.

See: #6916
2020-12-05 22:37:48 +01:00
Nils Carlson
c161893f44
OpenDocument writer: Allow references for internal links (#6774)
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.

Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.

For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.

In order for numbers and reference text to be updated the generated
document must be refreshed.

Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
2020-12-05 10:00:04 -08:00
John MacFarlane
ddb76cb356 LaTeX reader: don't apply theorem default styling to a figure inside.
If we put an image in italics, then when rendering to Markdown
we no longer get an implicit figure.

Closes #6925.
2020-12-05 09:53:39 -08:00
John MacFarlane
dc3ef5201f Markdown writer: ensure that a new csl-block begins on a new line.
This just looks better and doesn't affect the semantics.
See #6921.
2020-12-04 10:55:48 -08:00
John MacFarlane
7199d68ba0 EPUB writer: include title page in landmarks.
Closes #6919.

Note that the toc is also included if `--toc` is specified.
2020-12-03 21:39:44 -08:00
John MacFarlane
5bbd5a9e80 Docx writer: Support bold and italic in "complex script."
Previously bold and italics didn't work properly in LTR
text.  This commit causes the w:bCs and w:iCs attributes
to be used, in addition to w:b and w:i, for bold and
italics respectively.

Closes #6911.
2020-12-03 09:51:23 -08:00
Albert Krewinkel
8c38390038
HTML reader tests: improve test coverage of new features 2020-11-27 21:21:25 +01:00
Albert Krewinkel
a9c766291f
HTML reader: support body headers, row head columns
Closes: #6312
2020-11-27 10:36:13 +01:00
cholonam
5f4deb5455 Docx writer: Fix bullets/lists indentation
Fix appearance of bullets/numbered lists (the first level is slightly
indented to the right instead of right on the margin).

New golden files have been tested using Word 2010 on Windows 10.
2020-11-26 12:11:26 -08:00
Igor Pashev
630b1bff2b
LaTeX reader: preserve center environment (#6852)
The contents of the `center` environment are put in a `Div`
with class `center`.
2020-11-26 12:04:31 -08:00
Albert Krewinkel
07919e1b22
HTML reader: improve support for table headers, footer, attributes
- `<tfoot>` elements are no longer added to the table body but used as
  table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
  elements are preserved.
2020-11-26 07:22:01 +01:00
John MacFarlane
ea8097e412 latex template: put back amssymb
We need it for checkboxes in todo lists, and maybe for
other things.  In this location it seems compatible
with the cases that propmted #6469 and PR #6762.
2020-11-25 16:08:10 -08:00
John MacFarlane
70a7c2446e Update tests for LaTeX table changes. 2020-11-25 15:49:17 -08:00
John MacFarlane
b50ac3a95b LaTeX tables: Fix calculation of column spacing.
See #6883.
2020-11-25 14:41:28 -08:00
John MacFarlane
815976d537 Fix truncation of [Citation] list in Cite inside footnotes...
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.

Closes #6890.
2020-11-25 09:10:10 -08:00
John MacFarlane
e26d31d56b latex template: disable language-specific shorthands in babel.
Babel defines "shorthands" for some languages, and these can
produce unexpected results. For example, in Spanish, `1.22`
gets rendered as `122`, and `et~al.` as `etal`.

One would think that babel's `shorthands=off` option (which
we were using) would disable these, but it doesn't.  So we
remove `shorthands=off` and add some code that redefines
the shorthands macro.  Eventually this will be fixed in babel,
I hope, and we can revert to something simpler.

Closes #6817, closes #6887.
2020-11-25 08:25:30 -08:00
Kolen Cheung
bfb2a492c8
test/tables/*.native: normalized (#6888)
The 3 native table test cases are normalized so that it will looks exactly like it is written by some pandoc writers.

Note that apart from white space normalization, it includes other normalization such as `[Str "Nordic countries"] to [Str "Nordic",Space,Str "countries"]`.
2020-11-24 22:33:36 -08:00
Albert Krewinkel
0eedbd0a3d
HTML reader tests: disable round-trip testing for tables
Information for cell alignment in a column is not preserved during
round-trips.
2020-11-24 15:46:11 +01:00
Albert Krewinkel
c9f98e2bf5
HTML reader: support row or column-spanning table cells 2020-11-24 14:17:35 +01:00
Nils Carlson
75c881e2d9
OpenDocument Writer: Implement Div and Span ident support (#6755)
Spans and Divs containing an ident in the Attr will become bookmarks
or sections with idents in OpenDocument format.
2020-11-22 22:23:30 -08:00
John MacFarlane
b5b5ef92cb LaTeX writer: Improve table spacing.
+ Remove the `\strut` that was added at the end of minipage
  environments in cells.

+ Replace `\tabularnewline` with `\\ \addlinespace`.

Closes #6842, closes #6860.
2020-11-22 10:54:42 -08:00
Albert Krewinkel
5344dab8eb
Org reader: parse #+LANGUAGE into lang metadata field
Fixes: #6845
2020-11-22 12:53:05 +01:00
Nils Carlson
ae52918faa
OpenDocument writer: Table text width support (#6792)
Support for table width as a percentage of text width by summing
width of columns and verifying that the sum is > 0 and <= 1.
2020-11-21 12:42:43 -08:00
John MacFarlane
7db2cf5d2f LaTeX reader: more robust parsing of bracketed options.
Improves on 9a40976.  Closes #6873.
2020-11-21 12:24:37 -08:00
Nils Carlson
56ceaf49dc
DocBook reader: Table text width support (#6791)
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
2020-11-20 16:05:56 -08:00
John MacFarlane
9a4097640f Improve LaTeX option parsing...
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.

Closes #6869.
2020-11-20 13:40:26 -08:00
John MacFarlane
c647948ff1 commonmark_x: replace auto_identifiers with gfm_auto_identifiers.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.

Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.

Closes #6863.
2020-11-20 09:17:14 -08:00
Albert Krewinkel
d286242131 JATS writer: support advanced table features 2020-11-19 22:09:52 +01:00
John MacFarlane
0962b30d84 Man reader: improve handling of .IP.
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.

We also skip blank lines like groff itself.

Closes #6858.
2020-11-18 22:44:32 -08:00
Albert Krewinkel
023468ea2d
JATS writer: wrap all tables
All `<table>` elements are put inside `<table-wrap>` elements, as the
former are not valid as immediate child elements of `<body>`.
2020-11-18 18:10:17 +01:00
TEC
0306eec5fa Replace org #+KEYWORDS with #+keywords
As of ~2 years ago, lower case keywords became the standard (though they
are handled case insensitive, as always):
13424336a6

Upper case keywords are exclusive to the manual:
- https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/
- https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-11-18 14:48:56 +01:00
John MacFarlane
bf3fea0a8c Markdown reader: fix regression with example list references.
This affects example list references followed by dashes.
Introduced by commit b8d17f7.
Closes #6855.
2020-11-17 20:36:59 -08:00
John MacFarlane
5271c6b3fb Improve fix to siunitx numbers with minus.
- use real minus sign
- use tests contributed by Igor Pashev.
2020-11-16 16:36:16 -08:00
John MacFarlane
734b4c26a9 LaTeX reader: Fix negative numbers in siunitx commands.
The commit a157e1a broke negative numbers, e.g.
`\SI{-33}{\celcius}` or `\num{-3}`. This fixes the regression.
2020-11-16 14:08:29 -08:00
John MacFarlane
d7f905fb63 Markdown reader: fix detection of locators following in-text citations.
Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be
incorrectly parsed as a prefix of `@bar` rather than a suffix
of `@foo`.
2020-11-15 17:51:03 -08:00
Aner Lucero
f63b76e169 Markdown writer: default to using ATX headings.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).

* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.

Closes #6662.
2020-11-14 21:33:32 -08:00
John MacFarlane
b8d17f7ae8 Markdown reader: don't increment stateNoteNumber for example refs.
Background:  syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).

This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.

This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
2020-11-14 15:00:17 -08:00
John MacFarlane
68b298ed9a Improve period suppression algorithm for citations in notes...
in note citation styles.  See #6835.
2020-11-13 10:52:21 -08:00
John MacFarlane
efe74746d8 DokuWiki writer: translate language names for code elements...
...and improve whitespace.  Closes #6807.
2020-11-04 22:38:53 -08:00
John MacFarlane
8f75a53542 Properly support optional cite argument for \blockquote.
(LaTeX reader)

Closes #6802.
2020-11-03 10:25:56 -08:00
John MacFarlane
6cbe5efd56 LaTeX reader: fix bug parsing macro arguments.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`.  This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.

Closes #6796.
2020-11-02 15:04:16 -08:00
John MacFarlane
5760cd850f Default CSS tweaks.
These changes restore the 20px font size while increasing readibility by
reducing line width. (The number of words per line is now similar to
that of pandoc's default LaTeX/PDF output.)  With the narrower lines, we
also need less interline and interparagraph space, so the content
becomes more compact and skimmable:

- Change default font size back to 20px.
- Set font-size for print media to 12pt.
- Reduce interline space.
- Reduce interparagraph space.
- Reduce line width.
- Remove the special `line-height: 1` for table cells,
  which I had suggested but which now seems a mistake.
- Remove the special line-height for pre.
- Ensure that there is a bit more space before a heading
  than after.
- Slightly reduced space after title header.
2020-11-02 10:51:18 -08:00
John MacFarlane
ea45837372 Default CSS: avoid padding and color if monobackgroundcolor not given.
This makes the default more austere, while putting the padded,
colored code elements within easy reach.
2020-11-01 14:29:03 -08:00
Mauro Bieg
95d8713633
Updates to default CSS (#6786)
- Fix margin before codeblock
- Add `monobackgroundcolor` variable, making the background color
  and padding of code optional.
- Ensure that backgrounds from highlighting styles take precedence over
  monobackgroundcolor
- Remove list markers from TOC
- Add margin-bottom where needed
- Remove italics from blockquote styling
- Change borders and spacing in tables to be more consistent with other
   output formats
- Style h5, h6
- Decrease root font-size to 18px
- Update tests for styles.html changes
- Add CSS example to MANUAL
2020-11-01 14:22:58 -08:00
John MacFarlane
6051c751ce Citeproc: use comma for in-text citations inside footnotes.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with:  `AUTHOR NAME + COMMA + SPACE + REST`.

Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.

This gives better results.  Note that normal citations are still
rendered in parentheses.
2020-11-01 10:48:47 -08:00
Albert Krewinkel
7af608b214
JATS templates: ensure jats_publishing output is valid 2020-10-31 15:20:30 +01:00
Andy Morris
f1f2728259 Fix duplicate "class" attribute in HTML writer 2020-10-30 16:38:59 +01:00
John MacFarlane
3e6d009c6b Use new citeproc; do note capitalization here, not in citeproc. 2020-10-29 21:53:02 -07:00
John MacFarlane
bd7c9eb32b LaTeX writer: Improved calculation of table column widths.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.

The default template has been modified to load the calc
package if tables are used.
2020-10-29 12:10:05 -07:00
John MacFarlane
517c55dae7 Use latest citeproc. Closes #6783. 2020-10-27 22:21:03 -07:00
Nils Carlson
dd3d920ba0
DocBook Reader: fix duplicate bibliography bug (#6773)
Also add unit test to ensure the behavior stays consistent.
2020-10-26 12:49:03 -07:00
John MacFarlane
efc6994c8a Commonmark writer: fix regression with fenced divs.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output.  This is a regression
due to our transition from cmark-gfm.  This commit fixes it.

Closes #6768.
2020-10-23 09:25:07 -07:00
Denis Maier
4bce33e899
ConTeXt template: adds \setupinterlinespace to fonts larger than normal (#6763) 2020-10-22 17:24:27 -07:00
John MacFarlane
b876793910 Use latest citeproc.
This fixes a problem with author-in-text citations for references
including both an author and an editor. Previously, both were
included in the text, but only the author should be.

Closes #6765. Added a test.
2020-10-21 23:14:17 -07:00
John MacFarlane
f9c6167ad1 citeproc - improved removal of final period...
...in citations inside notes in note-based styles.
These citations are put in parentheses, but the final
period must be removed.

See jgm/citeproc#20
2020-10-21 22:23:21 -07:00
John MacFarlane
3734a02054 Update tests for latex template changes. 2020-10-19 16:32:39 -07:00
John MacFarlane
eb3307da4e Fix handling of xdata in bibtex/biblatex bibliographies.
Closes #6752.
2020-10-15 17:41:45 -07:00
Albert Krewinkel
7f57546345
Fix remaining typos in tests
See: #6738
2020-10-14 22:39:29 +02:00
Albert Krewinkel
90af138443
Fix typos in comments, doc strings, error messages, and tests
Typos reported by
https://fossies.org/linux/test/pandoc-master.tar.gz/codespell.html

See: #6738
2020-10-14 22:26:51 +02:00
John MacFarlane
229e763646 Depend on latest citeproc.
This fixes the citation number issue with ieee.csl and other
styles that do not explicitly sort bibliographies. (Pandoc
was numbering them by their order in the bibliography file,
rather than the order cited, as required by the CSL spec.)

Closes #6741.
2020-10-13 14:52:09 -07:00
John MacFarlane
6fce81fb61 Use latest citeproc (better grouping/collapsing behavior with prefixes). 2020-10-13 11:06:02 -07:00
John MacFarlane
12ff835a8a Commonmark reader: add pipe_table extension after defaults.
Otherwise we get bad results for non-table, non-paragraph
lines containing pipe characters.

Closes #6739.

See also jgm/commonmark-hs#52.
2020-10-12 21:24:26 -07:00
John MacFarlane
2007cff203 Markdown writer: Fix autolinks rendering for gfm.
Previously, autolinks rendered as raw HTML, due to the
`class="uri"` added by pandoc's markdown reader.

Closes #6740.
2020-10-12 18:57:04 -07:00
John MacFarlane
0b5e2601f5 LaTeX reader: allow blank lines inside \author. 2020-10-10 16:28:52 -07:00
John MacFarlane
270b4bc2bc Update s5-fancy.html test with new mathjax URL. 2020-10-10 11:15:35 -07:00
John MacFarlane
9a6c42590f LaTeX reader: Fix parsing of "show name" in newtheorem.
Previously we were just treating it as a string and
ignoring  accents and formatting.  See #6734.
2020-10-08 22:47:32 -07:00
John MacFarlane
2d4214fa31 Extend fix to #6719 to JATS reader 2020-10-08 21:36:08 -07:00
John MacFarlane
f19286cf12 DocBook reader: don't squelch space at end of emphasis element.
Instead, include it after the emphasis.  Closes #6719.

Same fix was made for other inline elements, e.g. strikethrough.
2020-10-08 21:27:52 -07:00
John MacFarlane
0cfba4e36e Fixed some bibtex comments in tests (closing }). 2020-10-08 20:42:36 -07:00
John MacFarlane
2054bcbff6 Fix custom writer test.
The custom writer is now less aggressive about escaping `"`.
2020-10-08 12:32:42 -07:00
John MacFarlane
641849b70a Be less aggressive about using quotes for YAML values.
We need quotes if `[` or `{` or `'` is at the beginning of
the line, but not otherwise.
2020-10-08 10:54:53 -07:00
John MacFarlane
1be0f0fba8 Use double quotes for YAML metadata.
Closes #6727.
2020-10-07 23:05:51 -07:00
John MacFarlane
a520181cdb Use golden test framework for command tests.
This means that `--accept` can be used to update expected output.
2020-10-07 22:33:44 -07:00
John MacFarlane
1742821c3e Fix URL prefixes in citations also when they occur in notes.
Update chicago-fullnote-bibliography.csl and adjust tests.

Closes #6723.
2020-10-07 11:23:28 -07:00
John MacFarlane
fd3809c33f Unescape entities in writing CSL JSON.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.

Closes jgm/citeproc#17.
2020-10-06 22:29:25 -07:00
Diego Balseiro
eda5540719
DOCX reader: Allow empty dates in comments and tracked changes (#6726)
For security reasons, some legal firms delete the date from comments and
tracked changes.

* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
2020-10-06 21:03:00 -07:00
John MacFarlane
7d54e79091 Use latest citeproc.
Update chicago-fullnute-bibliography test, which is now correct.
2020-10-03 16:07:55 -07:00
Michael Hoffmann
74bd5a4f47
Docx writer: better handle list items whose contents are lists (#6522)
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.

Closes #5948.
2020-10-02 09:30:05 -07:00
John MacFarlane
27b4c21f72 Update to lastest citeproc 2020-10-01 22:07:55 -07:00
Nils Carlson
ae4dcc0d4a
OpenDocument Writer: Implement table cell alignment (#6700)
Co-authored-by: Mauro Bieg <mb21@users.noreply.github.com>
2020-09-27 11:21:53 -07:00
John MacFarlane
a822067903 Fix short-title.
We were getting null short-titles generated, and that
was creating wrong citations in some cases.

Close #6702.
2020-09-26 14:28:28 -07:00
John MacFarlane
188c444990 RST reader: apply .. class:: directly to following Header.
rather than creating a surrounding Div.

Closes #6699.
2020-09-25 09:06:15 -07:00
Nils Carlson
1ad7a047d5
DocBook reader: Implement table cell alignment (#6698) 2020-09-24 17:43:43 -07:00
John MacFarlane
810ea6fdf8 Citeproc: Insert space after csl-left-margin span contents...
if they come before csl-right-inline.  This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
2020-09-24 09:57:55 -07:00
Nils Carlson
4f13c0e25e
OpenDocument writer: New table cell support with row and column spans (#6682)
Unit tests only verify column spans at this point.

Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
2020-09-24 09:31:47 -07:00
John MacFarlane
e0984a43a9 Add built-in citation support using new citeproc library.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.

* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
  modules under Text.Pandoc.Citeproc).  Exports `processCitations`.
  [API change]
* Add data files needed for Text.Pandoc.Citeproc:  default.csl
  in the data directory, and a citeproc directory that is just
  used at compile-time.  Note that we've added file-embed as a mandatory
  rather than a conditional depedency, because of the biblatex
  localization files. We might eventually want to use readDataFile
  for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
  in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
  some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
  release binary build instructions.  We will no longer distribute
  pandoc-citeproc.
* Markdown reader: tweak abbreviation support.  Don't insert a
  nonbreaking space after a potential abbreviation if it comes right before
  a note or citation.  This messes up several things, including citeproc's
  moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
  to convert between `csljson` and other bibliography formats,
  and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
  change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
  change]
* Added `bibtex`, `biblatex` as input formats.  This allows pandoc
  to convert between BibLaTeX and BibTeX and other bibliography formats,
  and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
  `readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
  This is needed because pandoc readers for bibliography formats put
  the bibliographic information in the `references` field of metadata;
  and unless standalone is specified, metadata gets ignored.
  (TODO: This needs improvement. We should trigger standalone for the
  reader when the input format is bibliographic, and for the writer
  when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`.  This was just
  ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
  [API change] This runs the processCitations transformation.
  We need to treat it like a filter so it can be placed
  in the sequence of filter runs (after some, before others).
  In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
  so this special filter may be specified either way in a defaults file
  (or by `citeproc: true`, though this gives no control of positioning
  relative to other filters).  TODO: we need to add something to the
  manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
  This behaves like a filter and will be positioned
  relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
  section which also includes some information formerly found in
  the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
  directory.  This changes the old pandoc-citeproc behavior, which looked
  in `~/.csl`.  Users can simply symlink `~/.csl` to the `csl`
  subdirectory of their pandoc user data directory if they want
  the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
  Ms writers.  Added CSL-related CSS to styles.html.
2020-09-21 10:15:50 -07:00
John MacFarlane
a59ae96062 Markdown reader: Set citationNoteNum accurately in citations.
This also changes stateLastNoteNumber -> stateNoteNumber.
2020-09-21 10:10:37 -07:00
Albert Krewinkel
acbea6b8c6
Lua filters: add SimpleTable for backwards compatibility (#6575)
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.

Old filters using tables now require minimal changes and can use,
e.g.,

    if PANDOC_VERSION > {2,10,1} then
      pandoc.Table = pandoc.SimpleTable
    end

and

    function Table (tbl)
      tbl = pandoc.utils.to_simple_table(tbl)
      …
      return pandoc.utils.from_simple_table(tbl)
    end

to work with the current pandoc version.
2020-09-20 15:48:31 -07:00
argent0
ba9bedef23
Asciidoctor images (#6671)
Support `Asciidoctor`'s block figures.

Closes #6538.
2020-09-19 18:22:52 -07:00
Mauro Bieg
caa225ad82
Add CSS to default HTML template (#6601) 2020-09-19 16:13:50 -07:00
John MacFarlane
a26ec96d89 LaTeX writer: fix spacing issue with list in definition list.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.

Fix that by starting such lists with an `\item[]`.
2020-09-15 17:59:03 -07:00
Christian Despres
a2d343420f
LaTeX reader: fix improper empty cell filtering (#6689) 2020-09-15 13:36:11 -07:00
Albert Krewinkel
34151e8da8
HTML writer: support intermediate table headers
Closes: #6314
2020-09-13 23:23:11 +02:00
Albert Krewinkel
8711640512
HTML writer: support attributes on all table elements
Add attributes to tbody and tr elements.
2020-09-13 20:26:06 +02:00
Christian Despres
cae155b095
Fix hlint suggestions, update hlint.yaml (#6680)
* Fix hlint suggestions, update hlint.yaml

Most suggestions were redundant brackets. Some required
LambdaCase.

The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
2020-09-13 07:48:14 -07:00
Albert Krewinkel
a400d0dc62
HTML writer: render table footers if present
Part of: #6314
2020-09-12 21:49:01 +02:00
Christian Despres
22babd5382
[API change] Rename Writers.Tables and its contents (#6679)
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
2020-09-12 08:50:36 -07:00
Leonard Rosenthol
55e5ad2d8f
Changed default link state to invisible (#6676) 2020-09-10 22:58:53 -07:00
John MacFarlane
623ce89e0e Improved uncertainty handling in slunitx. 2020-09-10 14:48:35 -07:00
John MacFarlane
a03160fb0d LaTeX reader: support parenthesized uncertainties in siunitx. 2020-09-10 13:07:31 -07:00
Albert Krewinkel
9423b4b7d9
Support colspans and rowspans in HTML tables (#6644)
* HTML writer: add support for row headers, colspans, rowspans
* Add planet table tests

See #6312
2020-09-10 09:47:40 -07:00
Leonard Rosenthol
ef4f514359
Implement support for internal document links in ICML (#6606)
Closes #5541.
2020-09-10 09:40:35 -07:00
Nils Carlson
96a0f3c7af
docbook reader: Implement column span support for tables (#6492)
Implement column span support for tables in the DocBook reader.

Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
2020-09-10 09:11:52 -07:00
Christian Despres
10c6c411f9
Add Writers.Tables helper functions and types, add tests for those (#6655)
Add Writers.Tables helper functions and types, add tests for those

The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.

The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.

Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
2020-09-05 14:36:51 -07:00
John MacFarlane
529eb696dc LaTeX reader: Support squared, cubed, tothe in siunitx.
Closes #6657.
2020-09-02 11:06:26 -07:00
Albert Krewinkel
3c07b1d9b6
Fix tests for skylighting 0.10 2020-08-31 21:22:03 +02:00
John MacFarlane
93e3d463fd Docx writer: separate adjacent tables.
Word combines adjacent tables, so to prevent this we insert
an empty paragraph between two adjacent tables.

Closes #4315.
2020-08-24 09:31:39 -07:00
Laurent P. René de Cotret
482a2e5079
[Latex Reader] Fixing issues with \multirow and \multicolumn table cells (#6608)
* Added test to replicate (#6596)

* Table cell reader not consuming spaces correctly (#6596)

* Prevented wrong nesting of \multicolumn and \multirow table cells (#6603)

* Parse empty table cells (#6603)

* Support full prototype for multirow macro (#6603)

Closes #6603
2020-08-15 11:40:10 -07:00
Emerson Harkin
6cfb31bbe2
Change SIRange to SIrange (#6617) 2020-08-14 11:30:17 -07:00
Laurent P. René de Cotret
499fc11fca
[Latex Reader] Table cell parser not consuming spaces correctly (#6597)
* Added test to replicate (#6596)

* Table cell reader not consuming spaces correctly (#6596)
2020-08-07 22:45:47 -07:00
Albert Krewinkel
6b08a37bbd
Org writer: don't force blank line after headers
Closes: #6554
2020-07-31 16:04:18 +02:00
Emerson Harkin
1b8f161198
Minimal support for SIRange in LaTeX reader (#6418)
Add support for `\SIRange{firstnumber}{secondnumber}{unit}` provided by siunitx.

An en-dash is used instead of localized "to".
2020-07-23 16:47:32 -07:00
Laurent P. René de Cotret
8c3b5dd3ae
Col-span and row-span in LaTeX reader (#6470)
Add multirow and multicolumn support in LaTex reader.
Partially addresses #6311.
2020-07-23 11:23:21 -07:00
John MacFarlane
a0e3172a0b Further improvements to ams theorem support, and a test.
See #1608.
2020-07-23 11:11:28 -07:00
John MacFarlane
1e84178431 Docx writer: support --number-sections.
Closes #1413.
2020-07-22 11:53:31 -07:00
John MacFarlane
942e3ee1f9 RST reader: fix csv tables with multiline cells.
Closes #6549.
2020-07-21 10:20:15 -07:00
John MacFarlane
d6b7b1dc77 Remove use of cmark-gfm for commonmark/gfm rendering.
Instead rely on the markdown writer with appropriate extensions.

Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
2020-07-19 22:51:59 -07:00
John MacFarlane
8ede05161f
Merge pull request #6495 from tarleb/html5-figure-accessiblity
HTML writer: improve alt-text/caption handling for HTML5
2020-07-19 11:24:54 -07:00
Albert Krewinkel
36fede2b02
Jira writer: keep image caption as alt attribute
Fixes #6529
2020-07-17 16:02:40 +02:00
John MacFarlane
06d834caaa Use selnolig to selectively suppress ligatures with lualatex.
Closes #6534
2020-07-15 13:28:44 -07:00
John MacFarlane
c3b170be1c
Merge pull request #6513 from brisad/master
Escape starting periods in ms writer code blocks
2020-07-12 17:02:06 -07:00
Michael Hoffmann
09ea10e2b1 Escape starting periods in ms writer code blocks
If a line of ms code block output starts with a period (.), it should
be prepended by '\&' so that it is not interpreted as a roff command.

Fixes #6505
2020-07-08 23:52:28 +02:00
Nikolay Yakimov
48cef91d18 [Docx Reader] Refactor/update smushInlines 2020-07-07 09:04:38 +03:00
John MacFarlane
804e8eeed2 Revert "Ipnyb: allow lossless round-tripping of markdown cell content."
This reverts commit efbc205031.
2020-07-02 09:03:56 -07:00
Albert Krewinkel
b894de6426
HTML writer: improve alt-text/caption handling for HTML5
Screen readers read an image's `alt` attribute and the figure caption,
both of which come from the same source in pandoc. The figure caption is
hidden from screen readers with the `aria-hidden` attribute. This
improves accessibility.

For HTML4, where `aria-hidden` is not allowed, pandoc still uses an
empty `alt` attribute to avoid duplicate contents.

Closes: #6491
2020-07-01 14:54:52 +02:00
Albert Krewinkel
ccf9889c2c
Org reader: respect tables-excluding export setting
Tables can be removed from the final document with the `#+OPTION:
|:nil` export setting.
2020-07-01 09:28:24 +02:00
Albert Krewinkel
d6711bd7d9
Org reader: respect export setting disabling footnotes
Footnotes can be removed from the final document with the `#+OPTION:
f:nil` export setting.
2020-06-30 22:30:15 +02:00
John MacFarlane
efbc205031 Ipnyb: allow lossless round-tripping of markdown cell content.
The reader now parses the contents of the markdown cell to a Pandoc
structure, but *also* stores the raw markdown in a `source`
attribute on the cell Div.  When we convert back to markdown,
this attribute is stripped off and the original source is used.
When we convert to other formats, the attribute is usually
ignored (though it will come through in HTML as a `data-source`
attribute, not unhelpfully).

I'll note some potential drawbacks of this approach:

- It makes it impossible to use pandoc to clean up or
  change the contents of markdown cells, e.g.
  going from `+smart` to `-smart`.

- There may be formats where the addition of the `source`
  attribute is problematic.  I can't think of any, though.

Closes #5408.
2020-06-30 12:32:44 -07:00
Albert Krewinkel
7c207c3051
Org reader: respect export setting which disables entities
MathML-like entities, e.g., `\alpha`, can be disabled with the
`#+OPTION: e:nil` export setting.
2020-06-30 11:39:32 +02:00
Albert Krewinkel
5ef315cc6d
Org reader: keep unknown keyword lines as raw org
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer
read as metadata, but kept as raw `org` blocks. This ensures that more
information is retained when round-tripping org-mode files;
additionally, this change makes it possible to support non-standard org
extensions via filters.
2020-06-29 21:19:34 +02:00
Albert Krewinkel
90ac70c79c
Org reader: unify keyword handling
Handling of export settings and other keywords (like `#+LINK`) has been
combined and unified.
2020-06-29 20:53:25 +02:00
Albert Krewinkel
1480606174
Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settings
These export settings are treated like their non-extra counterparts,
i.e., the values are added to the `header-includes` metadata list.
2020-06-29 17:04:29 +02:00
Albert Krewinkel
d17b257c89
Org reader: allow multiple #+SUBTITLE export settings
The values of all lines are read as inlines and collected in the
`subtitle` metadata field.
2020-06-29 17:03:33 +02:00
Albert Krewinkel
19175af811
JATS reader: parse abstract element into metadata field of same name (#6482)
Closes: #6480
2020-06-28 10:35:50 -07:00
Albert Krewinkel
d2d5eb8a99
Org reader: read #+INSTITUTE values as text with markup
The value is stored in the `institute` metadata field and used in the
default beamer presentation template.
2020-06-28 19:25:57 +02:00
Albert Krewinkel
b7a8620b43
Org tests: group export settings test for Org reader 2020-06-28 19:25:57 +02:00
Albert Krewinkel
e3a6d651e1
Org reader: update behavior of author, keywords export settings
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has
changed: Org now allows multiple such lines and adds a space between the
contents of each line. Pandoc now always parses these settings as meta
inlines; setting values are no longer treated as comma-separated lists.
Note that a Lua filter can be used to restore the previous behavior.
2020-06-28 18:01:30 +02:00
Albert Krewinkel
8dce28d949
Org reader: read description lines as inlines
`#+DESCRIPTION` lines are treated as text with markup. If multiple such
lines are given, then all lines are read and separated by soft
linebreaks.

Closes: #6485
2020-06-27 09:11:00 +02:00
Albert Krewinkel
9e6e9a7221
Org reader: honor tex export option
The `tex` export option can be set with `#+OPTION: tex:nil` and allows
three settings:

 - `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX,
 - `nil` removes all LaTeX fragments from the document, and
 - `verbatim` treats LaTeX as text.

The default is `t`.

Closes: #4070
2020-06-25 20:31:33 +02:00
John MacFarlane
9b7282bb0f LaTeX reader: Retain the Div around tables with attributes.
We'll need this to store table attributes until all writers
are adjusted to react to attributes on the Table element.
2020-06-23 11:12:40 -07:00
John MacFarlane
90b2c5a5e4 Add test for #6481. 2020-06-23 08:27:19 -07:00
John MacFarlane
7f8105159c Handle native Underline in Powerpoint writer.
(Instead of old Span with underline class.
Spans with `underline` will no longer be rendered
as underlined text.)
2020-06-22 17:56:28 -07:00
John MacFarlane
b1561d8e47 Use native Underline instead of Span in Jira 2020-06-22 17:55:57 -07:00
Albert Krewinkel
064303e2c9
Jira writer: always escape braces
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.

Package jira-wiki-markup must be version 1.3.2 or later.

Fixes: #6478
2020-06-22 16:30:11 +02:00
Albert Krewinkel
f5d7d41cbd
Recognize images with uppercase extensions
Fixes: #6472
2020-06-20 18:14:18 +02:00
Mathieu Boespflug
bbf04df900
Docbook reader: implement <procedure> (#6442)
A `<procedure>` contains a sequence of `<step>`'s, or `<substeps>`
that themselves contain `<step>`'s.
2020-06-14 10:45:52 -07:00
Mathieu Boespflug
12a35dd0d0
Docbook: map <simplesect> to unnumbered section (#6436)
A <simplesect> is a section like any other, except that it never
contains an subsection, and is typically rendered unnumbered.
2020-06-14 10:40:00 -07:00
John MacFarlane
f6dfacf9d6 Add "summary" to list of block-level HTML tags.
Closes #6385.  (The summary element needs to be the first
child of details and should not be enclosed by p tags.)

NOTE:  you need to include a blank line before the closing
`</details>`, if you want the last part of the content to
be parsed as a paragraph.
2020-05-20 07:45:14 -07:00
Lila
c04800305e
Propagate (DY)LD_LIBRARY_PATH in tests (#6376) 2020-05-18 22:46:14 -07:00
Lila
f4185fcef0
Use CSS in favor of <br> for display math (#6372)
Some CSS to ensure that display math is
displayed centered and on a new line is now included
in the default HTML-based templates; this may be
overridden if the user wants a different behavior.
2020-05-18 22:45:44 -07:00
John MacFarlane
5a20cc07dd Docx writer: enable column and row bands for tables.
This change will not have any effect with the default style.
However, it enables users to use a style (via a reference.docx)
that turns on row and/or column bands.

Closes #6371.
2020-05-16 15:50:59 -07:00
John MacFarlane
be9e93d4ae LaTeX writer: create hypertarget for links with identifier.
Closes #6360.
2020-05-12 14:37:07 -07:00
John MacFarlane
46179d5b3e Use latest skylighting.
This adds `aria-hidden="true"` to the empty a elements, which
helps people who use screen readers.
2020-05-12 14:37:07 -07:00
Albert Krewinkel
9c76c52e9b
Lua: fix regression in package searcher
This caused `require 'module'` to fail for third party packages.

Fixes: #6361
2020-05-12 17:10:30 +02:00
andrebauer
97fe2ea16c
LaTeX Writer: Add support for customizable alignment of columns in beamer (#6331)
Add support for customizable alignment of columns in beamer.
Closes #4805, closes #4150.
2020-05-02 17:08:16 -07:00
Vaibhav Sagar
9c2b659eeb
Support new Underline element in readers and writers (#6277)
Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.
2020-04-28 07:53:06 -07:00
John MacFarlane
f268ae3035 RST writer: properly handle images with same alt text.
Previously we created duplicate references for these
in rendering RST.  Closes #6194.
2020-04-24 16:54:52 -07:00
John MacFarlane
6baacb51bb AsciiDoc writer: add blank line after Div.
Closes #6308.
2020-04-22 23:04:43 -07:00
Joe Hermaszewski
fd5994cc5e Haddock Writer: Support Haddock tables
See this PR on Haddock for details on the table format:
https://github.com/haskell/haddock/pull/718
2020-04-20 13:57:36 +08:00
John MacFarlane
aff2500d46 More fixes for round-trip tests of HTML reader.
We exclude tables that have default widths but non-simple
content, as these can't really round-trip.
2020-04-19 17:21:19 -07:00
John MacFarlane
573214a06a Fixed round-trip HTML tests.
Exclude tables with cells with line breaks because they don't
currently round-trip.  (Table goes from being simple to having
explicit widths.)
2020-04-18 20:57:28 -07:00
John MacFarlane
9a809d4d01 Markdown writer: avoid unnecessary escapes before intraword _
when `intraword_underscores` extension is enabled.
Closes #6296.
2020-04-17 22:42:21 -07:00
John MacFarlane
0d2b8e3fe1
Merge pull request #6211 from tarleb/lua-pandocerror
API change: create PandocLua type, use PandocError for exceptions
2020-04-17 18:02:25 -07:00
John MacFarlane
8f40b4ba14 LaTeX reader: don't put surrounding Div around Table.
This reverts a change in the last release; the Div is
no longer needed, because we can now put the id right in
the Table's attributes.  However, writers may still need
to be modified to do something with the id in a Table
(e.g. create an anchor), so in the short term we may lose
the ability to link to tables in some writers.
2020-04-17 13:04:15 -07:00
Albert Krewinkel
fb54f3d679
API change: use PandocError for exceptions in Lua subsystem
The PandocError type is used throughout the Lua subsystem, all Lua
functions throw an exception of this type if an error occurs. The
`LuaException` type is removed and no longer exported from
`Text.Pandoc.Lua`. In its place, a new constructor `PandocLuaError` is
added to PandocError.
2020-04-17 21:52:48 +02:00
despresc
2fc11f3b1e Modify toLegacyTable to cut up cells, add tests
Now a cell with dimension (h, w) will be cut up into h*w cells of
dimension (1,1), all in the same grid position, with the upper-left
holding the original cell contents and the rest being empty.
2020-04-15 23:03:22 -04:00
despresc
c7814f31e1 Use the new builders, modify readers to preserve empty headers
The Builder.simpleTable now only adds a row to the TableHead when the
given header row is not null. This uncovered an inconsistency in the
readers: some would unconditionally emit a header filled with empty
cells, even if the header was not present. Now every reader has the
conditional behaviour. Only the XWiki writer depended on the header
row being always present; it now pads its head as necessary.
2020-04-15 23:03:22 -04:00
despresc
d368536a4e Adapt to the removal of the RowSpan, ColSpan, RowHeadColumns accessors 2020-04-15 23:03:22 -04:00
despresc
4e34d366df Adapt to the newest Table type, fix some previous adaptation issues
- Writers.Native is now adapted to the new Table type.

- Inline captions should now be conditionally wrapped in a Plain, not
  a Para block.

- The toLegacyTable function now lives in Writers.Shared.
2020-04-15 23:03:22 -04:00
despresc
7254a2ae0b Implement the new Table type 2020-04-15 23:03:22 -04:00
Nikolay Yakimov
83c1ce1d77
Markdown Reader: Fix inline code in lists (#6284)
Closes #6284.

Previously inline code containing list markers was sometimes parsed incorrectly.
2020-04-15 16:20:01 -07:00
John MacFarlane
71c4857464 JATS reader: handle "label" element in section title.
Closes #6288.
2020-04-15 09:23:04 -07:00
John MacFarlane
9187b4bca9 LaTeX writer: ensure that -M csquotes works even in fragment mode.
Closes #6265.
2020-04-11 10:40:59 -07:00
Tristan de Cacqueray
dd06d63540
HTML reader: support <bdo> (#6271)
See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo

Closes #5794
2020-04-11 09:57:59 -07:00
Albert Krewinkel
c09a3448d1
Jira reader: improve icon conversion
Icons are now converted as follows: `(/)` to ✔, `(x)` to , `(!)` to
, `(+)` to , `(-)` to , `(off)` to 🌙, and `(*)` to ☆. The new
icons render well in most fonts. Furthermore, the UTF-8 characters all
fit into 4-bytes.

Closes: #6264
2020-04-09 16:21:45 +02:00
John MacFarlane
11df2a3c0f LaTeX reader: better handling of \lettrine.
- SmallCaps instead of Span for the part after the initial capital.
- Ensure that both arguments are parsed, so that in Markdown both
  are treated as raw LateX. (Closes #6258.)
2020-04-07 09:25:52 -07:00
Vlad Hanciuta
8dbd4938f2
Vimwiki reader: Add nested syntax highlighting (#6257)
Nested syntaxes are specified like this:
{{{sql
SELECT * FROM table
}}}

The preformatted code block parser has been extended to check if the
first attribute of the block is not a `key=value` pair, and in that case
it will be considered as a class.

Closes #6256.
2020-04-06 16:41:28 -07:00
Albert Krewinkel
663a5a9b7f
test/writer.jira: fix links, skip alias if it equals the target 2020-04-04 15:03:13 +02:00
Albert Krewinkel
c3f539364a
Jira: support citations, attachment links, and user links
Closes: #6231
Closes: #6238
Closes: #6239
2020-04-04 14:27:27 +02:00
Albert Krewinkel
d867cac8ca
Jira reader: resolve parsing issues of blockquote, color
Parsing problems occurring with block quotes and colored text have been
resolved.

Fixes: #6233
Fixes: #6235
2020-04-03 13:25:52 +02:00
John MacFarlane
92e0801daa Add test fixes for docbook writer changes. 2020-04-01 23:09:14 -07:00
Albert Krewinkel
7df0710094
Jira reader: use span with class underline for inserted text
Jira text which is marked as `+inserted+` is converted into pandoc's
default representation for underlined text: a span with class
`underline`. Previously, the span was marked with the non-standard class
`inserted`.

Closes: #6237
2020-03-31 10:04:55 +02:00
Albert Krewinkel
ff9be6b384
Jira writer: convert spans with class underline to inserted text
Spans with class `underline` as converted into Jira text marked as
`+inserted+`, i.e. surrounded by plus-signs.
2020-03-31 09:57:59 +02:00
Albert Krewinkel
9a42bec7fc
Jira writer tests: update image in test/writer.jira 2020-03-31 08:18:41 +02:00
Albert Krewinkel
69a3fa5708
Jira reader: retain image attributes
Jira images attributes as in `!image.jpg|align=right!` are retained as
key-value pairs. Thumbnail images, such as `!example.gif|thumbnail!`,
are marked by a `thumbnail` class in their attributes.

Related to #6234.
2020-03-30 22:03:52 +02:00
Joseph C. Sible
7233a7a932
More cleanup (#6209)
* Simplify by collapsing a do block into a single <$>
* Remove an unnecessary variable: `all` takes any Foldable, so only blocksToInlines needs toList.
2020-03-28 22:48:47 -07:00
Albert Krewinkel
44f8c2725e
Jira reader: fix parsing of tables without preceding blankline
A bug was fixed which caused faulty parsing if a table was not preceded
by a newline and the first table cell had no space after the initial `|`
characters.

Fixes: #6198
2020-03-19 21:27:35 +01:00
Albert Krewinkel
81d46435f6
Jira reader: fix parsing of strikeout, emphasis
A bug was fixed which caused non-emphasized text containing digits and/or
non-special symbols (like dots) to sometimes be parsed incorrectly.

Fixes: #6196
2020-03-18 21:32:05 +01:00
Albert Krewinkel
11b5f1e40b
Update copyright year (#6186)
* Update copyright year

* Copyright: add notes for Lua and Jira modules
2020-03-13 09:52:47 -07:00
Albert Krewinkel
7eb9914841
Jira reader: support colored inline text, indented lists
* Support for colored inlines has been added.
* Lists are now allowed to be indented; i.e., lists are still recognized
  if list markers are preceded by spaces.

Closes: #6183, #6184
2020-03-13 09:52:28 +01:00
John MacFarlane
87875763c8 Ms writer: fix definition lists so indent even when...
paragraph indent is set to 0 (as is the default).

Also ensure indent for display math that falls back
to TeX.
2020-03-07 21:54:29 -08:00
John MacFarlane
2ab83a56e6 Ms writer: use .QS/.QE instead of .RS/.RE for block quotes. 2020-03-06 09:45:46 -08:00
John MacFarlane
aaf296508a Fix man reader test for previous change. 2020-03-05 19:34:17 -08:00
John MacFarlane
0edc084c50 Revert "Allow specifying string value in metadata using !!literal tag."
This reverts commit 3493d6afaa.

This might be worth considering in the future, but let's not do
it yet...the additional complexity needs a better justification.
2020-02-17 15:58:21 -08:00
John MacFarlane
3493d6afaa Allow specifying string value in metadata using !!literal tag.
This is experimental.  Normally metadata values are interpreted
as markdown, but if the !!literal tag is used they will be interpreted
as plain strings.

We need to consider whether this can still be implemented if
we switch back from HsYAML to yaml for performance reasons.
2020-02-17 09:53:36 -08:00
Ethan Riley
daf770c1e9
Fixes: group biblatex citations even with prefix and suffix (#6058)
Closes #5849.  Previously biblatex citations were only grouped if
there was no prefix.  This patch allows them to be grouped in
subgroups split by prefixes and suffixes, which allows better citation
sorting.
2020-02-14 08:44:40 -08:00
Lucas Escot
29c2670da2
Add highlight directive to the rST reader (#6140) 2020-02-13 10:27:34 -08:00
Albert Krewinkel
f5ea5f0aad
Introduce new format variants for JATS (#6067)
New formats:

- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."

The "jats" output format is now an alias for "jats_archiving".

Closes: #6014
2020-02-12 20:36:02 -08:00
John MacFarlane
3a79f37d88 LaTeX reader: improve caption and label parsing.
- Don't emit empty Span elements for labels.
- Put tables with labels in a surrounding Div.
2020-02-12 17:43:55 -08:00
John MacFarlane
3fbee8c6ed LaTeX reader: resolve \ref to table numbers.
Closes #6137.
2020-02-11 22:28:06 -08:00
John MacFarlane
114d77c2ab Fix spurious dots in markdown_mmd metadata output
Closes #6133 (regression).
2020-02-10 09:00:21 -08:00
John MacFarlane
5f0bd52221 reveal.js: ensure that pauses work even in title slides.
Closes #5819.
2020-02-08 09:38:07 -08:00
Joseph C. Sible
12c75701be
Use <$> instead of >>= and return (#6128) 2020-02-08 09:12:01 -08:00
John MacFarlane
4c3db9273f Apply linter suggestions. Add fix_spacing to lint target in Makefile. 2020-02-07 09:08:22 -08:00
Joseph C. Sible
a5a3ac9946
Various minor cleanups and refactoring (#6117)
* Use concatMap instead of reimplementing it

* Replace an unnecessary multi-way if with a regular if

* Use sortOn instead of sortBy and comparing

* Use guards instead of lots of indents for if and else

* Remove redundant do blocks

* Extract common functions from both branches of maybe

Whenever both the Nothing and the Just branch of maybe do the same
function, do that function on the result of maybe instead.

* Use fmap instead of reimplementing it from maybe

* Use negative forms instead of negating the positive forms

* Use mapMaybe instead of mapping and then using catMaybes

* Use zipWith instead of mapping over the result of zip

* Use unwords instead of reimplementing it

* Use <$ instead of <$> and const

* Replace case of Bool with if and else

* Use find instead of listToMaybe and filter

* Use zipWithM instead of mapM and zip

* Inline lambda wrappers into the real functions

* We get zipWithM from Text.Pandoc.Writers.Shared

* Use maybe instead of fromMaybe and fmap

I'm not sure how this one slipped past me.

* Increase a bit of indentation
2020-02-07 08:38:24 +01:00
John MacFarlane
0a4f49d370 MediaWiki writer: prevent triple [[[.
This confuses mediawiki's parser.  So we insert a `<nowiki/>`
no-op between a literal `[` and a link.  Closes #6119.
2020-02-05 10:08:18 -08:00