Commit graph

289 commits

Author SHA1 Message Date
Nikolay Yakimov
c113ca6717 [Docx Reader] Use style names, not ids, for assigning semantic meaning
Motivating issues: #5523, #5052, #5074

Style name comparisons are case-insensitive, since those are
case-insensitive in Word.

w:styleId will be used as style name if w:name is missing (this should
only happen for malformed docx and is kept as a fallback to avoid
failing altogether on malformed documents)

Block quote detection code moved from Docx.Parser to Readers.Docx

Code styles, i.e. "Source Code" and "Verbatim Char" now honor style
inheritance

Docx Reader now honours "Compact" style (used in Pandoc-generated docx).
The side-effect is that "Compact" style no longer shows up in
docx+styles output. Styles inherited from "Compact" will still
show up.

Removed obsolete list-item style from divsToKeep. That didn't
really do anything for a while now.

Add newtypes to differentiate between style names, ids, and
different style types (that is, paragraph and character styles)

Since docx style names can have spaces in them, and pandoc-markdown
classes can't, anywhere when style name is used as a class name,
spaces are replaced with ASCII dashes `-`.

Get rid of extraneous intermediate types, carrying styleId information.
Instead, styleId is saved with other style data.

Use RunStyle for inline style definitions only (lacking styleId and styleName);
for Character Styles use CharStyle type (which is basicaly RunStyle with styleId
and StyleName bolted onto it).
2019-09-21 11:18:15 -07:00
John MacFarlane
45b7636307 Revert "FB2 reader test: better diagnostics on failure."
This reverts commit c65af7d1a2.
2019-09-15 10:27:19 -07:00
John MacFarlane
c65af7d1a2 FB2 reader test: better diagnostics on failure. 2019-09-15 09:06:38 -07:00
John MacFarlane
88a0327579 FB2 reader test: Another attempt to fix test failure on GitHub CI. 2019-09-14 10:37:19 -07:00
John MacFarlane
7ecae69e27 Revert "FB2 reader test: filter CRs."
This reverts commit e35147d715.
2019-09-13 22:08:42 -07:00
John MacFarlane
e35147d715 FB2 reader test: filter CRs.
This may help with the test failure on GitHub CI.

b59e6d0376/checks
2019-09-13 16:50:00 -07:00
John MacFarlane
e4cca4cf67 Roff readers: better parsing of groups.
We now allow groups where the closing `\\}` isn't at the
beginning of a line.

Closes #5410.
2019-09-04 09:24:42 -07:00
John MacFarlane
b35fae6511 Use doctemplates 0.3, change type of writerTemplate.
* Require recent doctemplates.  It is more flexible and
  supports partials.
* Changed type of writerTemplate to Maybe Template instead
  of Maybe String.
* Remove code from the LaTeX, Docbook, and JATS writers that looked in
  the template for strings to determine whether it is a book or an
  article, or whether csquotes is used. This was always kludgy and
  unreliable.  To use csquotes for LaTeX, set `csquotes` in your
  variables or metadata. It is no longer sufficient to put
  `\usepackage{csquotes}` in your template or header includes.
  To specify a book style, use the `documentclass` variable or
  `--top-level-division`.
* Change template code to use new API for doctemplates.
2019-07-28 19:25:45 -07:00
Albert Krewinkel
63c65c89da
Org reader: accept ATTR_LATEX in block attributes
Attributes for LaTeX output are accepted as valid block attributes;
however, their values are ignored.

Fixes: #5648
2019-07-22 08:12:22 +02:00
Alexander Krotov
0713cb65bc Muse: add RTL support
Closes #5551
2019-07-14 18:22:52 +03:00
John MacFarlane
7bc9eab846
Merge pull request #5589 from blmage/fix-3992
Add support for EPUB2 covers (fix #3992)
2019-07-13 16:48:09 -07:00
martinfrancois
4b73544087 add tests for EPUB2 and EPUB3 cover reader 2019-06-22 22:07:26 +02:00
blmage
449c133406 Add a test for MathML formulas in ODT documents 2019-06-20 21:55:31 +02:00
Alexander Krotov
814c3af4df Muse reader: test that links inside image descriptions work 2019-06-09 14:08:37 +03:00
Alexander Krotov
f807f5b383 Muse reader: allow images inside link descriptions 2019-05-25 19:17:16 +03:00
Albert Krewinkel
8b00bc6029
Org reader: fix planning elements in headers level 3 and higher
Planning info is now always placed before the subtree contents.
Previously, the planning info was placed after the content if the
header's subtree was converted to a list, which happens with headers of
level 3 and higher per default.

Fixes: #5494
2019-05-13 22:55:13 +02:00
Albert Krewinkel
00ef03827e
Org reader: omit, but warn about unknown export options
Unknown export options are properly ignored and omitted from the output.
2019-05-13 22:25:04 +02:00
Alexander Krotov
5c7ad59ffe FB2 reader: add notes parsing test 2019-05-11 12:10:20 +00:00
Albert Krewinkel
33e2d46dbe
Org reader: prefer plain symbols over math symbols
Symbols like `\alpha` are output plain and unemphasized, not as math.

Fixes: #5483
2019-05-05 14:48:37 +02:00
Albert Krewinkel
7e7bc3493e
Org reader: recognize emphasis after TODO/DONE keyword
Fixes: #5484
2019-05-05 13:53:11 +02:00
John MacFarlane
052684712b HTML reader: read data-foo attribute into foo.
The HTML writer adds the `data-` prefix for HTML5
for nonstandard attributes.  But the attributes are
represented in the AST without the `data-` prefix,
so we should strip this when reading HTML.

Closes #5392.
2019-03-25 08:43:59 -07:00
Jesse Rosenthal
9a1a3fe482 Docx reader: add tests for trimming last inline. 2019-02-18 15:49:00 -05:00
Alexander Krotov
c4814ea965 Muse reader: add secondary note support 2019-02-18 15:21:32 +03:00
Jesse Rosenthal
332e2ba5b6 Docx reader: Add test for reading sdts in footnotes. 2019-02-12 17:26:37 -05:00
Jesse Rosenthal
1847bdbb83 Docx reader: Tests for alternate document.xml 2019-02-06 21:14:46 -05:00
Alexander Krotov
59fa4eb17e Muse reader: test that block level markup does not break <verbatim> 2019-02-06 02:25:24 +03:00
Albert Krewinkel
37a82b0b11 Add missing copyright notices and remove license boilerplate (#5112)
Quite a few modules were missing copyright notices.

This commit adds copyright notices everywhere via haddock module
headers.  The old license boilerplate comment is redundant with this and has
been removed.

Update copyright years to 2019.

Closes #4592.
2019-02-04 13:52:31 -08:00
Brian Leung
35971495ab RST reader: change treatment of number-lines directives. (#5207)
Directives of this type without numeric inputs should not have a
`startFrom` attribute; with a blank value, the writers can produce
extra whitespace.
2019-01-09 22:19:26 -08:00
John MacFarlane
8673eb079b Removed superfluous sourceCode class on code blocks.
* These were added by the RST reader and, for literate Haskell,
  by the Markdown and LaTeX readers.  There is no point to
  this class, and it is not applied consistently by all readers.
  See #5047.

* Reverse order of `literate` and `haskell` classes on code blocks
  when parsing literate Haskell. Better if `haskell` comes first.
2019-01-08 11:36:33 -08:00
Brian Leung
9dbcf16161 Org reader: handle minlevel option differently. (#5190)
When `minlevel` exceeds the original minimum level observed in the
file to be included, every heading should be shifted rightward.
2019-01-07 20:28:47 -08:00
Alexander
40c30a9d88 Add DokuWiki reader (#5108)
Closes #1792
2019-01-06 15:06:32 -08:00
Albert Krewinkel
2f92261d87
Org reader: fix self-link parsing regression
Fixes a regression introduced by the previous commit.
2019-01-01 22:06:44 +01:00
Albert Krewinkel
c0caaaeabb
Org reader: fix treatment of links to images
Links with descriptions which are pointing to images are no longer read
as inline images, but as proper links.

Fixes: #5191
2019-01-01 21:03:38 +01:00
Alexander Krotov
5101f4324b Muse reader tests: test #cover directive 2018-12-25 15:23:02 +03:00
Jesse Rosenthal
0f736d778f Docx: add test for lists with level overrides. 2018-12-10 19:24:56 -05:00
Alexander Krotov
367e8cac18 Muse reader: trim whitespace before parsing grid table cells 2018-11-14 19:17:05 +03:00
Alexander Krotov
c61b67410a Muse reader: add grid tables support 2018-11-14 17:58:44 +03:00
Yan Pashkovsky
43a0734f62 table tests 2018-11-02 22:42:51 -07:00
John MacFarlane
3305a018bc Roff reader: properly handle unknown backslash escapes.
They are treated as regular characters, according to groff 7.

Cloess #5034.
2018-10-30 15:54:29 -07:00
John MacFarlane
8d55dc10cd Roff tokenizer: better handling of escapes. 2018-10-28 21:37:57 -07:00
John MacFarlane
22755a35b7 Roff tokenizer: revamped font parsing using escapeArg.
Add support for \C'...' escapes.
2018-10-28 18:06:34 -07:00
Alexander Krotov
f8ca36525d Muse: Make tables round-trip 2018-10-28 03:52:35 +03:00
Alexander Krotov
e34a0703f5 Muse reader: try to parse lists before trying to parse table
This ensures that tables inside lists are parsed correctly.
2018-10-28 03:52:25 +03:00
Alexander Krotov
d8135b2e67 Remove misleading comment from Muse reader tests
pandoc follows Text::Amuse rules instead of being bug compatible with Emacs Muse
2018-10-27 23:43:23 +03:00
Alexander Krotov
d28dca57db Muse reader: forbid whitespace after opening and before closing markup elements
See https://github.com/melmothx/text-amuse/issues/44 for discussion on these rules
2018-10-27 23:35:11 +03:00
Alexander Krotov
1ca320e249 Muse reader: parse page breaks 2018-10-26 16:30:15 +03:00
John MacFarlane
0327226d4c Man reader: don't parse \[ul] as unicode escape. 2018-10-22 12:05:34 -07:00
Alexander Krotov
875e33ecf6 Muse reader: allow footnotes to start with empty line
A space character was required after footnote marker, now newline is allowed.
2018-10-22 03:05:17 +03:00
John MacFarlane
2b7a541dd0 Man reader: Fixed handling of nested fonts.
Closes #4978.
2018-10-20 22:41:39 -07:00
Alexander Krotov
8df59952bf Muse reader: allow empty headers
Previously empty headers caused parser to terminate without parsing the rest of the document.
2018-10-21 06:42:00 +03:00