Commit graph

1568 commits

Author SHA1 Message Date
Erik Rask
82e8c29cb0 Include Header.Attr.attributes as XML attributes on section
Add key-value pairs found in the attributes list of Header.Attr as
XML attributes on the corresponding section element.

Any key name not allowed as an XML attribute name is dropped, as
are keys with invalid values where they are defined as enums in
DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not
intervene with computed identifiers.
2021-03-20 21:29:17 +01:00
John MacFarlane
ceadf33246 Tests: Use getExecutablePath from base...
avoiding the need to depend on the executable-path package.
2021-03-19 23:35:47 -07:00
John MacFarlane
dc94601eb5 Tests: factor out setupEnvironment in Test.Helpers.
This avoids code duplication between Command and Old.
2021-03-19 21:17:13 -07:00
John MacFarlane
2ca1b20a85 Fix finding of data files from test programs.
Apparently Cabal sets a `pandoc_datadir` environment variable
so that the data files will be sought in the source directory
rather than in the final destination (where they aren't yet
installed).

So we no longer need to set `--data-dir` in the tests. We just
need to make sure `pandoc_datadir` is set in the environment
when we call the program in the test suite.

This will fix the issue with loading of pandoc.lua when
pandoc is built with `-embed_data_files`, reported in #7163.

Closes #7163.
2021-03-19 18:57:13 -07:00
John MacFarlane
c3f9e8c122 Docx writer: make nsid in abstractNum deterministic.
Previously we assigned a random number (though in a deterministic
way).  But changes in the random package mean we get different
results now on different architectures, even with the same random
seed. We don't need random values; so now we just assign a value
based on the list number id, which is guaranteed to be unique
to the list marker.
2021-03-17 22:31:20 -07:00
John MacFarlane
e66bf891ec Add test for #7155. 2021-03-17 09:10:37 -07:00
John MacFarlane
63a6059790 Update tests for new texmath. 2021-03-15 18:22:38 -07:00
John MacFarlane
35b66a7671 MediaWiki reader: Allow block-level content in notes (ref).
Closes #7145.
2021-03-13 12:50:44 -08:00
John MacFarlane
eed18d231c Use integral values for w:tblW in docx.
Cloess #7141.
2021-03-13 12:05:52 -08:00
Albert Krewinkel
f8b49e77f8
Use jira-wiki-markup 1.3.4
Jira reader:

* Fixed parsing of autolinks (i.e., of bare URLs in the text).
  Previously an autolink would take up the rest of a line, as spaces
  were allowed characters in these items.

* Emoji character sequences no longer cause parsing failures. This was
  due to missing backtracking when emoji parsing fails.

Jira writer:

* Block quotes are only rendered as `bq.` if they do not contain a
  linebreak.
2021-03-13 14:53:58 +01:00
Albert Krewinkel
00e8d0678e
Jira reader: mark divs created from panels with class "panel".
Closes: tarleb/jira-wiki-markup#2
2021-03-13 14:29:47 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
John MacFarlane
5608dc01e5 HTML writer: Add warnings on duplicate attribute values.
This prevents emitting invalid HTML.

Ultimately it would be good to prevent this in the types
themselves, but this is better for now.

T.P.Logging: Add DuplicateAttribute constructor to LogMessage.
[API change]
2021-03-10 10:19:40 -08:00
John MacFarlane
1c23e3a824 RST reader: fix logic for ending comments.
Previously comments sometimes got extended too far.  Closes #7134.
2021-03-09 13:03:27 -08:00
Albert Krewinkel
b9b2586ed3
Org writer: prevent unintended creation of ordered list items
Adjust line wrapping if default wrapping would cause a line to be read
as an ordered list item.

Fixes #7132
2021-03-09 18:14:54 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
John MacFarlane
5aa73bd0a2 LaTeX reader: handle table cells containing & in \verb.
Closes #7129.
2021-03-07 15:49:02 -08:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
12b47656d4 Remove superfluous imports. 2021-02-28 22:57:36 -08:00
John MacFarlane
7e38b8e55a T.P.Readers.LaTeX: Don't export tokenize, untokenize.
[API change]

These were only exported for testing, which seems the
wrong thing to do.  They don't belong in the public
API and are not really usable as they are, without access
to the Tok type which is not exported.

Removed the tokenize/untokenize roundtrip test.

We put a quickcheck property in the comments which
may be used when this code is touched (if it is).
2021-02-28 22:53:42 -08:00
John MacFarlane
a9cc5d2616 Update tests for changes to https URLs. 2021-02-26 18:00:45 -08:00
Salim B
fae6a204f1
Fix/update URLs and use HTTP**S** where possible (#7122) 2021-02-26 17:56:04 -08:00
John MacFarlane
f0a991a22b T.P.CSV: fix parsing of unquoted values.
Previously we didn't allow unescaped quotes in unquoted values,
but they are allowed. Closes #7112.
2021-02-22 21:18:04 -08:00
Albert Krewinkel
00e4bb51e4
tests: print accurate location if a test fails
Ensures that tasty-hunit reports the location of the failing test
instead of the location of the helper `test` function.
2021-02-22 23:56:04 +01:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
005344fb18 Revert "LaTeX template: disable ` ? ` and ! `` ligatures."
This reverts commit 24d7cd539b.
2021-02-18 17:03:11 -08:00
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00
John MacFarlane
59875185b3 Add command test for #7092 2021-02-12 19:04:14 -08:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
John MacFarlane
8e9131db4e Markdown reader: improved handling of mmd link attributes in references.
Previously they only worked for links that had titles.  Closes #7080.
2021-02-06 21:52:12 -08:00
Andrew Dunning
4de9edb8e8
LaTeX template: Update to iftex package (#7073)
Load the iftex package directly rather than via the ifxetex and ifluatex compatibility
wrappers, which have been merged into a single package that is part of the LaTeX core.
The capitalization of the commands has been changed for compatibility with older
versions of TeX Live that have the version of iftex by the Persian TeX Group. This had
been removed in
<2845794c0c>
for compatibility with BasicTeX, but that is no longer an issue.
2021-02-03 08:54:11 -08:00
John MacFarlane
e6c7fcc598 Fixed some compiler warnings in tests. 2021-02-02 21:09:10 -08:00
Albert Krewinkel
6f79042502 Add tests for search_path_separator 2021-02-02 21:04:30 -08:00
Albert Krewinkel
e0bf4bfe82 Check that all documented functions are present.
Rely on tests in the module package to check the correctness of each
function.
2021-02-02 21:04:30 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
2415b2680a Test suite: a more robust way of testing the executable.
Mmny of our tests require running the pandoc
executable. This is problematic for a few different reasons.
First, cabal-install will sometimes run the test suite
after building the library but before building the executable,
which means the executable isn't in place for the tests.
One can work around that by first building, then building
and running the tests, but that's fragile.  Second,
we have to find the executable. So far, we've done that
using a function findPandoc that attempts to locate it
relative to the test executable (which can be located
using findExecutablePath).  But the logic here is delicate
and work with every combination of options.

To solve both problems, we add an `--emulate` option to
the `test-pandoc` executable.  When `--emulate` occurs
as the first argument passed to `test-pandoc`, the
program simply emulates the regular pandoc executable,
using the rest of the arguments (after `--emulate`).
Thus,

    test-pandoc --emulate -f markdown -t latex

is just like

    pandoc -f markdown -t latex

Since all the work is done by library functions,
implementing this emulation just takes a couple lines
of code and should be entirely reliable.

With this change, we can test the pandoc executable
by running the test program itself (locatable using
findExecutablePath) with the `--emulate` option.
This removes the need for the fragile `findPandoc`
step, and it means we can run our integration tests
even when we're just building the library, not the
executable.

Part of this change involved simplifying some complex
handling to set environment variables for dynamic
library paths.  I have tested a build with
`--enable-dynamic-executable`, and it works, but
further testing may be needed.
2021-02-02 20:36:51 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
d1875b69ec RST reader: fix handling of header in CSV tables.
The interpretation of this line is not affected
by the delim option. Closes #7064.
2021-01-31 12:05:46 -08:00
John MacFarlane
9223788a05 Markdown writer: handle math right before digit.
We insert an HTML comment to avoid a `$` right before
a digit, which pandoc will not recognize as a math delimiter.
2021-01-29 18:29:17 -08:00