Commit graph

7312 commits

Author SHA1 Message Date
John MacFarlane
a31731b8e2 Docx reader: Don't reimplement NonEmpty. 2021-03-19 10:11:08 -07:00
John MacFarlane
3428248deb Use minimumDef instead of minimum (partial function). 2021-03-18 23:01:12 -07:00
John MacFarlane
f0e4b9cc3c Require safe >= 0.3.18 and remove cpp. 2021-03-18 21:37:56 -07:00
John MacFarlane
1da6208315 Rewrite a foldl1 as a foldl'. 2021-03-18 21:30:59 -07:00
John MacFarlane
67e173bda1 Remove another foldr1 partial function use. 2021-03-18 21:10:22 -07:00
John MacFarlane
fd76e605cd T.P.Readers.Odt.StyleReader: rewrite foldr1 use as foldr.
This avoids a partial function.
2021-03-18 21:02:05 -07:00
John MacFarlane
c3f9e8c122 Docx writer: make nsid in abstractNum deterministic.
Previously we assigned a random number (though in a deterministic
way).  But changes in the random package mean we get different
results now on different architectures, even with the same random
seed. We don't need random values; so now we just assign a value
based on the list number id, which is guaranteed to be unique
to the list marker.
2021-03-17 22:31:20 -07:00
John MacFarlane
7bf4be04b0 Fix regression with tex_math_backslash in Markdown reader.
Added regression test.  Closes #7155.
2021-03-17 09:10:44 -07:00
John MacFarlane
87538966a0 Removed unused LANGUAGE pragmas. 2021-03-16 13:05:29 -07:00
John MacFarlane
805d12ac9c Remove an unneeded import 2021-03-15 14:21:52 -07:00
John MacFarlane
24191a2a27 Use foldl' instead of foldl everywhere. 2021-03-15 10:37:35 -07:00
John MacFarlane
3622097da3 Handle 'nocite' better with --biblatex and --natbib.
Previously the nocite metadata field was ignored with
these formats.  Now it populates a `nocite-ids` template
variable and causes a `\nocite` command to be issued.

Closes #4585.
2021-03-14 00:10:37 -08:00
Albert Krewinkel
35688c4262
T.P.App.FormatHeuristics: shorten code, improve docs. 2021-03-13 22:06:43 +01:00
John MacFarlane
35b66a7671 MediaWiki reader: Allow block-level content in notes (ref).
Closes #7145.
2021-03-13 12:50:44 -08:00
John MacFarlane
eed18d231c Use integral values for w:tblW in docx.
Cloess #7141.
2021-03-13 12:05:52 -08:00
Albert Krewinkel
00e8d0678e
Jira reader: mark divs created from panels with class "panel".
Closes: tarleb/jira-wiki-markup#2
2021-03-13 14:29:47 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
John MacFarlane
894ed8ebb0 Citeproc: apply fixLinks correctly.
This is code that incorporates a prefix like `https://doi.org/`
into a following link when appropriate.  But it didn't work because
we were walking with a `[Inline] -> [Inline]` function on an `Inlines`.
Changed the point of application of `fixLink` to resolve the issue.
Closes #7130.
2021-03-12 11:58:52 -08:00
John MacFarlane
92ffd37475 Simplify compactDL. 2021-03-12 11:58:52 -08:00
John MacFarlane
5608dc01e5 HTML writer: Add warnings on duplicate attribute values.
This prevents emitting invalid HTML.

Ultimately it would be good to prevent this in the types
themselves, but this is better for now.

T.P.Logging: Add DuplicateAttribute constructor to LogMessage.
[API change]
2021-03-10 10:19:40 -08:00
John MacFarlane
1c23e3a824 RST reader: fix logic for ending comments.
Previously comments sometimes got extended too far.  Closes #7134.
2021-03-09 13:03:27 -08:00
Albert Krewinkel
d7f8fbf04b
Org writer: fix operator precedence mistake in previous commit 2021-03-09 21:16:11 +01:00
Albert Krewinkel
b9b2586ed3
Org writer: prevent unintended creation of ordered list items
Adjust line wrapping if default wrapping would cause a line to be read
as an ordered list item.

Fixes #7132
2021-03-09 18:14:54 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
John MacFarlane
5aa73bd0a2 LaTeX reader: handle table cells containing & in \verb.
Closes #7129.
2021-03-07 15:49:02 -08:00
John MacFarlane
c652dcc16b LaTeX reader: support hyperref command.
Closes #7127.
2021-03-07 13:22:00 -08:00
John MacFarlane
735a69de6b Allow --resource-path to accumulate.
Previously, if `--resource-path` were used multiple times, the last
resource path would replace the others.

With this change, each time `--resource-path` is used, it prepends
the specified path components to the existing resource path.

Similarly, when `resource-path` is specified in a defaults file,
the paths provided will be prepended to the existing resource
path.

This change also allows one to avoid using the OS-specific path
separator; instead, one can simply use `--resource-path`
a number of times with single paths. This form of command
will not have an OS-dependent behavior.

This change facilitates the use of multiple, small defaults
files: each can specify a directory containing its own
resources without clobbering the resource paths set by
the others.

Closes #6152.
2021-03-06 10:32:51 -08:00
John MacFarlane
df00cf05cb Allow ${.} in defaults files paths...
to refer to the directory where the default file is.
This will make it possible to create moveable
"packages" of resources in a directory.

Closes #5871.
2021-03-05 11:56:41 -08:00
John MacFarlane
6dd7520cc4 Implement environment variable interpolation in defaults files.
This allows the syntax `${HOME}` to be used, in fields that expect
file paths only.  Any environment variable may be interpolated
in this way. A warning will be raised for undefined variables.
The special variable `USERDATA` is automatically set to the
user data directory in force when the defaults file is parsed.
(Note: it may be different from the eventual user data directory,
if the defaults file or further command line options change that.)

Closes #5982.
Closes #5977.
Closes #6108 (path not taken).
2021-03-05 10:46:01 -08:00
John MacFarlane
a832469006 Add fields for CSL optinos to Opt.
* Add `optCSL`, `optBibliography`, `optCitationAbbreviations` to
  `Opt` [API change].
* Move `addMeta` from T.P.App.Opt to T.P.App.CommandLineOptions.
2021-03-05 10:42:33 -08:00
John MacFarlane
ccc530c588 Logging: Add EnvironmentVariableUndefined constructor to LogMessage.
[API change]
2021-03-05 10:28:46 -08:00
John MacFarlane
5f9327cfc8 Shared: Change defaultUserDataDirs -> defaultUserDataDir.
Rationale: the manual says that the XDG data directory will
be used if it exists, otherwise the legacy data directory.
So we should just determine this and use this directory,
rather than having a search path which could cause some
things to be taken from one data directory and others from
others.

[API change]
2021-03-05 10:25:18 -08:00
John MacFarlane
030209fc29 Revert "Revert "Relax --abbreviations rules so that a period isn't required.
This reverts commit 916ce4d511.

I was confused in thinking it wouldn't work.
2021-03-04 16:25:13 -08:00
John MacFarlane
916ce4d511 Revert "Relax --abbreviations rules so that a period isn't required."
This reverts commit e461b7dd45.

Ill-advised change.  This doesn't work because we parse
strings in chunks.
2021-03-04 16:22:08 -08:00
John MacFarlane
e461b7dd45 Relax --abbreviations rules so that a period isn't required.
Partially addresses #7124.
2021-03-04 16:02:46 -08:00
John MacFarlane
92ea8a0cb6 Revert "Add T.P.Readers.LaTeX.Include."
This reverts commit b569b0226d.

Memory usage improvement in compilation wasn't very significant.
2021-03-03 19:07:16 -08:00
John MacFarlane
b569b0226d Add T.P.Readers.LaTeX.Include. 2021-03-03 18:47:17 -08:00
John MacFarlane
33e4c8dd6c Remove T.P.Readers.LaTeX.Accent.
Incorporate accentCommands into T.P.Readers.LaTeX.Inline.
2021-03-03 18:21:32 -08:00
John MacFarlane
da5e9e5956 Move enquote commands to T.P.LaTeX.Lang. 2021-03-03 11:22:42 -08:00
John MacFarlane
044bc44fc6 Moved more into T.P.Readers.LaTeX.Lang. 2021-03-03 11:08:02 -08:00
John MacFarlane
bbcc1501a5 Split out T.P.Readers.LaTeX.Inline. 2021-03-03 10:34:10 -08:00
John MacFarlane
e8e5ffe1f4 Split out T.P.Writers.LaTeX.Util. 2021-03-02 22:40:45 -08:00
John MacFarlane
fe483c653b Split out T.P.Writers.LaTeX.Citation. 2021-03-02 21:57:37 -08:00
John MacFarlane
827ecdd2de Split out T.P.Writers.LaTeX.Lang. 2021-03-02 21:33:58 -08:00
John MacFarlane
2097411e4f Split up T.P.Writers.Markdown...
with T.P.Writers.Markdown.Types and T.P.Writers.Markdown.Inline.
The module was difficult to compile on low-memory system.s
2021-03-02 21:08:13 -08:00
John MacFarlane
7f1b933aaa Make T.P.Readers.LaTeX.Types an unexported module.
[API change]

This is really an implementation detail that shouldn't be
exposed in the public API.
2021-03-01 09:46:43 -08:00
John MacFarlane
382f0e23d2 Factor out T.P.Readers.LaTeX.Macro. 2021-03-01 09:46:43 -08:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
3793ed8beb Removed unnecessary pragmas. 2021-02-28 23:43:55 -08:00
John MacFarlane
6a6291d9e3 Change T.P.Readers.LaTeX.SIunitx to export a command map...
instead of individual commands.
2021-02-28 23:05:35 -08:00
John MacFarlane
7e38b8e55a T.P.Readers.LaTeX: Don't export tokenize, untokenize.
[API change]

These were only exported for testing, which seems the
wrong thing to do.  They don't belong in the public
API and are not really usable as they are, without access
to the Tok type which is not exported.

Removed the tokenize/untokenize roundtrip test.

We put a quickcheck property in the comments which
may be used when this code is touched (if it is).
2021-02-28 22:53:42 -08:00
John MacFarlane
2463fbf61d LaTeX writer: use function instead of map for accent lookup. 2021-02-28 21:43:11 -08:00
John MacFarlane
d2bb0c7c8d Factor out T.P.Readers.LaTeX.Math. 2021-02-28 21:05:25 -08:00
John MacFarlane
36456070c4 Fix bug in last commit. 2021-02-28 15:36:46 -08:00
John MacFarlane
7229d068c9 Markdown reader efficiency improvements.
Benchmarks show that these make the reader 13-17% faster,
depending on extensions.
2021-02-28 15:18:31 -08:00
John MacFarlane
cc543cf5b6 LaTeX reader: another small efficiency improvement. 2021-02-28 14:34:04 -08:00
John MacFarlane
f6cf03857b LaTeX reader efficiency improvements.
In conjunction with other changes this makes the reader
almost twice as fast on our benchmark as it was on Feb. 10.
2021-02-28 12:52:41 -08:00
John MacFarlane
564c39beef Move setDefaultLanguage to T.P.Readers.LaTeX.Lang. 2021-02-28 09:49:34 -08:00
John MacFarlane
5e571d9635 LaTeX reader: remove two unnecessary parsers in inline.
These are handled anyway by regularSymbol.
2021-02-28 09:39:01 -08:00
John MacFarlane
2faa57e8e9 Factor out T.P.Readers.LaTeX.Citation. 2021-02-28 09:12:09 -08:00
John MacFarlane
08231f5cdd Factor out T.P.Readers.LaTeX.Table. 2021-02-27 21:40:56 -08:00
John MacFarlane
925815bb33 Split off T.P.Readers.LaTeX.Accent.
To help reduce memory demands compiling the main LaTeX reader.
2021-02-27 17:02:44 -08:00
Albert Krewinkel
3327b225a1
Lua: use strict evaluation when retrieving AST value from the stack
Fixes: #6674
2021-02-27 21:57:12 +01:00
Salim B
fae6a204f1
Fix/update URLs and use HTTP**S** where possible (#7122) 2021-02-26 17:56:04 -08:00
John MacFarlane
f0a991a22b T.P.CSV: fix parsing of unquoted values.
Previously we didn't allow unescaped quotes in unquoted values,
but they are allowed. Closes #7112.
2021-02-22 21:18:04 -08:00
John MacFarlane
d30791a381 Fall back to latin1 if UTF-8 decoding fails...
...when handling URL argument served with no charset in the mime type.
The assumption is that most pages that don't specify a charset
in the mime type are either UTF-8 or latin1.  I think that's a good
assumption, though I'm not sure.
2021-02-22 14:17:22 -08:00
John MacFarlane
5a73c5d3f8 When downloading content from URL arguments, be sensitive to...
the character encoding.  We can properly handle UTF-8 and
latin1 (ISO-8859-1); for others we raise an error.
See #5600.
2021-02-22 14:01:10 -08:00
John MacFarlane
bafccd5aa2 T.P.Error: Add PandocUnsupportedCharsetError constructor...
...for PandocError.  [API change]
2021-02-22 14:01:04 -08:00
John MacFarlane
4617f229ea Text.Pandoc.MIME: add exported function getCharset.
[API change]
2021-02-22 13:28:47 -08:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
2b37ed9f21 LaTeX reader: further optimizations in satisfyTok.
Benchmarks show 2/3 of the run time and 2/3 of the allocation
of the Feb. 10 benchmarks.
2021-02-21 11:30:17 -08:00
John MacFarlane
db4f882315 LaTeX reader: removed sExpanded in state.
This isn't actually needed and checking it doesn't change
anything.

Also remove an unnecessary `doMacros` before `satisfyTok`,
which does it anyway.
2021-02-21 11:24:04 -08:00
John MacFarlane
f43cb5ddcf LaTeX reader: further performance optimization.
Avoid unnecessary 'doMacros'.
2021-02-21 10:58:42 -08:00
John MacFarlane
c0c8865eaa HTML reader: small performance tweak. 2021-02-20 23:40:02 -08:00
John MacFarlane
d8ef383692 T.P.Shared: remove some obsolete functions [API change].
Removed:

- `splitByIndices`
- `splitStringByIndicies`
- `substitute`
- `underlineSpan`

None of these are used elsewhere in the code base.
2021-02-20 23:02:10 -08:00
John MacFarlane
321343b2cf HTML reader: small efficiency improvements.
Also, remove exported class NamedTag(..) [API change].
This was just intended to smooth over the transition from String to Text
and is no longer needed.

The functions isInlineTag and isBlockTag are no longer
polymorphic.
2021-02-20 22:49:20 -08:00
John MacFarlane
cec541e54c LaTeX reader: Another small improvement to macro handling. 2021-02-20 22:14:31 -08:00
John MacFarlane
31b8f60ea8 LaTeX reader: avoid macro resolution code if no macros defined. 2021-02-20 22:03:29 -08:00
John MacFarlane
0f955b10b4 T.P.Readers.LaTeX.Parsing: improve braced'.
Remove the parameter, have it parse the opening brace,
and make it more efficient.
2021-02-20 18:57:46 -08:00
John MacFarlane
13847267e9 HTML reader: efficiency improvements.
Do a lookahead to find the right parser to use.

Benchmarks from 34ms to 23ms, with less allocation.
Also speeds up the epub reader.
2021-02-20 00:07:38 -08:00
John MacFarlane
98d26c2345 DocBook, JATS, OPML readers: performance optimization.
With the new XML parser, we can avoid the expensive tree
normalization step we used to do.

This gives a significant speed boost in docbook and JATS
parsing (e.g. 9.7 to 6 ms).
2021-02-18 21:24:31 -08:00
John MacFarlane
ef642e2bbc T.P.XML Improve fromEntities. 2021-02-18 18:11:27 -08:00
John MacFarlane
0f5c56dfb1 T.P.PDF: disable smart when building PDF via LaTeX.
This is to prevent accidental creation of ligatures like
`` ?` `` and `` !` `` (especially in languages with quotations
like German), and similar ligature issues.

See jgm/citeproc#54.
2021-02-18 17:11:53 -08:00
John MacFarlane
53cf8295a4 LaTeX writer: adjust hypertargets to beginnings of paragraphs.
Use `\vadjust pre` so that the hypertarget takes you to the
beginning of the paragraph rather than one line down.

Closes #7078.

This makes a particular difference for links to citations
using `--citeproc` and `link-citations: true`.
2021-02-18 14:34:38 -08:00
John MacFarlane
9e728b40f3 T.P.Shared: cleanup.
Cleanup up some functions and added deprecation pragmas
to funtions no longer used in the code base.
2021-02-18 13:12:15 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
John MacFarlane
73add05789 Docx reader: use Map instead of list for Namespaces.
This gives a speedup of about 5-10%.

The reader is now approximately twice as fast as in the last
release.
2021-02-17 09:54:39 -08:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
4af378702a Add orig copyright/license info for code derived from xml-light. 2021-02-16 18:44:38 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
8621ed600a
T.P.Error: remove unused variables 2021-02-14 15:49:12 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
2d60a5127c T.P.Shared: export handleTaskListItem. [API change] 2021-02-13 13:00:37 -08:00
John MacFarlane
6323250bad LaTeX reader: remove unnecessary line 2021-02-13 00:22:22 -08:00
John MacFarlane
25b7df7c2a Remove Ext_fenced_code_attributes from allowed commonmark attributes.
This attribute was listed as allowed, but it didn't actually
do anything. Use `attributes` for code attributes and more.

Closes #7097.
2021-02-13 00:18:40 -08:00
John MacFarlane
eb0c63b002 Avoid an unnecessary withRaw. 2021-02-12 19:29:48 -08:00