Commit graph

14393 commits

Author SHA1 Message Date
John MacFarlane
f43cb5ddcf LaTeX reader: further performance optimization.
Avoid unnecessary 'doMacros'.
2021-02-21 10:58:42 -08:00
John MacFarlane
c0c8865eaa HTML reader: small performance tweak. 2021-02-20 23:40:02 -08:00
John MacFarlane
d8ef383692 T.P.Shared: remove some obsolete functions [API change].
Removed:

- `splitByIndices`
- `splitStringByIndicies`
- `substitute`
- `underlineSpan`

None of these are used elsewhere in the code base.
2021-02-20 23:02:10 -08:00
John MacFarlane
321343b2cf HTML reader: small efficiency improvements.
Also, remove exported class NamedTag(..) [API change].
This was just intended to smooth over the transition from String to Text
and is no longer needed.

The functions isInlineTag and isBlockTag are no longer
polymorphic.
2021-02-20 22:49:20 -08:00
John MacFarlane
cec541e54c LaTeX reader: Another small improvement to macro handling. 2021-02-20 22:14:31 -08:00
John MacFarlane
31b8f60ea8 LaTeX reader: avoid macro resolution code if no macros defined. 2021-02-20 22:03:29 -08:00
John MacFarlane
0f955b10b4 T.P.Readers.LaTeX.Parsing: improve braced'.
Remove the parameter, have it parse the opening brace,
and make it more efficient.
2021-02-20 18:57:46 -08:00
maurerle
6b7d614888
revealjs writer: add 'center' option for vertical slide centering.
Closes #7104.
2021-02-20 10:17:31 -08:00
John MacFarlane
36e745c678 Benchmark improvements.
+ Run writer benchmarks for binary formats too.
+ Alphabetize benchmarks.
+ Don't run benchmarks for bibliography formats
  (yet; we need a special input for them).
2021-02-20 00:28:10 -08:00
John MacFarlane
13847267e9 HTML reader: efficiency improvements.
Do a lookahead to find the right parser to use.

Benchmarks from 34ms to 23ms, with less allocation.
Also speeds up the epub reader.
2021-02-20 00:07:38 -08:00
John MacFarlane
fc335801ef MANUAL: block-level formatting is not allowed in line blocks.
Closes #7107.
2021-02-19 10:22:54 -08:00
John MacFarlane
b745bf3938 make bench: compare against a baseline, use datestamps for bench results. 2021-02-19 10:22:54 -08:00
Lorenzo
d68bbae552 Update default ODT style
As of now, the default style for ODT documents has a "First paragraph" style that inherits from "Standard" style and has no top or bottom margin. All subsequent paragraphs have "Text_20_body" style that inherits from "Standard" and add "0.0598in" margins on top and bottom. This makes the final document a bit ugly since the first paragraph has a small gap ("0.0598in") towards the second one, and all subsequent have double that.

The proposed fix makes "First paragraph" inherit from "Text_20_body" instead so that it also has a consistent margin.

Another approach would be to inherit "Text_20_body" and add a 0 margin on top.
2021-02-19 10:18:49 -08:00
John MacFarlane
5eedbb6e8e Clarify tex_math_dollars extension.
Note that no blank lines are allowed between the delimiters
in display math.
2021-02-19 09:22:17 -08:00
John MacFarlane
b2b32d9bb2 'make bench': Create csv files for comparison. 2021-02-18 23:22:18 -08:00
John MacFarlane
98d26c2345 DocBook, JATS, OPML readers: performance optimization.
With the new XML parser, we can avoid the expensive tree
normalization step we used to do.

This gives a significant speed boost in docbook and JATS
parsing (e.g. 9.7 to 6 ms).
2021-02-18 21:24:31 -08:00
John MacFarlane
ef642e2bbc T.P.XML Improve fromEntities. 2021-02-18 18:11:27 -08:00
John MacFarlane
0f5c56dfb1 T.P.PDF: disable smart when building PDF via LaTeX.
This is to prevent accidental creation of ligatures like
`` ?` `` and `` !` `` (especially in languages with quotations
like German), and similar ligature issues.

See jgm/citeproc#54.
2021-02-18 17:11:53 -08:00
John MacFarlane
005344fb18 Revert "LaTeX template: disable ` ? ` and ! `` ligatures."
This reverts commit 24d7cd539b.
2021-02-18 17:03:11 -08:00
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
John MacFarlane
53cf8295a4 LaTeX writer: adjust hypertargets to beginnings of paragraphs.
Use `\vadjust pre` so that the hypertarget takes you to the
beginning of the paragraph rather than one line down.

Closes #7078.

This makes a particular difference for links to citations
using `--citeproc` and `link-citations: true`.
2021-02-18 14:34:38 -08:00
John MacFarlane
9e728b40f3 T.P.Shared: cleanup.
Cleanup up some functions and added deprecation pragmas
to funtions no longer used in the code base.
2021-02-18 13:12:15 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
Dmitrii Kovanikov
ef741f3842 Allow base64-bytestring-1.2.* 2021-02-18 18:07:23 +01:00
John MacFarlane
73add05789 Docx reader: use Map instead of list for Namespaces.
This gives a speedup of about 5-10%.

The reader is now approximately twice as fast as in the last
release.
2021-02-17 09:54:39 -08:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
4af378702a Add orig copyright/license info for code derived from xml-light. 2021-02-16 18:44:38 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
Albert Krewinkel
8621ed600a
T.P.Error: remove unused variables 2021-02-14 15:49:12 +01:00
Albert Krewinkel
1942dc5611
Allow tasty 1.4.* 2021-02-14 14:43:32 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
2d60a5127c T.P.Shared: export handleTaskListItem. [API change] 2021-02-13 13:00:37 -08:00
John MacFarlane
6323250bad LaTeX reader: remove unnecessary line 2021-02-13 00:22:22 -08:00
John MacFarlane
25b7df7c2a Remove Ext_fenced_code_attributes from allowed commonmark attributes.
This attribute was listed as allowed, but it didn't actually
do anything. Use `attributes` for code attributes and more.

Closes #7097.
2021-02-13 00:18:40 -08:00
John MacFarlane
1954e894b4 Clean up benchmark code.
Now we can do patterns using `-p blah'.
2021-02-13 00:14:49 -08:00
John MacFarlane
eb0c63b002 Avoid an unnecessary withRaw. 2021-02-12 19:29:48 -08:00
John MacFarlane
d9322629a3 LaTeX reader improvements.
* Rewrote `withRaw` so it doesn't rely on fragile assumptions
  about token positions (which break when macros are expanded).
  This requires the addition of `sEnableWithRaw` and `sRawTokens`
  in `LaTeXState`, and a new combinator `disablingWithRaw` to
  disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
  up the new raw token collecting.
* These changes slightly increase allocations and have a small
  performance impact, but it's minor.

Closes #7092.
2021-02-12 19:04:14 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00
John MacFarlane
59875185b3 Add command test for #7092 2021-02-12 19:04:14 -08:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8c2618ed81 Add MANUAL section on reproducible builds. 2021-02-11 15:17:56 -08:00
John MacFarlane
390d5e65b2 Use getTimestamp instead of getCurrentTime in writers.
Setting SOURCE_DATE_EPOCH will allow reproducible builds.

Partially addresses #7093.  This does not suffice to fully enable
reproducible in EPUB, since a unique id is being generated for each
build.
2021-02-11 14:55:03 -08:00
John MacFarlane
3c4a58bad0 T.P.Class: Add getTimestamp [API change].
This attempts to read the SOURCE_DATE_EPOCH environment variable
and parse a UTC time from it (treating it as a unix date stamp,
see https://reproducible-builds.org/specs/source-date-epoch/).
If the variable is not set or can't be parsed as a unix date
stamp, then the function returns the current date.
2021-02-11 14:54:28 -08:00
John MacFarlane
acc9afaf6f Correctly parse "raw" date value in markdown references metadata.
See jgm/citeproc#53.
2021-02-11 09:16:25 -08:00