Commit graph

14099 commits

Author SHA1 Message Date
John MacFarlane
4617f229ea Text.Pandoc.MIME: add exported function getCharset.
[API change]
2021-02-22 13:28:47 -08:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
607c014e9d Update changelog. 2021-02-21 22:05:50 -08:00
John MacFarlane
aae1e617a6 Fix changelog-helper.sh 2021-02-21 19:02:28 -08:00
John MacFarlane
2b37ed9f21 LaTeX reader: further optimizations in satisfyTok.
Benchmarks show 2/3 of the run time and 2/3 of the allocation
of the Feb. 10 benchmarks.
2021-02-21 11:30:17 -08:00
John MacFarlane
db4f882315 LaTeX reader: removed sExpanded in state.
This isn't actually needed and checking it doesn't change
anything.

Also remove an unnecessary `doMacros` before `satisfyTok`,
which does it anyway.
2021-02-21 11:24:04 -08:00
John MacFarlane
f43cb5ddcf LaTeX reader: further performance optimization.
Avoid unnecessary 'doMacros'.
2021-02-21 10:58:42 -08:00
John MacFarlane
c0c8865eaa HTML reader: small performance tweak. 2021-02-20 23:40:02 -08:00
John MacFarlane
d8ef383692 T.P.Shared: remove some obsolete functions [API change].
Removed:

- `splitByIndices`
- `splitStringByIndicies`
- `substitute`
- `underlineSpan`

None of these are used elsewhere in the code base.
2021-02-20 23:02:10 -08:00
John MacFarlane
321343b2cf HTML reader: small efficiency improvements.
Also, remove exported class NamedTag(..) [API change].
This was just intended to smooth over the transition from String to Text
and is no longer needed.

The functions isInlineTag and isBlockTag are no longer
polymorphic.
2021-02-20 22:49:20 -08:00
John MacFarlane
cec541e54c LaTeX reader: Another small improvement to macro handling. 2021-02-20 22:14:31 -08:00
John MacFarlane
31b8f60ea8 LaTeX reader: avoid macro resolution code if no macros defined. 2021-02-20 22:03:29 -08:00
John MacFarlane
0f955b10b4 T.P.Readers.LaTeX.Parsing: improve braced'.
Remove the parameter, have it parse the opening brace,
and make it more efficient.
2021-02-20 18:57:46 -08:00
maurerle
6b7d614888
revealjs writer: add 'center' option for vertical slide centering.
Closes #7104.
2021-02-20 10:17:31 -08:00
John MacFarlane
36e745c678 Benchmark improvements.
+ Run writer benchmarks for binary formats too.
+ Alphabetize benchmarks.
+ Don't run benchmarks for bibliography formats
  (yet; we need a special input for them).
2021-02-20 00:28:10 -08:00
John MacFarlane
13847267e9 HTML reader: efficiency improvements.
Do a lookahead to find the right parser to use.

Benchmarks from 34ms to 23ms, with less allocation.
Also speeds up the epub reader.
2021-02-20 00:07:38 -08:00
John MacFarlane
fc335801ef MANUAL: block-level formatting is not allowed in line blocks.
Closes #7107.
2021-02-19 10:22:54 -08:00
John MacFarlane
b745bf3938 make bench: compare against a baseline, use datestamps for bench results. 2021-02-19 10:22:54 -08:00
Lorenzo
d68bbae552 Update default ODT style
As of now, the default style for ODT documents has a "First paragraph" style that inherits from "Standard" style and has no top or bottom margin. All subsequent paragraphs have "Text_20_body" style that inherits from "Standard" and add "0.0598in" margins on top and bottom. This makes the final document a bit ugly since the first paragraph has a small gap ("0.0598in") towards the second one, and all subsequent have double that.

The proposed fix makes "First paragraph" inherit from "Text_20_body" instead so that it also has a consistent margin.

Another approach would be to inherit "Text_20_body" and add a 0 margin on top.
2021-02-19 10:18:49 -08:00
John MacFarlane
5eedbb6e8e Clarify tex_math_dollars extension.
Note that no blank lines are allowed between the delimiters
in display math.
2021-02-19 09:22:17 -08:00
John MacFarlane
b2b32d9bb2 'make bench': Create csv files for comparison. 2021-02-18 23:22:18 -08:00
John MacFarlane
98d26c2345 DocBook, JATS, OPML readers: performance optimization.
With the new XML parser, we can avoid the expensive tree
normalization step we used to do.

This gives a significant speed boost in docbook and JATS
parsing (e.g. 9.7 to 6 ms).
2021-02-18 21:24:31 -08:00
John MacFarlane
ef642e2bbc T.P.XML Improve fromEntities. 2021-02-18 18:11:27 -08:00
John MacFarlane
0f5c56dfb1 T.P.PDF: disable smart when building PDF via LaTeX.
This is to prevent accidental creation of ligatures like
`` ?` `` and `` !` `` (especially in languages with quotations
like German), and similar ligature issues.

See jgm/citeproc#54.
2021-02-18 17:11:53 -08:00
John MacFarlane
005344fb18 Revert "LaTeX template: disable ` ? ` and ! `` ligatures."
This reverts commit 24d7cd539b.
2021-02-18 17:03:11 -08:00
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
John MacFarlane
53cf8295a4 LaTeX writer: adjust hypertargets to beginnings of paragraphs.
Use `\vadjust pre` so that the hypertarget takes you to the
beginning of the paragraph rather than one line down.

Closes #7078.

This makes a particular difference for links to citations
using `--citeproc` and `link-citations: true`.
2021-02-18 14:34:38 -08:00
John MacFarlane
9e728b40f3 T.P.Shared: cleanup.
Cleanup up some functions and added deprecation pragmas
to funtions no longer used in the code base.
2021-02-18 13:12:15 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
Dmitrii Kovanikov
ef741f3842 Allow base64-bytestring-1.2.* 2021-02-18 18:07:23 +01:00
John MacFarlane
73add05789 Docx reader: use Map instead of list for Namespaces.
This gives a speedup of about 5-10%.

The reader is now approximately twice as fast as in the last
release.
2021-02-17 09:54:39 -08:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
4af378702a Add orig copyright/license info for code derived from xml-light. 2021-02-16 18:44:38 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
Albert Krewinkel
8621ed600a
T.P.Error: remove unused variables 2021-02-14 15:49:12 +01:00
Albert Krewinkel
1942dc5611
Allow tasty 1.4.* 2021-02-14 14:43:32 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
2d60a5127c T.P.Shared: export handleTaskListItem. [API change] 2021-02-13 13:00:37 -08:00
John MacFarlane
6323250bad LaTeX reader: remove unnecessary line 2021-02-13 00:22:22 -08:00
John MacFarlane
25b7df7c2a Remove Ext_fenced_code_attributes from allowed commonmark attributes.
This attribute was listed as allowed, but it didn't actually
do anything. Use `attributes` for code attributes and more.

Closes #7097.
2021-02-13 00:18:40 -08:00
John MacFarlane
1954e894b4 Clean up benchmark code.
Now we can do patterns using `-p blah'.
2021-02-13 00:14:49 -08:00
John MacFarlane
eb0c63b002 Avoid an unnecessary withRaw. 2021-02-12 19:29:48 -08:00
John MacFarlane
d9322629a3 LaTeX reader improvements.
* Rewrote `withRaw` so it doesn't rely on fragile assumptions
  about token positions (which break when macros are expanded).
  This requires the addition of `sEnableWithRaw` and `sRawTokens`
  in `LaTeXState`, and a new combinator `disablingWithRaw` to
  disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
  up the new raw token collecting.
* These changes slightly increase allocations and have a small
  performance impact, but it's minor.

Closes #7092.
2021-02-12 19:04:14 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00