pandoc

Author	SHA1	Message	Date
John MacFarlane	7e38b8e55a	T.P.Readers.LaTeX: Don't export tokenize, untokenize. [API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).	2021-02-28 22:53:42 -08:00
John MacFarlane	2463fbf61d	LaTeX writer: use function instead of map for accent lookup.	2021-02-28 21:43:11 -08:00
John MacFarlane	d2bb0c7c8d	Factor out T.P.Readers.LaTeX.Math.	2021-02-28 21:05:25 -08:00
John MacFarlane	36456070c4	Fix bug in last commit.	2021-02-28 15:36:46 -08:00
John MacFarlane	7229d068c9	Markdown reader efficiency improvements. Benchmarks show that these make the reader 13-17% faster, depending on extensions.	2021-02-28 15:18:31 -08:00
John MacFarlane	cc543cf5b6	LaTeX reader: another small efficiency improvement.	2021-02-28 14:34:04 -08:00
John MacFarlane	f6cf03857b	LaTeX reader efficiency improvements. In conjunction with other changes this makes the reader almost twice as fast on our benchmark as it was on Feb. 10.	2021-02-28 12:52:41 -08:00
John MacFarlane	564c39beef	Move setDefaultLanguage to T.P.Readers.LaTeX.Lang.	2021-02-28 09:49:34 -08:00
John MacFarlane	5e571d9635	LaTeX reader: remove two unnecessary parsers in inline. These are handled anyway by regularSymbol.	2021-02-28 09:39:01 -08:00
John MacFarlane	2faa57e8e9	Factor out T.P.Readers.LaTeX.Citation.	2021-02-28 09:12:09 -08:00
John MacFarlane	08231f5cdd	Factor out T.P.Readers.LaTeX.Table.	2021-02-27 21:40:56 -08:00
John MacFarlane	925815bb33	Split off T.P.Readers.LaTeX.Accent. To help reduce memory demands compiling the main LaTeX reader.	2021-02-27 17:02:44 -08:00
Albert Krewinkel	3327b225a1	Lua: use strict evaluation when retrieving AST value from the stack Fixes: #6674	2021-02-27 21:57:12 +01:00
Salim B	fae6a204f1	Fix/update URLs and use HTTPS where possible (#7122 )	2021-02-26 17:56:04 -08:00
John MacFarlane	f0a991a22b	T.P.CSV: fix parsing of unquoted values. Previously we didn't allow unescaped quotes in unquoted values, but they are allowed. Closes #7112.	2021-02-22 21:18:04 -08:00
John MacFarlane	d30791a381	Fall back to latin1 if UTF-8 decoding fails... ...when handling URL argument served with no charset in the mime type. The assumption is that most pages that don't specify a charset in the mime type are either UTF-8 or latin1. I think that's a good assumption, though I'm not sure.	2021-02-22 14:17:22 -08:00
John MacFarlane	5a73c5d3f8	When downloading content from URL arguments, be sensitive to... the character encoding. We can properly handle UTF-8 and latin1 (ISO-8859-1); for others we raise an error. See #5600.	2021-02-22 14:01:10 -08:00
John MacFarlane	bafccd5aa2	T.P.Error: Add PandocUnsupportedCharsetError constructor... ...for PandocError. [API change]	2021-02-22 14:01:04 -08:00
John MacFarlane	4617f229ea	Text.Pandoc.MIME: add exported function getCharset. [API change]	2021-02-22 13:28:47 -08:00
John MacFarlane	80fde18fb1	Text.Pandoc.UTF8: change IO functions to return Text, not String. [API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.	2021-02-22 11:30:07 -08:00
John MacFarlane	2b37ed9f21	LaTeX reader: further optimizations in satisfyTok. Benchmarks show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.	2021-02-21 11:30:17 -08:00
John MacFarlane	db4f882315	LaTeX reader: removed sExpanded in state. This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.	2021-02-21 11:24:04 -08:00
John MacFarlane	f43cb5ddcf	LaTeX reader: further performance optimization. Avoid unnecessary 'doMacros'.	2021-02-21 10:58:42 -08:00
John MacFarlane	c0c8865eaa	HTML reader: small performance tweak.	2021-02-20 23:40:02 -08:00
John MacFarlane	d8ef383692	T.P.Shared: remove some obsolete functions [API change]. Removed: - `splitByIndices` - `splitStringByIndicies` - `substitute` - `underlineSpan` None of these are used elsewhere in the code base.	2021-02-20 23:02:10 -08:00
John MacFarlane	321343b2cf	HTML reader: small efficiency improvements. Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed. The functions isInlineTag and isBlockTag are no longer polymorphic.	2021-02-20 22:49:20 -08:00
John MacFarlane	cec541e54c	LaTeX reader: Another small improvement to macro handling.	2021-02-20 22:14:31 -08:00
John MacFarlane	31b8f60ea8	LaTeX reader: avoid macro resolution code if no macros defined.	2021-02-20 22:03:29 -08:00
John MacFarlane	0f955b10b4	T.P.Readers.LaTeX.Parsing: improve braced'. Remove the parameter, have it parse the opening brace, and make it more efficient.	2021-02-20 18:57:46 -08:00
John MacFarlane	13847267e9	HTML reader: efficiency improvements. Do a lookahead to find the right parser to use. Benchmarks from 34ms to 23ms, with less allocation. Also speeds up the epub reader.	2021-02-20 00:07:38 -08:00
John MacFarlane	98d26c2345	DocBook, JATS, OPML readers: performance optimization. With the new XML parser, we can avoid the expensive tree normalization step we used to do. This gives a significant speed boost in docbook and JATS parsing (e.g. 9.7 to 6 ms).	2021-02-18 21:24:31 -08:00
John MacFarlane	ef642e2bbc	T.P.XML Improve fromEntities.	2021-02-18 18:11:27 -08:00
John MacFarlane	0f5c56dfb1	T.P.PDF: disable `smart` when building PDF via LaTeX. This is to prevent accidental creation of ligatures like `` ?` `` and `` !` `` (especially in languages with quotations like German), and similar ligature issues. See jgm/citeproc#54.	2021-02-18 17:11:53 -08:00
John MacFarlane	53cf8295a4	LaTeX writer: adjust hypertargets to beginnings of paragraphs. Use `\vadjust pre` so that the hypertarget takes you to the beginning of the paragraph rather than one line down. Closes #7078. This makes a particular difference for links to citations using `--citeproc` and `link-citations: true`.	2021-02-18 14:34:38 -08:00
John MacFarlane	9e728b40f3	T.P.Shared: cleanup. Cleanup up some functions and added deprecation pragmas to funtions no longer used in the code base.	2021-02-18 13:12:15 -08:00
Albert Krewinkel	743f7216de	Org reader: fix bug in org-ref citation parsing. The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101	2021-02-18 21:59:18 +01:00
John MacFarlane	73add05789	Docx reader: use Map instead of list for Namespaces. This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.	2021-02-17 09:54:39 -08:00
John MacFarlane	80a1d5c9b6	Revert "Add T.P.XML.Light.Cursor." This reverts commit `d8fc497186`.	2021-02-16 19:18:01 -08:00
John MacFarlane	d8fc497186	Add T.P.XML.Light.Cursor.	2021-02-16 18:51:41 -08:00
John MacFarlane	4af378702a	Add orig copyright/license info for code derived from xml-light.	2021-02-16 18:44:38 -08:00
John MacFarlane	d7a4996b1e	Split up T.P.XML.Light into submodules.	2021-02-16 18:40:06 -08:00
John MacFarlane	967e7f5fb9	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light... ..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to `8ca191604d` (Feb 8) B = as of `8ca191604d` (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|	2021-02-16 16:55:20 -08:00
Albert Krewinkel	8621ed600a	T.P.Error: remove unused variables	2021-02-14 15:49:12 +01:00
John MacFarlane	d84a6041e1	HTML reader: fix bad handling of empty src attribute in iframe. - If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.	2021-02-13 13:08:34 -08:00
John MacFarlane	6e73273916	T.P.Error: export `renderError`. Refactor `handleError` to use `renderError`. This allows us render error messages without exiting.	2021-02-13 13:08:34 -08:00
Albert Krewinkel	a3beed9db8	Org: support task_lists extension The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336	2021-02-13 13:00:37 -08:00
Albert Krewinkel	2d60a5127c	T.P.Shared: export `handleTaskListItem`. [API change]	2021-02-13 13:00:37 -08:00
John MacFarlane	6323250bad	LaTeX reader: remove unnecessary line	2021-02-13 00:22:22 -08:00
John MacFarlane	25b7df7c2a	Remove Ext_fenced_code_attributes from allowed commonmark attributes. This attribute was listed as allowed, but it didn't actually do anything. Use `attributes` for code attributes and more. Closes #7097.	2021-02-13 00:18:40 -08:00
John MacFarlane	eb0c63b002	Avoid an unnecessary withRaw.	2021-02-12 19:29:48 -08:00

1 2 3 4 5 ...

7312 commits