pandoc

Author	SHA1	Message	Date
Tristan Stenner	2444cbc668	Docx writer: add IDs to native_numbering test	2021-10-29 08:40:20 -07:00
Tristan Stenner	31d6a494de	Update test golden master for docx native numbering	2021-10-29 08:40:20 -07:00
Milan Bracke	465c28d28e	Docx reader: fix handling of empty fields Some fields only have an instrText and no content, Pandoc didn't understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn't.	2021-10-18 19:15:40 -07:00
Milan Bracke	6acc82c5d2	Docx parser: implement PAGEREF fields These fields, often used in tables of contents, can be a hyperlink.	2021-10-18 19:15:40 -07:00
Milan Bracke	193f6bfeba	Docx reader: fix handling of nested fields Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. To fix this issue, fields needed to be considered containing ParParts instead of Runs, since a Run can't represent complex enough structures. This also impacted Hyperlinks since they can originate from a field.	2021-10-18 19:15:40 -07:00
Milan Bracke	0f98cbff4b	Avoid blockquote when parent style has more indent When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.	2021-10-10 16:27:32 -07:00
Ezwal	472b33095e	Docx reader: Add placeholder for word diagram	2021-09-30 12:44:44 -07:00
John MacFarlane	6271b09c50	Docx writer: make id used in native_numbering predictable. If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. Closes #7551. This allows one to create a filter that adds a figure number with figure name, e.g. <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple> For this to be possible it must be possible to predict the figure number id from the image id. If images lack an id, an id of the form `ref_fig1` is used.	2021-09-12 15:30:29 -07:00
John MacFarlane	e4d7a6177f	Ensure we have unique ids for wp:docPr and pic:cNvPr elements. This will, I hope, fix #7527 and #7503.	2021-08-27 09:42:59 -07:00
John MacFarlane	0948af9cc5	Docx writer: Add table numbering for captioned tables. The numbers are added using fields, so that Word can create a list of tables that will update automatically.	2021-06-29 11:15:40 -07:00
John MacFarlane	a3d745e485	Docx writer: support figure numbers. These are set up in such a way that they will work with Word's automatic table of figures. Closes #7392.	2021-06-29 09:56:21 -07:00
Emily Bourke	56b211120c	Docx reader: Support new table features. * Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316	2021-05-28 20:15:23 +02:00
Emily Bourke	44484d0dee	Docx reader: Read table column widths.	2021-05-28 20:15:23 +02:00
John MacFarlane	0a4c6925b6	Docx writer: copy over more settings from referenc.odcx. From settings.xml in the reference-doc, we now include: `zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`, `drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`, `displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`, `characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`, `decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`. Closes #7240.	2021-05-15 15:40:49 -07:00
John MacFarlane	2cf971cf56	docx writer: Remove rsids from settings.docx. Word will add these when revisions are made. But it's pointless to start out with a set of them.	2021-05-15 10:54:05 -07:00
Albert Krewinkel	17d96404f5	Docx writer: allow multirow table headers	2021-05-14 16:19:20 +02:00
John MacFarlane	5eb7ad7d1e	Improve integration of settings from reference.docx. The settings we can carry over from a reference.docx are autoHyphenation, consecutiveHyphenLimit, hyphenationZone, doNotHyphenateCap, evenAndOddHeaders, and proofState. Previously this was implemented in a buggy way, so that the reference doc's values AND the new values were included. This change allows users to create a reference.docx that sets w:proofState for spelling or grammar to "dirty," so that spell/grammar checking will be triggered on the generated docx. Closes #1209.	2021-05-11 22:31:38 -06:00
Albert Krewinkel	ddbf83f62c	Docx writer: support colspans and rowspans in tables See: #6315	2021-05-01 18:52:24 +02:00
mbrackeantidot	b6a65445e1	Docx reader: add handling of vml image objects (jgm#4735) (#7257 ) They represent images, the same way as other images in vml format.	2021-04-29 09:11:44 -07:00
Albert Krewinkel	0921b82d98	Docx writer: autoset table width if no column has an explicit width.	2021-04-27 13:27:20 +02:00
John MacFarlane	c3f9e8c122	Docx writer: make nsid in abstractNum deterministic. Previously we assigned a random number (though in a deterministic way). But changes in the random package mean we get different results now on different architectures, even with the same random seed. We don't need random values; so now we just assign a value based on the list number id, which is guaranteed to be unique to the list marker.	2021-03-17 22:31:20 -07:00
John MacFarlane	eed18d231c	Use integral values for w:tblW in docx. Cloess #7141.	2021-03-13 12:05:52 -08:00
John MacFarlane	967e7f5fb9	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light... ..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to `8ca191604d` (Feb 8) B = as of `8ca191604d` (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|	2021-02-16 16:55:20 -08:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
John MacFarlane	c451207b08	Docx writer: handle table header using styles. Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.	2021-01-12 09:49:10 -08:00
Albert Krewinkel	00031fc809	Docx writer: keep raw openxml strings verbatim. Closes: #6933	2020-12-13 14:09:59 +01:00
John MacFarlane	5bbd5a9e80	Docx writer: Support bold and italic in "complex script." Previously bold and italics didn't work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively. Closes #6911.	2020-12-03 09:51:23 -08:00
cholonam	5f4deb5455	Docx writer: Fix bullets/lists indentation Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.	2020-11-26 12:11:26 -08:00
Diego Balseiro	eda5540719	DOCX reader: Allow empty dates in comments and tracked changes (#6726 ) For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests	2020-10-06 21:03:00 -07:00
Michael Hoffmann	74bd5a4f47	Docx writer: better handle list items whose contents are lists (#6522 ) If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.	2020-10-02 09:30:05 -07:00
John MacFarlane	93e3d463fd	Docx writer: separate adjacent tables. Word combines adjacent tables, so to prevent this we insert an empty paragraph between two adjacent tables. Closes #4315.	2020-08-24 09:31:39 -07:00
John MacFarlane	1e84178431	Docx writer: support --number-sections. Closes #1413.	2020-07-22 11:53:31 -07:00
Nikolay Yakimov	48cef91d18	[Docx Reader] Refactor/update smushInlines	2020-07-07 09:04:38 +03:00
John MacFarlane	5a20cc07dd	Docx writer: enable column and row bands for tables. This change will not have any effect with the default style. However, it enables users to use a style (via a reference.docx) that turns on row and/or column bands. Closes #6371.	2020-05-16 15:50:59 -07:00
Vaibhav Sagar	9c2b659eeb	Support new Underline element in readers and writers (#6277 ) Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.	2020-04-28 07:53:06 -07:00
despresc	c7814f31e1	Use the new builders, modify readers to preserve empty headers The Builder.simpleTable now only adds a row to the TableHead when the given header row is not null. This uncovered an inconsistency in the readers: some would unconditionally emit a header filled with empty cells, even if the header was not present. Now every reader has the conditional behaviour. Only the XWiki writer depended on the header row being always present; it now pads its head as necessary.	2020-04-15 23:03:22 -04:00
despresc	d368536a4e	Adapt to the removal of the RowSpan, ColSpan, RowHeadColumns accessors	2020-04-15 23:03:22 -04:00
despresc	4e34d366df	Adapt to the newest Table type, fix some previous adaptation issues - Writers.Native is now adapted to the new Table type. - Inline captions should now be conditionally wrapped in a Plain, not a Para block. - The toLegacyTable function now lives in Writers.Shared.	2020-04-15 23:03:22 -04:00
despresc	7254a2ae0b	Implement the new Table type	2020-04-15 23:03:22 -04:00
John MacFarlane	41d1ae0fdd	Change styles in reference.docx. All headings now have a uniform color. Level-1 headings no longer set `w:themeShade="B5"`. Level-2 headings are now 14 point rather than 16 point. Level-3 headings are now 12 point rather than 14 point. Level-4 headings are italic rather than bold. Closes #5820.	2019-11-16 09:48:05 -08:00
John MacFarlane	e8de53ce4a	Change reference.docx to use more normal block quotes. Indented left and right, same font and size. Previously it was unindented, smaller font and different typeface. See #5820.	2019-11-14 22:20:58 -08:00
John MacFarlane	530bfe5f5a	Docx reader: fix list number resumption for sublists. Closes #4324 . The first list item of a sublist should not resume numbering from the number of the last sublist item of the same level, if that sublist was a sublist of a different list item. That is, we should not get: ``` 1. one 1. sub one 2. sub two 2. two 3. sub one ```	2019-11-03 12:54:42 -08:00
Nikolay Yakimov	5c5d1a65d9	[Docx Reader] Update tests Notice this commit updates lists.docx. The old test file contained references to "ListParagraph" style, which should never leak outside of pandoc, so I'm not sure what that was supposed to test for exactly.	2019-09-21 11:37:21 -07:00
Nikolay Yakimov	c113ca6717	[Docx Reader] Use style names, not ids, for assigning semantic meaning Motivating issues: #5523, #5052, #5074 Style name comparisons are case-insensitive, since those are case-insensitive in Word. w:styleId will be used as style name if w:name is missing (this should only happen for malformed docx and is kept as a fallback to avoid failing altogether on malformed documents) Block quote detection code moved from Docx.Parser to Readers.Docx Code styles, i.e. "Source Code" and "Verbatim Char" now honor style inheritance Docx Reader now honours "Compact" style (used in Pandoc-generated docx). The side-effect is that "Compact" style no longer shows up in docx+styles output. Styles inherited from "Compact" will still show up. Removed obsolete list-item style from divsToKeep. That didn't really do anything for a while now. Add newtypes to differentiate between style names, ids, and different style types (that is, paragraph and character styles) Since docx style names can have spaces in them, and pandoc-markdown classes can't, anywhere when style name is used as a class name, spaces are replaced with ASCII dashes `-`. Get rid of extraneous intermediate types, carrying styleId information. Instead, styleId is saved with other style data. Use RunStyle for inline style definitions only (lacking styleId and styleName); for Character Styles use CharStyle type (which is basicaly RunStyle with styleId and StyleName bolted onto it).	2019-09-21 11:18:15 -07:00
Ben Steinberg	7389919bb4	Preserve built-in styles in DOCX with custom style (#5670 ) This commit prevents custom styles on divs and spans from overriding styles on certain elements inside them, like headings, blockquotes, and links. On those elements, the "native" style is required for the element to display correctly. This change also allows nesting of custom styles; in order to do so, it removes the default "Compact" style applied to Plain blocks, except when inside a table.	2019-09-20 22:13:29 -07:00
Agustín Martín Barbero	bd69218451	Change order of ilvl and numId in document.xml (#5647 ) Workaround for Word Online shortcomming. Fixes #5645 Also, make list para properties go first. This reordering of properties shouldn't be necessary but it seems Word Online does not understand the docx correctly otherwise.	2019-07-19 09:32:43 -07:00
John MacFarlane	66e5f0ff8d	Docx writer: Use w:br without attributes for line breaks. We previously added the attribute `type="textWrapping"`, but this causes problems on Word Online. Closes #5377.	2019-03-21 09:28:16 -07:00
John MacFarlane	b7cbd7b8c9	docx writer: avoid extra copy of abstractNum and num elements... ...in numbering.xml. This caused pandoc-produced docx files to be uneditable using Word Online. The problem was that recent versions of reference.docx include samples of various kinds of text, including lists. The numering elements for these were getting copied over to the new docx, where they clashed with the autogenerated elements produced by pandoc. This didn't confuse Desktop Word, but it did confuse Word Online. Closes #5358.	2019-03-11 22:09:21 -07:00
Jesse Rosenthal	83d2a5131d	Docx reader tests: fix test file with trailing space. This failed due to the fix of #5273.	2019-02-18 15:49:36 -05:00
Jesse Rosenthal	9a1a3fe482	Docx reader: add tests for trimming last inline.	2019-02-18 15:49:00 -05:00

1 2

84 commits