Commit graph

42 commits

Author SHA1 Message Date
Nikolay Yakimov
5c5d1a65d9 [Docx Reader] Update tests
Notice this commit updates lists.docx. The old test file contained
references to "ListParagraph" style, which should never leak
outside of pandoc, so I'm not sure what that was supposed to test
for exactly.
2019-09-21 11:37:21 -07:00
Nikolay Yakimov
c113ca6717 [Docx Reader] Use style names, not ids, for assigning semantic meaning
Motivating issues: #5523, #5052, #5074

Style name comparisons are case-insensitive, since those are
case-insensitive in Word.

w:styleId will be used as style name if w:name is missing (this should
only happen for malformed docx and is kept as a fallback to avoid
failing altogether on malformed documents)

Block quote detection code moved from Docx.Parser to Readers.Docx

Code styles, i.e. "Source Code" and "Verbatim Char" now honor style
inheritance

Docx Reader now honours "Compact" style (used in Pandoc-generated docx).
The side-effect is that "Compact" style no longer shows up in
docx+styles output. Styles inherited from "Compact" will still
show up.

Removed obsolete list-item style from divsToKeep. That didn't
really do anything for a while now.

Add newtypes to differentiate between style names, ids, and
different style types (that is, paragraph and character styles)

Since docx style names can have spaces in them, and pandoc-markdown
classes can't, anywhere when style name is used as a class name,
spaces are replaced with ASCII dashes `-`.

Get rid of extraneous intermediate types, carrying styleId information.
Instead, styleId is saved with other style data.

Use RunStyle for inline style definitions only (lacking styleId and styleName);
for Character Styles use CharStyle type (which is basicaly RunStyle with styleId
and StyleName bolted onto it).
2019-09-21 11:18:15 -07:00
Ben Steinberg
7389919bb4 Preserve built-in styles in DOCX with custom style (#5670)
This commit prevents custom styles on divs and spans from overriding
styles on certain elements inside them, like headings, blockquotes,
and links. On those elements, the "native" style is required for the
element to display correctly. This change also allows nesting of
custom styles; in order to do so, it removes the default "Compact"
style applied to Plain blocks, except when inside a table.
2019-09-20 22:13:29 -07:00
Agustín Martín Barbero
bd69218451 Change order of ilvl and numId in document.xml (#5647)
Workaround for Word Online shortcomming. Fixes #5645

Also, make list para properties go first.

This reordering of properties shouldn't be necessary but
it seems Word Online does not understand the docx correctly otherwise.
2019-07-19 09:32:43 -07:00
John MacFarlane
66e5f0ff8d Docx writer: Use w:br without attributes for line breaks.
We previously added the attribute `type="textWrapping"`, but
this causes problems on Word Online.

Closes #5377.
2019-03-21 09:28:16 -07:00
John MacFarlane
b7cbd7b8c9 docx writer: avoid extra copy of abstractNum and num elements...
...in numbering.xml.  This caused pandoc-produced docx files to
be uneditable using Word Online.

The problem was that recent versions of reference.docx include
samples of various kinds of text, including lists.  The
numering elements for these were getting copied over to
the new docx, where they clashed with the autogenerated
elements produced by pandoc.  This didn't confuse Desktop
Word, but it did confuse Word Online.

Closes #5358.
2019-03-11 22:09:21 -07:00
Jesse Rosenthal
83d2a5131d Docx reader tests: fix test file with trailing space.
This failed due to the fix of #5273.
2019-02-18 15:49:36 -05:00
Jesse Rosenthal
9a1a3fe482 Docx reader: add tests for trimming last inline. 2019-02-18 15:49:00 -05:00
Jesse Rosenthal
332e2ba5b6 Docx reader: Add test for reading sdts in footnotes. 2019-02-12 17:26:37 -05:00
Jesse Rosenthal
1847bdbb83 Docx reader: Tests for alternate document.xml 2019-02-06 21:14:46 -05:00
Agustín Martín Barbero
9894d05fe3 Improve writing metadata for docx, pptx and odt (#5252)
* docx writer: support custom properties.  Solves the writer part of #3024.
  Also supports additional core properties:  `subject`, `lang`, `category`,
  `description`.

* odt writer: improve standard properties, including the following core properties:
  `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`,
  `initial-creator` (from authors), `creation-date` (actual creation date).
  Also fix date.

* pptx writer: support custom properties.  Also supports additional core
  properties: `subject`, `category`, `description`.

* Includes golden tests.

* MANUAL: document metadata support for docx, odt, pptx writers
2019-01-26 16:14:35 -08:00
Jesse Rosenthal
0f736d778f Docx: add test for lists with level overrides. 2018-12-10 19:24:56 -05:00
John MacFarlane
d333c283cc Docx writer: Fix bookmarks to headers with long titles.
Word has a 40 character limit for bookmark names.  In
addition, bookmarks must begin with a letter.  Since
pandoc's auto-generated identifiers may not respect
these constraints, some internal links did not work.

With this change, pandoc uses a bookmark name based
on the SHA1 hash of the identifier when the identifier
isn't a legal bookmark name.

Closes #5091.
2018-11-20 23:43:21 -05:00
John MacFarlane
30033f417f Docx writer: added framework for custom properties.
So far, we don't actually write any custom properties,
but we have the infrastructure to add this.

See #3034.
2018-10-09 10:38:50 -07:00
John MacFarlane
40603dd4cd Support underline in docx writer.
Updated golden test and confirmed validity of file.

Closes #4633.
2018-05-08 10:17:51 -07:00
John MacFarlane
97916f0881 Remove nonfree ICC profiles from thumbnails in test docx files.
Closes #4588.
2018-04-25 17:00:21 -07:00
Jesse Rosenthal
c5d8fab058 Docx reader tests: Test for combining adjacent code blocks. 2018-04-17 09:29:54 -04:00
John MacFarlane
7e99178a09 Changes to tests to accommodate changes in pandoc-types.
In https://github.com/jgm/pandoc-types/pull/36 we changed
the table builder to pad cells.  This commit changes tests
(and two readers) to accord with this behavior.
2018-04-05 10:14:06 -07:00
Jesse Rosenthal
85a65c6a51 Docx reader: add tests for nested smart tags. 2018-03-13 22:16:54 -04:00
Jesse Rosenthal
7d3e7a5a6d Docx reader: Handle nested sdt tags.
Previously we had only unwrapped one level of sdt tags. Now we recurse
if we find them.

Closes: #4415
2018-02-28 16:32:20 -05:00
Jesse Rosenthal
5ada5cceac Docx reader: Don't look up dependant run styles if +styles is enabled.
It makes more sense not to interpret -- otherwise using the original
document as the reference-doc would produce two of everything: the
interpreted version and the uninterpreted style version.
2018-02-23 14:35:30 -05:00
laptop1\Andrew
aadac3c891 Docx test: adjust test for fix of bug
This commit adjusts the test cases for the Docx writer after the fix of #3930.

- Adjusted test cases with inline images. The inline images now have the correct sizing, title and description.
- Modified the test case to include an image multiple times with different sizing each time.
- Tested on Windows 8.1 with Word 2007 (12.0.6705.5000) The files are not corrupted and display exactly what is expected.
2018-02-23 11:50:33 -05:00
Jesse Rosenthal
8b7df2d915 Docx reader: Move pandoc inline styling inside custom-style span
Previously Emph, Strong, etc were outside the custom-style span. This
moves them inside in order to make it easier to write filters that act
on the formatting in these contents.

Tests and MANUAL example are changed to match.
2018-02-22 13:41:02 -05:00
Jesse Rosenthal
87e0728b87 Docx reader: Avoid repeated spans in custom styles.
The previous commit had a bug where custom-style spans would be read
with every recurrsion. This fixes that, and changes the example given
in the manual.
2018-02-22 13:27:34 -05:00
Jesse Rosenthal
ffcecfacb1 Docx reader tests: test custom style extension. 2018-02-22 13:05:44 -05:00
danse
e6ff7f7986 Docx reader: Pick table width from the longest row or header
This change is intended to preserve as much of the table content as
possible

Closes #4360
2018-02-15 15:06:01 -05:00
Jesse Rosenthal
ebcd04f57a Docx writer tests: Add tests for custom styles 2018-01-27 11:46:41 -05:00
Jesse Rosenthal
b3449a84aa Docx writer tests: Use new golden framework
These are based off the reader tests, with some removed (where the
reader output was identical, based on different docx inputs). There
are still more to be added. In particular, tests for custom-styles
need to be added.

All golden docx files have been checked in MS Word
2013 (windows). There is no corruption.

There is questionable output in the `tables` test: the three tables
seemed to be joined. This will be addressed in a future commit, and
the golden docx file will be changed.
2018-01-27 08:08:25 -05:00
Jesse Rosenthal
004f60bf26 Docx reader: Add test for hyperlinks in instrText tag
This is difficult to recreate with a modern version of Word, so I'm
using the file submitted with the bug report. It would be preferable
to find a smaller example with Latin characters, though, so as not to
confuse the issue being tested.
2018-01-16 13:22:02 -05:00
Jesse Rosenthal
a5b71a3c7f Docx reader: Add tests for paragraph insertion/deletion. 2018-01-02 11:32:48 -05:00
Jesse Rosenthal
3f30455b49 Docx reader: tests for overlapping targets (anchor spans). 2017-12-31 09:36:42 -05:00
Jesse Rosenthal
836153de43 Docx Reader: Combine adjacent anchors.
There isn't any reason to have numberous anchors in the same place,
since we can't maintain docx's non-nesting overlapping. So we reduce
to a single anchor, and have all links pointing to one of the
overlapping anchors point to that one. This changes the behavior from
commit e90c714c7 slightly (use the first anchor instead of the last)
so we change the expected test result.

Note that because this produces a state that has to be set after every
invocation of `parPartToInlines`, we make the main function into a
primed subfunction `parPartToInlines'`, and make `parPartToInlines` a
wrapper around that.
2017-12-31 09:29:51 -05:00
Jesse Rosenthal
475b0dcb66 Docx reader: tests for removing unused anchors. 2017-12-30 22:43:33 -05:00
Jesse Rosenthal
4fc3f51186 Docx reader: Read multiple children of w:sdtContents`
Previously we had only read the first child of an sdtContents tag. Now
we replace sdt with all children of the sdtContents tag.

This changes the expected test result of our nested_anchors test,
since now we read docx's generated TOCs.
2017-12-30 08:21:42 -05:00
Jesse Rosenthal
d71165c8e2 Docx reader: add tests for structured document tags unwrapping. 2017-12-27 10:03:00 -05:00
Jesse Rosenthal
440533643e Docx writer: Add tests for list continuation. 2017-12-13 15:16:44 -05:00
John MacFarlane
ae60e0196c Add empty_paragraphs extension.
* Deprecate `--strip-empty-paragraphs` option.  Instead we now
  use an `empty_paragraphs` extension that can be enabled on
  the reader or writer.  By default, disabled.

* Add `Ext_empty_paragraphs` constructor to `Extension`.

* Revert "Docx reader: don't strip out empty paragraphs."
  This reverts commit d6c58eb836.

* Implement `empty_paragraphs` extension in docx reader and writer,
  opendocument writer, html reader and writer.

* Add tests for `empty_paragraphs` extension.
2017-12-04 14:56:57 -08:00
John MacFarlane
d6c58eb836 Docx reader: don't strip out empty paragraphs.
We now have the `--strip-empty-paragraphs` option for that,
if you want it.  Closes #2252.

Updated docx reader tests.

We use stripEmptyParagraphs to avoid changing too
many tests.  We should add new tests for empty paragraphs.
2017-12-02 16:51:31 -08:00
John MacFarlane
a2a14f9029 Removed old adjacent_links test for docx reader.
See #2270 for background -- this test blocked the consistent
underline change and was hard to revise, so for now we are
removing it.
2017-10-27 16:09:44 -07:00
hftf
7f8a3c6cb7 Consistent underline for Readers (#2270)
* Added underlineSpan builder function.  This can be easily updated if needed. The purpose is for Readers to transform underlines consistently.

* Docx Reader: Use underlineSpan and update test

* Org Reader: Use underlineSpan and add test

* Textile Reader: Use underlineSpan and add test case

* Txt2Tags Reader: Use underlineSpan and update test

* HTML Reader: Use underlineSpan and add test case
2017-10-27 18:45:00 -04:00
Jesse Rosenthal
a67a96b932 Docx reader: Add tests for avoiding zero-level header. 2017-08-06 19:36:25 -07:00
John MacFarlane
18ab864269 Moved tests/ -> test/. 2017-02-04 12:56:30 +01:00