Commit graph

6355 commits

Author SHA1 Message Date
Jesse Rosenthal
4cce0efa48 Docx reader: Dynamically determine document.xml path.
The desktop Word program places the main document file in
"word/document.xml", but the online word places it in
"word/document2.xml". This file path is actually stated in the root
"_rels/.rels" file, in the "Relationship" element with an
"http://../officedocument" type.

Closes #5277
2019-02-06 21:14:46 -05:00
John MacFarlane
2b003d4a6b Handle Word files generated by Microsoft Word Online.
For some reason, Word in Office 365 Online uses `document2.xml`
for the content, instead of `document.xml`.  This causes pandoc
not to be able to parse docx.

This quick fix has the parser check for both `document.xml`
and `document2.xml`.

Addresses #5277, but a more robust solution would be to
get the name of the main document dynamically (who knows
whether it might change again?).
2019-02-06 09:01:26 -08:00
Albert Krewinkel
37a82b0b11 Add missing copyright notices and remove license boilerplate (#5112)
Quite a few modules were missing copyright notices.

This commit adds copyright notices everywhere via haddock module
headers.  The old license boilerplate comment is redundant with this and has
been removed.

Update copyright years to 2019.

Closes #4592.
2019-02-04 13:52:31 -08:00
John MacFarlane
4b89311081 More carefully groom ipynb default extensions. 2019-02-04 11:11:38 -08:00
John MacFarlane
977a88f92d Add all_symbols_escapable to githubMarkdownExtensions. 2019-02-04 11:11:13 -08:00
John MacFarlane
ccf4e23ee1 Markdown reader: add newline when parsing blocks in YAML.
Otherwise last block gets parsed as a Plain rather than
a Para.

This is a regression in pandoc 2.x.  This patch restores
pandoc 1.19 behavior.

Closes #5271.
2019-02-04 10:22:02 -08:00
John MacFarlane
ca4d308b60 ipynb reader: handle images referring to attachments.
Previously we didn't strip off the attachment: prefix,
so even though the attachment is available in the mediabag,
pandoc couldn't find it.
2019-02-02 18:22:43 -08:00
John MacFarlane
b062117ef4 HTML5 writer: implement WAI-ARIA roles for (end)notes.
See #4213.
2019-02-02 16:14:58 -08:00
John MacFarlane
00cd11c6e2 Shared: withTempDir is no longer used in the codebase.
Add comment to remove it in next major release.
2019-02-02 12:36:32 -08:00
John MacFarlane
cb1ede5b08 PDF: More conservative solution to #777.
Now, instead of always creating temp dirs in the home
directory on Windows, we only do it if the system tempdir
name contains tildes.  (This will be the case for longer
usernames only.)

Closes #1192.
2019-02-02 12:35:27 -08:00
John MacFarlane
737c0a684e PDF: use system temp dir and set TEXMFOUTPUT.
Previously the temp directory was created inside the working
directory, so that programs like epstopdf.pl would be allowed
to run in restricted mode.  However, setting TEXMFOUTPUT allows
these programs to run in the tmpdir inside the system temp
directory.

This is a better solution than cd51983.  Using the system
temp dir prevents problems when pandoc is run inside a synced
directory (e.g. dropbox).

Partially addresses #1192.
2019-02-02 11:31:29 -08:00
Mauro Bieg
9225583ccf
MIME: add WebP
fixes #5267
2019-02-02 10:05:06 +01:00
John MacFarlane
a6e3f1c775 LaTeX writer: use right fold for escapeString.
This is more elegant than the explicit recursive
we were using.
2019-02-01 22:12:54 -08:00
John MacFarlane
f5ebe98773 LaTeX writer: code simplification in escaping. 2019-02-01 21:59:58 -08:00
John MacFarlane
20a0b4433f Markdown writer: use markdown="1" when appropriate for Divs.
When `native_divs` and `markdown_in_html_blocks` are disabled
but `raw_html` and `markdown_attribute` are enabled...
2019-02-01 21:49:02 -08:00
John MacFarlane
633a9ecfec LaTeX writer: avoid {} after control sequences when escaping.
`\ldots{}.` doesn't behave as well as `\ldots.` with the latex
ellipsis package.  This patch causes pandoc to avoid emitting
the `{}` when it is not necessary.  Now `\ldots` and other
control sequences used in escaping will be followed by either
a `{}`, a space, or nothing, depending on context.

Thanks to Elliott Slaughter for the suggestion.
2019-02-01 21:17:46 -08:00
John MacFarlane
e752669e50 LaTeX reader: don't let \egroup match {.
`braced` now actually requires nested braces.
Otherwise some legitimate command and environment
definitions can break (see test/command/tex-group.md).
2019-01-31 22:50:51 -08:00
John MacFarlane
51f042279c Update copyright year in version. 2019-01-30 14:45:35 -08:00
leungbk
ac83b9c37c Org reader: add support for #+SELECT_TAGS. 2019-01-30 18:27:38 +01:00
leungbk
dc43174573 Org reader: separate filtering logic from conversion function. 2019-01-30 18:27:38 +01:00
John MacFarlane
c9454a4176 Add cpp to avoid warning. 2019-01-28 16:50:47 -08:00
John MacFarlane
2932ac8574 Add isPrefixOf to imports. 2019-01-27 12:27:24 -08:00
Agustín Martín Barbero
9894d05fe3 Improve writing metadata for docx, pptx and odt (#5252)
* docx writer: support custom properties.  Solves the writer part of #3024.
  Also supports additional core properties:  `subject`, `lang`, `category`,
  `description`.

* odt writer: improve standard properties, including the following core properties:
  `generator` (Pandoc/VERSION), `description`, `subject`, `keywords`,
  `initial-creator` (from authors), `creation-date` (actual creation date).
  Also fix date.

* pptx writer: support custom properties.  Also supports additional core
  properties: `subject`, `category`, `description`.

* Includes golden tests.

* MANUAL: document metadata support for docx, odt, pptx writers
2019-01-26 16:14:35 -08:00
John MacFarlane
ff0aaa549d Normalize Windows paths to account for change in ghc 8.6.
When pandoc is compiled with ghc 8.6, Windows paths are treated
differently, and paths beginning `\\server` no longer work.
This commit rewrites such patsh to `\\?\UNC\server` which works.

The change operates at the level of argument parsing, so it
only affects the command line program.

See #5127 and the discussion there.
2019-01-26 16:07:39 -08:00
John MacFarlane
446583e322 Texinfo writer: use header identifier for anchor if present.
Previously we were overwriting an existing identifier
with a new one.  Closes #4731.
2019-01-25 17:11:28 -08:00
John MacFarlane
a5ac58f82f MediaWiki reader: use _ instead of - in auto-identifiers.
Partially addresses #4731.
We may not still be exactly matching mediawiki's algorithm
for identifiers.
2019-01-25 17:10:49 -08:00
John MacFarlane
2e7cfe1bba LaTeX writer: add # special characeters for listings.
This character needs special handling in lstinline.
Closes #4939.
2019-01-25 16:49:31 -08:00
John MacFarlane
2f54470266 Ipynb: Put all jupyter metadata under 'jupyter' key. 2019-01-24 16:51:56 -08:00
John MacFarlane
7167330a2a Revert "Prepend jupyter_ to jupyter metadata keys."
This reverts commit 5eaff399d5.
2019-01-24 16:33:03 -08:00
John MacFarlane
b08c8627d3 Allow some command line options to take URL in addition to FILE.
`--include-in-header`, `--include-before-body`, `--include-after-body`
2019-01-24 16:21:57 -08:00
John MacFarlane
22b09d88ff Ms writer: ensure we have a newline after .EN in disply math.
Closes #5251.
2019-01-24 16:09:14 -08:00
John MacFarlane
5eaff399d5 Prepend jupyter_ to jupyter metadata keys.
This avoids conflics with things like 'toc'.
2019-01-24 09:35:42 -08:00
John MacFarlane
09b6dca763 Removed superfluous import. 2019-01-23 10:08:08 -08:00
John MacFarlane
395ea03069 Support ipynb (Jupyter notebook) as input and output format.
[API change]

* Depend on ipynb library.

* Add `ipynb` as input and output format.

* Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4).

* Added Text.Pandoc.Writers.Ipynb (supports nbformat v4).

* Added ipynb readers and writers to T.P.Readers,
  T.P.Writers, and T.P.Extensions.  Register the
  file extension .ipynb for this format.

* Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error.

* Note: there is no template for ipynb.
2019-01-22 21:45:59 -08:00
John MacFarlane
5ddd7b121e LaTeX reader: support \endinput. Closes #5233. 2019-01-22 21:39:26 -08:00
Brian Leung
509336d866 Man reader: fix typo. (#5245) 2019-01-22 20:50:25 -08:00
John MacFarlane
f86ac89383 HTML and markdown: treat textarea as a verbatim environment.
We don't want to parse its contents as Markdown or HTML.

Closes #5241.
2019-01-21 20:54:12 -08:00
John MacFarlane
11810edb2f LaTeX reader: allow includes with dots like cc_by_4.0.
Previously the `.0` was interpreted as a file extension,
leading pandoc not to add `.tex` (and thus not to find the
file).

The new behavior matches tex more closely.
2019-01-20 18:22:19 -08:00
John MacFarlane
26dfab2e61 LaTeX reader: cleaned up 'input' code. 2019-01-20 17:35:51 -08:00
Agustín Martín Barbero
fb1f76ddee odt writer: fix typo in custom properties (#5231)
fixes #2839
2019-01-17 16:09:25 -08:00
John MacFarlane
24a0d613a8 Make raw content marked beamer work in beamer output.
See pandoc/lua-filters#40.
2019-01-10 12:00:34 -08:00
John MacFarlane
dfd1796cf2 Make 'plain' RawBlocks work for 'plain' output. 2019-01-10 11:55:21 -08:00
Brian Leung
35971495ab RST reader: change treatment of number-lines directives. (#5207)
Directives of this type without numeric inputs should not have a
`startFrom` attribute; with a blank value, the writers can produce
extra whitespace.
2019-01-09 22:19:26 -08:00
John MacFarlane
7e481d73cf Beamer writer: avoid duplicated fragile property in some cases.
Closes #5208.
2019-01-09 08:36:24 -08:00
John MacFarlane
253f342a80 EPUB writer: ensure that picture transforms are done on metadata too. 2019-01-08 16:19:54 -08:00
John MacFarlane
8673eb079b Removed superfluous sourceCode class on code blocks.
* These were added by the RST reader and, for literate Haskell,
  by the Markdown and LaTeX readers.  There is no point to
  this class, and it is not applied consistently by all readers.
  See #5047.

* Reverse order of `literate` and `haskell` classes on code blocks
  when parsing literate Haskell. Better if `haskell` comes first.
2019-01-08 11:36:33 -08:00
John MacFarlane
230e07ddfc RST reader: handle sourcecode directive as synonynm for code.
Closes #5204.
2019-01-08 11:11:48 -08:00
John MacFarlane
599327bee1 Asciidoc writer: shorter delimiters for tables, blockquotes.
This matches asciidoctor reference docs.

Closes #4364.
2019-01-07 22:10:34 -08:00
John MacFarlane
c1d058aeb1 revealjs writer: fix some section nesting corner cases.
* Ensure that we don't get > 2 levels of section nesting,
  even with slide level > 2.
* If slide level == N but there is no N-level header, make
  sure the next header with level > N gets treated as a slide
  and put in a section, rather than remaining loose.

Closes #5168.
2019-01-07 21:54:14 -08:00
John MacFarlane
710a22e5ac Org reader: allow for case of :minlevel == 0.
See #5190.
2019-01-07 20:39:40 -08:00