Commit graph

3247 commits

Author SHA1 Message Date
John MacFarlane
0baaa1080a Pipe tables: allow indented columns.
Previously the left-hand column could not start with 4 or
more spaces indent.  This was inconvenient for right-aligned
left columns.

Note that the first (header column) must still have 3 or fewer
spaces indentation, or the table will be treated as an indented
code  block.
2015-07-27 10:24:06 -07:00
John MacFarlane
defcb5b6b1 Merge pull request #1689 from kuribas/master
Use '=' instead of '#' for atx-style headers in markdown+lhs.
2015-07-25 10:21:02 -07:00
Ophir Lifshitz
7ef8700734 HTML Reader: Parse <ol> type, class, and inline list-style(-type) CSS 2015-07-24 02:53:17 -04:00
MarLinn
f068093555 Added odt reader
Fully implemented features:

* Paragraphs
* Headers
* Basic styling
* Unordered lists
* Ordered lists
* External Links
* Internal Links
* Footnotes, Endnotes
* Blockquotes

Partly implemented features:

* Citations
  Very basic, but pandoc can't do much more
* Tables
  No headers, no sizing, limited styling
2015-07-23 15:37:01 -07:00
John MacFarlane
8390d935d8 Updated tests and removed a skipSpaces....
we no longer need it with the change to toKey, and it
is expensive to skip spaces after every inline.
2015-07-23 15:35:18 -07:00
John MacFarlane
35e6c893ec Parsing: toKey: strip off outer brackets.
This makes keys with extra space at the beginning and end
work:  e.g.

    [foo]: bar

    [ foo ]

will now be a link to bar (it wasn't before).
2015-07-23 15:34:27 -07:00
John MacFarlane
5db4787330 Merge pull request #2323 from hftf/implicit-header-refs
Fix implicit header refs for headers with extra spaces
2015-07-23 14:46:38 -07:00
John MacFarlane
66a72b8eec LaTeX reader: support abstract environment.
The abstract populates an "abstract" metadata field.
2015-07-23 09:31:46 -07:00
Ophir Lifshitz
42c139d302 Markdown Reader: Skip spaces in headers 2015-07-23 02:29:37 -04:00
John MacFarlane
fa2c008ae5 Fix regression: allow HTML comments containing --.
Technically this isn't allowed in an HTML comment, but
we've always allowed it, and so do most other implementations.
It is handy if e.g. you want to put command line arguments
in HTML comments.
2015-07-21 22:44:18 -07:00
John MacFarlane
ec5960ab11 Use newManager instead of withManager in recent http-client.
This avoids a deprecation warning.
2015-07-21 16:32:44 -07:00
John MacFarlane
450bef90e0 DZSlides: Add role="note" for speaker notes.
Closes #1693.
2015-07-21 14:54:43 -07:00
John MacFarlane
da0842b5b5 HTML reader: handle type attribute on ol.
E.g. `<ol type="i">`.

Closes #2313.
2015-07-21 13:07:52 -07:00
John MacFarlane
f6ad9e263f LaTeX reader: properly handle booktabs lines.
Lines aren't part of the pandoc table model, but we can just
ignore them.

Closes #2307.
2015-07-21 10:26:29 -07:00
John MacFarlane
6166cd7559 Removed unneeded import. 2015-07-16 17:06:57 -07:00
John MacFarlane
075ad9a406 LaTeX writer: Fixed detection of 'chapters' from template.
If a documentclass isn't specified in metadata, but the
template has a hardwired bookish documentclass, act as if
`--chapters` was used.  This was the default in earlier
versions, but it has been broken for a little while.
2015-07-16 15:52:38 -07:00
John MacFarlane
c2ab44af84 --self-contained: Fixed overaggressive CSS minimization.
Previously `--self-contained` wiped out all spaces in CSS,
including semantically significant spaces!

Closes #2301.
Closes #2286.
2015-07-15 08:16:42 -07:00
John MacFarlane
6c32afc3c4 Updated to use cmark >= 0.4. 2015-07-14 22:51:23 -07:00
John MacFarlane
9e0fb844a9 Markdown reader: don't allow bare URI links or autolinks in link label.
Added test cases.

Closes #2300.
2015-07-14 13:16:40 -07:00
John MacFarlane
9cdfd4f649 Improved bare autolink detection.
Previously we disallowed `-` at the end of an autolink,
and disallowed the combination `=-`.

This commit liberalizes the rules for allowing punctuation in
a bare URI.

Added test cases.

One potential drawback is that you can no longer put a bare
URI in em dashes like this

    this uri---http://example.com---is an example.

But in this respect we now match github's treatment of bare URIs.

Closes #2299.
2015-07-14 10:24:39 -07:00
John MacFarlane
b8634b9f75 HTML writer: support speaker notes in dzslides.
With this change `<div class="notes">` and also `<div class="notes"
role="note">` will be output if `-t dzslides` is used. So we can
have speaker notes in dzslides too.

Thanks to maybegeek.
2015-07-13 22:50:17 -07:00
Tiziano Müller
f464e49142 DokuWiki: write $..$ instead of <math>..</math>
MathJax seems currently to be the only maintained math rendering
extension for DokuWiki and it uses $..$ instead of <math>..</math>.
2015-07-13 14:19:48 +02:00
John MacFarlane
2df3dfe883 Changed hierarchicalize so it treats references div as top-level header.
Fixes a bug with `--section-divs`, where the final references section
added by pandoc-citeproc, enclosed in its own div, got put in the
div for the section previous to it.

This fixes #2294.  Longer term, we might think about how hierarchicalize
should interact with Div elements.
2015-07-12 13:58:28 -07:00
John MacFarlane
99fe8594d9 Avoid parsing partial URLs as HTML tags.
Closes #2277.
2015-07-10 10:33:27 -07:00
John MacFarlane
b587acb224 Merge pull request #2266 from PromyLOPh/fieldinline
RST: Support inline markup for field list names
2015-07-08 22:45:06 -07:00
John MacFarlane
ac79429a12 PDF: Make sure --latex-engine-opt goes before the filename...
on the command line.  LaTeX needs the argument to come after
the options.

Closes #1779 - again!  Thanks to squisher for pointing
out the problem.
2015-07-08 17:37:54 -07:00
Andrew Dunning
4850aaf046 Correct superscript/subscript. 2015-07-08 13:57:04 -04:00
John MacFarlane
9e528f4c0c Fixed email javascript obfuscation with mailto: URLs.
This fixes a potential security issue.  Because single quotes weren't
being escaped in the link portion, a specially crafted email address
could allow javascript code injection.

    [Jim'+alert('hi')+'OBrien](mailto:me@example.com)

Closes #2280.
2015-07-07 11:15:40 -07:00
Lars-Dominik Braun
b2adf44e75 Readers.RST: Factor out inline markup string parsing 2015-07-03 16:42:51 +02:00
Lars-Dominik Braun
68b6b9f652 Readers.RST: Parse field list name
“Inline markup is parsed in field names.” [1]

[1] http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#field-lists
2015-07-03 16:41:28 +02:00
John MacFarlane
e0a88df686 ConTeXt: use \goto for internal links. 2015-07-01 12:11:45 -07:00
John MacFarlane
b5d3b4f608 Merge pull request #2255 from mchladek/odt_linebreak
Fix #2254 : OpenDocument writer adds space with hard line break
2015-07-01 11:49:58 -07:00
John MacFarlane
8e747004e6 ConTeXt writer: Added a % at end for \reference to avoid spurious space. 2015-07-01 11:30:28 -07:00
John MacFarlane
a04c15a422 New method for building man pages.
+ Removed `--man1`, `--man5` options (breaking change).
+ Removed `Text.Pandoc.ManPages` module (breaking API change).
+ Version bump to 1.15 because of the breaking changes, even
  though they involve features that have only been in pandoc
  for a day.
+ Makefile target for `man/man1/pandoc.1`.  This uses pandoc to
  create the man page from README using a custom template and filters.
+ Added `man/` directory with template and filters needed to build
  man page.
+ We no longer have two man pages: pandoc.1 and pandoc_markdown.5.
  Now there is just pandoc.1, which has all the content from README.
  This change was needed because of the extensive cross-references
  between parts of the README.
+ Removed old `data/pandoc.1.template` and
  `data/pandoc_markdown.5.template`.
2015-07-01 11:27:15 -07:00
Michael Chladek
125b0c7359 Do not add a carriage return after a hard line break in OpenDocument writer and reflect change in tests. 2015-07-01 09:43:36 -05:00
John MacFarlane
226a5cd6a9 Merge pull request #2250 from PromyLOPh/rsttarget
Fix RST reference names with special characters
2015-06-29 10:21:40 -07:00
John MacFarlane
754d1cef7b LaTeX reader: Allow _ and ^ as regular inline text.
Normally these will cause an error in LaTeX, but there
are contexts (e.g. `alltt` environments) where they are
okay.  Now that we aren't treating them as super/subscript
outside of math mode, it seems okay to parse them as regular
text.
2015-06-29 10:20:08 -07:00
John MacFarlane
457fbebabc LaTeX reader: don't parse _,^ as super/sub outside math mode. 2015-06-29 09:46:57 -07:00
Lars-Dominik Braun
3b2c50ed93 Fix RST reference names with special characters 2015-06-29 18:34:45 +02:00
John MacFarlane
27754e170b Removed unneeded import. 2015-06-28 23:59:10 -07:00
John MacFarlane
7c6277d2c1 Added a needed import in Shared. 2015-06-28 23:43:17 -07:00
John MacFarlane
36baded572 Make sure we use dist version of reference.docx for some things.
Taking some values from a user-supplied reference.docx
tends to lead to corruption.

This fixes a regression from the last release. Closes #2249.
2015-06-28 23:25:55 -07:00
John MacFarlane
de184a80ec Let reference.docx/odt behave as if they are virtual data files.
Now they are constructed on the fly from their components,
but we now allow them to be printed with `--print-default-data-file`
and to override the defaults if placed in the user data directory.

Shared now exports getDefaultReferenceDocx and getDefaultReferenceODT
(API change).

These functions have been removed from the Docx and ODT writers.

Shared.readDataFile has been modified so that requests to read
a reference.odt or reference.docx will use these functions to
generate the files.
2015-06-28 22:38:13 -07:00
John MacFarlane
7bbb007359 Minor fixes to previous commit.
* Instead of defining readmeFile in Text.Pandoc.Data (which we forgot
  to export anyway), we simply add a record for "README" to the
  `dataFiles` lookup table.  This allows simplifying some of the code
  for `readDefaultDataFile` in SHared.

* As a bonus, `pandoc --print-default-data-file README` now works.
2015-06-28 20:59:18 -07:00
John MacFarlane
fe625e053d New method for producing man pages.
This change adds `--man1` and `--man5` options to pandoc, so
pandoc can generate its own man pages.

It removes the old overly complex method of building a separate
executable (but not installing it) just to create the man pages.

The man pages are no longer automatically created in the build
process.

The man/ directory has been removed.  The man page templates
have been moved to data/.

New unexported module:  Text.Pandoc.ManPages.

Text.Pandoc.Data now exports readmeFile, and `readDataFile`
knows how to find README.

Closes #2190.
2015-06-28 14:39:17 -07:00
John MacFarlane
ed9a118b54 Fixed regression in CSS parsing with --self-contained.
In 1b44acf0c5 we replaced some
hackish CSS parsing with css-text, which I thought was a complete
CSS parser.  It turns out that it is very buggy, which results
in lots of things being silently dropped from CSS when
`--self-contained` is used (#2224).

This commit replaces the use of css-text with a small but
more principled css preprocessor, which only removes whitespace
and replaces URLs with base 64 data when possible.

Closes #2224.
2015-06-28 11:54:18 -07:00
John MacFarlane
2b1fd52403 Removed unused import. 2015-06-27 20:41:00 -07:00
John MacFarlane
7df4d86006 Textile writer: escape + and - as entities.
Closes #2225.
2015-06-27 20:30:20 -07:00
John MacFarlane
fce3ebb8e0 Plain writer: don't use symbols for super/subscript.
Simplified code by using plainExtensions from Options.

Closes #2237.
2015-06-27 20:19:04 -07:00
John MacFarlane
177533d3f8 Options: Export plainExtensions.
These are the extensions used in `plain` output.
2015-06-27 20:18:14 -07:00
mb21
82e363a727 DocBook reader mediaobjects and figures, closes #2184 2015-06-21 18:36:47 +02:00
gohai
f51757bd16 Fix InDesign crash with URLs containing more than one colon character
Colons are valid characters in URLs, and used e.g. by the Internet Archive's Wayback Machine - a popular resource amongst researchers. When InDesign encounters a HyperlinkURLDestination with more than one colon character in it, it crashes when placing the ICML. (This was tested against CS6.) The IDML specification hints at this requirement in section 6.4.1: "The colon apppears in the Name attribute of the style, but is encoded as %3a when it appears in the Self attribute". Follow this example for all colon characters in URLs.
2015-06-09 15:46:23 +02:00
John MacFarlane
7b4f077652 DokuWiki writer: Use proper <code> tags for code blocks.
Closes #2213.
2015-06-07 11:29:47 -07:00
John MacFarlane
cb7dd4469d HTML reader: allow <body> to close <head>. 2015-06-04 10:23:38 +02:00
John MacFarlane
3534ad3666 Custom writer: fixed some compiler warnings for ghc < 7.10. 2015-05-31 21:18:26 +02:00
John MacFarlane
c327b283c1 Allow building with hslua 0.4. 2015-05-31 13:57:14 +02:00
John MacFarlane
b241472a90 Better fix for #2187.
* Reverted kludgy change to make-windows-installer.bat.
* Removed make-reference-fiels.hs.
* Moved the individual ingredients of reference.docx and
  reference.odt to the data directory.
* Removed reference.docx and reference.odt from data directory.
* We now build the reference archives from their ingredient pieces
  in the docx and odt writers, instead of having a reference.docx
  or reference.odt intermediary.

This should fix #2187.

It also simplifies the bulid procedure.

The one thing users may notice is different is that you can
no longer get the reference.docx or reference.odt using
`--print-default-data-file`.  Instead, simply generate a
docx or odt using pandoc with a blank or minimal input,
and use that (or a customized version) with `--reference-docx`
or `--reference-odt`.
2015-05-28 18:15:01 -07:00
John MacFarlane
c1f6d5e31f ConTeXt writer: Add reference anchors to Div with ids.
This is useful for pandoc-citeproc linked citations.
2015-05-28 17:46:26 -07:00
John MacFarlane
e54c8613e8 Removed tab chars in Textile reader source. 2015-05-28 13:07:52 -07:00
John MacFarlane
4112fb8358 Texinfo writer: Removed tabs from source. 2015-05-28 13:01:43 -07:00
John MacFarlane
4a9aaf6fd6 LaTeX/beamer: added setotherlanguages in polyglossia.
This uses an `otherlang` variable that takes a list of languages.

As requseted in #2174.
2015-05-27 12:15:50 -07:00
John MacFarlane
c841b15d25 LaTeX writer: Make mainlang work when lang is in metadata.
Closes #2174.
2015-05-27 12:01:49 -07:00
John MacFarlane
adfb217622 Fixed svg handling in EPUB writer.
This is a crude workaroud for #2183.
A correct fix would require having openURL and fetchItem return
a content encoding as well as a content type.
2015-05-27 11:46:02 -07:00
John MacFarlane
c99e057d8e Fixed compiler warning. 2015-05-27 11:22:39 -07:00
John MacFarlane
734b0bc2fb Revealjs: allow 'center' to be set to false. 2015-05-27 11:04:38 -07:00
John MacFarlane
dbf2a63669 EPUB writer: Improved chapter splitting and internal link rewriting.
Closes #1887.
Closes #2163.
Closes #2162.
2015-05-27 09:39:05 -07:00
John MacFarlane
24bfc8274e Merge pull request #2170 from tarleb/org-generalize-result-block
Org generalize result block
2015-05-26 17:14:42 -07:00
John MacFarlane
fe66122b61 Merge pull request #2169 from tarleb/org-header-tags
Org reader: put header tags into empty spans
2015-05-26 17:14:29 -07:00
John MacFarlane
c3cb27f2f2 Merge pull request #2141 from DigitalPublishingToolkit/icml-images
Fix image URIs in ICML output
2015-05-26 17:14:10 -07:00
Albert Krewinkel
385dcf5b99 Org reader: drop trees with a :noexport: tag
Trees having a `:noexport:` tag set are not exported.  This mirrors
default Emacs Org-Mode behavior.
2015-05-23 14:23:16 +02:00
Albert Krewinkel
d8e4a8bc10 Org reader: put header tags into empty spans
Org mode allows headers to be tagged:

``` org-mode
* Headline         :TAG1:TAG2:
```

Instead of being interpreted as part of the headline, the tags are now
put into the attributes of empty spans.  Spans without textual content
won't be visible by default, but they are detectable by filters.  They
can also be styled using CSS when written as HTML.

This fixes #2160.
2015-05-23 14:06:32 +02:00
Albert Krewinkel
b61355cecd Org reader: generalize code block result parsing
Code blocks can be followed by optional result blocks, representing the
output generated by running the code in the code block.  It is possible
to choose whether one wants to export the code, the result, both or
none.

This patch allows any kind of `Block` as the result.  Previously, only
example code blocks were recognized.
2015-05-23 13:22:07 +02:00
Albert Krewinkel
40fb102417 Reorder block arguments parsing code
Group code used to parse block arguments together in one place.  This
seems better than having part of the code mixed between unrelated
parsing state changing functions.
2015-05-23 13:17:10 +02:00
John MacFarlane
d5f367d04b EPUB writer: Split references into separate chapter.
Previously the div-enclosed reference section produced
by pandoc-citeproc would not be split into its own chapter,
which caused various problems.

See #2162, #2163.

I'm not sure this is a complete fix.  I note that the bibliography
doesn't appear in nav or toc, which seems bad.
2015-05-21 00:45:57 -07:00
John MacFarlane
57e16e3287 PDF writer: Print temp dir on --verbose.
This might help diagnose #777.
2015-05-20 15:43:42 -07:00
John MacFarlane
f532dd69c9 DocBook writer: add id to para if in Div with id element.
This makes the writer work properly with linked bibliographic
items with pandoc-citeproc.

Closes jgm/pandoc-citeproc#132.
2015-05-20 10:55:06 -07:00
John MacFarlane
24ee1ab4f7 Markdown reader: Made implicit header references case-insensitive.
Added `stateHeaderKeys` to `ParserState`; this is a `KeyTable`
like `stateKeys`, but it only gets consulted if we don't find
a match in `stateKeys`, and if `Ext_implicit_header_references`
is enabled.

Closes #1606.
2015-05-13 23:12:58 -07:00
John MacFarlane
e06810499e HTML reader: Support base tag.
We only support the href attribute, as there's no place for
"target" in the Pandoc document model for links.

Added HTML reader test module, with tests for this feature.

Closes #1751.
2015-05-13 20:53:19 -07:00
John MacFarlane
75cfa7b462 Beamer: mark slide as [fragile] if header has fragile class.
Closes #2119.
2015-05-13 20:10:54 -07:00
John MacFarlane
0fa753b999 EPUB writer: Properly handle image URLs without an extension.
We now look at the mime type from the server and attach an
appropriate extension.

Closes #1855.
2015-05-13 14:52:51 -07:00
John MacFarlane
c9cb313a47 Fixed regression in charsInBalancedBrackets.
Introduced by e9d7504.

This regression caused link and image references containing
raw tex not to parse correctly.

Added test.

Closes #2150.
2015-05-13 10:16:06 -07:00
John MacFarlane
4560447041 Don't use sup element for epub footnotes.
Instead, just use an a element with class `footnoteRef`.
This allows more styling options, and provides better results
in some readers (e.g. iBooks, where anything inside the a
tag breaks popup footnotes).

Closes #1995.
2015-05-11 21:58:01 -07:00
John MacFarlane
9857aa866a HTML reader: Fixed detection of self-closing tags.
Earlier versions had a bug and would wrongly think
opening tags containing attributes with slashes in them
were self-closing.

Closes #2146.
2015-05-11 16:17:20 -07:00
gohai
8af168a7fe Fix image URIs in ICML output (v2)
InDesign expects LinkResourceURI to start with "file:" for local filenames, and won't render/link the image without.
2015-05-11 15:49:36 +02:00
John MacFarlane
4b251e93b4 ImageSize: fixed some exif parsing bugs.
Closes #1834.  The image originally supplied works fine now
with pandoc.
2015-05-10 16:52:37 -07:00
John MacFarlane
60bf4a8bfb Improved warnings when image size can't be determined.
Closes #1834.
2015-05-09 23:56:53 -07:00
John MacFarlane
31b3f2ef88 ImageSize: Use runGetOrFail with binary 0.7+. 2015-05-09 22:28:49 -07:00
John MacFarlane
a60c65c4e9 ImageSize: make jpeg header parsing routines return Either.
See #1834.
2015-05-09 21:55:19 -07:00
John MacFarlane
6fe243abbd ImageSize: make imageSize return an Either, not a Maybe.
This will give us better error reporting options.
This is part of a fix for #1834.
2015-05-09 21:32:31 -07:00
John MacFarlane
7920a1a469 Revert "EPUB writer: stylesheet changes. Closes #2040."
This reverts commit 1c2951dfd9.

See #2040.

The semantics was too squishy.  `--css` takes a URL, but
for EPUB we need files that we can read.  I prefer keeping
the old system for now, with `--epub-stylesheet`.
2015-05-09 00:07:27 -07:00
John MacFarlane
1c2951dfd9 EPUB writer: stylesheet changes. Closes #2040.
* Allow `--css` to be used to specify stylesheets.
* Deprecated `--epub-stylesheet` and made it a synoynym of
  `--css`.
* If a code block with class "css" is given as contents of the
  `stylesheet` metadata field, use its literal code as contents of
  the epub stylesheet.  Otherwise, treat it as a filename and
  read the file.
* Note: `--css` and `stylesheet` in metadata are not compatible.
  `stylesheet` takes precedence.
2015-05-08 23:47:50 -07:00
John MacFarlane
472c1424ba Deal with deprecation warning in Custom. 2015-05-05 12:46:20 -07:00
John MacFarlane
d19a347fd5 UTF8: Better handling of bare CRs in input files.
Previously we just stripped them out; now we convert
other line ending styles to LF line endings.

Closes #2132.
2015-05-05 12:42:50 -07:00
John MacFarlane
1b44acf0c5 SelfContained: properly handle data URIs in css urls.
Also use a proper css parser (adds dependency on text-css).

Closes #2129.
2015-05-04 16:00:28 -07:00
John MacFarlane
64b1394fe2 Make sure a closing </div> doesn't get included in a defn list item.
Closes #2127.
2015-05-03 15:06:40 -07:00
John MacFarlane
8a77eb4c9c LaTeX writer: Add a \label in \hyperdef for Div, Span.
Otherwise links don't work.
2015-05-02 17:58:16 -07:00
John MacFarlane
f1aaad9e86 EPUB writer: Use plain writer for metadata dc: fields.
This gives better results when we have, e.g. multiple paragraphs.
Note that tags aren't allowed in these fields.

Closes #2121.
2015-05-01 22:36:38 -07:00
John MacFarlane
9b2f645e2a SelfContained: cssURLs no longer tries to fetch fragment URLs.
The current test is: does the URL start with a `#`?
Closes #2121.
2015-05-01 22:15:43 -07:00
Alfred Wechselberger
7031748a43 Added woff2 to MIME types 2015-04-29 14:10:30 -07:00
John MacFarlane
55b7afc674 HTML reader: Allow multiple colgroups in table.
Closes #2122.
2015-04-29 12:05:38 -07:00