Commit graph

3913 commits

Author SHA1 Message Date
John MacFarlane
00b11bcbcf Fixed exponential parsing bug in textile reader.
Closes #3020.
2016-07-14 08:42:38 -07:00
Albert Krewinkel
529146decf Org reader: fix parsing of verbatim inlines
Org rules for allowed characters before or after markup chars were not
checked for verbatim text.  This resultet in wrong parsing outcomes of
if the verbatim text contained e.g. space enclosed markup characters as
part of the text (`=is_substr = True=`).  Forcing the parser to update
the positions of allowed/forbidden markup border characters fixes this.

This fixes #3016.
2016-07-14 13:33:25 +02:00
John MacFarlane
e2659a46db Merge pull request #3014 from tarleb/org-writer-div
Org writer: improve Div handling
2016-07-05 12:46:13 -07:00
Albert Krewinkel
5378b7c5bd
Org writer: improve Div handling
Div blocks handling is changed to make the output look more like
idiomatic org mode:

  - Div-wrapped content is output as-is if the div's attribute is the
    null attribute.
  - Div containers with an id but neither classes nor key-value pairs
    are unwrapped and the id is added as an anchor.
  - Divs with classes associated with greater block elements are
    wrapped in a `#+BEGIN`...`#+END` block.
  - The old behavior for Divs with more complex attributes is kept.
2016-07-05 11:49:45 +02:00
Albert Krewinkel
f417fecf5f
Org reader: replace ugly code with view pattern
Some less-than-smart code required a pragma switching of overlapping
pattern warnings in order to compile seamlessly.  Using view patterns
makes the code easier to read and also doesn't require overlapping
pattern checks to be disabled.
2016-07-04 11:20:05 +02:00
John MacFarlane
e548b8df07 Merge pull request #3010 from tarleb/org-header-tree
Org reader: support archived trees, headline levels export setting
2016-07-03 22:57:22 -07:00
John MacFarlane
4099b2dca4 Odt reader: Removed redundant Monoid constraints. 2016-07-03 22:47:32 -07:00
John MacFarlane
b203a31ba7 Fix warning for parseURl import. 2016-07-03 22:26:08 -07:00
John MacFarlane
261c3af053 CPP workaround for deprecation of parseUrl in http-client. 2016-07-03 21:29:47 -07:00
Albert Krewinkel
5ffa4abf72
Org reader: support headline levels export setting
The depths of headlines can be modified using the `H` option.  Deeper
headlines will be converted to lists.
2016-07-03 23:28:45 +02:00
John MacFarlane
40caf516aa Allow 'standout' as a beamer frame option.
## Slide title {.standout}

Closes #3007.
2016-07-03 11:56:03 -07:00
Albert Krewinkel
c1f6bd2640
Org reader: put export setting parser into module
Export option parsing is distinct enough from general block parsing to
justify putting it into a separate module.
2016-07-02 13:14:09 +02:00
John MacFarlane
e0cc9e4463 LaTeX reader: strip off double quotes around image source if present.
Avoids interpreting these as part of the literal filename.
See #2825.
2016-07-01 15:47:42 -07:00
John MacFarlane
7e712abfa6 LaTeX writer: don't URI-escape image source.
Usually this is a local file, and replacing spaces with `%20`
ruins things.  Closes #2825.
2016-07-01 15:41:33 -07:00
Albert Krewinkel
c4cf6d237f
Org reader: support archived trees export options
Handling of archived trees can be modified using the `arch` option.
Archived trees are either dropped, exported completely, or collapsed to
include just the header when the `arch` option is nil, non-nil, or
`headline`, respectively.
2016-07-01 23:05:33 +02:00
Albert Krewinkel
1ebaf6de11
Org reader: refactor comment tree handling
Comment trees were handled after parsing, as pattern matching on lists
is easier than matching on sequences.  The new method of reading
documents as trees allows for more elegant subtree removal.
2016-07-01 23:05:32 +02:00
Albert Krewinkel
17484ed01a
Org reader: parse as headlines, convert to blocks
Emacs org-mode is based on outline-mode, which treats documents as trees
with headlines are nodes.  The reader is refactored to parse into a
similar tree structure.  This simplifies transformations acting on
document (sub-)trees.
2016-07-01 23:05:32 +02:00
Albert Krewinkel
2f8d6755f4
Org reader: improve tag and properties type safety
Specific newtype definitions are used to replace stringly typing of tags
and properties.  Type safety is increased while readability is improved.
2016-07-01 23:05:32 +02:00
John MacFarlane
b3382cf377 ZimWiki writer: removed commented out code that confused Haddock.
See https://travis-ci.org/jgm/pandoc/jobs/141542247
2016-07-01 10:39:32 -07:00
Alex Ivkin
a73c95f61d Added Zim Wiki writer, template and tests. 2016-06-30 23:59:43 -07:00
Jesse Rosenthal
b103f829f0 Docx writer: set paragraph to FirstPara after display math
We treat display math like block quotes, and apply FirstParagraph style
to paragraphs that follow them. These can be styled as the user
wishes. (But, when the user is using indentation, this allows for
paragraphs to continue after display math without indentation.)
2016-07-01 01:14:16 -04:00
Jesse Rosenthal
2c62f0e122 Writers: treat SoftBreak as space for stripping
In Writers.Shared, we strip leading and trailing spaces for display
math. Since SoftBreak's are treated as spaces, we should strip those
too.
2016-07-01 00:52:52 -04:00
John MacFarlane
3429fa6438 LaTeX reader: fixed \cite so it is a NormalCitation not AuthorInText. 2016-06-29 07:59:00 -07:00
John MacFarlane
a349814665 Merge pull request #3001 from tarleb/org-figure-label
Org reader: support figure labels
2016-06-26 17:51:51 -07:00
Albert Krewinkel
0f3f5ce1a1 Org reader: support figure labels
Figure labels given as `#+LABEL: thelabel` are used as the ID of the
respective image.  This allows e.g. the LaTeX to add proper `\label`
markup.

This fixes half of #2496 and #2999.
2016-06-26 20:42:22 +02:00
John MacFarlane
38c97320ef Textile reader: Fix overly aggressive interpretation as images.
Spaces are not allowed in the image URL in textile.

Closes #2998.
2016-06-25 14:04:47 -07:00
John MacFarlane
d283f9c864 Fixed RST links with no explicit link text.
The link

    `<foo>`_

should have `foo` as both its link text and its URL.

See RST spec at
<http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#embedded-uris-and-aliases>

"The reference text may also be omitted, in which case the URI will be
duplicated for use as the reference text. This is useful for relative
URIs where the address or file name is also the desired reference text:

See `<a_named_relative_link>`_ or `<an_anonymous_relative_link>`__
for details."

Closes Debian #828167 -- reported by Christian Heller.
2016-06-25 10:56:37 -07:00
John MacFarlane
a4294800bf Make --webtex work with the Markdown writer.
Closes #1177.  This is a convenient option for people using
websites whose Markdown flavors don't provide for math.
2016-06-24 14:57:21 -07:00
John MacFarlane
69e59e7f29 Process markdown extensions on command line in L->R order.
Previously they were processed, very unintuitively, in R->L
order, so that `markdown-tex_math_dollars+tex_math_dollars`
had `tex_math_dollars` disabled.

Closes #2995.
2016-06-23 23:04:42 -07:00
John MacFarlane
a820c1bd1c Textile reader: fixed attributes.
Attributes can't be followed by a space.

So,

    _(class)emph_

but

    _(noclass) emph_

Closes #2984.
2016-06-23 10:28:54 -07:00
John MacFarlane
139d418d4b Markdown writer: use raw HTML for simple, pipe tables with linebreaks.
Markdown line breaks involve a newline, and simple and pipe
tables can't contain one.

Closes #2993.
2016-06-23 10:00:33 -07:00
Jesse Rosenthal
032ba8dd0c Docx reader: Add warning for advanced comment formatting.
We can't guarantee we'll convert every comment correctly, though we'll
do the best we can. This warns if the comment includes something other
than Para or Plain.
2016-06-23 10:50:46 -04:00
Jesse Rosenthal
5f0cd89129 docx reader: enable warnings in top-level reader
Previously we had only allowed for warnings in the parser. Now we allow
for them in the `Docx.hs` as well. The warnings are simply concatenated.
2016-06-23 10:50:46 -04:00
Jesse Rosenthal
8bb739f7ff Docx reader: add simple comment functionality.
This adds simple track-changes comment parsing to the docx reader. It is
turned on with `--track-changes=all`. All comments are converted to
inlines, which can list some information. In the future a warning will
be added for comments with formatting that seems like it will be
excessively denatured.

Note that comments can extend across blocks. For that reason there are
two spans: `comment-start` and `comment-end`. `comment-start` will
contain the comment. `comment-end` will always be empty. The two will be
associated by a numeric id.
2016-06-23 10:50:46 -04:00
Jesse Rosenthal
cbc2c15f0f Shared: Add BlockQuote to blocksToInlines 2016-06-23 10:50:46 -04:00
Jesse Rosenthal
2b701f9389 Shared: introduce blocksToInlines function
This is a lossy function for converting `[Block] -> [Inline]`. Its main
use, at the moment, is for docx comments, which can contain arbitrary
blocks (except for footnotes), but which will be converted to spans.

This is, at the moment, pretty useless for everything but the basic
`Para` and `Plain` comments. It can be improved, but the docx reader
should probably emit a warning if the comment contains more than this.
2016-06-23 10:50:46 -04:00
John MacFarlane
319a56aefc Merge pull request #2992 from tarleb/org-partial-functions
Org reader: remove partial functions
2016-06-22 12:51:37 -07:00
John MacFarlane
ba7868765a HTML writer: Better support for raw LaTeX environments.
Previously we just passed all raw TeX through when MathJax
was used for HTML math.  This passed through too much.
With this patch, only raw LaTeX environments that MathJax
can handle get passed through.

This patch also causes raw LaTeX environments to be treated
as math, when possible, with MathML and WebTeX output.

Closes #2758.
2016-06-22 11:47:44 -07:00
Albert Krewinkel
7df656089f Org reader: remove partial functions
Partial functions like `head` lead to avoidable errors and should be
avoided.  They are replaced with total functions.

This fixes #2991.
2016-06-21 23:51:15 +02:00
John MacFarlane
58d60b1c85 Changed email-obfuscation default to no obfuscation.
- `writerEmailObfuscation` in `defaultWriterOptions` is now
  `NoObfuscation`
- the default for the command-line `--email-obfuscation` option is
  now `none`.

Closes #2988.
2016-06-20 10:37:23 -07:00
Albert Krewinkel
29552eff3e Org reader: support arbitrary raw inlines
Org mode allows arbitrary raw inlines ("export snippets" in Emacs
parlance) to be included as `@@format:raw foreign format text@@`.

Support for this features is added to the Org reader.
2016-06-13 23:53:14 +02:00
Albert Krewinkel
cf2502de8f Org writer: support arbitrary raw inlines
Org mode allows arbitrary raw inlines ("export snippets" in Emacs
parlance) to be included as `@@format:raw foreign format text@@`.

Support for this features is added to the Org writer.
2016-06-13 23:13:05 +02:00
Ivo Clarysse
240cdfd1b3 Docbook writer: Declare xlink namespace in Docbook5 output 2016-06-07 06:03:06 -07:00
Albert Krewinkel
8a9f5915ab Org reader: add support for "Berkeley-style" cites
A specification for an official Org-mode citation syntax was drafted by
Richard Lawrence and enhanced with the help of others on the orgmode
mailing list.  Basic support for this citation style is added to the
reader.

This closes #1978.
2016-06-05 11:28:57 +02:00
Albert Krewinkel
06dfe3276d Org reader: add semicolon to list of special chars
Semicolons are used as special characters in citations syntax.  This
ensures the correct parsing of Pandoc-style citations:

    [prefix; @key; suffix]

Previously, parsing would have failed unless there was a space or other
special character as the last <prefix> character.
2016-06-05 11:28:57 +02:00
Albert Krewinkel
f56792927f
Org reader: support special strings export option
Parsing of special strings (like '...' as ellipsis or '--' as en dash)
can be toggled using the `-` option.
2016-06-03 11:41:23 +02:00
Albert Krewinkel
d4de8451b9 Org reader: support emphasized text export option
Parsing of emphasized text can be toggled using the `*` option.  This
influences parsing of text marked as emphasized, strong, strikeout, and
underline.  Parsing of inline math, code, and verbatim text is not
affected by this option.
2016-06-03 11:17:02 +02:00
Albert Krewinkel
952a7dac58 Org reader: support smart quotes export option
Reading of smart quotes can be toggled using the `'` option.
2016-06-03 11:16:35 +02:00
Albert Krewinkel
729fca311f Org reader: drop unused field from parser state
The `OrgParserState` contained both an `orgStateMeta` and
`orgStateMeta'` field, the former for plain meta information and the
latter for F-monad wrapped meta info.  The plain meta info is only used
to make `OrgParserState` an instance of the `HasMeta` class, which in
turn is never used in the reader.  The (F Meta) version is hence renamed
to the "un-primed" version while the other one is dropped.
2016-06-02 15:30:21 +02:00
Albert Krewinkel
512bf2eebf Org reader: undo code duplication
Some code was duplicated (copy-pasted) or placed in an inappropriate
module during the modularization refactoring.  Those functions are moved
into a `Shared` module, as was originally intended but forgotten.
Better documentation of the respective functions is a positive
side-effect.
2016-06-02 15:30:20 +02:00
John MacFarlane
061bc60f70 Merge pull request #2950 from tarleb/org-ref-support
Org reader: support org-ref style citations
2016-05-31 12:44:29 -07:00
John MacFarlane
669ecbd4ab Merge pull request #2954 from tarleb/org-export-blocks
Org export blocks
2016-05-31 11:16:08 -07:00
John MacFarlane
561afac0bc brazilian -> brazil for polyglossia.
Closes #2953.
2016-05-31 11:15:21 -07:00
Albert Krewinkel
c17c62a2c7 Org reader: support new syntax for export blocks
Org-mode version 9 usees a new syntax for export blocks.  Instead of
`#+BEGIN_<FORMAT>`, where `<FORMAT>` is the format of the block's
content, the new format uses `#+BEGIN_export <FORMAT>` instead.  Both
types are supported.
2016-05-29 21:08:50 +02:00
Albert Krewinkel
4f84cf02c7 Org reader: refactor BEGIN…END block parsing
- Reorder functions, grouping related functions together.

- Demote simple functions to local functions if they are used just once.

- Rename and document functions to increase code readability.

- Fix handling of whitespace in blocks, allowing content to be indented
  less then the block header.
2016-05-29 21:08:36 +02:00
Albert Krewinkel
bc82123122 Org reader: rename parseInlines to inlines
Having a function starting with `parse` in a parsing library is overly
redundant.  Let's use a nicer, shorter name more in line with the rest
of the library.
2016-05-29 21:08:31 +02:00
Albert Krewinkel
f226cb88b0 Org reader: support org-ref style citations
The *org-ref* package is an org-mode extension commonly used to manage
citations in org documents.  Basic support for the `cite:citeKey` and
`[[cite:citeKey][prefix text::suffix text]]` syntax is added.
2016-05-27 21:19:28 +02:00
Albert Krewinkel
eea6d6568f Org reader: extract blocks parser to module
Block parsing code is moved to a separate module.

This is part of the Org-mode reader cleanup effort.
2016-05-25 23:21:40 +02:00
Albert Krewinkel
39e8b4276e Org reader: extract inline parser to module
Inline parsing code is moved to a separate module.  Parsers for block
starts are extracted as well, as those are used in the `endline` parser.

This is part of the Org-mode reader cleanup effort.
2016-05-25 22:54:45 +02:00
Albert Krewinkel
a340c7249f Org reader: extract parsing function to module
The Org-mode reader uses many functions defined in the
`Text.Pandoc.Parsing` utility module.  Some of the functions are
overwritten with versions adapted to Org-mode idiosyncrasies.  These
special functions, as well as the normal Pandoc versions, are combined
in a single module to increase the ease of use.

This leads to decoupling of Org-mode and Pandoc and hence to slightly
cleaner code.  The downside is code-bloat due to repeated import/export
statements.
2016-05-25 22:53:55 +02:00
mb21
340f0aaef8 EPUB Reader: normalise Link id as well 2016-05-24 17:42:37 +02:00
Carlos Sosa
5667e0959a Org writer: add drawer capability
For the implementation of the Drawer element in the Org Writer, we make
use of a generic Block container with attributes.  The presence of a
`drawer` class defines that the `Div` constructor is a drawer. The first
class defines the drawer name to use. The key-value list in the
attributes defines the keys to add inside the Drawer. Lastly, the list
of Block elements contains miscellaneous blocks elements to add inside
of the Drawer.

Signed-off-by: Albert Krewinkel <albert@zeitkraut.de>
2016-05-23 10:00:14 +02:00
Albert Krewinkel
a4717c2fc5 Org reader: respect drawer export setting
The `d` export option can be used to control which drawers are exported
and which are discarded.  Basic support for this option is added here.
2016-05-23 09:44:37 +02:00
Albert Krewinkel
f3d27e4c80 Org reader/writer: use CUSTOM_ID in properties
The `ID` property is reserved for internal use by Org-mode and should
not be used.  The `CUSTOM_ID` property is to be used instead, it is
converted to the `ID` property for certain export format.

The reader and writer erroneously used `ID`.  This is corrected by using
`CUSTOM_ID` where appropriate.
2016-05-22 23:01:47 +02:00
John MacFarlane
446cf6a1cf HTML reader: fixed bug in pClose.
This caused exponential parsing behavior in documnets
with unclosed tags in dl, dd, dt.
2016-05-21 23:05:00 -07:00
Albert Krewinkel
cd3282b08d Org writer: add :PROPERTIES: drawer support
This allows header attributes to be added to org documents in the form
of `:PROPERTIES:` drawers.  All available attributes are stored as
key/value pairs.  This reflects the way the org reader handles
`:PROPERTIES:` blocks.

This closes #1962.
2016-05-20 17:01:50 +02:00
Albert Krewinkel
68d388f833 Org reader: add :PROPERTIES: drawer support
Headers can have optional `:PROPERTIES:` drawers associated with them.
These drawers contain key/value pairs like the header's `id`.  The
reader adds all listed pairs to the header's attributes; `id` and
`class` attributes are handled specially to match the way `Attr` are
defined.

This also changes behavior of how drawers of unknown type are handled.
Instead of including all unknown drawers, those are not read/exported,
thereby matching current Emacs behavior.

This closes #1877.
2016-05-20 17:01:26 +02:00
John MacFarlane
0958f2f5d0 Merge pull request #2927 from tarleb/org-attr-html
Org reader support for ATTR_HTML statements
2016-05-19 10:44:11 -07:00
Albert Krewinkel
16e233475a Org reader: add support for ATTR_HTML attributes
Arbitrary key-value pairs can be added to some block types using a
`#+ATTR_HTML` line before the block.  Emacs Org-mode only includes these
when exporting to HTML, but since we cannot make this distinction here,
the attributes are always added.

The functionality is now supported for figures.

This closes #1906.
2016-05-19 09:55:12 +02:00
Albert Krewinkel
26e8d98be2 Org reader: use custom anyLine
Additional state changes need to be made after a newline is parsed,
otherwise markup may not be recognized correctly.

This fixes a bug where markup after certain block-types would not be
recognized. E.g. `/emph/` in the following snippet was not parsed as
emphasized.

    foo
    # comment
    /emph/
2016-05-19 09:35:47 +02:00
Albert Krewinkel
1dda535378 Org reader: refactor block attribute handling
A parser state attribute was used to keep track of block attributes
defined in meta-lines.  Global state is undesirable, so block attributes
are no longer saved as part of the parser state.  Old functions and the
respective part of the parser state are removed.
2016-05-19 09:33:51 +02:00
John MacFarlane
847167804a EPUB reader: unescape URIs in spine.
This should fix #2924.

Testing on the epub that caused the problem originally
would be welcome.
2016-05-17 09:38:52 -07:00
John MacFarlane
7be30a40f1 LaTeX writer: Don't escape underscore in labels.
Previously they were escaped as ux5f.

Closes #2921.
2016-05-17 09:18:52 -07:00
John MacFarlane
344412cba8 Merge pull request #2894 from sid-kap/rst-code-class
Add class option for code block in RST reader
2016-05-12 00:03:14 -07:00
John MacFarlane
609fb33302 Merge pull request #2913 from jlduran/strut-minipage-tables
Retake on strut with \minipage inside tables
2016-05-11 23:57:47 -07:00
John MacFarlane
3800cb3d42 Merge pull request #2912 from tarleb/org-export-settings
Org reader: basic support for export settings
2016-05-11 13:36:02 -07:00
Albert Krewinkel
be5cccf248 Org reader: parse but ignore export options
All known export options are parsed but ignored.
2016-05-11 19:13:43 +02:00
Albert Krewinkel
76143de97e Org reader: add support for sub/superscript export options
Org-mode allows to specify export settings via `#+OPTIONS` lines.
Disabling simple sub- and superscripts is one of these export options,
this options is now supported.
2016-05-11 19:13:43 +02:00
Albert Krewinkel
7a0729ea09 Org reader: move parser state into separate module
The org reader code has become large and confusing.  Extracting smaller
parts into submodules should help to clean things up.
2016-05-11 19:13:42 +02:00
Jose Luis Duran
ec2fc30288 Retake on strut with \minipage inside tables
Reimplement on 4c684561ee

The problem with 4c68456 was a space between the cell contents and the
`\strut` that affected the alignment.
2016-05-11 14:02:09 -03:00
John MacFarlane
f7601297f0 Avoid lazy foldl in LaTeX writer. 2016-05-09 18:25:57 -07:00
John MacFarlane
fd9ec835ec Merge pull request #2907 from tarleb/org-fixes
Org fixes (reader and writer)
2016-05-09 10:17:56 -07:00
Albert Krewinkel
d32878b84b Org writer: print empty table rows
Empty table rows should not be dropped from the output, so row-height is
always set to be at least 1.
2016-05-09 19:06:24 +02:00
Albert Krewinkel
10a809f126 Org reader: fix inline-LaTeX regression
The last fix for whitespace handling of inline LaTeX commands was
incorrect, preventing correct recognition of inline LaTeX commands which
contain spaces.  This fix ensures that only trailing whitespace is cut
off.
2016-05-09 19:06:04 +02:00
roblabla
acd492c7f4 Allow spaces before '!' in MediaWiki table header 2016-05-09 17:54:40 +02:00
John MacFarlane
21d1a3b57c Merge pull request #2898 from tarleb/org-table-refactoring
Org reader: table parsing code refactoring and fixes
2016-05-05 16:22:56 -07:00
Albert Krewinkel
405c3e9c36 Org reader: fix spacing after LaTeX-style symbols
The org-reader was droping space after unescaped LaTeX-style symbol
commands: `\ForAll \Auml` resulted in `∀Ä` but should give `∀ Ä`
instead.  This seems to be because the LaTeX-reader treats the
command-terminating space as part of the command.  Dropping the trailing
space from the symbol-command fixes this issue.
2016-05-04 23:16:23 +02:00
Albert Krewinkel
2d825603c6 Org reader: fix handling of empty table cells, rows
This fixes Org mode parsing of some corner cases regarding empty cells
and rows.  Empty cells weren't parsed correctly, e.g. `|||` should be
two empty cells, but would be parsed as a single cell containing a pipe
character.  Empty rows where parsed as alignment rows and dropped from
the output.

This fixes #2616.
2016-05-04 16:02:03 +02:00
Albert Krewinkel
a51e4e8215 Org reader: refactor rows-to-table conversion
This refactores the codes conversing a list table lines to an org table
ADT.  The old code was simplified and is now slightly less ugly.
2016-05-04 16:01:22 +02:00
Albert Krewinkel
d5e4bc179c Org reader: stop padding short table rows
Emacs Org-mode doesn't add any padding to table rows.  The first
row (header or first body row) is used to determine the column count, no
other magic is performed.

The org reader was padding rows to the length of the longest table row.
This was done due to a misunderstanding of how Org handles tables.  This
feature reflected how Org-mode handles tables when pressing <TAB>.  The
Org exporter however, which is what the reader should implement, doesn't
do any of this.  So this was a mis-feature that made the reader more
complex and reduced comparability.  It was hence removed.
2016-05-04 15:48:07 +02:00
John MacFarlane
ee4e863225 Merge pull request #2890 from bcdevices/docbook5-writer
Docbook5 write support
2016-05-01 22:43:38 -07:00
Sidharth Kapur
490c2b543d Add class option for code block in RST reader
According to http://docutils.sourceforge.net/docs/ref/rst/directives.html#code,
the code directive supports the ":class:" option.
2016-05-01 21:42:58 -05:00
Jesse Rosenthal
99eac312fe Binary fmts throw PandocError on zip-archive fail
Commit 91dc3342 made `readDocx` throw PandocError if there was an
unarchiving error. This extends that fix to `readOdt` and `readEPUB`.
2016-05-01 18:27:20 -04:00
John MacFarlane
1fbe79db05 LaTeX writer: use {} around options containing special chars.
Closes #2892.
2016-05-01 11:20:26 -07:00
Jesse Rosenthal
91dc334249 Docx Reader: Throw PandocError on unzip failure
Previously, readDocx would error out if zip-archive failed. We change
the archive extraction step from `toArchive` to `toArchiveOrFail`, which
returns an Either value.
2016-05-01 12:17:12 -04:00
Ivo Clarysse
fd36e6b64a Docbook5 writer: Properly handle ulink/link 2016-04-29 16:06:55 -07:00
Ivo Clarysse
987ec3a752 Write out Docbook 5 namespace 2016-04-29 15:43:15 -07:00
John MacFarlane
aa4a1d527a HTML writer: ensure mathjax link is added when math appears in footnote.
Previously if a document only had math in a footnote,
the MathJax link would not be added.

Closes #2881.
2016-04-29 14:54:54 -07:00
Ivo Clarysse
271cb4d845 Add docbook5 writer support 2016-04-29 14:00:46 -07:00
John MacFarlane
32f1b0a5f1 Revert "LaTeX writer: Add \strut to fix multiline tables"
This reverts commit 4c684561ee.

See
https://groups.google.com/d/msg/pandoc-discuss/u6J-_aCProU/UufN3IYRAgAJ

This should fix uneven spacing issues in multiline tables.
2016-04-27 17:25:45 -07:00
John MacFarlane
ece215ed7d Merge pull request #2735 from mb21/patch-1
LaTeX Writer: fix polyglossia to babel env mapping
2016-04-26 23:09:02 -07:00
John MacFarlane
cc0527bf31 Merge pull request #2829 from adunning/patch-1
LaTeX writer: Add missing languages.
2016-04-26 23:08:15 -07:00
John MacFarlane
cc82851a6a Merge pull request #2876 from shosti/org-code-indent
Ignore leading space in org code blocks
2016-04-26 23:07:29 -07:00
John MacFarlane
1985164816 LaTeX writer: ignore --incremental unless -t beamer.
Closes #2843.
2016-04-26 21:50:37 -07:00
Emanuel Evans
1bfe39e24c
Ignore leading space in org code blocks
Fixes #2862

Also fix up tab handling for leading whitespace in code blocks.
2016-04-26 10:29:59 -07:00
Jesse Rosenthal
a385ee1d4f Docx Reader: parse moveTo and moveFrom
`moveTo` and `moveFrom` are track-changes tags that are used when a
block of text is moved in the document. We now recognize these tags and
treat them the same as `insert` and `delete`, respectively. So,
`--track-changes=accept` will show the moved version, while
`--track-changes=reject` will show the original version.
2016-04-15 14:09:18 -04:00
John MacFarlane
4b49f923cb Markdown reader: Fix pandoc title blocks with lines ending in 2 spaces.
Closes #2799.

Also added -s to markdown-reader-more test.
2016-04-10 09:13:53 -07:00
John MacFarlane
773bbb8fc7 Markdown + HTML readers: be more forgiving about unescaped &.
We are now more forgiving about parsing invalid HTML with
unescaped `&` as raw HTML.  (Previously any unescaped `&`
would cause pandoc not to recognize the string as raw HTML.)

Closes #2410.
2016-04-10 07:39:36 -07:00
Andrew Dunning
9765ef2ce6 LaTeX writer: Add missing languages.
Updates the list from the hyphenation files at <http://mirror.ctan.org/language/hyph-utf8/tex/generic/hyph-utf8/loadhyph/>.
2016-04-01 16:47:33 +01:00
Andrew Dunning
0c37a7c488 Recognize la-x-classic as Classical Latin.
This allows one to access the hyphenation patterns at <http://mirrors.ctan.org/language/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-la-x-classic.tex>, using its private language tag.
2016-03-30 14:15:47 +01:00
John MacFarlane
f74498cb47 EPUB writer: set 'navpage' variable on nav page.
This allows templates to treat it differently.
2016-03-26 13:14:50 -07:00
John MacFarlane
9742c48647 Removed two superfluous lines. 2016-03-25 09:05:38 -07:00
John MacFarlane
f47b369f37 LaTeX writer: better positioning for hypertarget in figures.
Closes #2813.
2016-03-24 16:44:33 -07:00
John MacFarlane
bb6897a13e LaTeX writer: Fixed position of label in figures.
Partially addresses #2813.

This isn't perfect, because now the hypertarget is in the
wrong place -- when you link to the figure, the screen
is positioned with the caption at the top, and most of
the figure off screen.

So this needs a bit more tweaking.
2016-03-24 09:41:45 -07:00
John MacFarlane
499985c1a3 Updated copyright dates to include 2016. 2016-03-22 17:20:39 -07:00
John MacFarlane
b1ffdf3b01 Fixed bug in Markdown raw HTML parsing.
This was a regression, with the rewrite of `htmlInBalanced`
(from `Text.Pandoc.Readers.HTML`) in 1.17.

It caused newlines to be omitted in raw HTML blocks.

Closes #2804.
2016-03-22 16:56:10 -07:00
Mauro Bieg
44f95484a4 LaTeX Writer: fix polyglossia to babel env mapping
allow for optional argument in square brackets, closes #2728
2016-03-20 19:17:30 +01:00
John MacFarlane
3af753de47 Merge pull request #2637 from mb21/latex-figure-label
LaTeX writer: figure label
2016-03-19 13:56:14 -07:00
John MacFarlane
976e7e2054 ConTeXt writer: fix whitespace at line beginning in line blocks.
Add a `\strut` after `\crlf` before space.
Closes #2744, #2745.  Thanks to @c-foster.
This uses the fix suggested by @c-foster.

Mid-line spaces are still not supported, because of limitations
of the Markdown parser.
2016-03-18 16:36:56 -07:00
John MacFarlane
e821b05125 LaTeX writer: Avoid double toprule in headerless table with caption.
Closes #2742.
2016-03-18 16:16:18 -07:00
Jesse Rosenthal
28c7617f19 Docx reader: Handle alternate content
Some word functions -- especially graphics -- give various choices for
content so there can be backwards compatibility. This follows the
largely undocumented feature by working through the choices until we
find one that works.

Note that we had to split out the processing of child elems of runs into
a separate function so we can recurse properly. Any processing of an
element *within* a run (other than a plain run) should go into
`childElemToRun`.
2016-03-18 09:38:26 -04:00
Jesse Rosenthal
855c8b43f0 Docx reader: Don't make numbered heads into lists.
Word uses list numbering styles to number its headings. We only call
something a numbered list if it does not also heave a heading style.
2016-03-16 12:50:32 -04:00
Jesse Rosenthal
5c055b4cf3 Introduce file-scope parsing (parse-before-combine)
Traditionally pandoc operates on multiple files by first concetenating
them (around extra line breaks) and then processing the joined file. So
it only parses a multi-file document at the document scope. This has the
benefit that footnotes and links can be in different files, but it also
introduces a couple of difficulties:

  - it is difficult to join files with footnotes without some sort of
    preprocessing, which makes it difficult to write academic documents
    in small pieces.

  - it makes it impossible to process multiple binary input files, which
    can't be catted.

  - it makes it impossible to process files from different input
    formats.

This commit introduces alternative method. Instead of catting the files
first, it parses the files first, and then combines the parsed
output. This makes it impossible to have links across multiple files,
and auto-identified headers won't work correctly if headers in multiple
files have the same name. On the other hand, footnotes across multiple
files will work correctly and will allow more freedom for input formats.

Since ByteStringReaders can currently only read one binary file, and
will ignore subsequent files, we also changes the behavior to
automatically parse before combining if using the ByteStringReader. If
we use one file, it will work as normal. If there is more than one file
it will combine them after parsing (assuming that the format is the
same).

Note that this is intended to be an optional method, defaulting to
off. Turn it on with `--file-scope`.
2016-03-15 12:52:51 -04:00
Jesse Rosenthal
68fd333ec4 Add a general ByteStringReader with warnings.
Have docx reader use it.
2016-03-12 17:08:20 -05:00
Jesse Rosenthal
ee03e954d0 Add readDocxWithWarnings
The regular readDocx just becomes a special case.
2016-03-12 17:08:20 -05:00
Jesse Rosenthal
102ba9ecb8 Docx Reader: Add state to the parser, for warnings
In order to be able to collect warnings during parsing, we add a state
monad transformer to the D monad. At the moment, this only includes a
list of warning strings (nothing currently triggers them, however). We
use StateT instead of WriterT to correspond more closely with the
warnings behavior in T.P.Parsing.
2016-03-12 17:08:20 -05:00
John MacFarlane
a485c42d78 Fixed behavior of base tag.
+ If the base path does not end with slash, the last component
  will be replaced.  E.g. base = `http://example.com/foo`
  combines with `bar.html` to give `http://example.com/bar.html`.
+ If the href begins with a slash, the whole path of the base
  is replaced.  E.g. base = `http://example.com/foo/` combines
  with `/bar.html` to give `http://example.com/bar.html`.

Closes #2777.
2016-03-10 19:59:55 -08:00
mb21
139fa54d48 Docx Writer: handle image alt text
closes #2754
2016-03-10 08:56:08 +01:00
John MacFarlane
2b55b76ebe Markdown reader: Improved pipe table parsing.
Fixes #2765.
Added test case.
2016-03-09 11:46:00 -08:00
John MacFarlane
54a68616d7 Markdown reader: Clean up pipe table parsing. 2016-03-09 10:11:32 -08:00
John MacFarlane
6e950a8eb5 Markdown reader: allow + separators in pipe table cells.
We already allowed them in the header, but not in the body
rows, for some reason.  This gives compatibility with org-mode
tables.
2016-03-09 08:44:31 -08:00
John MacFarlane
4ed64835cb Markdown reader: don't cross line boundary parsing pipe table row.
Previously an emph element could be parsed across the newline
at the end of the pipe table row.

I thought this would help with #2765, but it doesn't.
2016-03-09 08:33:13 -08:00
John MacFarlane
6bfaa5ad15 DokuWiki writer: use $$ for display math. 2016-03-08 10:08:14 -08:00
Jesse Rosenthal
0b9c54d9f3 Docx reader: update feature checklist.
The feature checklist in the source code was out of date. Update.
2016-03-08 00:36:13 -05:00
John MacFarlane
7c6a3c0f69 LaTeX reader: handle interior $ characters in math.
e.g. `$$\hbox{$i$}$$`.

Partially addresses #2743.
2016-02-28 11:14:03 -08:00
Jesse Rosenthal
a7a0b452a5 Docx Reader: Get rid of Modifiable typeclass.
The docx reader used to use a Modifiable typeclass to combine both
Blocks and Inlines. But all the work was in the inlines. So most of the
generality was wasted, at the expense of making the code harder to
understand. This gets rid of the generality, and adds functions for
Blocks and Inlines. It should be a bit easier to work with going forward.
2016-02-26 08:57:53 -05:00
John MacFarlane
f2bd6fd37c Make protocol-relative URIs work again.
Closes #2737.
2016-02-23 21:58:10 -08:00
John MacFarlane
04d1e40f37 Markdown reader: use htmlInBalanced for rawVerbatimBlock.
This should give better performance.

See #2730.
2016-02-21 07:56:41 -08:00
John MacFarlane
9693de7f59 Fixed some linter warnings. 2016-02-20 22:16:39 -08:00
John MacFarlane
29706ee02d Merge pull request #2646 from tarleb/org-figure-with-no-name
Prefix even empty figure names with "fig:"
2016-02-20 21:44:39 -08:00
John MacFarlane
649cfb61b8 Merge pull request #2668 from monofon/fix/yaml-metadata-block-bottom-line
Markdown writer: Use hyphens for yaml metadata block bottom line
2016-02-20 21:43:15 -08:00
John MacFarlane
e369e60fb4 Merge pull request #2691 from tarleb/org-image-file-links
Org reader: Refactor link-target processing
2016-02-20 21:42:12 -08:00
John MacFarlane
1534052dd9 HTML reader: rewrote htmlInBalanced.
This version avoids an exponential performance problem with `<script>` tags,
and it should be faster in general.

Closes #2730.
2016-02-20 15:00:31 -08:00
Jesse Rosenthal
4438ff17fb LaTeX writer: clean up options parser.
Make sure that we require the closing bracket.
2016-02-18 23:35:38 -05:00
Jesse Rosenthal
4112b321cd LaTeX writer: treat memoir template with article opt as article
We currently treat all memoir templates as books. This means that pandoc
will infer the `--chapters` argument, even if the `article` iption is
set for memoir.

This commit makes pandoc treats the document as an article if there is
an article option (i.e., `\documentclass[12pt,article]{memoir}`).

Note that this refactors out the parsec parsers for document class and
options, to make it a little clearer what's going on.
2016-02-18 22:32:38 -05:00
John MacFarlane
b8dadc608a HTML reader: properly handle an empty cell in a simple table.
Closes #2718.
2016-02-16 11:05:51 -08:00
John MacFarlane
bbc67dee36 Removed tex_math_single_backslash from markdown_github options.
Closes #2707.
2016-02-09 22:30:52 -08:00
John MacFarlane
a692bd2872 Custom writer: Pass attributes parameter to CaptionedImage.
Closes #2697.
2016-02-05 16:49:27 -08:00
John MacFarlane
6cb4991f6b Markdown reader: Fixed bug with smart quotes around tex math.
Previously smart quotes were incorrect in the following:

    '$\neg(x \in x)$'.

(because of the following period).  This commit fixes the problem,
which was introduced by commit 4229cf2d92.
2016-02-04 12:09:26 -08:00
John MacFarlane
93a05dffd3 HTML writer: don't include alignment attribute for default table columns.
Previously these were given "left" alignment.  Better to leave off
alignment attributes altogether.

Closes #2694.
2016-02-03 13:31:21 -08:00
Jesse Rosenthal
2ee7752d14 Docx reader: Add a "Link" modifier to Reducible
We want to make sure that links have their spaces removed, and are
appropriately smushed together.

This closes #2689
2016-02-02 14:40:09 -05:00
Albert Krewinkel
92e6ae47f6 Org reader: Refactor link-target processing
Cleanup of the code for link target handling.  Most notably, the
canonicalization of a link is handled by a separate function.

This fixes #2684.
2016-01-31 23:23:09 +01:00
John MacFarlane
18745585c1 LaTeX reader: inlineCommand now gobbles an empty {} after any command.
This gives better results when people write e.g. `\TeX{}` in Markdown.

    \TeX{} and \LaTeX{}

now works as expected with `pandoc -f markdown -t latex`.

Closes #2687.
2016-01-31 10:52:46 -08:00
John MacFarlane
a02c26d9f4 HTML reader: handle multiple meta tags with same name.
Put them in a list in the metadata so they are all
preserved, rather than (as before) throwing out all
but one..
2016-01-29 11:51:01 -08:00
John MacFarlane
76983c31f2 Properly handle LaTeX "math" environment as inline math.
See #2171.
2016-01-29 10:11:45 -08:00
John MacFarlane
a1021bdda6 Textile reader: Support >, <, =, <> text alignment attributes.
Closes #2674.
2016-01-25 09:34:49 -08:00
John MacFarlane
11c5831a1f Make language extensions trigger highlighting.
For example, `py` will now work as well as `python`.
Closes jgm/highlighting-kate#83.
2016-01-24 14:15:06 -08:00
John MacFarlane
20170c328f Changed type of Shared.uniqueIdent argument from [String] to Set String.
This avoids performance problems in documents with many identically
named headers.

Closes #2671.
2016-01-22 10:16:47 -08:00
John MacFarlane
3b39b16a4b Merge pull request #2638 from c-forster/teiwriter
Add TEI Writer.
2016-01-21 15:23:50 -08:00
Henrik Tramberend
556d0c1810 Markdown writer: Use hyphens for yaml metadata block bottom line 2016-01-21 12:44:16 +01:00
Henrik Tramberend
7a18879a36 LaTeX writer: Allow more flexible table alignment 2016-01-20 13:21:26 +01:00
John MacFarlane
4d74a966c4 Added some entity tests in Markdown reader tests.
Change types of divs.

From Docbook "sect#" and "simplesect" to "level#" and
"section."

Add tests.

Add mention of TEI to README.

Small changes to TEI writer.
2016-01-19 14:03:57 -05:00
csforste
25a9ca697a Add TEI Writer. 2016-01-19 14:03:57 -05:00
John MacFarlane
f2c0974a26 HTML writer: harmless code simplification.
Since the 'math' is only put into the template if stMath is
set anyway, there's no need for this conditional.
2016-01-14 10:55:04 -08:00
John MacFarlane
f45a8e1d3b Org writer - pass through RawInline with format "org". 2016-01-13 23:01:51 -08:00
Albert Krewinkel
fabbd1aa79 Prefix even empty figure names with "fig:"
The convention used by pandoc for figures is to mark them by prefixing
the name with "fig:".  The org reader failed to do this if a figure had
no name.  The test for this was broken as well.

This fixes #2643.
2016-01-11 22:23:59 +01:00
John MacFarlane
f34382ef2c Depend on deepseq rather than deepseq-generics.
See fpco/stackage#1096.
2016-01-11 12:49:28 -08:00
John MacFarlane
8611ac56a6 Fixed regression in latex smart quote parsing.
Closes #2645.

In cases where a match was not found for a quote, everything
from the open quote to the end of the paragraph was being dropped.
2016-01-11 12:17:49 -08:00
mb21
1fde92053f LaTeX writer: figure label 2016-01-10 13:30:32 +01:00
John MacFarlane
1506e62f48 LaTeX writer: restore old treatment of Span.
A Span is rendered with surrounding {braces}.

This was a regression in 1.16.  Closes #2624.
2016-01-09 12:16:24 -08:00
John MacFarlane
729911ad74 Fixed shadowing warning. 2016-01-08 20:20:37 -08:00
John MacFarlane
5884ff6994 Work around tagsoup bug - not allowing uppercase x in hex entities.
Issue submitted at tagsoup.
2016-01-08 17:33:32 -08:00
John MacFarlane
12a5bd3c8d Entity handling fixes:
- Text.Pandoc.XML.fromEntities:  handle entities without a
  semicolon. Always lookup character references with the
  trailing ';', even if it wasn't present.  And never add
  it when looking up numerical entities.  (This is what
  tagsoup seems to require.)
- Text.Pandoc.Parsing.characterReference:  Always lookup
  character references with the trailing ';', and leave off
  the ';' when looking up numerical entities.

This fixes a regression for e.g. `&lang;`.
2016-01-08 17:08:01 -08:00
John MacFarlane
9320d359a2 Merge pull request #2629 from tarleb/org-noexport-fix
Fix function dropping subtrees tagged :noexport:
2016-01-07 11:34:27 -08:00
Albert Krewinkel
b3b00da43d Fix function dropping subtrees tagged :noexport:
Continue scanning for comment subtrees beyond only the first block.

Note to self: when writing an recursive function, don't forget to, you
know, actually recurse.

Shout to @mrvdb for noticing this.

This fixes #2628.
2016-01-07 19:56:44 +01:00
John MacFarlane
c4fdf28815 Markdown reader: renormalize table column widths if they exceed 100%.
Closes #2626.
2016-01-07 10:40:30 -08:00
John MacFarlane
a796538d84 RST, Markdown writers: Fixed rendering of grid tables with blank rows.
Closes #2615.
2016-01-05 14:04:10 -08:00
John MacFarlane
4990350fc7 Fixed v1.16 reversion with --latex-engine.
In 1.16 --latex-engine raises an error if a full path is
given. This commit fixes this reversion. Closes #2618.
2016-01-04 22:44:50 -08:00
John MacFarlane
97c9691696 Textile reader: don't allow block HTML tags in inline contexts.
The reader previously did allow this, following redcloth,
which happily parses

    Html blocks can be <div>inlined</div> as well.

as

    <p>Html blocks can be <div>inlined</div> as well.</p>

This is invalid HTML, and this kind of thing can lead
to parsing problems (stack overflows) as well.  So this
commit undoes this behavior.  The above sample now produces;

    <p>Html blocks can be</p>
    <div>
    <p>inlined</p>
    </div>
    <p>as well.</p>
2016-01-02 22:34:06 -08:00
John MacFarlane
75695b1817 MediaWiki reader: interpret markup inside <tt>, <code>.
Closes #2607.
2016-01-02 12:26:16 -08:00
John MacFarlane
a68e072bac MediaWiki writer: fix spacing issues.
+ Start cell on new line unless it's a single Para or Plain.
+ For single Para or Plain, insert a space after the `|` to
  avoid problems when the text begins with a character like
  `-`.

Closes #2604, closes #2606.
2016-01-02 12:14:12 -08:00
John MacFarlane
b27783e2ec Use cmark 0.5.
Closes #2605.
2015-12-29 19:52:06 -08:00
John MacFarlane
297345098d ConTeXt writer: set default layout based on margin-left, etc.
This sets up `\setuplayout` based on the variables `margin-left`,
`margin-right`, `margin-bottom`, and `margin-top`, if no layout
is given.
2015-12-22 13:28:11 -08:00
John MacFarlane
f9202f5d39 LaTeX writer: create defaults for geometry using margin-left etc.
If `geometry` has no value, but `margin-left`, `margin-right`,
`margin-top`, and/or `-margin-bottom` are given, a default value
for `geometry` is created from these.

Note that these variables already affect PDF production via HTML5
with wkhtmltopdf.
2015-12-22 13:10:46 -08:00
John MacFarlane
35e0544977 LaTeX reader: allow blank space between braced arguments of commands.
For example

    \foo
    {bar}
    {baz}

Closes #2592.
2015-12-22 11:06:06 -08:00
John MacFarlane
46e38d0a0a Improved treatment of margins in wkhtmltopdf. 2015-12-21 23:47:03 -08:00
John MacFarlane
8b8bdca56a Allow setting margins from metadata variables for wkhtmltopdf.
Variables margin-top, margin-bottom, margin-left, margin-right.
Setting them with css inside @page doesn't seem to work, at least
with the released wkhtmltopdf.
2015-12-21 22:59:01 -08:00
John MacFarlane
0596b65a74 pdf via wkhtmltopdf: take title and page-size from metadata.
Adjusted default `page-size` to `letter`, to match current LaTeX
template.
2015-12-21 22:13:44 -08:00
John MacFarlane
0a768f1cc5 Added preliminary support for PDF creation via wkhtmltopdf.
To use this:

    pandoc -t html5 -o result.pdf

(and add `--mathjax` if you have math.)
2015-12-21 17:22:12 -08:00
John MacFarlane
28b2d86b21 LaTeX/Beamer template changes (Thomas Hodgson):
* Added `thanks` variable
* Use `parskip.sty` when `indent` isn't set (fall
  back to using `setlength` as before if `parskip.sty`
  isn't available).
* Use `biblio-style` with biblatex.
* Added `biblatexoptions` variable.
* Added `section-titles` variable (defaults to true)
  to enable/suppress section title pages in beamer
  slide shows.
* Moved beamer themes after fonts, so that themes can
  change fonts.  (Previously the fonts set were being
  clobbered by lmodern.sty.)
2015-12-19 18:50:45 -08:00
John MacFarlane
9333814254 Added needed import of FromJSON.
Fixes build failure.
2015-12-19 17:54:20 -08:00
John MacFarlane
770641f741 Fix language code for Czech (cs not cz)
Closes #2597.
2015-12-19 17:54:02 -08:00
John MacFarlane
4c103f67f9 Merge branch 'master' of https://github.com/AndreasLoow/pandoc into AndreasLoow-master 2015-12-19 00:07:28 -08:00
John MacFarlane
e20f433f38 Markdown reader: fixed parsing bug with macros.
Previously macro definitions in indented code blocks
were being parsed as macro definitions, not code.
2015-12-19 00:00:04 -08:00
mb21
1ead1f39ad ICML writer: intersperse line breaks
instead of appending them to every ParagraphStyleRange
closes #2501
2015-12-17 10:26:59 +01:00
mb21
f3a9bdafef ICML writer: added figure handling, closes #2590 2015-12-16 11:07:23 +01:00
John MacFarlane
9f43acb5d2 ICML writer: removed redundant import. 2015-12-13 22:18:23 -08:00
John MacFarlane
f3133a8e9e Merge pull request #2570 from mb21/rst-reader-imgattrs
Image attributes
2015-12-13 20:29:13 -08:00
John MacFarlane
a924a3f43d Fixed ICML image syntax for local files.
`file:filename` rather than `file://./filename`.

I think this is right; it matches what we had before
with people actually using the ICML writer, and seems
to match examples in the spec.  I don't
have a copy of InDesign I can test on, though.
@DigitalPublishingToolkit and @mb21, can you have
a look?
2015-12-13 20:19:34 -08:00
John MacFarlane
90b8024fac Use posix path separators in ICML link URIs.
Closes #2589.
2015-12-13 17:40:24 -08:00
mb21
df68f25459 ODT/OpenDocument writer: improved image attributes
- support for percentage widths/heights
- use Attr instead of title to get dimensions from ODT walker to writeOpenDocument
2015-12-13 21:40:13 +01:00
mb21
37931cb0c5 Docx reader: image attributes 2015-12-13 21:40:13 +01:00
mb21
2060f5fe83 new function to extract multiple properties at once in CSS.hs
and use it in Textile reader
2015-12-13 21:40:12 +01:00
mb21
30644b291b RST reader: image attributes 2015-12-13 21:40:12 +01:00
John MacFarlane
e4b3da6929 AsciiDoc writer: support anchors in spans with id elements. 2015-12-13 09:02:37 -08:00
John MacFarlane
3e079a25bc AsciiDoc writers: Add anchors on Div elements.
This partially addresses jgm/pandoc-citeproc#143.

It does not use the native asciidoc syntax for citations,
but it does get the links to individual citations working.
2015-12-13 08:56:22 -08:00
John MacFarlane
44120ea716 Implemented east_asian_line_breaks extension.
Text.Pandoc.Options: Added `Ext_east_asian_line_breaks` constructor to
`Extension` (API change).

This extension is like `ignore_line_breaks`, but smarter -- it
only ignores line breaks between two East Asian wide characters.
This makes it better suited for writing with a mix of East Asian
and non-East Asian scripts.

Closes #2586.
2015-12-12 17:28:52 -08:00
John MacFarlane
af7e782436 Modified readers to emit SoftBreak when appropriate. 2015-12-12 09:31:51 -08:00
John MacFarlane
47cc5ad6e0 Restore no wrapping of XML in Docx, ODT.
It's possible that wrapping causes problems; safer to
turn it off.
2015-12-12 00:28:47 -08:00
John MacFarlane
28a2f4c2a4 Fixed cite key parsing regression.
We were capturing final colons as in [@foo: bar];
the citation id was being parsed as "@foo:".

Closes jgm/pandoc-citeproc#201.
2015-12-12 00:27:08 -08:00
John MacFarlane
1b0e0998fa FB2 writer: support SoftBreak.
This was omitted earlier.
2015-12-12 00:13:58 -08:00
John MacFarlane
536b6bf538 Implemented SoftBreak and new --wrap option.
Added threefold wrapping option.

* Command line option: deprecated `--no-wrap`, added
  `--wrap=[auto|none|preserve]`
* Added WrapOption, exported from Text.Pandoc.Options
* Changed type of writerWrapText in WriterOptions from
  Bool to WrapOption.
* Modified Text.Pandoc.Shared functions for SoftBreak.
* Supported SoftBreak in writers.
* Updated tests.
* Updated README.

Closes #1701.
2015-12-11 23:55:08 -08:00
John MacFarlane
63d875c6cb Markdown reader: parse soft break as SoftBreak. 2015-12-11 15:33:53 -08:00
John MacFarlane
09958d7f95 Fixed Emoji character definitions.
There were many bugs in the definitions.

Closes #2523.
2015-12-04 09:38:58 -08:00
John MacFarlane
dd8df6cfbc Markdown reader: Improved pipe table relative widths.
Previously pipe table columns got relative widths (based
on the header underscore lines) when the source of one of the rows was
greater in width than the column width.  This gave bad results in some
cases where much of the width of the row was due to nonprinting
material (e.g. link URLs).  Now pandoc only looks at printable
width (the width of a plain string version of the source), which
should give better results.

Thanks to John Muccigrosso for bringing up the issue.
2015-12-03 11:02:45 -08:00
Raniere Silva
13f74d018b Add support to GAP 2015-12-03 08:23:26 -02:00
mb21
d901a3da03 Textile Reader: image attributes
closes #2515
2015-12-03 00:06:18 +01:00
mb21
1f379da94b Parse CSS that doesn't contain the optional semicolon 2015-12-02 23:56:44 +01:00
John MacFarlane
622f09617e Docx writer: better handling of PDF images.
Previously we tried to get the image size from the image even
if an explicit size was specified.  Since we still can't get
image size for PDFs, this made it impossible to use PDF images
in docx.

Now we don't try to get the image size when a size is already
explicitly specified.
2015-12-01 00:23:03 -08:00
John MacFarlane
6d91fb2563 Markdown writer: use raw HTML for link/image attributes when
the `link_attributes` extension is unset and `raw_html` is set.

Closes #2554.
2015-11-24 23:28:52 -08:00
John MacFarlane
33d328f1cf Allow pipe tables with no body rows.
Previously this raised a runtime error.

Closes #2556.
2015-11-24 20:23:06 -08:00
John MacFarlane
c73ae81628 LaTeX reader: Improved smart quote parsing.
This fixes redering of unmatched quotes.
Closes #2555.
2015-11-24 17:20:15 -08:00
John MacFarlane
ce5583460c Improved fetchItem so that C:/Blah/Blah.jpg isn't treated as URL.
The Haskell URI parsing routines will accept "C:" as a scheme,
so we rule that out manually.

This helps with `--self-contained` and absolute Windows paths.
See
http://stackoverflow.com/questions/33899126/rchart-in-markdown-doesnt-render-due-to-invalidurlexception-from-pandoc
2015-11-24 11:05:31 -08:00
John MacFarlane
2eb5d2dc42 LaTeX reader: Use curly quotes for unmatched `.
Partially addresses #2555.

Note that there's still a problem with the code sample given.
2015-11-23 23:44:39 -08:00
John MacFarlane
2633dc2f5e Beamer writer: mark frame as fragile when it contains verbatim.
Closes #1613.
2015-11-23 23:07:56 -08:00
John MacFarlane
b20ecbedc4 AsciiDoc writer: Fixed code blocks.
Closes #1861.
2015-11-23 21:29:21 -08:00
John MacFarlane
4361dc0245 Define a meta-json variable for all writers.
This contains a JSON version of all the metadata, in the
format selected for the writer.

So, for example, to get just the YAML metadata, you can
run pandoc with the following custom template:

    $meta-json$

Closes #2019.  The intent is to make it easier for static
site generators and other tools to get at the metadata.
2015-11-23 20:40:27 -08:00
Jesse Rosenthal
07b8a456b1 Docx Reader: Remove DummyListItem type
Change 5527465c introduced a `DummyListItem` type in Docx/Parse.hs. In
retrospect, this seems like it mixes parsing and iterpretation
excessively. What's *really* going on is that we have a list item
without and associate level or numeric info. We can decide what to do
what that in Docx.hs (treat it like a list paragraph), but the parser
shouldn't make that decision.

This commit makes what is going on a bit more explicit. `LevelInfo` is
now a Maybe value in the `ListItem` type. If it's a Nothing, we treat
it as a ListParagraph. If it's a Just, it's a normal list item.
2015-11-23 11:50:49 -05:00
John MacFarlane
a008e57ddf hlint fixes 2015-11-22 07:43:48 -08:00
John MacFarlane
f7e37141e5 hlint fixes 2015-11-22 07:42:11 -08:00
John MacFarlane
bbb3d8d442 hlint changes 2015-11-22 07:40:26 -08:00
John MacFarlane
a7f6241f50 hlint fixes. 2015-11-22 07:38:51 -08:00
John MacFarlane
4b293a6a54 hlint fixes. 2015-11-22 07:37:51 -08:00
John MacFarlane
c5b9ae3060 ImageSize: use safeRead instead of readMaybe.
readMaybe is only provided in base 4.6+.
2015-11-21 08:46:01 -08:00
John MacFarlane
73e2d7976c Renamed link attribute extensions.
* Old `link_attributes` -> `mmd_link_attributes`
* Recently added `common_link_attributes` -> `link_attributes`

Note: this change could break some existing workflows.
2015-11-19 23:17:50 -08:00
John MacFarlane
244cd5644b Merge branch 'new-image-attributes' of https://github.com/mb21/pandoc into mb21-new-image-attributes
* Bumped version to 1.16.
* Added Attr field to Link and Image.
* Added `common_link_attributes` extension.
* Updated readers for link attributes.
* Updated writers for link attributes.
* Updated tests
* Updated stack.yaml to build against unreleased versions of
  pandoc-types and texmath.
* Fixed various compiler warnings.

Closes #261.

TODO:

* Relative (percentage) image widths in docx writer.
* ODT/OpenDocument writer (untested, same issue about percentage widths).
* Update pandoc-citeproc.
2015-11-19 23:14:23 -08:00
John MacFarlane
1ad296dc69 Merge pull request #2532 from michaelbeaumont/fix-2530
Interpret pauses correctly for all headers
2015-11-19 21:06:53 -08:00
John MacFarlane
fdc81be7d2 Merge pull request #2506 from adunning/patch-1
Remove redundant `center` variable for reveal.js.
2015-11-19 21:03:31 -08:00
John MacFarlane
ed1173ace6 Rationalized behavior of --no-tex-ligatures and --smart.
This change makes `--no-tex-ligatures` affect the LaTeX reader
as well as the LaTeX and ConTeXt writers.  If it is used,
the LaTeX reader will parse characters `` ` ``, `'`, and `-`
literally, rather than parsing ligatures for quotation marks
and dashes.  And the LaTeX writer will print unicode quotation
mark and dash characters literally, rather than converting
them to the standard ASCII ligatures.

Note that `--smart` has no affect on the LaTeX reader.

`--smart` is still the default for all input formats when
LaTeX or ConTeXt is the output format, *unless* `--no-tex-ligatures`
is used.

Some examples to illustrate the logic:

```
% echo "'hi'" | pandoc -t latex
`hi'
% echo "'hi'" | pandoc -t latex --no-tex-ligatures
'hi'
% echo "'hi'" | pandoc -t latex --no-tex-ligatures --smart
‘hi’
% echo "'hi'" | pandoc -f latex --no-tex-ligatures
<p>'hi'</p>
% echo "'hi'" | pandoc -f latex
<p>’hi’</p>
```

Closes #2541.
2015-11-19 20:30:41 -08:00
Jesse Rosenthal
da4103bc42 Docx reader: Clean up commented-out function
A residue of a recent change was left around in the form of a
commented-out function. Let's clean that up.
2015-11-18 14:06:13 -05:00
Jesse Rosenthal
5527465c77 Docx reader: Handle dummy list items.
These come up when people create a list item and then delete the
bullet. It doesn't refer to any real list item, and we used to ignore
it.

We handle it with a DummyListItem type, which, in Docx.hs, is turned
into a normal paragraph with a "ListParagraph" class. If it follow
another list item, it is folded as another paragraph into that item. If
it doesn't, it's just its own (usually indented, and therefore
block-quoted) paragraph.
2015-11-18 13:02:57 -05:00
John MacFarlane
995f28ff07 Haddock writer: omit formatting inside links.
It isn't supported by Haddock.

Closes #2515.
2015-11-16 20:53:01 -08:00
John MacFarlane
469338a272 Textile reader: skip over attribute in image source.
We don't have a place yet for styles or sizes on images, but
we can skip the attributes rather than incorrectly taking them
to be part of the filename.

Closes #2515.
2015-11-16 20:43:07 -08:00
John MacFarlane
f096f032f0 ICML writer: better handling of math.
Instead of just printing the raw tex, we now try to fake
it with unicode characters.
2015-11-16 20:24:34 -08:00
John MacFarlane
74cf52728e HTML writer: Include example class for example lists.
Closes #2524.
2015-11-16 09:57:28 -08:00
John MacFarlane
593cbd8142 Docx writer: insert space between footnote ref and footnote.
This matches Word's default behavior.  Closes #2527.
2015-11-15 07:53:40 -08:00
John MacFarlane
8f5ff7075c Derive Generic instances for types in Text.Pandoc.Options. 2015-11-14 17:46:55 -08:00
John MacFarlane
420c86b69a Allow more customization of opendocument styles.
Automatic styles can now be inserted in the template,
since the template, not the writer, now provides the
enclosing `<office:automatic-styles>` tags.

Closes #2520.
2015-11-14 17:19:25 -08:00
michaelbeaumont
8b289326a7 Interpret pauses correctly for all headers
Previously, when using headers below the slide level, pauses are left
uninterpreted into pauses. In my opinion, unexpected behavior but
intentional looking at the code.

Fixes #2530
2015-11-15 01:37:39 +01:00
Jesse Rosenthal
e5b374e2ca Follow relationships correctly in foot/endnotes.
There are separate relationship (link) files for foot and
endnotes. These had previously been grouped together which led to
links not working correctly in notes. This should finally fix that.
2015-11-14 13:41:34 -05:00
John MacFarlane
37285b432c Text.Pandoc.Emoji: use hex escapes instead of Unicode in source.
Some of the unicode characters cause ghc parse errors in older
ghc versions.
2015-11-13 14:18:02 -08:00
John MacFarlane
73e6333fae Merge pull request #2526 from tarleb/org-definition-lists-fix
Org reader: Require whitespace around def list markers
2015-11-13 14:07:56 -08:00
Albert Krewinkel
67cb2809fd Org reader: Require whitespace around def list markers
Definition list markers (i.e. double colons `::`) must be surrounded by
whitespace to start a definition item.  This rule was not checked
before, resulting in bugs with footnotes and some link types.

Thanks to @conklech for noticing and reporting this issue.

This fixes #2518.
2015-11-13 22:04:17 +01:00
John MacFarlane
028a605bf8 Merge pull request #2525 from tarleb/org-smart-fixes
Org reader: Fix emphasis rules for smart parsing
2015-11-13 12:27:23 -08:00
John MacFarlane
0a6aaf5e1b Added emoji extension to Markdown.
This is enabled by default in `markdown_github`.
Added `Ext_emoji` to `Extension` in `Text.Pandoc.Options` (API change).

Closes #2523.
2015-11-13 12:14:24 -08:00
Albert Krewinkel
220f3d12b8 Org reader: Fix emphasis rules for smart parsing
Smart quotes, ellipses, and dashes should behave like normal quotes,
single dashes, and dots with respect to text markup parsing.  The parser
state was not updated properly in all cases, which has been fixed.

Thanks to @conklech for reporting this issue.

This fixes #2513.
2015-11-13 20:43:46 +01:00
John MacFarlane
d8080db7f7 Allow :// in citation keys.
Closes jgm/pandoc-citeproc#166.
2015-11-13 11:00:56 -08:00
John MacFarlane
c80c0df1fe EPUB writer: don't download linked media when data-external attribute set.
By default pandoc downloads all linked media and includes it in the
EPUB container.  This can be disabled by setting `data-external`
on the tags linking to media that should not be downloaded.

Example:

    <audio controls="1">
     <source src="http://www.sixbarsjail.it/tmp/bach_toccata.mp3"
     type="audio/mpeg"></source>
    </audio>

Closes #2473.
2015-11-12 13:27:41 -08:00
John MacFarlane
83b1aa042d LaTeX writer: set colorlinks...
if `linkcolor`, `urlcolor`, `citecolor`, or `toccolor` is set.

Closes #2508.
2015-11-12 12:37:20 -08:00
John MacFarlane
64b32e1e81 Fixed shadowing error. 2015-11-09 11:25:05 -08:00
John MacFarlane
c1e474f005 Restored Text.Pandoc.Compat.Monoid.
Don't use custom prelude for latest ghc.

This is a better approach to making 'stack ghci' and 'cabal repl'
work.  Instead of using NoImplicitPrelude, we only use the custom
prelude for older ghc versions.  The custom prelude presents a
uniform API that matches the current base version's prelude.
So, when developing (presumably with latest ghc), we don't
use a custom prelude at all and hence have no trouble with ghci.

The custom prelude no longer exports (<>):  we now want to
match the base 4.8 prelude behavior.
2015-11-09 11:19:25 -08:00
John MacFarlane
23b693c029 Revert "Use -XNoImplicitPrelude and 'import Prelude' explicitly."
This reverts commit c423dbb5a3.
2015-11-09 10:08:22 -08:00
Andrew Dunning
6c27e5f2c1 Remove redundant center variable for reveal.js.
This is no longer needed with the updates to the template in da139313d2
2015-11-09 09:38:12 -05:00
John MacFarlane
5a9ca26172 Merge pull request #2502 from minoki/latex-comment-environment
LaTeX reader: Handle `comment` environment.
2015-11-08 17:57:52 -08:00
John MacFarlane
237c10992e Merge pull request #2505 from tarleb/org-header-markup-fix
Org reader: fix markup parsing in headers
2015-11-08 17:17:19 -08:00
John MacFarlane
c423dbb5a3 Use -XNoImplicitPrelude and 'import Prelude' explicitly.
This is needed for ghci to work with pandoc, given that we
now use a custom prelude.

Closes #2503.
2015-11-08 16:56:59 -08:00
Albert Krewinkel
e3b4844bd7 Org reader: fix markup parsing in headers
Markup as the very first item in a header wasn't recognized.  This was
caused by an incorrect parser state: positions at which inline markup
can start need to be marked explicitly by changing the parser state.
This wasn't done for headers.  The proper function to update the state
is now called at the beginning of the header parser, fixing this issue.

This fixes #2504.
2015-11-08 22:32:00 +01:00
ARATA Mizuki
94500f7469 LaTeX reader: Handle comment environment.
The `comment` environment is handled in a similar way to the `verbatim` environment, except that its content is discarded.
2015-11-08 02:24:25 +09:00
John MacFarlane
411a25306c LaTeX writer: properly handle footnotes in captions.
Closes #1506.
2015-11-01 15:30:05 -08:00
John MacFarlane
c4ea64203a LaTeX writer: avoid footnotes in list of figures.
Footnotes aren't allowed in the list of figures.  This
patch causes footnotes to be stripped from captions when
entered into the list of figures.

Footnotes still don't actually WORK in captions in latex/pdf,
but at least an error is no longer raised.

See #1506.
2015-11-01 13:42:36 -08:00
John MacFarlane
eb8aee477d Pipe tables with long lines now get relative cell widths.
If a pipe table contains a line longer than the column
width (as set by `--columns` or 80 by default), relative
widths are computed based on the widths of the separator lines
relative to the column width.

This should solve persistent problems with long pipe tables in
LaTeX/PDF output, and give more flexibility for determining
relative column widths in other formats, too.

For narrower pipe tables, column widths of 0 are used,
telling pandoc not to specify widths explicitly in output
formats that permit this.

Closes #2471.
2015-10-30 12:37:08 -07:00
John MacFarlane
7843b5759a HTML writer: use width on whole table if col widths sum to < 100%.
Otherwise some browsers display the table with the columns
separated far apart.
2015-10-30 12:36:36 -07:00
John MacFarlane
532ae22c29 Textile reader: don't do smart punctuation unless explicitly asked.
Closes #2480.

Note that although smart punctuation is part of the textile
spec, it's not always wanted when converting from textile
to, say, Markdown.  So it seems better to make this an option.
2015-10-30 10:54:07 -07:00
John MacFarlane
fb1843ecde Fixed omitted url(...) in CSS data-uri with --self-contained.
Fixes #2489.
2015-10-28 10:06:40 -07:00
John MacFarlane
1d53d452c3 LaTeX writer: add \protect to \hyperlink.
Thanks to Hadrien Mary for the problem and solution.

Closes #2490.
2015-10-28 09:37:05 -07:00
John MacFarlane
d5efa9b35c LaTeX writer: Use \hypertarget and \hyperlink for links.
This works correctly to link to Div or Span elements.
We now don't bother defining `\label` for Div or Span
elements.

Closes jgm/pandoc-citeproc#174.
2015-10-27 14:08:35 -07:00
John MacFarlane
3ea444666a Markdown reader: improved parser for mmd_title_block.
We now allow blank metadata fields.  These were explicitly
disallowed before.

For background see #2026.  The issue in #2026 has since
been fixed in another way, so there is no need to forbid
blank metadata fields.
2015-10-26 22:06:34 -07:00
nickbart1980
143093eabd Added de-CH-1901, fixed el-polyton
el-polyton, not el-poly, see http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
2015-10-26 10:34:32 +00:00
John MacFarlane
89d399b6b1 Merge pull request #2481 from mb21/textarabic
LaTeX writer: \textarabic fix
2015-10-25 12:42:41 -07:00
John MacFarlane
45df87cf15 Merge pull request #2477 from tarleb/org-toggling-header-args
Org reader: allow toggling header args
2015-10-25 12:10:07 -07:00
mb21
f3f6483510 LaTeX writer: \textarabic fix 2015-10-25 18:31:35 +01:00
Albert Krewinkel
27a8603278 Org reader: allow toggling header args
Org-mode allows to skip the argument of a code block header argument if
it's toggling a value.  Argument-less headers are now recognized,
avoiding weird parsing errors.

The fixes are not exactly pretty, but neither is the code that was
fixed.  So I guess it's about par for the course.  However, a rewrite of
the header parsing code wouldn't hurt in the long run.

Thanks to @jo-tham for filing the bug report.

This fixes #2269.
2015-10-25 08:54:00 +01:00
Albert Krewinkel
b27366780f Org reader: fix paragraph/list interaction
Paragraphs can be followed by lists, even if there is no blank line
between the two blocks.  However, this should only be true if the
paragraph is not within a list, were the preceding block should be
parsed as a plain instead of paragraph (to allow for compact lists).

Thanks to @rgaiacs for bringing this up.

This fixes #2464.
2015-10-24 19:05:56 +02:00
John MacFarlane
a7150bb6b6 Fixed over-eager raw HTML inline parsing.
Tightened up the inline HTML parser so it disallows
TagWarnings.

This only affects the markdown reader when the `markdown_in_html_blocks`
option is disabled.

Closes #2469.
2015-10-22 21:18:06 -07:00
John MacFarlane
a21833b638 Avoid compiler warning for unused identifier. 2015-10-22 21:05:52 -07:00
John MacFarlane
317d9eea17 Changed § to % in operators from Odt.Arrows.Utils.
This prevents problems building haddocks with "C" locale.

Closes #2457.
2015-10-22 17:54:58 -07:00
John MacFarlane
48b68aac43 Textile writer: support start number in ordered lists.
e.g. `#3`.

Partially addresses #2465.
TBD: reader support.
2015-10-22 12:37:40 -07:00
John MacFarlane
8193ebcd99 Allow use of ConTeXt to generate PDFs.
pandoc my.md -t context -o my.pdf

will now create a PDF using ConTeXt rather than LaTeX.

Closes #2463.
2015-10-20 08:16:17 -07:00
mb21
9328f4cd3d LaTeX and ConTeXt writers: support lang attribute on divs and spans
For LaTeX, also collect lang and dir attributes on spans and divs to set the lang,
otherlangs and dir variables if they aren’t set already. See #895.
2015-10-18 17:01:37 +02:00
John MacFarlane
7f4b78c064 Text.Pandoc.Data: store paths in dataFiles using posix separators.
This way we have uniform separators, whether on Windows or Linux.

This should solve a problem where on some Windows versions
the data files weren't being found.

Closes #2459.
2015-10-17 22:04:02 -07:00
John MacFarlane
34d53aff6e Remove compiler warning with embed_data_files. 2015-10-17 21:21:52 -07:00
Andreas Lööw
f0c47907ca Consider header files when determining whether to use csquotes. 2015-10-17 23:04:15 +02:00
John MacFarlane
2357e61748 LaTeX reader: fixed longtable support. 2015-10-15 23:15:40 -07:00
John MacFarlane
504bf3f8e7 Support all frame attributes in Beamer. 2015-10-15 15:11:07 -07:00
John MacFarlane
047cb32dfc Use unicode super/subscripts for digits in plain output. 2015-10-15 14:35:01 -07:00
John MacFarlane
6dc3b6585d More changes to avoid compiler warnings on ghc 7.10.
* CPP around deprecated `parseTime`.
* Text.Pandoc.Compat.Locale -> Text.Pandoc.Compat.Time,
  now exports Data.Time.
2015-10-14 10:06:18 -07:00
John MacFarlane
82b3e0ab97 Use custom Prelude to avoid compiler warnings.
- The (non-exported) prelude is in prelude/Prelude.hs.
- It exports Monoid and Applicative, like base 4.8 prelude,
  but works with older base versions.
- It exports (<>) for mappend.
- It hides 'catch' on older base versions.

This allows us to remove many imports of Data.Monoid
and Control.Applicative, and remove Text.Pandoc.Compat.Monoid.

It should allow us to use -Wall again for ghc 7.10.
2015-10-14 09:09:10 -07:00
John MacFarlane
198862ee40 LaTeX writer: add \protect to \hyperdef in inline context.
This way we don't get an error when this is used as a moveable
argument.

Closes #2136.
2015-10-13 21:48:14 -07:00
John MacFarlane
25e0e0bd2a epub with --webtex: include image file rather than data: URI.
Closes #2363.
2015-10-13 21:19:43 -07:00
John MacFarlane
24f68654e9 RST writer: do header normalization only in "standalone" mode.
If we're producing a fragment, just skip normalization.
After all, the fragment might be somewhere in the middle
of the document.  It's more important for fragments to
have consistency in rendering (so they can be pieced
together) than to normalize.

This closes #2394.  It's simpler and more robust than
my earlier fix.
2015-10-12 23:00:27 -07:00
John MacFarlane
fb51077712 Revert "RST writer: tweaks to header normalization."
This reverts commit 476b383c57.
2015-10-12 22:44:37 -07:00
John MacFarlane
476b383c57 RST writer: tweaks to header normalization.
These changes are intended to make the writer more
useful to people who are processing small fragments,
which may for example look like this:

    ### third level header from previous section

    ## second level header

Previously such fragments got turned into two
headers of the same level.  The new algorithm
avoids doing any normalization until we hit the
minimal-level header in the fragment (here, the
second level header).

Closes #2394.
2015-10-12 22:04:40 -07:00
John MacFarlane
0b91c73456 Removed unnecessary import. 2015-10-11 17:27:00 -07:00
John MacFarlane
1e8a25ad69 Percent-encode more special characters in URLs.
HTML, LaTeX writers adjusted.
The special characters are '<','>','|','"','{','}','[',']','^', '`'.

Closes #1640, #2377.
2015-10-11 17:12:50 -07:00
John MacFarlane
04307a1554 Define Typeable and Exception instances for PandocError.
Closes #2386.
2015-10-11 15:50:41 -07:00
John MacFarlane
0e78eba791 HTML reader/writer: better handling of "section" elements.
Previously `<section>` tags were just parsed as raw HTML
blocks.  With this change, section elements are parsed as
Div elements with the class "section".  The HTML writer will
use `<section>` tags to render these Divs in HTML5; otherwise
they will be rendered as `<div class="section">`.

Closes #2438.
2015-10-11 15:25:49 -07:00
John MacFarlane
60dcaa37d5 Native writer: format Div properly, with blocks separated. 2015-10-11 15:14:35 -07:00
John MacFarlane
72b038d201 Merge pull request #2412 from frerich/reader/docbook/xref_support
Added support for <xref> tag in DocBook reader
2015-10-10 14:18:28 -07:00
John MacFarlane
3e4713c2de Merge pull request #2441 from mb21/polyglossia-lang
Change variable to polyglossia-lang.name and .options
2015-10-10 13:52:36 -07:00
John MacFarlane
5e57beac8d Re-export pandocVersions from Text.Pandoc.
The actual definition has been moved to Text.Pandoc.Shared,
but to avoid breaking changes we reexport it here.
2015-10-10 13:42:02 -07:00
John MacFarlane
4aabcf3d4e Merge pull request #2426 from alexvong1995/better-man-writer
Better man writer (revised)
2015-10-10 13:11:21 -07:00
John MacFarlane
114103d67f LaTeX reader: don't eat excess whitespace after macros.
Really close #2446.
2015-10-09 14:39:42 -07:00
John MacFarlane
1af8bc6f4d LaTeX reader: don't eat whitespace after macro with only opt arg.
Closes #2446.
2015-10-09 10:32:31 -07:00
mb21
80b851a4cf Change variable to polyglossia-lang.name and .options
closes #2437
2015-10-07 22:53:09 +02:00
Ophir Lifshitz
0b899ce7ef Docx Reader: Parse soft, no-break hyphen elements 2015-10-04 06:11:07 -04:00
John MacFarlane
421845202d FIxed typo: Ext_superscript, Ext_subscript. 2015-10-03 16:03:40 -07:00
John MacFarlane
68c02e1d01 For markdown_mmd, add: implicit_figures, superscripts, subscripts.
See #2401.
2015-10-03 15:32:01 -07:00
Alex Vong
319832cc19 Set the template variable $pandoc-version$ to pandocVersion by default.
* src/Text/Pandoc/Writers/Man.hs: Set $pandoc-version$ to be pandocVersion.
2015-10-01 02:24:34 +08:00
Alex Vong
d7a19c22be Move the variable pandocVersion from src/Text/Pandoc.hs to
`src/Text/Pandoc/Shared.hs`, so that all Writers can access this variable
without importing `src/Text/Pandoc.hs`, preventing circular import.

* pandoc.hs: Import pandocVersion from `Text.Pandoc.Shared`.
* src/Text/Pandoc.hs: Remove the definition of pandocVersion
 and relevant import.
* src/Text/Pandoc/Shared.hs: Add the definition of pandocVersion
 and relevant import.
2015-10-01 02:24:34 +08:00
Alex Vong
f5e33e0dce Set the template variable $hyphenate$ to true by default
* src/Text/Pandoc/Writers/Man.hs: Set $hyphenate$ to be true.
2015-10-01 02:24:34 +08:00
John MacFarlane
af8fb5e792 Removed unneeded imports. 2015-09-26 22:56:13 -07:00
John MacFarlane
6532950b26 MediaBag: ensure that / is always used as path separator. 2015-09-26 22:40:58 -07:00
John MacFarlane
fdfc961284 Merge pull request #2419 from mb21/bidi
Support bidirectional text output with XeLaTeX, ConTeXt and HTML
2015-09-26 17:06:56 -07:00
mb21
7b0c1e0d37 Support bidirectional text output with XeLaTeX, ConTeXt and HTML
closes #2191
2015-09-26 22:22:24 +02:00
John MacFarlane
29668552c8 Removed unneeded import. 2015-09-26 10:27:55 -07:00
John MacFarlane
da1b599c96 Correctly recognize book documentclass in metadata.
Closes #2395.
2015-09-25 23:28:38 -07:00
John MacFarlane
dcb0b02aa3 Markdown reader: handle 'id' and 'class' in parsing key/value attrs.
# Header {id="myid" class="foo bar"}

is now equivalent to

    # Header {#myid .foo .bar}

Closes #2396.
2015-09-25 23:01:34 -07:00
Frerich Raabe
eee992520c Improve text generated for <xref> by employing docbook-xsl heuristics
docbook-xsl, a set of XSLT scripts to generate HMTL out of DocBook,
tries harder to generate a nice xref text. Depending on the element
being linked to, it looks at the title or other descriptive child
elements. Let's do that, too.
2015-09-24 18:28:51 +02:00
Frerich Raabe
35f12b5095 Added proper support for DocBook 'xref' elements
'xref' is used to create cross references to other parts of the
document. It is an empty element - the cross reference text depends on
various attributes. Quoting 'DocBook: The Definitive Guide':

  1. If the endterm attribute is specified on xref, the content of the
  element pointed to by endterm will be used as the text of the
  cross-reference.

  2. Otherwise, if the object pointed to has a specified XRefLabel, the
  content of that attribute will be used as the cross-reference text.
2015-09-24 18:26:55 +02:00
Frerich Raabe
f6538144f0 Pass the parsed DocBook content along the state of readDocBook
Having access to the entire document will be needed when handling
elements which refer to other elements. This is needed for e.g. <xref>
or <link>, both of which reference other elements (by the 'id'
attribute) for the label text.

I suppose that in practice, the [Content] returned by parseXML always
only contains one 'Elem' value -- the document element. However, I'm not
totally sure about it, so let's just pass all the Content along.
2015-09-23 19:31:25 +02:00
Frerich Raabe
3564cd82ca Minor refactoring to readDocBook
I plan to use the parsed and normalized XML tree read in readDocBook in
other places - prepare that commit by factoring this code out into a
separate, shared, definition.
2015-09-23 19:25:58 +02:00
John MacFarlane
72e71a1dad LaTeX reader: support longtable.
Closes #2411.
2015-09-23 08:34:37 -07:00
John MacFarlane
f232a0a720 Merge pull request #2369 from mb21/language-variables
`lang` variable is now in BCP47 format
2015-09-22 22:21:06 -07:00
John MacFarlane
9b033672e4 Merge pull request #2406 from tarleb/org-verse-fix
Make sure verse blocks can contain empty lines
2015-09-20 13:02:47 -07:00
Albert Krewinkel
8007dd97b5 Make sure verse blocks can contain empty lines
The previous verse parsing code made the faulty assumption that empty
strings are valid (and empty) inlines.  This isn't the case, so lines
are changed to contain at least a newline.

It would generally be nicer and faster to keep the newlines while
splitting the string.  However, this would require more code, which
seems unjustified for a simple (and fairly rare) block as *verse*.

This fixes #2402.
2015-09-19 22:02:43 +02:00
Nikolay Yakimov
5788f62ef5 [RST Writer] Don't normalize heading levels below input minimum 2015-09-19 17:45:54 +03:00
John MacFarlane
4d49f76dbb Markdown writer: in TOC, add links to headers.
Closes #829.
2015-09-17 11:41:05 -07:00
John MacFarlane
bee255cbfe Use user data directory for reference docx archive.
This allows the test suite to work without installing pandoc first.
It also brings the docx writer in line with the odt writer.
2015-09-09 10:16:45 -07:00
mb21
622df7034c lang variable is now in BCP47 format
strings are converted for LaTeX and ConTeXt output, closes #1614
2015-08-20 23:17:47 +02:00
John MacFarlane
761e1edc30 Merge pull request #2364 from gbataille/bugDoc
[BUG] Haddock : * and ^ to be escaped in docs
2015-08-17 12:15:32 -07:00
Grégory Bataille
0dff30271c [BUG] Haddock : * and ^ to be escaped in docs 2015-08-17 09:03:33 +02:00
John MacFarlane
1f00a5395f RST reader: better handling of indirect roles.
Previously the parser failed on this kind of case

    .. role:: indirect(code)

    .. role:: py(indirect)
       :language: python

    :py:`hi`

Now it currectly recognizes `:py:` as a code role.

The previous test for this didn't work, because the
name of the indirect role was the same as the language
defined its parent, os it didn't really test for this
behavior.  Updated test.
2015-08-15 10:22:47 -07:00
John MacFarlane
8c579a5daa Merge pull request #2360 from jg/issue-2354
Org reader: add auto identifiers if not present on headers
2015-08-15 09:47:56 -07:00
Juliusz Gonera
f1c87ed164 Org reader: add auto identifiers if not present on headers
Refs #2354

This should also fix the table of contents (--toc) when generating a html file
from org input
2015-08-15 07:57:48 +02:00
John MacFarlane
0302330a27 RST writer: ensure that \ is inserted when needed...
...before Cite and Span elements that begin with a "complex"
element.  Closes jgm/pandoc-citeproc#157.
2015-08-13 23:20:22 -07:00
John MacFarlane
c82f3ad61e RST writer: Don't insert \ when complex expression in matched pairs.
E.g. `` [:sup:`3`] `` is okay; you don't need `` [:sup:`3`\ ] ``.
2015-08-12 21:08:13 -07:00
John MacFarlane
9894012776 EPUB TOC: replace literal "<br/>" with space.
Closes #2105.
2015-08-10 16:58:47 -07:00
John MacFarlane
7b8c005d07 EPUB reader: stop mangling external URLs.
Closes #2284.

Note the changes to the test suite. In each case, a mangled
external link has been fixed, so these are all positive.
2015-08-10 16:35:43 -07:00
John MacFarlane
0ad576eb1a Docx writer: Moved invalid character stripping to formattedString.
This avoids an inefficient generic traversal.

Updates f3aa03e.

Closes #2356.
2015-08-10 10:49:18 -07:00
John MacFarlane
06d69fe215 Text.Pandoc: disable auto_identifiers for epub.
The epub writer inserts its own auto identifiers;
this is more complex due to splitting into "chapter" files.
2015-08-08 21:05:43 -07:00
John MacFarlane
467e3be700 MediaWiki reader: handle unquoted table attributes.
Closes #2355.
2015-08-08 20:55:00 -07:00
John MacFarlane
2eec8cf61b HTML reader: add auto identifiers if not present on headers.
This makes TOC linking work properly.

The same thing needs to be done to the org reader to fix #2354;
in addition, `Ext_auto_identifiers` should be added to the list
of default extensions for org in Text.Pandoc.
2015-08-08 11:20:15 -07:00
John MacFarlane
eaccef1491 DocBook reader: handle informalexample.
It is parsed into a Div with class `informalexample`.

Closes #2319.
2015-08-08 10:43:25 -07:00
John MacFarlane
bf7d858a9e LaTeX reader: Implement \Cite.
See #2335.
2015-08-08 10:32:03 -07:00
John MacFarlane
74c31abb1a Merge pull request #2327 from hftf/list-style
HTML Reader: Correctly parse inline list-style(-type) for <ol>
2015-08-07 11:08:53 -07:00
mb21
a010b83a75 Updated readers, writers and README for link attribute 2015-08-07 12:38:37 +02:00
John MacFarlane
92d48fa65b Updated readers and writers for new image attribute parameter.
(mb21)
2015-08-07 12:37:12 +02:00
John MacFarlane
9deb335ca5 ICML writer: changed type of writeICML.
API change:  It is now `WriterOptions -> Pandoc -> IO String`.

Also handle new image attributes.

(mb21)
2015-08-05 16:08:46 +02:00
John MacFarlane
4391c5f34c ICML writer: Add Cite style to citations.
(mb21)
2015-08-05 16:08:46 +02:00
John MacFarlane
12df4054ad PDF: Modified for new image size attributes parameter.
(mb21)
2015-08-05 16:08:46 +02:00
John MacFarlane
76f0708ef5 Parsing: Add extractIdClass, modified type of KeyTable.
(mb21)
2015-08-05 16:08:46 +02:00
John MacFarlane
878ab00233 ImageSize: Added functions for converting between image dimensions.
(mb21)
2015-08-05 16:08:46 +02:00
Sergei Trofimovich
ab7c5f2221 fix build failure with --flags=-https
The issue was originally reported by CasperVector as
    https://github.com/gentoo-haskell/gentoo-haskell/issues/427

Mainfests itself as a builg failure full of missing zip-archive
names:

    src/Text/Pandoc/Shared.hs:756:49:
        Not in scope: type constructor or class ‘Archive’
    src/Text/Pandoc/Shared.hs:777:38: Not in scope: ‘toEntry’
    src/Text/Pandoc/Shared.hs:786:19:
        Not in scope: ‘toArchive’
        Perhaps you meant ‘mbArchive’ (line 778)

Included Codec.Archive.Zip unconditionally.

Signed-off-by: Sergei Trofimovich <siarheit@google.com>
2015-07-30 22:39:25 +01:00
Ophir Lifshitz
18b1b21a6a HTML Reader: Detect font-variant with pickStyleAttrProps 2015-07-27 20:08:04 -04:00
John MacFarlane
5df099957e Text.Pandoc.Options: modifications for image attributes.
* Added `Ext_common_link_attributes` constructor to `Extension`
  (for link and image attributes).
* Added this to `pandocExtensions` and `phpMarkdownExtraExtensions`.
* Added `writerDpi` to `WriterOptions`.
* pandoc.hs:  Added `--dpi` option.
* Updated README for `--dpi` and `common_link_attributes` extension.

Patch due to mb21, with some modifications: `writerDpi` is now an
`Int` rather than a `Double`.
2015-07-27 21:52:43 +02:00
John MacFarlane
0baaa1080a Pipe tables: allow indented columns.
Previously the left-hand column could not start with 4 or
more spaces indent.  This was inconvenient for right-aligned
left columns.

Note that the first (header column) must still have 3 or fewer
spaces indentation, or the table will be treated as an indented
code  block.
2015-07-27 10:24:06 -07:00
John MacFarlane
defcb5b6b1 Merge pull request #1689 from kuribas/master
Use '=' instead of '#' for atx-style headers in markdown+lhs.
2015-07-25 10:21:02 -07:00
John MacFarlane
2e8064346d Pretty: comment fix (mb21). 2015-07-25 15:51:55 +02:00
Ophir Lifshitz
7ef8700734 HTML Reader: Parse <ol> type, class, and inline list-style(-type) CSS 2015-07-24 02:53:17 -04:00
MarLinn
f068093555 Added odt reader
Fully implemented features:

* Paragraphs
* Headers
* Basic styling
* Unordered lists
* Ordered lists
* External Links
* Internal Links
* Footnotes, Endnotes
* Blockquotes

Partly implemented features:

* Citations
  Very basic, but pandoc can't do much more
* Tables
  No headers, no sizing, limited styling
2015-07-23 15:37:01 -07:00
John MacFarlane
8390d935d8 Updated tests and removed a skipSpaces....
we no longer need it with the change to toKey, and it
is expensive to skip spaces after every inline.
2015-07-23 15:35:18 -07:00
John MacFarlane
35e6c893ec Parsing: toKey: strip off outer brackets.
This makes keys with extra space at the beginning and end
work:  e.g.

    [foo]: bar

    [ foo ]

will now be a link to bar (it wasn't before).
2015-07-23 15:34:27 -07:00
John MacFarlane
5db4787330 Merge pull request #2323 from hftf/implicit-header-refs
Fix implicit header refs for headers with extra spaces
2015-07-23 14:46:38 -07:00
John MacFarlane
66a72b8eec LaTeX reader: support abstract environment.
The abstract populates an "abstract" metadata field.
2015-07-23 09:31:46 -07:00
Ophir Lifshitz
42c139d302 Markdown Reader: Skip spaces in headers 2015-07-23 02:29:37 -04:00
John MacFarlane
fa2c008ae5 Fix regression: allow HTML comments containing --.
Technically this isn't allowed in an HTML comment, but
we've always allowed it, and so do most other implementations.
It is handy if e.g. you want to put command line arguments
in HTML comments.
2015-07-21 22:44:18 -07:00
John MacFarlane
ec5960ab11 Use newManager instead of withManager in recent http-client.
This avoids a deprecation warning.
2015-07-21 16:32:44 -07:00
John MacFarlane
450bef90e0 DZSlides: Add role="note" for speaker notes.
Closes #1693.
2015-07-21 14:54:43 -07:00
John MacFarlane
da0842b5b5 HTML reader: handle type attribute on ol.
E.g. `<ol type="i">`.

Closes #2313.
2015-07-21 13:07:52 -07:00
John MacFarlane
f6ad9e263f LaTeX reader: properly handle booktabs lines.
Lines aren't part of the pandoc table model, but we can just
ignore them.

Closes #2307.
2015-07-21 10:26:29 -07:00
John MacFarlane
6166cd7559 Removed unneeded import. 2015-07-16 17:06:57 -07:00
John MacFarlane
075ad9a406 LaTeX writer: Fixed detection of 'chapters' from template.
If a documentclass isn't specified in metadata, but the
template has a hardwired bookish documentclass, act as if
`--chapters` was used.  This was the default in earlier
versions, but it has been broken for a little while.
2015-07-16 15:52:38 -07:00
John MacFarlane
c2ab44af84 --self-contained: Fixed overaggressive CSS minimization.
Previously `--self-contained` wiped out all spaces in CSS,
including semantically significant spaces!

Closes #2301.
Closes #2286.
2015-07-15 08:16:42 -07:00
John MacFarlane
6c32afc3c4 Updated to use cmark >= 0.4. 2015-07-14 22:51:23 -07:00
John MacFarlane
9e0fb844a9 Markdown reader: don't allow bare URI links or autolinks in link label.
Added test cases.

Closes #2300.
2015-07-14 13:16:40 -07:00
John MacFarlane
9cdfd4f649 Improved bare autolink detection.
Previously we disallowed `-` at the end of an autolink,
and disallowed the combination `=-`.

This commit liberalizes the rules for allowing punctuation in
a bare URI.

Added test cases.

One potential drawback is that you can no longer put a bare
URI in em dashes like this

    this uri---http://example.com---is an example.

But in this respect we now match github's treatment of bare URIs.

Closes #2299.
2015-07-14 10:24:39 -07:00
John MacFarlane
b8634b9f75 HTML writer: support speaker notes in dzslides.
With this change `<div class="notes">` and also `<div class="notes"
role="note">` will be output if `-t dzslides` is used. So we can
have speaker notes in dzslides too.

Thanks to maybegeek.
2015-07-13 22:50:17 -07:00
Tiziano Müller
f464e49142 DokuWiki: write $..$ instead of <math>..</math>
MathJax seems currently to be the only maintained math rendering
extension for DokuWiki and it uses $..$ instead of <math>..</math>.
2015-07-13 14:19:48 +02:00
John MacFarlane
2df3dfe883 Changed hierarchicalize so it treats references div as top-level header.
Fixes a bug with `--section-divs`, where the final references section
added by pandoc-citeproc, enclosed in its own div, got put in the
div for the section previous to it.

This fixes #2294.  Longer term, we might think about how hierarchicalize
should interact with Div elements.
2015-07-12 13:58:28 -07:00
John MacFarlane
99fe8594d9 Avoid parsing partial URLs as HTML tags.
Closes #2277.
2015-07-10 10:33:27 -07:00
John MacFarlane
b587acb224 Merge pull request #2266 from PromyLOPh/fieldinline
RST: Support inline markup for field list names
2015-07-08 22:45:06 -07:00
John MacFarlane
ac79429a12 PDF: Make sure --latex-engine-opt goes before the filename...
on the command line.  LaTeX needs the argument to come after
the options.

Closes #1779 - again!  Thanks to squisher for pointing
out the problem.
2015-07-08 17:37:54 -07:00
Andrew Dunning
4850aaf046 Correct superscript/subscript. 2015-07-08 13:57:04 -04:00
John MacFarlane
9e528f4c0c Fixed email javascript obfuscation with mailto: URLs.
This fixes a potential security issue.  Because single quotes weren't
being escaped in the link portion, a specially crafted email address
could allow javascript code injection.

    [Jim'+alert('hi')+'OBrien](mailto:me@example.com)

Closes #2280.
2015-07-07 11:15:40 -07:00
Lars-Dominik Braun
b2adf44e75 Readers.RST: Factor out inline markup string parsing 2015-07-03 16:42:51 +02:00
Lars-Dominik Braun
68b6b9f652 Readers.RST: Parse field list name
“Inline markup is parsed in field names.” [1]

[1] http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#field-lists
2015-07-03 16:41:28 +02:00
John MacFarlane
e0a88df686 ConTeXt: use \goto for internal links. 2015-07-01 12:11:45 -07:00
John MacFarlane
b5d3b4f608 Merge pull request #2255 from mchladek/odt_linebreak
Fix #2254 : OpenDocument writer adds space with hard line break
2015-07-01 11:49:58 -07:00
John MacFarlane
8e747004e6 ConTeXt writer: Added a % at end for \reference to avoid spurious space. 2015-07-01 11:30:28 -07:00
John MacFarlane
a04c15a422 New method for building man pages.
+ Removed `--man1`, `--man5` options (breaking change).
+ Removed `Text.Pandoc.ManPages` module (breaking API change).
+ Version bump to 1.15 because of the breaking changes, even
  though they involve features that have only been in pandoc
  for a day.
+ Makefile target for `man/man1/pandoc.1`.  This uses pandoc to
  create the man page from README using a custom template and filters.
+ Added `man/` directory with template and filters needed to build
  man page.
+ We no longer have two man pages: pandoc.1 and pandoc_markdown.5.
  Now there is just pandoc.1, which has all the content from README.
  This change was needed because of the extensive cross-references
  between parts of the README.
+ Removed old `data/pandoc.1.template` and
  `data/pandoc_markdown.5.template`.
2015-07-01 11:27:15 -07:00