Commit graph

2923 commits

Author SHA1 Message Date
Matthew Pickering
48e2586ec8 Merge pull request #1746 from shelf/dw-ext-images
DokuWiki writer: fix external images
2014-12-08 23:55:36 +00:00
Nikolay Yakimov
f7b265e2ff Fix for #1641 (docx table captions above tables)
Word doesn't really treat table captions as something special. It's just a paragraph with special style, nothing more, so simple reversal of output order in writer works fine.
2014-12-08 22:50:57 +00:00
Daniel Bergey
87e536b438 RST Reader: Warn about skipped directives
move `addWarning` to Parsing.hs, so it can be used by Markdown & RST readers.
2014-12-08 14:43:04 +00:00
Matthew Pickering
068bdbbc91 Merge pull request #1716 from lierdakil/issue1607-pullreq
First step to fixing internationalisation problems with docx output
2014-12-07 16:44:24 +00:00
Matthew Pickering
9761283c8f Text.Pandoc.Pretty: Improve performance of realLength
Eliminates memory usage and twofold increase in speed.
2014-12-06 22:58:40 +00:00
Daniel Bergey
74c1b547c2 parse RST class directives
The class directive accepts one or more class names, and creates a Div
value with those classes.  If the directive has an indented body, the
body is parsed as the children of the Div.  If not, the first block
folowing the directive is made a child of the Div.

This differs from the behavior of rst2xml, which does not create a Div
element.  Instead, the specified classes are applied to each child of
the directive.  However, most Pandoc Block constructors to not take an
Attr argument, so we can't duplicate this behavior.
2014-12-01 18:22:03 +00:00
Daniel Bergey
2cdfa5eb20 parse RST quoted literal blocks
closes #65
RST quoted literal blocks are the same as indented literal blocks (which
pandoc already supports) except that the quote character is preserved in
each line.

This includes test cases for the quoted literal block, as well as
additional tests for line blocks and indented literal blocks, to verify
that these are unaffected by the changes.
2014-12-01 18:22:03 +00:00
John MacFarlane
4d296f70df ICML writer: Don't force all citations into footnotes. 2014-11-30 22:30:04 -08:00
John MacFarlane
d8fde9547e Reverted "omit blank lines after list items," better fix for #1777.
Now we do as before, including blank lines after list items in
loose lists (even though RST doesn't care -- this is just a matter
of visual appeal).  But we chomp any excess whitespace after the
last list item, which solves #1777.
2014-11-25 12:34:44 -08:00
John MacFarlane
25e2c42347 RST writer: Omit blank lines after list items.
They are optional in RST (except after the last list item,
of course).

Fixes #1777.
2014-11-25 12:24:33 -08:00
John MacFarlane
dc92a62883 RST writer: Ensure blank line after figure. 2014-11-25 12:24:14 -08:00
John MacFarlane
6c0943000d LaTeX reader: support \smartcite and \Smartcite from biblatex.
See jgm/pandoc-citeproc#26.
2014-11-25 10:03:43 -08:00
John MacFarlane
159c711675 Fixed double-rendering of footnotes in RST tables.
Closes #1769.
2014-11-19 16:20:07 -08:00
John MacFarlane
7a5cb29319 Really fix #1758. Add id="cover" to body on cover page.
Not title page!
2014-11-17 15:43:40 -08:00
John MacFarlane
91a26fcdc4 Use regular page template for nav.xhtml.
This includes the HTML doctype.

Closes #1759.
2014-11-16 21:43:05 -08:00
John MacFarlane
4aadcd51b5 Make embed tag either block or inline.
Closes #1756.
2014-11-16 20:51:35 -08:00
John MacFarlane
333fc60684 Changed mime type for otf to application/vnd.ms-opentype.
Closes #1761.  This is needed for epub3 validation.
See http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm#Section2.3.1
2014-11-16 20:29:38 -08:00
John MacFarlane
46d343f474 Fixed bug in org with bulleted lists:
- a
   - b
   * c

was being parsed as a list, even though an unindented `*`
should make a heading.  See
<http://orgmode.org/manual/Plain-lists.html#fn-1>.
2014-11-13 23:40:18 -08:00
Caleb McDaniel
196c4f2343 Account for external link URLs with anchors
Previously, if a URL had an anchor, such as

    http://johnmacfarlane.net/pandoc/README.html#synopsis

the reader would incorrectly identify it as an internal link
and return "#synopsis" for the link in output.
2014-11-13 00:42:58 -05:00
John MacFarlane
43c1978fae Merge pull request #1645 from neongreen/issue1636
Fix 'Ext_lists_without_preceding_blankline' bug.
2014-11-12 09:05:29 -08:00
Timothy Humphries
ffae1567fd DokuWiki writer: fix external images
Handles #1739. Preface relative links with ":", absolute URIs without.
2014-11-09 00:35:29 -05:00
Albert Krewinkel
e6cd8c9077 Org reader: allow empty links for gitit interop
While empty links are not allowed in Emacs org-mode,  Pandoc org-mode
should support them: gitit relies on empty links as they are used to
create wiki links.

Fixes jgm/gitit#471
2014-11-05 23:15:28 +01:00
Albert Krewinkel
daaf635806 Org reader: absolute, relative paths in links
The org reader was to restrictive when parsing links, some relative
links and links to files given as absolute paths were not recognized
correctly.  The org reader's link parsing function was amended to handle
such cases properly.

This fixes #1741
2014-11-05 22:27:25 +01:00
John MacFarlane
f3ac41937d DokuWiki writer: Better handling of block quotes.
This change ensures that multiple paragraph blockquotes are
rendered using native `>` rather than as HTML.

Closes #1738.
2014-11-04 14:52:19 -08:00
John MacFarlane
1d268876f8 Merge pull request #1726 from AlexanderS/twiki-parser
TWiki Reader: add new new twiki reader
2014-10-31 12:02:14 -07:00
John MacFarlane
deaefb18ca ODT writer: Correctly handle images without extensions.
Closes #1729.
2014-10-30 15:54:04 -07:00
Alexander Sulfrian
c3780992ab TWiki Reader: add new new twiki reader 2014-10-30 19:54:48 +01:00
Todd Sifleet
8ad321d0d4 Strip querystring in ODT write
* Resolve #1682
* Strip querystring from filename before rendering ODT files, ODT cannot
handle querystrings in files.
2014-10-28 18:45:36 -07:00
Nikolay Yakimov
96c4b9e2e6 Docx reader: fix for Issue #1692 (i18n styles)
This patch builds paragraph styles tree, then checks if paragraph has
style.styleId or style/name.val matching predetermined patterns.
Works with "Heading#" (name.val="heading #") for headings and
"Quote"|"BlockQuote"|"BlockQuotation" (name.val="Quote"|"Block Text")
for block quotes.
2014-10-25 15:54:44 -04:00
Nikolay Yakimov
3c894987b2 Docx Writer: Partial fix for #1607
International heading styles are inferred based on `<w:name val="heading #">` fallback, if there are no en-US "Heading#" styles
2014-10-24 23:57:06 +04:00
John MacFarlane
e16683b539 HTML writer: Make header attributes work outside top level.
Previously they only appeared on top level header elements.
Now they work e.g. in blockquotes.

Closes #1711.
2014-10-23 10:27:14 -07:00
John MacFarlane
e4f3475eaa DOCX writer: Look in user data dir for archive reference.docx. 2014-10-21 14:33:15 -07:00
John MacFarlane
78bdc08de7 Merge pull request #1706 from tarleb/org-symbol-entities
Org reader: parse LaTeX-style MathML entities
2014-10-21 10:11:19 -07:00
John MacFarlane
0714a363c6 Merge pull request #1668 from gbataille/widthFromRef2
Getting the page width from the reference file
2014-10-21 10:05:51 -07:00
John MacFarlane
7f6bbfadf4 Pretty: Make CR + BLANKLINE = BLANKLINE.
This fixes an extra blank line we were getting at the end
of markdown fragments (as well as rst, org, etc.)

Closes #1705.
2014-10-20 20:26:08 -07:00
Albert Krewinkel
a5eb02f6a7 Org reader: parse LaTeX-style MathML entities
Org supports special symbols which can be included using LaTeX syntax,
but are actually MathML entities.  Examples for this are
`\nbsp` (non-breaking space), `\Aacute` (the letter A with accent acute)
or `\copy` (the copyright sign ©).

This fixes #1657.
2014-10-20 22:57:36 +02:00
John MacFarlane
d7169c715d Parsing: fixed inlineMath so it handles \text{..} containing $.
For example: `$x = \text{the $n$th root of $y$}`.  Closes #1677.
2014-10-19 16:42:56 -07:00
John MacFarlane
328ff8e71f Markdown reader: allow startnum to work without fancy_lists.
Formerly `pandoc -f markdown-fancy_lists+startnum` did not work
properly.
2014-10-18 13:58:08 -07:00
John MacFarlane
84f6b1e41a Merge pull request #1680 from shelf/master
Respect indent when parsing Org bullet lists
2014-10-18 13:20:27 -07:00
John MacFarlane
31713d572a Merge pull request #1700 from tarleb/org-emphasis-fix
Org reader: fix rules for emphasis recognition
2014-10-18 13:19:42 -07:00
Albert Krewinkel
e3c36ed6ce Org reader: Drop COMMENT document trees
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported.  Those trees are dropped
silently.

This closes #1678.
2014-10-18 22:11:53 +02:00
Albert Krewinkel
d571bec454 Org reader: fix rules for emphasis recognition
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings.  This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text.  This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
2014-10-18 12:47:59 +02:00
Timothy Humphries
f1f56e8533 Fix indent issue for definition lists
Tidy up fix for #1650, #1698 as per comments in #1680.
Fix same issue for definition lists with the same method.
2014-10-17 20:06:25 -04:00
Bjorn Buckwalter
7960013cd4 Escape spaces. Fixes jgm/pandoc#1694. 2014-10-15 21:06:41 +02:00
Timothy Humphries
4f4b0f031d Respect indent when parsing Org bullet lists
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue #1650.
2014-10-12 03:18:36 -04:00
John MacFarlane
8b60d430f2 Merge pull request #1674 from freiric/master
fix inDirectory to reset to the original directory in case an exception ...
2014-10-08 15:48:59 -07:00
John MacFarlane
2eaa0f6ab1 EPUB reader: Further URI handling improvements.
Now we outsource most of the work to `fetchItem'`.
Also, do not include queries in file extensions.

Improves fix to #1671.

It is possible that this will have some unexpected effects, so
further testing would be good.
2014-10-08 15:45:50 -07:00
John MacFarlane
f8087b6c43 EPUB writer: correctly resolve relative URIs. (Closes #1671.) 2014-10-08 15:19:27 -07:00
John MacFarlane
a4d28cdd6d Fixed absolute URI detection in EPUB writer. Closes #1672. 2014-10-08 14:54:03 -07:00
Freiric Barral
24231623f3 fix inDirectory to reset to the original directory in case an exception occurs 2014-10-08 23:25:01 +02:00
John MacFarlane
d60707eed0 EPUB writer: Don't add sourceURL to absolute URIs!
Closes #1669.

If there are further issues, please open a new, targeted issue on the
tracker.  Some notes on the further issues you gestured at:

Data URIs are indeed dereferenced, but why is this a problem?
(The function being used to fetch from URLs is used for many different
formats.  Preserving data URIs would make sense in EPUBs, but not
for e.g. PDF output.  And by dereferencing we can get a smaller,
more efficient EPUB, with the data stored as bytes in a file rather
than encoded in textual representation.)

"absolute uris are not recognized" -- I assume that is the problem
just fixed.  If not, please open a new issue.

"relative uris are resolved (wrongly) like file paths" -- can you
give an example?

`<base>` tag is ignored.  Yes. I didn't know about the base tag.  Could
you open a new issue just for this?
2014-10-08 11:52:47 -07:00
Grégory Bataille
8a1a5948be Getting the page width from the reference file
Uses it to scale images that are too large.
When there is no reference files, default to a US letter portrait size
to scale the images
2014-10-05 14:53:06 +02:00
Jason Ronallo
3dc58090d2 add mime type for WebVTT 2014-10-04 22:40:02 -04:00
John MacFarlane
bf00556c72 Added track to list of tags treated by --self-contained.
Closes #1664.
2014-10-04 11:39:08 -07:00
Wikiwide
678aa31561 cref, sep
Adding inlineCommands
2014-10-03 11:33:02 +10:00
John MacFarlane
08ac33815b RST writer: Wrap line blocks with spaces before continuations.
Improves on fix to #1656.
2014-09-30 09:25:54 -07:00
John MacFarlane
29e1c9529f Don't wrap lines in rST line blocks.
Closes #1656.

Fixing pandoc to wrap the lines but insert spaces would be much
more complicated.  This at least makes the output semantically
correct.
2014-09-29 21:48:59 -07:00
John MacFarlane
fe6d43b3e0 Merge pull request #1601 from jkr/windowsfix
Fix path-slashes inside archive for windows
2014-09-27 16:21:17 -07:00
John MacFarlane
9c4e33f085 Merge pull request #1589 from mszep/master
Add function to sanitize ConTeXt labels
2014-09-27 16:20:56 -07:00
Matthew Pickering
5cb475c374 Org Reader: Parse multi-inline terms correctly in definition list
Closes #1649
2014-09-27 22:40:25 +01:00
Artyom
bc115ffc2d Fix 'Ext_lists_without_preceding_blankline' bug.
* Fixes #1636.
  * Adds a test.
2014-09-26 13:32:08 +04:00
mpickering
6740a9592a HTML Reader: Recognise <br> tags inside <pre> blocks
Closes #1620
2014-09-25 19:20:12 +01:00
mpickering
1f0ba8ec11 HTML Writer: Don't double render when email-obfuscation=none
Closes #1625
2014-09-25 18:46:36 +01:00
mpickering
515a120d04 Add support for KaTeX HTML math
Closes #1626
2014-09-25 18:32:42 +01:00
mpickering
575c76e36b HTML Writer: MathML now outputted with tex annotation.
Closes #1635
2014-09-25 15:28:50 +01:00
mpickering
cc07d0c6bf Shared: Make collapseFilePath OS-agnostic 2014-09-25 12:42:53 +01:00
mpickering
56e4ecab20 MediaBag: Fixes Windows specific path problems
Changes the internal representation to fix the problem.

I haven't tested this on windows.

Closes #1597
2014-09-25 12:19:52 +01:00
Mark Szepieniec
84b75a1c2a ConTeXt writer: add function toLabel
This function can be used to sanitize reference labels so that
they do not contain any of the illegal characters \#[]",{}%()|= .

Currently only Links have their labels sanitized, because they
are the only Elements that use passed labels.
2014-09-18 23:27:14 +02:00
Jesse Rosenthal
020a527c15 Docx writer: Renumber header and footer relationships to avoid collisions.
We previously took the old relationship names of the headers and footer in
secptr. That led to collisions. We now make a map of availabl names in the
relationships file, and then rename in secptr.
2014-09-11 15:11:44 -04:00
Jesse Rosenthal
1326a41780 LaTeX writer: Protect graphics in headers.
Graphics in `\section`/`\subsection` etc titles need to be `\protect`ed.

This adds a state value and manually turns it on before every invocation
of `sectionHeader` and manually turns it off after. Using a writer value
and applying `local` would probably be cleaner, but this fits with the
current style.
2014-09-09 10:56:37 -04:00
Jesse Rosenthal
132814aeb6 Docx Reader: Remove header class properly in other langs
When we encounter one of the polyglot header styles, we want to remove
that from the par styles after we convert to a header. To do that, we
have to keep track of the style name, and remove it appropriately.
2014-09-06 07:53:29 -04:00
Jesse Rosenthal
71452946d9 Docx reader: Use polyglot header list.
We're just keeping a list of header formats that different languages use
as their default styles. At the moment, we have English, German, Danish,
and French. We can continue to add to this.

This is simpler than parsing the styles file, and perhaps less
error-prone, since there seems to be some variations, even within a
language, of how a style file will define headers.
2014-09-05 21:59:58 -04:00
Jesse Rosenthal
13fefd7959 Docx Reader: Start list of polyglot section headers. 2014-09-05 17:31:24 -04:00
Jesse Rosenthal
73b887e2df Org reader: Added state changing blanklines.
This allows us to emphasize at the beginning of a new paragraph (or, in
general, after blank lines).
2014-09-04 19:55:53 -04:00
Jesse Rosenthal
ac8ed1fa93 Docx reader: Rewrite rewriteLink to work with new headers.
There could be new top-level headers after making lists, so we have to
rewrite links after that.
2014-09-04 16:44:21 -04:00
Jesse Rosenthal
7fe54505df Docx reader: Single-item headers in ordered lists are headers.
When users number their headers, Word understands that as a single item
enumerated list. We make the assumption that such a list is, in fact, a header.
2014-09-04 16:35:57 -04:00
Jesse Rosenthal
4ef850ded5 Docx reader: Fix window path for image lookup.
Don't use os-sensitive "combine", since we always want the paths in our
zip-archive to use forward-slashes.
2014-09-02 13:45:01 -04:00
John MacFarlane
db90667a79 EPUB writer: Don't include nav node in spine unless --toc was requested.
Previously we included it in the spine with `linear="no"`, leading
to odd results in some readers.

Closes #1593.
2014-09-01 16:31:32 -07:00
John MacFarlane
cb1a8da01c LaTeX writer: Avoid using reserved characters as \lstinline delimiters.
Closes #1595.
2014-09-01 10:11:09 -07:00
John MacFarlane
43ebb0229f EPUB writer: Fixed typo. 2014-09-01 08:04:21 -07:00
John MacFarlane
3533218d6d Merge pull request #1594 from jkr/itemFix
Item fix
2014-08-31 19:31:38 -07:00
John MacFarlane
01b7957812 EPUB writer: Extract title even from structured title.
Added docTitle'.
2014-08-31 14:47:07 -07:00
Jesse Rosenthal
d3053807a8 LaTeX writer: Put ~ before header in item text.
Because of the built-in line skip, LaTeX can't handle a section header
as the first element in a list item. (To be precise, it can't handle it
if the list immediately follows a section header, but the instance is
rare enough that we can afford to be a bit more general). This puts a
non-breaking space before the header to solve this problem. We won't see
this space, since the header skips a line before printing anyway.

The output is ugly in LaTeX and this structure seems like it should
probably be avoided. But it is valid HTML and native pandoc, so we
should have some sort of typesettable representation in LaTeX.
2014-08-31 16:05:09 -04:00
John MacFarlane
598d3ee23b Markdown reader: better handling of paragraph in div.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.

Test case:

    <div class="first">
    This is a paragraph.

    This is another paragraph.
    </div>

Closes #1591.
2014-08-31 12:55:47 -07:00
John MacFarlane
54df49335a EPUB writer: Don't use opf:title-type for epub2.
It is not supported and epubcheck complains.
2014-08-31 12:08:17 -07:00
John MacFarlane
611bc27862 Shared: Moved import of toChunks outside of conditional.
Closes #1590.
2014-08-31 11:17:53 -07:00
John MacFarlane
374bb3c147 DokuWiki writer: Make tables prettier by aligning columns.
Also cleaned up crufty code and added tests.
2014-08-30 21:24:33 -07:00
John MacFarlane
d97aed3903 DokuWiki writer: Handle table cell alignments.
Closes #1566.
2014-08-30 20:54:33 -07:00
John MacFarlane
a8273009ba Textile reader: Improved table support.
We can now handle all different alignment types, for simple
tables only (no captions, no relative widths, cell contents just
plain inlines).  Other tables are still handled using raw HTML.

Addresses #1585 as far as it can be addresssed, I believe.
2014-08-30 20:34:42 -07:00
John MacFarlane
b50927527c PDF: Catch errors in conversion of images and display message.
See #1582.
2014-08-30 18:45:58 -07:00
John MacFarlane
f70e3c3297 Merge branch 'mime' of https://github.com/Aelve/John into Aelve-mime
Conflicts:
	src/Text/Pandoc/Writers/Docx.hs
2014-08-30 11:49:50 -07:00
John MacFarlane
8b09d954f9 Merge pull request #1580 from jkr/stringCellDokuWiki
DokuWiki writer: Backslash newlines in table cells
2014-08-30 09:22:38 -07:00
Jesse Rosenthal
ccda2a902c DokuWiki writer: Use backslash newlines in table cells.
Write out strings in table cells with backslash linebreaks in place of
newlines. We also want to remove the first two spaces of an indent in lists.
2014-08-30 07:17:55 -04:00
John MacFarlane
218633548f Merge pull request #1574 from jlduran/latex-horizontal-rule
LaTeX writer: Make Horizontal Rules more flexible
2014-08-29 21:40:55 -07:00
John MacFarlane
017d44af1d Merge branch 'ugly-tables' of https://github.com/jlduran/pandoc into jlduran-ugly-tables 2014-08-29 21:24:36 -07:00
Jose Luis Duran
4c684561ee LaTeX writer: Add \strut to fix multiline tables
See: http://tex.stackexchange.com/questions/34971
2014-08-29 13:54:08 +00:00
Jesse Rosenthal
c931be24e1 Docx Reader: Read single para in table cell as plain
This makes to docx reader's native output fit with the way the markdown
reader understands its markdown output. Ie, as far as table cells go:

docx -> native == docx -> native -> markdown -> native

(This identity isn't true for other things outside of table cells, of
course).
2014-08-28 14:35:33 -04:00
Jose Luis Duran
1fc665c07d LaTeX writer: Make Horizontal Rules more flexible
Currently, pandoc has hard-coded the following in order to make horizontal
rules in LaTeX:

```hs
"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
```

Which is fine, but does not allow customizations.  It also does not take into
consideration the current line width.

I'm proposing this change:

```diff
@@ In Writers/LaTeX.hs:
-"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
+"\\begin{center}\\rule{0.5\\linewidth}{\\linethickness}\\end{center}"
```
2014-08-28 03:12:37 +00:00
Jose Luis Duran
f1d330b7b5 LaTeX writer: Fix tables
- [x] Fix a bug introduced in 66378062b6, which
  causes the table caption to repeat across all pages
- [x] Address the issues discussed
  [here](https://groups.google.com/forum/#!msg/pandoc-discuss/qMu6_5lYy0o/ZAU7lzAIKw0J)
  regarding the extra vertical space.
  - [ ] NOTE: This will cause multiline table cells to appear unpadded. See
    http://tex.stackexchange.com/questions/34971
  - [x] Use [`\tabularnewline`](http://tex.stackexchange.com/questions/78796)
    instead of `\\`.
2014-08-28 02:02:20 +00:00
Matthew Pickering
404a58f456 DokuWiki Writer: Refactor to use Reader monad 2014-08-27 14:29:09 +01:00