Commit graph

206 commits

Author SHA1 Message Date
Nikolay Yakimov
c1ff165154 MD Reader: Tests for links/footnotes after citations
In-text citation suffix clashes with links and footnotes
2015-04-20 01:31:45 +03:00
John MacFarlane
343b6051da Added test case for #2062. 2015-04-18 19:00:18 -07:00
John MacFarlane
d3544dc6f7 Markdown definition lists: don't require indent for first line.
Previously the body of the definition (after the `:` or `~` marker)
needed to be in column 4.  This commit relaxes that requirement,
to better match the behavior of PHP Markdown Extra.  So, now
this is a valid definition list:

    foo
    : bar

This patch also helps resolve a potentially ambiguity with table
captions:

    foo

      : bar

      -----
      table
      -----

Is "bar" a definition, or the caption for the table?  We'll count
it as a caption for the table.

Closes #2087.
2015-04-18 10:13:32 -07:00
John MacFarlane
10e28ef750 More principled fix for #1820.
If the tag parses as a comment, we check to see if the
input starts with `<!--`. If not, it's bogus comment mode
and we fail htmlTag.

Includes test case.  Closes #1820.
2015-04-17 22:56:33 -07:00
John MacFarlane
28ca8566ab Merge pull request #1954 from mcmtroffaes/feature/citekey-firstchar-alphanum
Allow digit as first character of a citation key.
2015-04-17 19:10:37 -07:00
Nikolay Yakimov
94e4a5ec44 MD Reader: Test for smart ' after inline math 2015-04-18 00:53:20 +03:00
Nikolay Yakimov
251ce0738d LaTeX Reader: Test for ^^ character escapes 2015-04-13 03:22:39 +03:00
John MacFarlane
2d2e4c9ab2 Merge branch 'master' of https://github.com/rootzlevel/pandoc into rootzlevel-master
Conflicts:
	src/Text/Pandoc/Readers/Org.hs
2015-03-28 21:09:38 -07:00
John MacFarlane
6a3a04c428 Merge branch 'errortype' of https://github.com/mpickering/pandoc into mpickering-errortype
Conflicts:
	benchmark/benchmark-pandoc.hs
	src/Text/Pandoc/Readers/Markdown.hs
	src/Text/Pandoc/Readers/Org.hs
	src/Text/Pandoc/Readers/RST.hs
	tests/Tests/Readers/LaTeX.hs
2015-03-28 12:12:48 -07:00
Craig S. Bosma
513221f822 Org reader: add support for smart punctuation 2015-03-09 07:11:53 -05:00
Mathias Schenner
12bf0ff3e5 LaTeX reader: allow non-empty colsep in tables
The `tabular` environment allows non-empty column separators
with the "@{...}" syntax. Previously, pandoc would fail to
parse tables if a non-empty colsep was present. With this
commit, these separators are still ignored, but the table gets
parsed. A test case is included.
2015-03-08 15:47:39 +01:00
Mathias Schenner
1e3ef0e36f LaTeX reader: allow valign argument in tables
The `tabular` environment takes an optional parameter for
vertical alignment. Previously, pandoc would fail to parse
tables if this parameter was present. With this commit,
the parameter is still ignored, but the table gets
parsed. A test case is included.
2015-03-08 15:39:18 +01:00
Mathias Schenner
4f9a10619f LaTeX reader: add some test cases for simple tables 2015-03-08 15:17:09 +01:00
Hans-Peter Deifel
5871955169 Org reader: Add test for image links
Tests for image links with non-image targets, as introduced in
commit 2ca5101.
2015-02-26 13:11:50 +01:00
Jesse Rosenthal
9654514e8a Docx reader: add test for verbatim in sub/superscript. 2015-02-21 08:45:38 -05:00
Jesse Rosenthal
2995526772 Docx reader: Add tests for new list style parsing. 2015-02-19 00:24:04 -05:00
Matthew Pickering
1a7a99161a Update tests 2015-02-18 21:09:07 +00:00
Matthias C. M. Troffaes
dccd408a9c Allow digit as first character of a citation key.
* Update parser to recognize citation keys starting with a digit.
* Update documentation accordingly.
* Test case added.

See https://github.com/jgm/pandoc-citeproc/issues/97
2015-02-18 15:30:17 +00:00
Jesse Rosenthal
616e211f36 Docx reader: test lists in table cells. 2015-02-13 09:08:07 -05:00
Jesse Rosenthal
e88119f2d1 Docx Reader: Add test for VML images.
Since images are often visually (not structurally) placed on the page,
people might not always get the results they're looking for here.
2015-01-21 13:41:16 -05:00
John MacFarlane
a864e9a348 Merge pull request #1805 from bergey/rst
RST Reader - Improved Role Support
2014-12-15 09:06:45 -08:00
John MacFarlane
1d3ca088f2 Merge pull request #1813 from tarleb/file-links
Org reader: properly handle links to `file:target`
2014-12-14 13:36:34 -08:00
Albert Krewinkel
4d85b17fc5 Org reader: properly handle links to file:target
Org links like `[[file:target][title]]` were not handled correctly,
parsing the link target verbatim.  The org reader is changed such that
the leading `file:` is dropped from the link target.

This is related to issues #756 and #1812.
2014-12-14 21:30:10 +01:00
John MacFarlane
2b08e32a90 Fixe autolinks with following punctuation.
Closes #1811.
The price of this is that autolinked bare URIs can no longer
contain `>` characters, but this is not a big issue.
2014-12-14 12:20:33 -08:00
Daniel Bergey
689fb112bf RST Reader: compute Attrs when role is defined
Move recursive role lookup from renderRole to addNewRole.  The Attr value
will be the same for every occurance of this role, so there's no reason
to compute it every time.  This allows simplifying the
stateRstCustomRoles map considerably.

We could go even further, and remove the fmt and attr arguments to
renderRole, which are null except for custom roles.
2014-12-12 14:45:45 +00:00
Daniel Bergey
4e040160e0 WIP: tests for RST roles 2014-12-12 14:45:45 +00:00
Daniel Bergey
74c1b547c2 parse RST class directives
The class directive accepts one or more class names, and creates a Div
value with those classes.  If the directive has an indented body, the
body is parsed as the children of the Div.  If not, the first block
folowing the directive is made a child of the Div.

This differs from the behavior of rst2xml, which does not create a Div
element.  Instead, the specified classes are applied to each child of
the directive.  However, most Pandoc Block constructors to not take an
Attr argument, so we can't duplicate this behavior.
2014-12-01 18:22:03 +00:00
Daniel Bergey
2cdfa5eb20 parse RST quoted literal blocks
closes #65
RST quoted literal blocks are the same as indented literal blocks (which
pandoc already supports) except that the quote character is preserved in
each line.

This includes test cases for the quoted literal block, as well as
additional tests for line blocks and indented literal blocks, to verify
that these are unaffected by the changes.
2014-12-01 18:22:03 +00:00
John MacFarlane
46d343f474 Fixed bug in org with bulleted lists:
- a
   - b
   * c

was being parsed as a list, even though an unindented `*`
should make a heading.  See
<http://orgmode.org/manual/Plain-lists.html#fn-1>.
2014-11-13 23:40:18 -08:00
John MacFarlane
43c1978fae Merge pull request #1645 from neongreen/issue1636
Fix 'Ext_lists_without_preceding_blankline' bug.
2014-11-12 09:05:29 -08:00
Albert Krewinkel
e6cd8c9077 Org reader: allow empty links for gitit interop
While empty links are not allowed in Emacs org-mode,  Pandoc org-mode
should support them: gitit relies on empty links as they are used to
create wiki links.

Fixes jgm/gitit#471
2014-11-05 23:15:28 +01:00
Albert Krewinkel
daaf635806 Org reader: absolute, relative paths in links
The org reader was to restrictive when parsing links, some relative
links and links to files given as absolute paths were not recognized
correctly.  The org reader's link parsing function was amended to handle
such cases properly.

This fixes #1741
2014-11-05 22:27:25 +01:00
Jesse Rosenthal
60846471a3 Docx test: Remove Danish header test.
Redundant, now that we're testing for a more generalized sort of
internationalized blocks.
2014-10-25 16:02:31 -04:00
Jesse Rosenthal
c0ddcb359e Docx reader: add tests for i18n headers.
This tests blockquotes and headers in Russian. Previous tests make sure
that this doesn't produce a regression in en-us Header and Blockquotes.
2014-10-25 16:00:27 -04:00
Albert Krewinkel
a5eb02f6a7 Org reader: parse LaTeX-style MathML entities
Org supports special symbols which can be included using LaTeX syntax,
but are actually MathML entities.  Examples for this are
`\nbsp` (non-breaking space), `\Aacute` (the letter A with accent acute)
or `\copy` (the copyright sign ©).

This fixes #1657.
2014-10-20 22:57:36 +02:00
John MacFarlane
84f6b1e41a Merge pull request #1680 from shelf/master
Respect indent when parsing Org bullet lists
2014-10-18 13:20:27 -07:00
John MacFarlane
31713d572a Merge pull request #1700 from tarleb/org-emphasis-fix
Org reader: fix rules for emphasis recognition
2014-10-18 13:19:42 -07:00
Albert Krewinkel
e3c36ed6ce Org reader: Drop COMMENT document trees
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported.  Those trees are dropped
silently.

This closes #1678.
2014-10-18 22:11:53 +02:00
Albert Krewinkel
d571bec454 Org reader: fix rules for emphasis recognition
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings.  This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text.  This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
2014-10-18 12:47:59 +02:00
Timothy Humphries
f1f56e8533 Fix indent issue for definition lists
Tidy up fix for #1650, #1698 as per comments in #1680.
Fix same issue for definition lists with the same method.
2014-10-17 20:06:25 -04:00
Timothy Humphries
4f4b0f031d Respect indent when parsing Org bullet lists
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue #1650.
2014-10-12 03:18:36 -04:00
John MacFarlane
fe6d43b3e0 Merge pull request #1601 from jkr/windowsfix
Fix path-slashes inside archive for windows
2014-09-27 16:21:17 -07:00
Matthew Pickering
fa2d11c954 Update tests for #1649 2014-09-27 22:40:25 +01:00
Artyom
bc115ffc2d Fix 'Ext_lists_without_preceding_blankline' bug.
* Fixes #1636.
  * Adds a test.
2014-09-26 13:32:08 +04:00
mpickering
c0b9ad4c5d EPUB Tests: Seperating image testing from other features 2014-09-25 13:33:25 +01:00
Jesse Rosenthal
f56e0e958a Docx reader: Add test for polyglot headers.
Only Danish at the moment.
2014-09-05 22:07:06 -04:00
Jesse Rosenthal
313355e373 Org reader: Update Tests
Test for markup after blank line.
2014-09-04 19:55:53 -04:00
Jesse Rosenthal
08359c44e4 Docx Reader: Add tests for numbered headers. 2014-09-04 19:39:49 -04:00
Jesse Rosenthal
a6eead7f26 Docx reader: Modify mediabag test accordingly. 2014-09-02 14:05:54 -04:00
John MacFarlane
598d3ee23b Markdown reader: better handling of paragraph in div.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.

Test case:

    <div class="first">
    This is a paragraph.

    This is another paragraph.
    </div>

Closes #1591.
2014-08-31 12:55:47 -07:00
mpickering
2cd049a1bf Txt2Tags reader: Header is now parsed only if standalone flag is set 2014-08-20 18:11:37 +01:00
Jesse Rosenthal
180f5cbe63 Docx reader: Test for character styles. 2014-08-16 14:05:56 -04:00
John MacFarlane
40e67b8737 Revised tests directory.
Renamed some tests, introducing subsidiary directories
for fb2, docx, epub.

Cleaned up tests in cabal file.

Combined dokuwiki-writer and dokuwiki_inline_formatting tests.
2014-08-13 11:16:50 -07:00
Jesse Rosenthal
0808449547 Docx: Add dropcap tests. 2014-08-11 23:10:50 -04:00
Matthew Pickering
f33ae631f3 Improved EPUB Tests
Rewrote features test to remove all unimplemented features.

There are now all three examples of where an image can be included in
the test.
  1. Cover image
  2. As a spine elemnt
  3. In the document

Tests have also been added to make sure that the mediabag contains all
these images after processing.
2014-08-10 14:58:53 +01:00
Jesse Rosenthal
98d14b2b2a Docx reader: Test inline image code. 2014-08-07 15:34:49 -04:00
Jesse Rosenthal
ed71e9b31d Docx tests: rewrite mediabag tests.
This will allow us to test the whole mediabag (making sure, for example,
that images are added with the correct keys) instead of just individual
extracted images. We compare each entry in the media bag to an image
extracted on the fly from the docx. As a result, we only need one file
to test with.

The image in the current tests was also replaced with a smaller one.
2014-07-31 15:47:45 -04:00
John MacFarlane
6dd2418476 New module, Text.Pandoc.MediaBag.
Moved `MediaBag` definition and functions from Shared:
`lookupMedia`, `mediaDirectory`, `insertMedia`, `extractMediaBag`.
Removed `emptyMediaBag`; use `mempty` instead, since `MediaBag`
is a Monoid.
2014-07-31 12:00:21 -07:00
John MacFarlane
00662faefb Made MediaBag a newtype, and added mime type information to media.
Shared now exports functions for interacting with a MediaBag:

- `emptyMediaBag`
- `lookuMedia`
- `insertMedia`
- `mediaDirectory`
- `extractMediaBag`
2014-07-31 11:05:35 -07:00
Jesse Rosenthal
4d1d8a4b6f Docx test: Test image from media bag. 2014-07-30 22:32:55 -04:00
Jesse Rosenthal
16f88edb3b Docx tests: Added media test comparison function.
Also tell pandoc.cabal that we'll be needing base64, since we want to
compare strings here.
2014-07-30 22:31:38 -04:00
Jesse Rosenthal
941df1b0de Docx reader: change tests to make use of media bag. 2014-07-30 12:46:53 -04:00
Jesse Rosenthal
54708da371 Add and update docx tests in pandoc.cabal. 2014-07-29 13:05:19 -04:00
Jesse Rosenthal
840108a9c1 Docx reader: Make metavalues out of styled paragraphs.
This will make paragraphs styled with `Author`, `Title`, `Subtitle`,
`Date`, and `Abstract` into pandoc metavalues, rather than text. The
implementation only takes those elements from the beginning of the
document (ignoring empty paragraphs).

Multiple paragraphs in the `Author` style will be made into a metaList,
one paragraph per item. Hard linebreaks (shift-return) in the paragraph
will be maintained, and can be used for institution, email, etc.
2014-07-29 13:03:01 -04:00
Matthew Pickering
e340a7da02 Txt2Tags Reader: Added tests 2014-07-27 00:12:57 +01:00
John MacFarlane
4af8eed764 Markdown reader: revised definition list syntax (closes #1429).
* This change brings pandoc's definition list syntax into alignment
  with that used in PHP markdown extra and multimarkdown (with the
  exception that pandoc is more flexible about the definition markers,
  allowing tildes as well as colons).

* Lazily wrapped definitions are now allowed; blank space is required
  between list items; and the space before definition is used to
  determine whether it is a paragraph or a "plain" element.

* For backwards compatibility, a new extension,
  `compact_definition_lists`, has been added that restores the behavior
  of pandoc 1.12.x, allowing tight definition lists with no blank space
  between items, and disallowing lazy wrapping.
2014-07-20 16:33:59 -07:00
John MacFarlane
87096c64f8 Org reader: text adjacent to a list yields a Plain, not Para.
This gives better results for tight lists.  Closes #1437.

An alternative solution would be to use Para everywhere, and
never Plain.  I am not sufficiently familiar with org to know
which is best.  Thoughts, @tarleb?
2014-07-20 12:56:01 -07:00
Craig S. Bosma
1bb4f0c497 Org reader: Respect :exports header arguments on code blocks
Adds support to the org reader for conditionally exporting either the code block,
results block immediately following, both, or neither, depending on the value
of the `:exports` header argument. If no such argument is supplied, the default
org behavior (for most languages) of exporting code is used.
2014-07-17 10:23:22 -05:00
Jesse Rosenthal
643435f1de Docx reader: Add test
Test auto ident header anchors with pandoc-generated pandoc.
2014-07-15 18:32:19 +01:00
John MacFarlane
ff86702a95 Added failing test for issue #1121. 2014-07-10 14:23:20 -07:00
John MacFarlane
d1ac594d4a Added test for issue #1330. 2014-07-07 22:27:28 -06:00
John MacFarlane
f96a2b91f5 Reorganized some markdown tests. 2014-07-07 22:21:04 -06:00
John MacFarlane
e4263d306e Revamped raw HTML block parsing in markdown.
- We no longer include trailing spaces and newlines in the
  raw blocks.
- We look for closing tags for elements (but without backtracking).
- Each block-level tag is its own RawBlock; we no longer try to
  consolidate them (though `--normalize` will do so).

Closes #1330.
2014-07-07 15:53:59 -06:00
Jesse Rosenthal
1405e7b709 Docx reader: Add tests for hanging indent handline.
We want to treat it as a plain paragraph if the hanging amount is
greater to or equal to the left indent---i.e., if the first line has
zero indentation. But we still want it to be a block quote if it starts
to the right of the margin. Someone might format verse with wrapping
lines with a hanging indent, for example.
2014-06-29 23:37:00 -04:00
Jesse Rosenthal
afdc0af779 Track changes tests. 2014-06-25 16:13:59 -04:00
Jesse Rosenthal
a2b6ab847c Docx reader: Add tests for basic track changes
This is what seems like the sensible default: read in insertions, and
ignore deletions. In the future, it would be good if options were
available for either taking in deletions or keeping both in some
scriptable format.
2014-06-25 11:09:28 -04:00
Jesse Rosenthal
2621482d69 Docx Reader: add failing defintion list tests. 2014-06-24 12:11:57 -04:00
Jesse Rosenthal
21295c5ab5 Docx reader: add failing tests for inline code and code blocks. 2014-06-24 10:33:49 -04:00
Jesse Rosenthal
9b954fa855 Add test for correctly trimming spaces in formatting.
This used to be fixed in the tree-walking. We need to make sure we're doing it
right now.
2014-06-23 17:08:26 -04:00
Jesse Rosenthal
ed43513087 Docx reader tests: add tests for normalization deep in blocks. 2014-06-22 01:58:41 -04:00
Jesse Rosenthal
ca4add679c Add normalization test.
Add torture-test for new normalization functions.

One problem that this test demonstrates is that word has a tendency to
turn off formatting at a space, and then turn it back on after. I'm not
sure yet whether this is something we should fix.
2014-06-22 00:46:19 -04:00
Jesse Rosenthal
a4508d7fcf Docx reader tests: Introduce NoNormPandoc type.
This is just a wrapper around Pandoc that doesn't normalize with
`toString`. We want to make sure that our own normalization process
works. If, in the future, we are able to hook into the builder's
normalization, this will be removed.
2014-06-20 18:37:52 -04:00
Jesse Rosenthal
da0d1d27ac Add tabs tests. 2014-06-19 19:33:22 -04:00
Jesse Rosenthal
ceb742b124 Add ReaderOptions to the docx tests
This will allow for testing different media embedding (in addition to
any other applicable options.)
2014-06-19 12:16:53 -04:00
John MacFarlane
bbe99003f8 Naming: Use Docx instead of DocX.
For consistency with the existing writer.
2014-06-16 22:44:40 -07:00
John MacFarlane
bec9f3c641 Merge branch 'docx' of https://github.com/jkr/pandoc into jkr-docx 2014-06-16 22:16:45 -07:00
John MacFarlane
78ee2416d1 Org reader: make tildes create inline code.
Closes #1345.  Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.

I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc.  The previous behavior resulted in the
text not appearing in any output format.
2014-06-16 22:03:26 -07:00
Jesse Rosenthal
f928e4c8dc Add DocX automated tests.
Note this makes use of input and output files in the tests/ dir.
2014-06-16 07:18:40 -04:00
Albert Krewinkel
3238a2f919 Org reader: support for inline LaTeX
Inline LaTeX is now accepted and parsed by the org-mode reader.  Both,
math symbols (like \tau) and LaTeX commands (like \cite{Coffee}), can be
used without any further escaping.
2014-05-20 22:29:21 +02:00
Albert Krewinkel
ceeb701c25 Org reader: support Pandocs citation extension
Citations are defined via the "normal citation" syntax used in markdown,
with the sole difference that newlines are not allowed between "[...]".
This is for consistency, as org-mode generally disallows newlines
between square brackets.

The extension is turned on by default and can be turned off via the
default syntax-extension mechanism, i.e. by specifying "org-citation" as
the input format.
Move `citeKey` from Readers.Markdown into Parsing

The function can be used by other readers, so it is made accessible for
all parsers.
2014-05-14 15:00:26 +02:00
Albert Krewinkel
c5fd631b55 Org reader: Fix block parameter reader, relax constraints
The reader produced wrong results for block containing non-letter chars
in their parameter arguments.  This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).

Thanks to Xiao Hanyu for noticing this.
2014-05-10 11:35:54 +02:00
Albert Krewinkel
07694b3018 Org reader: Fix parsing of blank lines within blocks
Blank lines were parsed as two newlines instead of just one.
Thanks to Xiao Hanyu (@xiaohanyu) for pointing this out.
2014-05-09 18:23:23 +02:00
Albert Krewinkel
757c4f68f3 Org reader: Support arguments for code blocks
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader.  This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.

This closes #1286.
2014-05-09 18:08:30 +02:00
Albert Krewinkel
71bd4fb2b3 Org reader: Read inline code blocks
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document.  The blocks are read as inline code and marked with
the special class `rundoc-block`.  Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.

This closes #1278.
2014-05-06 13:21:26 +02:00
John MacFarlane
1e50424892 Added test for #1154. 2014-05-04 08:19:48 -07:00
Albert Krewinkel
8726eebcd3 Org reader: Add support for custom link types
Org allows users to define their own custom link types.  E.g., in a
document with a lot of links to Wikipedia articles, one can define a
custom wikipedia link-type via

    #+LINK: wp https://en.wikipedia.org/wiki/

This allows to write [[wp:Org_mode][Org-mode]] instead of the
equivallent [[https://en.wikipedia.org/wiki/Org_mode][Org-mode]].
2014-05-01 11:50:32 +02:00
Albert Krewinkel
2eec20d92f Org reader: Enable internal links
Internal links in Org are possible by using an anchor-name as the target
of a link:

[[some-anchor][This]] is an internal link.

It links <<some-anchor>> here.
2014-04-25 15:29:28 +02:00
Albert Krewinkel
c128daba9d Org reader: Recognize plain and angle links
This adds support for plain links (like http://zeitlens.com) and angle
links (like <http://moltkeplatz.de>).
2014-04-24 17:55:24 +02:00
Albert Krewinkel
8276449520 Org reader: Allow for compact definition lists
Use `Text.Pandoc.Shared.compactify'DL` to allow for compact definition
lists.
2014-04-19 15:13:16 +02:00
Albert Krewinkel
8e91d362a3 Org reader: Fix parsing of footnotes
Footnotes can consist of multiple blocks and end only at a header or at
the beginning of another footnote.  This fixes the previous behavior,
which restricted notes to a single paragraph.
2014-04-19 14:40:46 +02:00