Commit graph

1927 commits

Author SHA1 Message Date
John MacFarlane
0487eae7ee Markdown reader: Fixed bug in code block attribute parser.
Previously the ID attribute got lost if it didn't come first.
Now attributes can come in any order.
2012-01-28 12:36:51 -08:00
John MacFarlane
d1ded4b026 Support github syntax for fenced code blocks.
You can now write

    ```ruby
    x = 2
    ```

instead of

    ~~~ {.ruby}
    x = 2
    ~~~~
2012-01-28 12:25:24 -08:00
John MacFarlane
ff93a8e789 Fixed table parsing with wide or combining characters.
Closes #348.  Closes #108.
2012-01-27 00:39:00 -08:00
John MacFarlane
1ce7c38bc4 LaTeX reader: Handle \@. 2012-01-26 11:52:25 -08:00
John MacFarlane
ba81cda7f1 Added Docx writer.
* New module `Text.Pandoc.Docx`.
* New output format `docx`.
* Added reference.docx.
* New option `--reference-docx`.

The writer includes support for highlighted code blocks
and math (which is converted from TeX to OMML using
texmath's new OMML module).
2012-01-19 12:10:49 -08:00
John MacFarlane
a3988d89c8 Added "title" to list of docbook block-level tags. 2012-01-12 21:13:52 -08:00
John MacFarlane
5b49c47414 Markdown reader: fixed bug in table/hrule parsing.
Top line of table must not be followed by a blank line.
This bug caused slowdown on some files with hrules and tables,
and pandoc tried to interpret the hrules as the tops of
multiline tables.
2012-01-10 12:45:19 -08:00
John MacFarlane
0ee49911f6 Markdown reader: Allow links in image captions.
This change also means that

[link with [link](/url)](/url)

will turn into

<p><a href="/url">link with link</a></p>

instead of

<p><a href="/url">link with [link](/url)</a></p>
2012-01-08 09:52:39 -08:00
John MacFarlane
5b7c209373 Markdown reader: Fix parsing of consecutive lists.
Pandoc previously behaved like Markdown.pl for consecutive
lists of different styles. Thus, the following would be parsed
as a single ordered list, rather than an ordered list followed
by an unordered list:

    1. one
    2. two

    - one
    - two

This patch makes pandoc behave more sensibly, parsing this as
two lists.  Any change in list type (ordered/unordered) or in
list number style will trigger a new list. Thus, the following
will also be parsed as two lists:

    1. one
    2. two

    a. one
    b. two

Since we regard this as a bug in Markdown.pl, and not something
anyone would ever rely on, we do not preserve the old behavior
even when `--strict` is selected.
2012-01-02 17:04:59 -08:00
John MacFarlane
da8425598a New treatment of dashes in --smart mode.
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.

Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably.  This
change gives users greater control.  The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
2012-01-01 13:48:28 -08:00
John MacFarlane
3cf60c7306 Support for math in RST reader and writer.
Inline math uses the :math:`...` construct.

Display math uses

  .. math:: ...

or if multilin

  .. math::

     ...

These seem to be supported now by rst2latex.py.
2011-12-31 11:40:47 -08:00
John MacFarlane
d8272d0356 Support Sphinx style math in RST reader.
Inline:  :math:`E=mc^2`

Block:

.. math: E = mc^2

.. math::

   E = mc^2

   a = b^2

(This latter will turn into a paragraph with two
display math elements.)

Closes #117.
2011-12-30 23:46:43 -08:00
John MacFarlane
925a4c5164 Better smart quote parsing.
* Added stateLastStrPos to ParserState. This lets us keep track
  of whether we're parsing the position immediately after a 'str'.
  If we encounter a ' in such a location, it must be an apostrophe,
  and can't be a single quote start.

* Set this in the markdown, textile, html, and rst str parsers.

* Closes #360.
2011-12-29 23:44:12 -08:00
John MacFarlane
a579e2c892 Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings. 2011-12-27 15:45:34 -08:00
John MacFarlane
8838f473a8 LaTeX reader: Return Str instead of Apostrophe. 2011-12-27 11:19:23 -08:00
John MacFarlane
4f76fe5a1d Markdown reader: Improved previous patch to allow unicode apostrophe. 2011-12-27 11:01:34 -08:00
John MacFarlane
dd96267626 Modified str parser to capture apostrophes in smart mode.
This solves a problem stemming from the fact that a parser
doesn't know what came *before* in the input stream.

Previously pandoc would parse

D'oh l'*aide*

as containing a single quoted "oh l", when both `'`s should
be apostrophes.  (Issue #360.)  There are two issues here.

(a) It is obvious that the first `'` is not an open quote,
becaues of the preceding `D`. This patch solves the problem.

(b) It is obvious to us that the second `'` is not an
open quote, because we see that *aide* is some text.
But getting a good algorithm that has good performance is
a bit tricky.  You can't assume that `'` followed by `*`
is always an apostrophe:

*'this is quoted'*

This patch does not fix (b).
2011-12-26 23:04:45 -08:00
John MacFarlane
9f9a57de19 Markdown reader: Fixed backslash escapes in reference links.
Closes #312.
2011-12-05 21:33:47 -08:00
John MacFarlane
26371975f8 Markdown: Better handling of escapes in link URLs and titles. 2011-12-05 21:13:24 -08:00
John MacFarlane
d34f85613a Changes to fit new charsInBalanced. 2011-12-05 20:55:23 -08:00
John MacFarlane
c39cdc15ba Markdown reader: internal changes.
Refactored escapedChar into escapedChar', escapedChar.
2011-12-05 20:27:10 -08:00
John MacFarlane
7b971517b0 Parsing: Changed type of escaped to return Char 2011-12-05 20:22:27 -08:00
John MacFarlane
bf4f8ffe55 LaTeX reader: Don't crash on commands like \itemsep.
Closes #314.
2011-11-12 13:20:29 -08:00
John MacFarlane
da57775171 LaTeX reader: Ignore empty groups {}, { }.
Closes #322.
2011-11-12 13:03:11 -08:00
John MacFarlane
d74e8d14a5 Markdown citations: don't strip off initial space in locator.
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere
else", or, with the new citeproc-hs, "and nowhere else".
Now it yields " and nowhere else".
2011-11-09 13:18:01 -08:00
John MacFarlane
c2f7ba3b69 TeXMath writer: Use unicode thin spaces for thin spaces.
Partially resolves issue #333.
2011-11-08 18:22:28 -08:00
John MacFarlane
dd6ed88707 Markdown reader: allow punctuation only internally in cite keys.
The characters '.',':',';','$','<','>','~','#','-','_' can
be used only between two letters or digits in a citation key.

This means that '@item1.' will be parsed as a citation, 'item1',
followed by a period, instead of a citation 'item1.', as was the
case previously.

Thanks to David Sanson for alerting us to the problem.
2011-11-06 16:00:23 -08:00
John MacFarlane
1b81981c5f HTML reader now recognizes DocBook block and inline tags.
It was always possible to include raw DocBook tags in a markdown
document, but now pandoc will be able to distinguish block from
inline tags and behave accordingly. Thus, for example,

    <sidebar>
    hello
    </sidebar>

will not be wrapped in `<para>` tags.
2011-10-25 12:44:20 -07:00
takahashim
724de8314c allow footnotes followed by newline without space chars 2011-08-23 09:56:58 +09:00
John MacFarlane
6c639d3420 HTML reader: Fixed bug parsing tables w both thead and tbody.
See bug #274, which was not completely fixed by the last patch.
2011-08-01 11:56:15 -07:00
John MacFarlane
8be6cc210c Added PRAGMA needed for ghc 6.12. 2011-07-30 19:58:46 -07:00
John MacFarlane
81381a9305 Removed applicative stuff in Markdown reader.
It requires parsec 3, and currently pandoc can build with parsec 2.
2011-07-30 19:43:20 -07:00
John MacFarlane
b66b7a791c Markdown reader: Improved emph/strong parsing.
Ported code from pandoc2.
Now all tests pass.
2011-07-30 18:08:49 -07:00
John MacFarlane
35cef01659 RST reader: Partial support for labeled footnotes.
Also made simpleReferenceName parser more accurate, which
affects several other parsers.
2011-07-23 18:51:02 -07:00
John MacFarlane
6424e7d02c Properly handle characters in the 128..159 range.
These aren't valid in HTML, but many HTML files produced by
Windows tools contain them.  We substitute correct unicode
characters.
2011-07-23 12:43:01 -07:00
John MacFarlane
fe14bf9447 LaTeX reader: Handle \subtitle command.
If there's a subtitle, it is added to the title,
separated by a colon and linebreak.  Closes #280.
2011-07-21 13:33:51 -07:00
John MacFarlane
6c029621ed LaTeX reader & writer: Use \and to separate authors.
Closes #279.
2011-07-21 10:09:51 -07:00
John MacFarlane
dd59cd2341 HTML reader: treat Plain as Para when needed.
For example, in

    Just a few glitches remaining.
    <ul><li> In this situation, one loses the list.
    </ul>
    And in this, the preformatting.
   <pre>Preformatted text not starting with its own blank line.
   </pre>

Thansk to Dirk Laurie for noticing the issue.
2011-07-16 09:42:16 -07:00
John MacFarlane
934867f858 HTML reader: Handle tbody, thead in simple tables.
Closes #274.
2011-07-15 21:16:49 -07:00
John MacFarlane
b30afc2009 Merge pull request #273 from qerub/master
Textile reader: Make it possible to have colons after links.
2011-07-11 08:31:29 -07:00
John MacFarlane
c83b578f58 LaTeX reader: Gobble option & space after linebreak \\[10pt]. 2011-07-10 19:07:40 -07:00
John MacFarlane
4134dad500 Make HTML reader more forgiving of bad HTML.
* Skip spaces after <b>, <emph>, etc.
* Convert Plain elements into Para when they're in a list
  item with Para, Pre, BlockQuote, CodeBlock.

An example of HTML that pandoc handles better now:

~~~~
<h4> Testing html to markdown </h4>
<ul>
<li>
<b> An item in a list </b>
<p> An introductory sentence.
<pre>
Some preformatted text
at this stage comes next.

But alas! much havoc
is wrought by Pandoc.
</pre>
</ul>
~~~~

Thanks to Dirk Laurie for reporting the issues.
2011-07-10 16:54:46 -07:00
Christoffer Sawicki
8fa4e8bff1 Textile reader: Make it possible to have colons after links. 2011-07-10 16:30:14 +02:00
John MacFarlane
9e71dc3f48 Support \dots and well as \ldots in LaTeX reader. 2011-06-22 20:06:29 -07:00
John MacFarlane
6e59053d32 Forbid ()s in citation item keys.
Resolves Issue #304: problems with

(@item1; @item2)

because the final paren was being parsed as part of
the item key.
2011-05-22 20:24:18 -07:00
John MacFarlane
b42c48e919 Disallow notes within notes in reST and markdown.
These previously caused infinite looping and stack overflows.
For example:

[^1]

[^1]: See [^1]

Note references are allowed in reST notes, so this isn't a full
implementation of reST. That can come later. For now we need to
prevent the stack overflows.

Partially resolves Issue #297.
2011-04-20 11:42:27 -07:00
John MacFarlane
4b90ffe1bd Allow '|' followed by newline in RST line block. 2011-04-11 14:45:42 -07:00
John MacFarlane
6beba76f61 Changed uri parser so it doesn't include trailing punctuation.
So, in RST, 'http://google.com.' should be parsed as a link
to 'http://google.com' followed by a period.

The parser is smart enough to recognize balanced parentheses,
as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'.

Also added ()s to RST specialChars, so '(http://google.com)'
will be parsed as a link in parens.

Added test cases.

Resolves Issue #291.
2011-03-18 11:30:20 -07:00
John MacFarlane
403bb521cd Fixed bug in RST field list parser.
The bug affected field lists with multi-line items at the
end of the list.
2011-03-12 17:08:23 -08:00
John MacFarlane
eebd77829c Markdown+lhs reader: Require space after inverse bird tracks.
The point of the change is to allow html tags to be used freely
at the left margin of a markdown+lhs document.

Thanks to Conal Elliot for the suggestion.
2011-03-02 12:47:17 -08:00