Commit graph

355 commits

Author SHA1 Message Date
John MacFarlane
17d48cf4af Markdown reader: Allow linebreaks in URLs (treat as spaces).
Also, a string of consecutive spaces or tabs is now parsed
as a single space. If you have multiple spaces in your URL,
use %20%20.
2010-12-10 12:14:51 -08:00
John MacFarlane
ee0a0953de Markdown reader: Rewrote para parser for better efficiency.
This change avoids repeated parsing of inline lists for 'plain'
blocks.
2010-12-10 10:47:46 -08:00
paul.rivier
bb609a85e3 textile redcloth definition lists 2010-12-09 09:25:46 -08:00
John MacFarlane
88a40685b8 Textile reader: better treatment of acronyms.
We now parse PBS(Public Broadcasting System) as if it were
"PBS (Public Broadcasting System)".
2010-12-09 08:52:09 -08:00
John MacFarlane
9ead748cc9 RST reader: Added footnote suppport.
Resolves issue #258.

Note that there are some differences in how docutils and
pandoc treat footnotes.  Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
2010-12-08 08:39:50 -08:00
John MacFarlane
91978d2201 Markdown reader: minor footnote changes.
Don't skipNonindentSpaces in noteMarker, since it's also
used in the inline note parser.
2010-12-08 08:17:16 -08:00
John MacFarlane
f02080b62d Textile reader: Implemented footnotes. 2010-12-08 00:44:46 -08:00
John MacFarlane
200ea33641 Made --smart work with RST reader. 2010-12-07 21:49:10 -08:00
John MacFarlane
5e35eb309f Make --smart work in HTML reader. 2010-12-07 21:24:35 -08:00
John MacFarlane
33ba35da9f Smart punctuation: recognize entities.
Now “Hi” gets parsed as a Quoted DoubleQuote inline.
2010-12-07 20:44:43 -08:00
John MacFarlane
e20052a1ba Markdown reader: Moved smartPunctuation parser, for slight speed bump. 2010-12-07 20:09:40 -08:00
John MacFarlane
50ca61ef49 Moved smartPunctuation from Markdown to Parsing.
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
2010-12-07 19:03:08 -08:00
John MacFarlane
f917b46500 Textile reader: implemented acronyms, (tm), (r), (c). 2010-12-07 18:28:36 -08:00
John MacFarlane
c66921f2ac Markdown reader: better handling of intraword _.
The 'str' parser now reads internal _'s as part of the string.
This prevents pandoc from getting started looking for an emphasized
block, which can cause exponential slowdowns in some cases.

Resolves Issue #182.
2010-12-06 22:12:18 -08:00
John MacFarlane
7864f30717 Markdown reader: handle curly quotes better.
Previously, curly quotes were just parsed literally, leading
to problems in some output formats.  Now they are parsed as
Quoted inlines, if --smart is specified.

Resolves Issue #270.
2010-12-06 20:36:58 -08:00
John MacFarlane
5a4609584c Fix regression: markdown references should be case-insensitive.
This broke when we added the Key type.  We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.

* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
  only way the API provides to construct a Key.

Resolves Issue #272.
2010-12-05 19:27:00 -08:00
John MacFarlane
357b965b44 Merge branch 'citeproc' into master.
Conflicts:
	src/Text/Pandoc/Definition.hs
2010-12-03 23:43:47 -08:00
John MacFarlane
bea62bcab8 Textile reader: temporarily removed smartPunctuation.
The smartPuncutation parser from the markdown parser
was being used, but this creates two problems:

* smart punctuation rules are slightly different in textile,
  for example, a single dash wish space around becomes an
  En dash.
* the following gets parsed as a double quoted string followed
  by a colon, rather than as a link:

  "emphasized text":http://my.url.com

This needs rethinking.
2010-12-03 23:10:52 -08:00
John MacFarlane
d4e512776d Textile reader: added hrule parser. 2010-12-03 23:10:52 -08:00
John MacFarlane
4bf9d362d2 Textile reader: Turn on smart punctuation by default. 2010-12-03 23:10:52 -08:00
John MacFarlane
0356ad4de6 Textile reader: drop leading, trailing newline in pre block.
This is consistent with how the other readers work.
2010-12-03 23:10:52 -08:00
John MacFarlane
36d4aa4a09 Textile reader: modified str to handle acronyms, hyphens.
* A single hyphen between two word characters is no longer a
  potential strikeout-starter.
* Acronym explanations are dropped.
2010-12-03 23:10:52 -08:00
John MacFarlane
f415e9e119 Textile reader: parse raw by default.
It's part of the textile spec to allow raw HTML,
just as with markdown.
-R is no longer needed in test suite.
2010-12-03 23:10:52 -08:00
paul.rivier
c3866f3c66 punctuation handling, and more html-specific handling 2010-12-03 23:10:52 -08:00
Paul Rivier
d724c6b568 html inlines and html blocks handling in textile reader 2010-12-03 23:10:51 -08:00
Paul Rivier
fa0866886b textile reader now ignores html/css attributes 2010-12-03 23:10:51 -08:00
Paul Rivier
e6dde36622 removed support for textile Inserted construct 2010-12-03 23:10:51 -08:00
Paul Rivier
593b4f6c94 fix autolink by promoting it in the parser list, fix table parabreak 2010-12-03 23:10:51 -08:00
Paul Rivier
a7da0672dc more support for Textile reader (explicit links, images), tests and cabal entries 2010-12-03 23:10:51 -08:00
paul.rivier
cfc70863a3 simpler table cell handling 2010-12-03 23:10:51 -08:00
paul.rivier
d917db5e42 preliminary material toward table support 2010-12-03 23:10:51 -08:00
paul.rivier
75fa22c300 textile reader now imports import Text.Pandoc.Parsing 2010-12-03 23:10:50 -08:00
paul.rivier
d532c72c5b Basic Textile Reader 2010-12-03 23:10:50 -08:00
John MacFarlane
4c21c5566d Merge branch 'master' into citeproc 2010-11-28 20:21:07 -08:00
John MacFarlane
3ffd724617 Markdown parser performance improvement.
Do a quick lookahead to make sure what follows looks like a setext
header before parsing any Inlines.  This gives a 15% performance
boost in one benchmark.  Many thanks to knieriem for finding
the problem (in peg-markdown):

https://github.com/jgm/peg-markdown/issues/issue/3
2010-11-28 20:19:32 -08:00
John MacFarlane
0ca84f0d38 Markdown suffix parser fix.
If suffix doesn't begin with punctuation, include opening
comma and space in result.

Previously,

@item [only a suffix]

would result in something like

Doe (2002only a suffix)

because there was no opening delimiter.
2010-11-26 22:34:53 -08:00
John MacFarlane
0871a512d7 Split locator and suffix in Biblio rather than Markdown parser.
Patch from Nathan Gass.
2010-11-26 12:06:56 -08:00
John MacFarlane
b48fa0ea59 Check biblio for all citations, not just textual. 2010-11-22 23:09:30 -08:00
John MacFarlane
7ef7d85b3f HTML reader: Export htmlTag. 2010-11-20 22:10:16 -08:00
John MacFarlane
6390103509 Markdown citation parser: small refactoring for clarity. 2010-11-18 14:16:18 -08:00
John MacFarlane
f3bb3c1ff1 Markdown citation parser improvements and test updates.
Now we handle a suffix after a bare locator, e.g.
@item1 [p. 30, suffix]
The suffix now includes any punctuation that introduces it.
A few tests fail because of problems with citeproc (extra space
before the suffix, missing space after comma separating multiple
page ranges in the locator).
2010-11-18 13:22:20 -08:00
John MacFarlane
aaf7de0dda Markdown reader: Revised parser for new citation syntax.
Suffixes and prefixes are now [Inline].  The locator is separated
from the citation key by a blank space.  The locator consists of
one introductory word and any number of words containing at
least one digit.  The suffix, if any, is separated from the locator
by a comma, and continues til the end of the citation.
2010-11-18 12:38:45 -08:00
John MacFarlane
47c64d4fc4 Don't pass a [Str ""] as citationPrefix. 2010-11-17 15:35:53 -08:00
John MacFarlane
ce9fc2a37d Updated for changes in Citaiton type.
citationPrefix now [Inline] rather than String;
citationSuffix added.

This change presupposes no changes in citeproc-hs.
It passes a string for these values to citeproc-hs.
Eventually, citeproc-hs should use an [Inline] for
these as well.
2010-11-16 20:31:22 -08:00
John MacFarlane
1fa2973da6 Repairs to citation parser + citation test suite. 2010-11-12 19:30:59 -08:00
John MacFarlane
79bab2d210 Revised citation parsers for markdown reader.
Added a form for in-text citations:

@doe99 [30; see also @smith99].
2010-11-12 00:37:44 -08:00
John MacFarlane
1bfd8110af Merge branch 'master' into citeproc 2010-11-11 21:31:15 -08:00
John MacFarlane
36d4e649a6 Added support for textual citations (but not yet markdown syntax).
Patch from Andrea Rossato.
2010-11-11 21:30:34 -08:00
John MacFarlane
ca51bbbf16 HTML reader: don't parse raw HTML inside <code> tag.
Previously '<code><a>x</a></code>' would be parsed as
Code "<a>x</a>", which is not what you want.
2010-11-11 20:02:37 -08:00
John MacFarlane
83e6c01e4d Merge branch 'master' into citeproc 2010-11-09 22:52:36 -08:00