Commit graph

74 commits

Author SHA1 Message Date
fiddlosopher
0e4eb83749 Markdown reader: cleaner handling of spaces in URLs.
Consecutive spaces are now collapsed into one %20, and
final spaces are removed.  Also, a test case has been added.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1477 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-11-01 21:05:33 +00:00
fiddlosopher
ddf2dc6896 Markdown reader: allow URLs containing spaces.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1475 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-10-31 22:34:38 +00:00
fiddlosopher
a2422504ff HTML reader: Don't interpret contents of <pre> blocks as markdown.
Added rawVerbatimBlock parser.  Resolves Issue #94.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1468 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-10-18 23:42:23 +00:00
fiddlosopher
f08ebf5a9b Added colons to protocols in unsanitaryURI in HTML reader.
Closes Issue #88.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1462 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-10-16 01:00:51 +00:00
fiddlosopher
7ad17fe5cf Markdown reader: Ignore blank line after ~~~~~~~~ in delimited code blocks.
Rationale:  these are useful for literate haskell, but lhs requires
a blank line before the haskell code, and we don't want spurious
 blank lines in the output.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1454 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-16 01:39:05 +00:00
fiddlosopher
943c2f353d Changed list parser so that only the starting list marker matters:
1. one
  -  two
  (b) three

produces an ordered list with 1., 2., 3.  This is the behavior of
Markdown.pl.

Modified README to document the new behavior.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1438 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-12 00:05:32 +00:00
fiddlosopher
000b89c718 Use Data.List's 'intercalate' instead of custom 'joinWithSep'.
+ Removed joinWithSep definition from Text.Pandoc.Shared.
+ Replaced joinWithSep with intercalate
+ Depend on base >= 3, since in base < 3 intercalate is not included.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1428 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-08 06:36:28 +00:00
fiddlosopher
5c02959483 LaTeX reader: Refactored math parsers, limited support for eqnarray.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1426 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 21:24:33 +00:00
fiddlosopher
87aa458446 LaTeX reader: Removed specialEnvironment parser.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1425 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 21:24:24 +00:00
fiddlosopher
b422711451 LaTeX reader: minor improvements.
+ parse '{}', if present, after \textless, \textgreater,
  \textbar, \textbackslash, \ldots.
+ Parse unescaped special characters verbatim rather than
  changing them to spaces.  This way arguments of unknown
  commands will appear in braces.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1424 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 20:45:53 +00:00
fiddlosopher
ae30b5ae37 LaTeX reader: Fixed regression in list parsing
(introduced by recent changes to unknownCommand).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1423 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 20:45:42 +00:00
fiddlosopher
fc24c79db6 LaTeX reader: improvements in raw LaTeX parsing.
+ "loose punctuation" (like {}) parsed as Space
+ Para elements must contain more than Str "" and Space elements
+ Added parser for "\ignore" command used in literate haskell.
+ Reworked unknownCommand and rawLaTeXInline: when not in "parse raw"
  mode, these parsers simply strip off the command part and allow
  the arguments to be parsed normally.  So, for example,
  \blorg{\emph{hi}} will be parsed as Emph "hi" rather than
  Str "{\\emph{hi}}".


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1420 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 18:05:18 +00:00
fiddlosopher
2127b7d513 Changed Float to Double in definition of Table element.
(Double is more efficient in GHC.)
Truncate width in opendocument output to 2 decimal places.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1418 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-09-06 02:51:44 +00:00
fiddlosopher
4f14802831 LaTeX reader: Parse "code" environments as verbatim (lhs).
Refactored parsers for verbatim environments.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1414 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-26 20:36:06 +00:00
fiddlosopher
f53fb554fe Support for display math; changed ASCIIMathML -> LaTeXMathML:
Resolves Issue #47.

+ Added a DisplayMath/InlineMath selector to Math inlines.
+ Markdown parser yields DisplayMath for $$...$$.
+ LaTeX parser yields DisplayMath when appropriate.  Removed
  mathBlock parsers, since the same effect is achieved by the math
  inline parsers, now that they handle display math.
+ Writers handle DisplayMath as appropriate for the format.
+ Changed -m option to use LaTeXMathML rather than ASCIIMathML.
  LaTeXMathML is closer to LaTeX in its display of math, and
  supports many non-math LaTeX environments.
+ Modified HTML writer to print raw TeX when LaTeXMathML is
  being used instead of suppressing it.
+ Removed ASCIIMathML files from data/ and added LaTeXMathML.
+ Replaced ASCIIMathML with LaTeXMathML in source files.
+ Modified README and pandoc man page source.
+ Modified web page.
+ Added --latexmathml option (kept --asciimathml as a synonym
  for backwards compatibility)
+ Modified tests accordingly; added new tests for display math.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1409 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-13 03:02:42 +00:00
fiddlosopher
c8a56a2864 Parse raw ConTeXt environments as TeX in markdown reader.
Resolves Issue #73.

Also made some structural changes to parsing of raw LaTeX environments.
Previously there was a special block parser for LaTeX environments.
It returned a Para element containing the raw TeX inline. This has
been removed, and the raw LaTeX environment parser is now used in the
rawLaTeXInline parser. The effect is exactly the same, except that we
can now handle consecutive LaTeX and ConTeXt environments not separated
by spaces.  This new flexibility is required by the example in
Issue #73:

    \placeformula \startformula
         L_{1} = L_{2}
    \stopformula

API change: The LaTeX reader now exports rawLaTeXEnvironment' (which
returns a string) rather than rawLaTeXEnvironment (which returns a block
element). This is more likely to be useful in other applications.

Added test cases for raw ConTeXt environments to markdown-reader-more.txt.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1405 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-11 07:04:36 +00:00
fiddlosopher
dd2b77d590 Allow newline before URL in markdown link references. Resolves Issue #81.
Added tests for this issue in new "markdown-reader-more" tests.
Changed RunTests.hs to run these tests.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1401 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-10 23:26:32 +00:00
fiddlosopher
05b366a0b2 Small improvements to citation parsing in markdown reader.
(Don't allow blank lines inside citations.)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1382 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-06 03:34:06 +00:00
fiddlosopher
abf2dc78ac Allow parsing of multiline citations.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1381 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-05 23:18:52 +00:00
fiddlosopher
1bfe1b84a8 Added support for Cite to Markdown reader, and conditional support for citeproc module.
+ The citeproc cabal configuration option sets the _CITEPROC macro, which conditionally
  includes code for handling citations.
+ Added Text.Pandoc.Biblio module.
+ Made highlighting option default to False.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1376 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-08-04 03:15:34 +00:00
fiddlosopher
06983c9ba5 Markdown reader: Parse setext headers before atx headers.
Test case:
   # hi
   ====
parsed by Markdown.pl as an H1 header with contents "# hi".


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1334 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-23 23:10:05 +00:00
fiddlosopher
7c35c0bc25 Fixed bug in Markdown parser: regular $s triggering math mode.
For example:  "shoes ($20) and socks ($5)."

The fix consists in two new restrictions:
+ the $ that ends a math span may not be directly followed by a digit.
+ no blank lines may be included within a math span.

Thanks to Joseph Reagle for noticing the bug.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1326 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-15 20:41:27 +00:00
fiddlosopher
235e41f246 Commented out some unneeded code in HTML reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1325 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-15 00:56:53 +00:00
fiddlosopher
7701a87a1a Code cleanup - RST reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1324 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-15 00:54:43 +00:00
fiddlosopher
5be53bbd3f LaTeX reader - Code cleanup.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1322 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-15 00:14:58 +00:00
fiddlosopher
3e2afa7a49 Code cleanup in LaTeX reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1320 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-14 16:19:42 +00:00
fiddlosopher
d5c73ac42a Code cleanup in TexMath reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1318 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-13 23:58:35 +00:00
fiddlosopher
752adcd45a Added type signatures and fixed other -Wall warnings in Markdown reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1301 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11 16:33:21 +00:00
fiddlosopher
45044ff536 Added a few more recognized abbreviations to 'abbrev' parser.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1300 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11 03:00:35 +00:00
fiddlosopher
824bb2d22e In smart mode, use nonbreaking spaces after abbreviations in markdown parser.
Thus, for example, "Mr. Brown" comes out as "Mr.~Brown" in LaTeX, and does
not produce a sentence-separating space.  Resolves Issue #75.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1298 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11 02:14:57 +00:00
fiddlosopher
8ed710bc9d Treat '\ ' in (extended) markdown as nonbreaking space.
Print nonbreaking space appropriately in each writer (e.g. ~ in LaTeX).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1297 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-07-11 01:24:15 +00:00
fiddlosopher
6b73389328 Added type signatures, etc., to eliminate -Wall warnings.
(except for two warnings about unneeded functions, which might
come in handy some day...)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1291 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-06-17 22:15:39 +00:00
fiddlosopher
cd38d4ae79 Markdown smart typography: Em dashes no longer eat surrounding whitespace.
Resolves Issue #69.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1279 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-06-08 03:20:15 +00:00
fiddlosopher
6a46ffc0ad Count anything that isn't a known block (HTML) tag as an inline tag
(rather than the other way around).  Added "html", "head", and
"body" to list of block tags.  Resolves Issue #66, allowing
<lj> to count as an inline tag.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1276 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-04-20 03:12:42 +00:00
fiddlosopher
8624ed9bd3 The '--sanitize-html' option now examines URIs in markdown links
and images, and in HTML href and src attributes.  If the URI scheme
is not on a whitelist of safe schemes, it is rejected.  The main point
is to prevent cross-site scripting attacks using 'javascript:' URIs.
See http://www.mail-archive.com/markdown-discuss@six.pairlist.net/msg01186.html
and http://ha.ckers.org/xss.html.  Resolves Issue #62.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1262 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-03-22 20:41:56 +00:00
fiddlosopher
06b544360e Factored codeBlock into separate codeBlockIndented and codeBlockDelimited.
Do not use codeBlockDelimited in strict mode.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1211 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:20:02 +00:00
fiddlosopher
614547b38e Use generic attributes type, not a string, for CodeBlocks.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1209 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:19:43 +00:00
fiddlosopher
2e683e8b53 Fixed delimited code blocks: eat blank lines afterwards, and allow end line
to contain more tildes than beginning line.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1206 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:19:17 +00:00
fiddlosopher
24f22ee7ac Added a needed try to {} attribute parser.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1205 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:19:10 +00:00
fiddlosopher
046c6b0d0d Added support for multiple classes in delimited code block.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1204 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:19:01 +00:00
fiddlosopher
b06ddad4bc Initial support for delimited code blocks in markdown reader.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1203 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:18:54 +00:00
fiddlosopher
9f7a14c210 Modified readers for new parameter in CodeBlock.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1199 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-02-09 03:18:03 +00:00
fiddlosopher
42359e63c9 Fixed bug in RST reader, which would choke on: "p. one\ntwo\n".
Added some try's in ordered list parsers.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1191 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-17 02:14:20 +00:00
fiddlosopher
d474852f56 Removed unnecessary imports.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1189 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-16 02:18:23 +00:00
fiddlosopher
8fca649d05 Changed copyright dates where appropriate to include 2008.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1181 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-08 17:26:16 +00:00
fiddlosopher
2df432dc60 Changed comment used to replace unsafe HTML if sanitize-html option
selected.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1178 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-08 04:53:01 +00:00
fiddlosopher
b9e30ca8b7 RST reader: Fixed bug in parsing explicit links (resolves Issue #44).
The problem was that we were looking for inlines until a '<' character
signaled the start of the URL.  So if you hit a reference-style link,
it would keep looking til the end of the document.  Fix:  change
inline => (notFollowedBy (char '`') >> inline).  Note that this won't
allow code inlines in links, but these aren't allowed in resT anyway.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1175 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-06 19:46:55 +00:00
fiddlosopher
85657add6a RST reader: cleaned up parsing of reference names in key blocks and links.
Allow nonquoted reference links to contain isolated '.', '-', '_', so
so that strings like 'a_b_' count as links.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1174 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-06 19:46:43 +00:00
fiddlosopher
e4837c140c RST reader: Removed unnecessary check for following link in str.
This is unnecessary now that link is above str in the definition of
'inline'.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1173 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-06 19:46:38 +00:00
fiddlosopher
d271473044 Fixed markdown reader to handle "*hi **there***" as a strong nested in an emph.
(A '*' is only recognized as the end of the emphasis if it's not the beginning
of a strong emphasis.)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1172 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-06 19:46:31 +00:00