Commit graph

1163 commits

Author SHA1 Message Date
John MacFarlane
a6d7d88b0f RST reader: Big speed improvement (300->260ms).
Moved whitespace parser to top of inline parsers.
2011-01-22 16:01:42 -08:00
John MacFarlane
8dcc67a993 ConTeXt writer: Don't add cr at end of inline footnote. 2011-01-22 12:17:39 -08:00
John MacFarlane
5c35be1362 Make sure native output ends in newline with --standalone. 2011-01-21 09:58:23 -08:00
John MacFarlane
bfbc289871 Haddock comment improvements. 2011-01-21 09:00:05 -08:00
John MacFarlane
9bb5b54102 Added --normalize option. 2011-01-20 22:48:20 -08:00
John MacFarlane
8894b1a030 Markdown writer: Avoid printing excess spaces at end if no notes/refs. 2011-01-20 22:36:08 -08:00
John MacFarlane
8011d079c8 Native writer: eliminated empty spaces in brackets. 2011-01-20 20:48:06 -08:00
John MacFarlane
6b50778b2a Export readNative in Text.Pandoc.Shared. 2011-01-20 08:52:59 -08:00
John MacFarlane
810e3336dc Improved native writer using Pretty.
2-3X speed improvement and more consistent layout.
2011-01-20 08:43:13 -08:00
John MacFarlane
978d949526 Made writeNative sensitive to writerStandalone.
The Pandoc (Meta ...) is not written unless standalone is set.
2011-01-19 18:57:25 -08:00
John MacFarlane
e1f3c6058e Added Text.Pandoc.Readers.Native (readNative).
readNative can now read full pandoc documents, block lists, blocks,
inline lists, or inlines.  It will interpret

Str "hi"

as if it were

Pandoc (Meta [] [] []) [Plain [Str "hi"]]

This should make testing easier.
2011-01-19 18:36:27 -08:00
John MacFarlane
e647f761ed Use spaceChar instead of oneOf " \t" in rst reader. 2011-01-19 15:17:51 -08:00
John MacFarlane
1b8a9711b8 Replaced more noneOf/oneOf parsers. 2011-01-19 15:14:23 -08:00
John MacFarlane
a400cfe10f Replaced uses of oneOf with more efficient parsers.
This speeds up the markdown reader.
2011-01-19 15:06:56 -08:00
John MacFarlane
c09518eefd More small parser rewrites for small performance gains. 2011-01-19 14:59:59 -08:00
John MacFarlane
61f3db612c Parsing: Rewrote spaceChar for significant speedup in readers. 2011-01-19 14:45:15 -08:00
John MacFarlane
adaae082fc Fixed problem with inline code in ConTeXt writer.
Previously `}` would be rendered '\type{}}'.
Now we check the string for '}' and '{'. If it contains neither,
use \type{}; otherwise use \mono{} with an escaped version of the
string.

Note:  There are some issues using the \type!str! form, including
differences btw mkii and mkiv. For now this is a conservative fix.
Perhaps in the future we can use \type!str!.  See the discussion on
pandoc-discuss s.v. "Bug in context writer".
2011-01-19 11:53:00 -08:00
John MacFarlane
8f7c119c0f Removed '--no-citeproc' as alias for '--natbib'.
This was confusing, I think, as no-citeproc could be either
natbib or biblatex.
2011-01-16 11:08:56 -08:00
John MacFarlane
281b36470f Minor code formatting. 2011-01-16 11:08:20 -08:00
John MacFarlane
b6d1f4bc9e Moved --chapters to before --number-sections in option list. 2011-01-16 09:34:26 -08:00
John MacFarlane
ab20da4be5 Support --chapters for ConTeXt output as well. 2011-01-16 09:08:19 -08:00
John MacFarlane
ece098b9e0 Use <chapter> for top docbook header if template has <book>.
Resolves Issue #265.
2011-01-16 08:59:53 -08:00
John MacFarlane
9721b87c26 Added --chapters option affecting docbook and latex.
* Added writerChapters to WriterOptions.
* Added --chapters command-line option.
* --chapters causes top-level headers to be "chapter" instead of
  "section" in LaTeX and DocBook.
* Resolves Issue #225.
2011-01-16 08:58:29 -08:00
John MacFarlane
53eb2c4828 HTML writer: Add ids to <section> tags. 2011-01-15 22:35:25 -08:00
John MacFarlane
a0e19ba8aa Merge branch 'tests' 2011-01-15 09:25:01 -08:00
John MacFarlane
fd79417825 Fixed the parser for rst+lhs - set stateLiterateHaskell. 2011-01-14 22:38:02 -08:00
John MacFarlane
a5cbcdfe3a HTML reader: parse simple tables.
Resolves Issue #106.  Thanks to Rodja Trappe for the idea
and some sample code.
2011-01-14 20:48:10 -08:00
John MacFarlane
c31d3cc306 HTML reader: parse location tags in pSatisfy.
This avoids the need for manual parsing all over the place.
2011-01-14 20:47:32 -08:00
John MacFarlane
9305114b9f LaTeX writer: Escape strings in \href{..}.
Previously strings weren't escaped, so %5D would be interpreted
as a LaTeX comment!
2011-01-14 18:59:50 -08:00
John MacFarlane
5131589be0 Simplified Text.Pandoc.CharacterReferences by using TagSoup entity lookup 2011-01-14 18:28:54 -08:00
John MacFarlane
09e9a86db9 Merge branch 'master' of github.com:jgm/pandoc into tests 2011-01-14 14:46:48 -08:00
John MacFarlane
81403b8d80 LateX writer: In nonsimple tables, put cells in \parbox.
Otherwise we can get problems with linebreaks, and cell spacing
isn't right.

Thanks to Jef Allbright for pointing out the problem.
2011-01-14 14:45:04 -08:00
John MacFarlane
ba1d0d3070 Parsing: Fixed bug in grid table parser.
Spaces at end of line were not being stripped properly,
resulting in unintended LineBreaks.
2011-01-14 14:16:27 -08:00
John MacFarlane
5da2d1e66c Merge branch 'master' into tests 2011-01-12 08:13:11 -08:00
John MacFarlane
91510a109f Improvements to --html5 support:
+ <nav> for TOC, <figure> for figures, type attribute in <ol>.
+ Don't add math javascript in html5.
+ Use style attributes instead of deprecated width, align.
+ html template: move <title> after <meta>.
  Note: charset needs to be declared before title.
+ slidy and s5 templates: move <title> after <meta>.
+ html template: Added link to html5 shim for IE.
+ Make --html5 have an effect only for 'html' writer (not s5, slidy, epub).
2011-01-11 23:15:30 -08:00
John MacFarlane
e8ad4ba43c Preliminary support for HTML5.
+ Added writerHtml5 writer option.
+ Added --html5 option.
+ Added support for lang in html tag (so you can do
  'pandoc -s --V lang=en', for example).
+ Updated html template with conditionals for HTML5.
+ When HTML5 selected, use <header> tag around title in document,
  and use <section> tags instead of <div>s if --section-divs
  specified.
2011-01-11 21:18:46 -08:00
John MacFarlane
33ff2fed21 Text.Pandoc: Improved readers, writers lists for lhs variants.
Now the lhs variants set the needed literate Haskell flag in
parser state and writer options.
2011-01-11 20:23:43 -08:00
Nathan Gass
e8fa72c6a7 Moved test-pandoc.hs to tests directory. 2011-01-11 21:49:49 +01:00
Nathan Gass
f3ee73607f Removed run prefix from all test functions. 2011-01-11 21:30:19 +01:00
Nathan Gass
a2153acfff Include lhs tests in existing testGroup structure. 2011-01-11 21:10:36 +01:00
Nathan Gass
e06899ef1f Add reader groups for markdown and rst reader tests. 2011-01-11 20:41:34 +01:00
Nathan Gass
c0700987ba Changed test-pandoc to use test-framework and HUnit. 2011-01-10 00:37:46 +01:00
John MacFarlane
3317e9dea8 pandoc: Test standalone' rather than standalone for final newline. 2011-01-07 18:12:20 -08:00
John MacFarlane
d891b2c29d LaTeX reader: Support simple tables. 2011-01-07 10:15:48 -08:00
John MacFarlane
93c3e27731 pandoc: Add newline to output unless standalone.
This avoids output that does not end with a newline, which
is inconvenient when working with many tools.

Updated tests accordingly.
2011-01-06 21:05:28 -08:00
John MacFarlane
c4c336460b RST writer: blank line after literate Haskell code block. 2011-01-06 21:03:08 -08:00
John MacFarlane
438f32cdfa test-pandoc: Wrap to 78 columns in lhs writer tests. 2011-01-06 16:54:15 -08:00
John MacFarlane
aea93977f5 Markdown writer: blank line after delimited code block. 2011-01-06 16:53:21 -08:00
John MacFarlane
303ce8a9e5 LaTeX reader: allow spaces btw \\begin or \\end and {. 2011-01-06 09:34:24 -08:00
John MacFarlane
81ea1a59b4 LaTeX reader: Removed unnecessary 'spaces'. 2011-01-06 09:24:56 -08:00
John MacFarlane
1be2ca6c78 HTML reader: Fixed bug in htmlTag for comments. 2011-01-06 00:21:19 -08:00
John MacFarlane
b63a7f7c48 LaTeX reader: Apply macros to non-math; handle ensuremath. 2011-01-05 16:55:26 -08:00
John MacFarlane
18e7a7a495 LaTeX reader: Don't handle \label and \ref specially.
Put labels in {} instead of ().
2011-01-05 15:24:20 -08:00
John MacFarlane
1415b6831e LaTeX reader: Support \L \l accents. 2011-01-05 14:57:06 -08:00
John MacFarlane
23aae79b01 Updated for texmath 0.5. 2011-01-05 14:44:26 -08:00
John MacFarlane
eb83f0e5e4 Fixed macro parsing. 2011-01-05 14:42:47 -08:00
John MacFarlane
e126ab9efc LaTeX reader: Parse inside arguments when ignoring commands. 2011-01-05 12:25:47 -08:00
John MacFarlane
c3071ff6e9 LaTeX reader: Don't handle \index separately.
Instead, just put it in list of commands to ignore.
2011-01-05 12:05:04 -08:00
John MacFarlane
b26247a4a8 LaTeX reader: Added "index" to ignorable commands. 2011-01-05 11:56:37 -08:00
John MacFarlane
cf6cd15c27 LaTeX reader: skip space before option or argument. 2011-01-05 11:54:40 -08:00
John MacFarlane
d033fc9d3e LaTeX reader: Skip \index commands. 2011-01-05 10:11:24 -08:00
John MacFarlane
c949530815 LaTeX reader: Removed \group (we want to parse inside {}). 2011-01-05 10:06:51 -08:00
John MacFarlane
3dab6c574c LaTeX reader: Better handling of preamble, inc. parsing macros. 2011-01-05 09:04:03 -08:00
John MacFarlane
85bfd26b78 LaTeX reader: Parse bracketed {parts} as raw TeX. 2011-01-04 22:20:35 -08:00
John MacFarlane
22b2c02aeb Markdown reader: Removed unneeded definitions.
specialChars, strChar, specialCharsMinusLt.
2011-01-04 22:11:56 -08:00
John MacFarlane
dac2e9156f LaTeX reader: parse macros and apply to math. 2011-01-04 19:18:20 -08:00
John MacFarlane
fcbe1e95eb Moved 'macro' and 'applyMacros'' from markdown reader to Parsing. 2011-01-04 19:12:33 -08:00
John MacFarlane
3e61333af0 Fixed regression in markdown reader.
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix:  the 'str' parser now only parses alphanumerics and
embedded underscores.  All other symbols are handled by the
'symbol' parser.  This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"].  But there
should not be a visible effect in any of the writers.

Thanks to gwern for pointing out the regression.
2011-01-01 22:46:30 -08:00
John MacFarlane
0411f51433 Updated copyright notices. 2011-01-01 10:26:10 -08:00
John MacFarlane
b05e739c6d LaTeX reader: Allow ignored comments after \end{document}. 2010-12-30 22:05:19 -08:00
John MacFarlane
d6f28af9cb HTML reader: Fixed some parsing bugs. 2010-12-30 19:33:37 -08:00
Puneeth Chaganti
e4dedad1c0 Added support for listings package code blocks and inline code. 2010-12-30 14:37:51 -08:00
John MacFarlane
f49e60a8b8 Textile reader: Slight speed improvement. 2010-12-30 14:33:11 -08:00
John MacFarlane
904050fa36 New HTML reader using tagsoup as a lexer.
* The new reader is faster and more accurate.

* API changes for Text.Pandoc.Readers.HTML:
   - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
     anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
     htmlBlockElement, htmlComment
   - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag

* tagsoup is a new dependency.

* Text.Pandoc.Parsing: Generalized type on readWith.

* Benchmark.hs: Added length calculation to force full evaluation.

* Updated HTML reader tests.

* Updated markdown and textile readers to use the functions from
  the HTML reader.

* Note: The markdown reader now correctly handles some cases it did not
  before. For example:

    <hr/>

  is reproduced without adding a space.

    <script>
      a = '<b>';
    </script>

  is parsed correctly.
2010-12-30 13:55:40 -08:00
John MacFarlane
220fe5fab8 normalize: Don't reduce [Space] to []. 2010-12-26 12:01:33 -08:00
John MacFarlane
c912288eda Improved 'normalize'.
Now normalizeInlines is split into consolidateInlines
and removeEmptyInlines.  We need to remove empties before
consolidating.
2010-12-26 10:24:15 -08:00
John MacFarlane
249aa9e044 Markdown writer: Fixed bug in Image.
URI was getting unescaped twice!
2010-12-26 10:23:20 -08:00
John MacFarlane
82903cfaf3 Improved normalize. 2010-12-25 14:03:43 -08:00
John MacFarlane
10d85f8b0b Use functions from Text.Pandoc.Generic instead of processWith(M). 2010-12-24 13:39:27 -08:00
John MacFarlane
c08ca6fa6d HTML reader: Simplified parsing of <script> sections.
I had previously assumed that we needed to ignore
</script> occuring in a string literal or javascript
comment.  It turns out, though, that browsers aren't
that smart.
2010-12-22 19:20:27 -08:00
John MacFarlane
4bfe140ed1 Made --smart work with HTML reader.
It did not work before, because - and quotes were gobbled
up by the str parser.
2010-12-22 17:05:17 -08:00
John MacFarlane
63bf227e04 RST reader: Added unicode quote characters to specialChars.
(So they can trigger Quoted environments.)
2010-12-22 17:04:56 -08:00
John MacFarlane
bbad129066 RST reader: recouped speed loss due to addition of --smart.
This was achieved by rearranging the parsers in inline.

Benchmarks went from 500ms to 307ms -- not quite back to the
279ms we had in 1.6, before supporting smart punctuation and
footnotes, but close.
2010-12-22 15:10:21 -08:00
John MacFarlane
4ba3afbb4d ODT writer: Don't wrap text in opendocument. 2010-12-22 14:55:59 -08:00
John MacFarlane
dc597a8a68 Removed all dependencies on 'pretty' package. 2010-12-22 11:48:08 -08:00
John MacFarlane
8e9c490b0a Texinfo writer: Updated to use Pretty. 2010-12-22 11:43:43 -08:00
John MacFarlane
f15d479fc2 Shared: Removed unneeded prettyprinting functions:
wrapped, wrapIfNeeded, wrappedTeX, wrapTeXIfNeeded, hang'.
2010-12-22 00:34:36 -08:00
John MacFarlane
21d2d918ac Shared: Removed BlockWrapper, wrappedBlocksToDoc.
These are no longer needed with the new Pretty module.
2010-12-22 00:28:20 -08:00
John MacFarlane
369502bbb4 Pretty: Added quote, doubleQuote. 2010-12-22 00:22:28 -08:00
John MacFarlane
fd07db16e9 Man writer: updated to use Pretty. 2010-12-22 00:22:13 -08:00
John MacFarlane
c904024944 OpenDocument writer: Updated to use Pretty. 2010-12-21 16:59:17 -08:00
John MacFarlane
e2548a1317 XML: don't use breaking spaces in attribute lists. 2010-12-21 16:46:21 -08:00
John MacFarlane
ebdbb06f94 Docbook writer: Updated to use Pretty. 2010-12-21 16:45:43 -08:00
John MacFarlane
ce533ffd90 Pretty: don't print a breaking space before a newline. 2010-12-21 16:45:13 -08:00
John MacFarlane
fe1152985c Shared: Made splitBy take a test instead of an element. 2010-12-21 08:41:24 -08:00
John MacFarlane
4e446358d1 XML: Replaced escapeStringAsXML with a faster version.
Benchmarked with criterion, it's about 8x faster than
the old version.  This speeds up docbook, opendocument,
and html writers.
2010-12-21 08:23:48 -08:00
John MacFarlane
78cea94f45 Markdown writer: use \ for newline instead of two spaces at eol.
(Unless --strict.)
2010-12-20 19:36:40 -08:00
John MacFarlane
8889ae8b5b Markdown writer: Use delimited code block if there are attributes.
(Unless in strict mode.)
2010-12-20 19:36:40 -08:00
John MacFarlane
0086329c36 Plain writer: set stateStrictMarkdown automatically. 2010-12-20 19:36:40 -08:00
John MacFarlane
2587543457 ConTeXt writer: Updated to use Text.Pandoc.Pretty. 2010-12-20 19:36:35 -08:00
John MacFarlane
112717de4e Renamed 'enclosed' to 'inside'.
This avoids conflict with 'enclosed' in Text.Pandoc.Parsing.
2010-12-20 19:09:01 -08:00
John MacFarlane
2fe271d163 Pretty: Fixed parens. 2010-12-19 17:20:18 -08:00
John MacFarlane
9120514998 Pretty: Added enclosed, parens. 2010-12-19 12:39:49 -08:00
John MacFarlane
59cc27c10b LaTeX writer: A bit of code polish. 2010-12-19 10:21:16 -08:00
John MacFarlane
99a58e51f5 LaTeX writer: Modified to use Pretty.
Improved footnote formatting, removed spurious blank lines.
2010-12-19 10:14:12 -08:00
John MacFarlane
09aec9f3e3 Shared: Use stringify to simplify inlineListToIdentifier. 2010-12-19 10:13:36 -08:00
John MacFarlane
6aa5010617 Pretty: Added braces and brackets. 2010-12-19 10:13:11 -08:00
John MacFarlane
89bf312765 LaTeX writer: Use \paragraph, \subparagraph for level 4,5 headers. 2010-12-18 15:05:21 -08:00
John MacFarlane
543aa28c38 Added new prettyprinting module.
* Added Text.Pandoc.Pretty.
  This is better suited for pandoc than the 'pretty' package.
  One advantage is that we now get proper wrapping; Emph [Inline]
  is no longer treated as a big unwrappable unit. Previously
  we only got breaks for spaces at the "outer level." We can also
  more easily avoid doubled blank lines.  Performance is
  significantly better as well.

* Removed Text.Pandoc.Blocks.
  Text.Pandoc.Pretty allows you to define blocks and concatenate
  them.

* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
  instead of Text.PrettyPrint.HughesPJ.

* Text.Pandoc.Shared:  Added writerColumns to WriterOptions.

* Markdown, RST, Org writers now break text at writerColumns.

* Added --columns command-line option, which sets stColumns
  and writerColumns.

* Table parsing:  If the size of the header > stColumns,
  use the header size as 100% for purposes of calculating
  relative widths of columns.
2010-12-17 13:39:17 -08:00
John MacFarlane
2a075e9d7a test-pandoc: removed need to depend on MissingH. 2010-12-15 18:07:36 -08:00
John MacFarlane
605648cbbf Added 'tests' Cabal flag.
+ This ensures that test-pandoc gets built.
+ 'cabal test' now runs this.
+ The old tests/RunTests.hs has been removed, and
  src/test-pandoc.hs added.
2010-12-15 17:54:51 -08:00
John MacFarlane
63cf37a9ca HTML reader: allow : in tags.
Resolves Issue #274.
2010-12-15 14:15:53 -08:00
Nathan Gass
a312d2a8ae Use top-level header at end as bibliography title for natbib and biblatex output. 2010-12-15 10:21:56 -08:00
Nathan Gass
8f60176511 Remove punctuation at start of suffix for natbib and biblatex output.
This is necessary as the latex citation commands include there own
punctuation, which resulted in doubled commas for markdown documents
where citeproc output works correctly.
2010-12-15 10:21:53 -08:00
Nathan Gass
43fee5e7f7 Support multiple bibliography files with natbib and biblatex output. 2010-12-15 10:21:47 -08:00
John MacFarlane
63d5e0c5f9 Added 'normalize' to Text.Pandoc.Shared. 2010-12-14 20:04:37 -08:00
John MacFarlane
3ac6f72f98 Fixed preamble parsing in LaTeX reader. 2010-12-14 19:34:28 -08:00
John MacFarlane
128cf46089 Fixed regression in parsing _emph_
There was a bug in parsing '_emph_, ...':  when followed by
a comma, underscore emphasis did not register.  (Thanks to
gwern for pointing this out.)

This bug was introduced by the change in
c66921f2ac
2010-12-14 18:23:26 -08:00
Nathan Gass
2e728df756 Moved special handling of punctuation in suffix out of markdown reader.
This allows different writers to handle punctuation in the suffix
differently.
2010-12-13 20:50:29 -08:00
Nathan Gass
c2d3796439 Added support for latex cite commands in latex reader. 2010-12-13 20:48:19 -08:00
Nathan Gass
c81495a07a Added option to write citation markup in markdown writer. 2010-12-13 20:42:58 -08:00
Nathan Gass
48600fd547 Added support to write natbib or biblatex citations in latex output. 2010-12-13 20:41:37 -08:00
John MacFarlane
1a4a0d0283 Markdown reader: Further fix to abbrevs. 2010-12-13 20:05:50 -08:00
John MacFarlane
7b4d3c77ec Markdown reader: Fixed abbrev handler to allow abbrev at end of line.
E.g., Mr.
Frank.
2010-12-13 20:04:11 -08:00
John MacFarlane
3822d6c440 Markdown reader: Fixed referenceKey parser to allow space after newline. 2010-12-13 20:03:59 -08:00
John MacFarlane
dfbb4d3994 Fixed inlineListToIdentifier to treat '\160' as ' '. 2010-12-13 20:03:52 -08:00
John MacFarlane
71e0557e61 Markdown reader: Fixed regression in reference key parser.
* The recent change allowing spaces and newlines in the URL
  caused problems when reference keys are stacked up without
  blank lines between. This is now fixed.
* Added test.
2010-12-13 20:03:12 -08:00
John MacFarlane
3748dfeb91 Markdown reader: fix superscripts with links.
Moved inlineNote parser after superscript parser,
so ^[link](/foo)^ gets recognized as a superscripted
link, not an inline note followed by garbage.

Thanks to Conal Elliott for pointing out the problem.
2010-12-12 20:30:55 -08:00
John MacFarlane
250aa20250 Recognize .json extension as json reader/writer. 2010-12-12 20:30:26 -08:00
John MacFarlane
c6b79d794e Removed deprecated -C/--custom-header option.
Use --template instead.
2010-12-11 00:22:34 -08:00
John MacFarlane
f5c2082304 Added JSON reader and writer.
The JSON reader is about 20x faster than the native reader.
So this can be a good way to serialize a pandoc document.
2010-12-11 00:06:03 -08:00
John MacFarlane
2dfb45950e LaTeX reader: Improved parsing of preamble.
Previously you'd get unexpected behavior on a document that
contained '\begin{document}' in, say, a verbatim block.
2010-12-10 23:21:24 -08:00
John MacFarlane
9602f73f2a Moved 'readers' and 'writers' to Text.Pandoc.
This allows library users to avoid repetitive case statements...
2010-12-10 17:30:32 -08:00
John MacFarlane
de6452c0d1 Markdown reader: small cosmetic code improvements. 2010-12-10 16:26:35 -08:00
John MacFarlane
5770ceca36 Removed HTML sanitization.
This is better done on the resulting HTML; use the xss-sanitize library
for this.  xss-sanitize is based on pandoc's sanitization, but improves
it.

- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
2010-12-10 12:26:03 -08:00
John MacFarlane
17d48cf4af Markdown reader: Allow linebreaks in URLs (treat as spaces).
Also, a string of consecutive spaces or tabs is now parsed
as a single space. If you have multiple spaces in your URL,
use %20%20.
2010-12-10 12:14:51 -08:00
John MacFarlane
ee0a0953de Markdown reader: Rewrote para parser for better efficiency.
This change avoids repeated parsing of inline lists for 'plain'
blocks.
2010-12-10 10:47:46 -08:00
John MacFarlane
167eeef6cb Added json format for reading and writing.
This is faster to parse than native.
2010-12-09 10:40:31 -08:00
paul.rivier
bb609a85e3 textile redcloth definition lists 2010-12-09 09:25:46 -08:00
John MacFarlane
88a40685b8 Textile reader: better treatment of acronyms.
We now parse PBS(Public Broadcasting System) as if it were
"PBS (Public Broadcasting System)".
2010-12-09 08:52:09 -08:00
John MacFarlane
9ead748cc9 RST reader: Added footnote suppport.
Resolves issue #258.

Note that there are some differences in how docutils and
pandoc treat footnotes.  Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
2010-12-08 08:39:50 -08:00
John MacFarlane
91978d2201 Markdown reader: minor footnote changes.
Don't skipNonindentSpaces in noteMarker, since it's also
used in the inline note parser.
2010-12-08 08:17:16 -08:00
John MacFarlane
f02080b62d Textile reader: Implemented footnotes. 2010-12-08 00:44:46 -08:00
John MacFarlane
200ea33641 Made --smart work with RST reader. 2010-12-07 21:49:10 -08:00
John MacFarlane
5e35eb309f Make --smart work in HTML reader. 2010-12-07 21:24:35 -08:00
John MacFarlane
33ba35da9f Smart punctuation: recognize entities.
Now &ldquo;Hi&rdquo; gets parsed as a Quoted DoubleQuote inline.
2010-12-07 20:44:43 -08:00
John MacFarlane
3a5fceeef9 Rewrote normalizeSpaces (mostly aesthetic reasons). 2010-12-07 20:10:21 -08:00
John MacFarlane
e20052a1ba Markdown reader: Moved smartPunctuation parser, for slight speed bump. 2010-12-07 20:09:40 -08:00
John MacFarlane
ace3b80f1e Smart punctuation: don't alllow ellipses containing spaces.
Previously we allowed '. . .', ' . . . ', etc.  This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
2010-12-07 20:08:14 -08:00
John MacFarlane
50ca61ef49 Moved smartPunctuation from Markdown to Parsing.
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
2010-12-07 19:03:08 -08:00
John MacFarlane
f917b46500 Textile reader: implemented acronyms, (tm), (r), (c). 2010-12-07 18:28:36 -08:00
John MacFarlane
c66921f2ac Markdown reader: better handling of intraword _.
The 'str' parser now reads internal _'s as part of the string.
This prevents pandoc from getting started looking for an emphasized
block, which can cause exponential slowdowns in some cases.

Resolves Issue #182.
2010-12-06 22:12:18 -08:00
John MacFarlane
7864f30717 Markdown reader: handle curly quotes better.
Previously, curly quotes were just parsed literally, leading
to problems in some output formats.  Now they are parsed as
Quoted inlines, if --smart is specified.

Resolves Issue #270.
2010-12-06 20:36:58 -08:00
John MacFarlane
5a4609584c Fix regression: markdown references should be case-insensitive.
This broke when we added the Key type.  We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.

* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
  only way the API provides to construct a Key.

Resolves Issue #272.
2010-12-05 19:27:00 -08:00
John MacFarlane
d52a01a926 Org writer: Minor changes to documentation header. 2010-12-05 09:48:54 -08:00
Puneeth Chaganti
4d48abcb12 Added tests.
+ Added tables.org and writer.org to tests.
    + Added org.template to templates.
    + Changed RunTests.hs as required.
    + Minor changes to Org writer.
2010-12-04 23:49:53 +05:30
Puneeth Chaganti
921e2b6e67 Added Org-mode writer
+ Added Text/Pandoc/Writers/Org.hs
    + Added to pandoc.cabal
    + Added to pandoc.hs and Text/Pandoc.hs exports.
2010-12-04 15:57:39 +05:30
John MacFarlane
357b965b44 Merge branch 'citeproc' into master.
Conflicts:
	src/Text/Pandoc/Definition.hs
2010-12-03 23:43:47 -08:00
John MacFarlane
bea62bcab8 Textile reader: temporarily removed smartPunctuation.
The smartPuncutation parser from the markdown parser
was being used, but this creates two problems:

* smart punctuation rules are slightly different in textile,
  for example, a single dash wish space around becomes an
  En dash.
* the following gets parsed as a double quoted string followed
  by a colon, rather than as a link:

  "emphasized text":http://my.url.com

This needs rethinking.
2010-12-03 23:10:52 -08:00
John MacFarlane
d4e512776d Textile reader: added hrule parser. 2010-12-03 23:10:52 -08:00
John MacFarlane
4bf9d362d2 Textile reader: Turn on smart punctuation by default. 2010-12-03 23:10:52 -08:00
John MacFarlane
0356ad4de6 Textile reader: drop leading, trailing newline in pre block.
This is consistent with how the other readers work.
2010-12-03 23:10:52 -08:00
John MacFarlane
36d4aa4a09 Textile reader: modified str to handle acronyms, hyphens.
* A single hyphen between two word characters is no longer a
  potential strikeout-starter.
* Acronym explanations are dropped.
2010-12-03 23:10:52 -08:00
John MacFarlane
55e43c4991 Use textile reader by default for .textile extension. 2010-12-03 23:10:52 -08:00
John MacFarlane
f415e9e119 Textile reader: parse raw by default.
It's part of the textile spec to allow raw HTML,
just as with markdown.
-R is no longer needed in test suite.
2010-12-03 23:10:52 -08:00
paul.rivier
c3866f3c66 punctuation handling, and more html-specific handling 2010-12-03 23:10:52 -08:00
Paul Rivier
d724c6b568 html inlines and html blocks handling in textile reader 2010-12-03 23:10:51 -08:00
Paul Rivier
fa0866886b textile reader now ignores html/css attributes 2010-12-03 23:10:51 -08:00
Paul Rivier
e6dde36622 removed support for textile Inserted construct 2010-12-03 23:10:51 -08:00
Paul Rivier
593b4f6c94 fix autolink by promoting it in the parser list, fix table parabreak 2010-12-03 23:10:51 -08:00
Paul Rivier
a7da0672dc more support for Textile reader (explicit links, images), tests and cabal entries 2010-12-03 23:10:51 -08:00
paul.rivier
cfc70863a3 simpler table cell handling 2010-12-03 23:10:51 -08:00
paul.rivier
d917db5e42 preliminary material toward table support 2010-12-03 23:10:51 -08:00
paul.rivier
75fa22c300 textile reader now imports import Text.Pandoc.Parsing 2010-12-03 23:10:50 -08:00
paul.rivier
d532c72c5b Basic Textile Reader 2010-12-03 23:10:50 -08:00
John MacFarlane
e578b7f3d3 Added --data-dir to valid options for markdown2pdf. 2010-12-02 22:42:13 -08:00
John MacFarlane
fe39a06e24 Tweaked command-line options allowed by markdown2pdf. 2010-12-02 22:38:26 -08:00
John MacFarlane
4c21c5566d Merge branch 'master' into citeproc 2010-11-28 20:21:07 -08:00
John MacFarlane
3ffd724617 Markdown parser performance improvement.
Do a quick lookahead to make sure what follows looks like a setext
header before parsing any Inlines.  This gives a 15% performance
boost in one benchmark.  Many thanks to knieriem for finding
the problem (in peg-markdown):

https://github.com/jgm/peg-markdown/issues/issue/3
2010-11-28 20:19:32 -08:00
John MacFarlane
b10e82c9fa Fixed spacing bug for reference-style citations. 2010-11-28 07:55:33 -08:00
John MacFarlane
f64983f879 Merge branch 'master' into citeproc 2010-11-27 14:58:23 -08:00
John MacFarlane
e9cfbd5adc OpenDocument writer: don't print raw TeX. 2010-11-27 14:57:48 -08:00
John MacFarlane
f15965e205 Merge branch 'master' into citeproc 2010-11-27 11:54:26 -08:00
John MacFarlane
970f63c18a LaTeX writer: Escape curly quotes. 2010-11-27 11:53:30 -08:00
John MacFarlane
eac4abe36f Biblio: If locator ends with ",", add it to the suffix. 2010-11-27 11:28:45 -08:00
John MacFarlane
219853b05e Added procOpts parameter to citeproc call. 2010-11-27 11:28:11 -08:00
John MacFarlane
54397a9e99 Merge branch 'master' into citeproc 2010-11-27 10:58:05 -08:00
John MacFarlane
c989bf028f Merge branch 'textile'
Conflicts:
	README
	man/man1/pandoc.1.md
	pandoc.cabal
2010-11-27 10:52:44 -08:00
John MacFarlane
71c9316a59 Use [] for superscripts and subscripts in textile writer. 2010-11-27 10:44:58 -08:00
John MacFarlane
cae3f8edba Fixed spacing problems in textile nested lists. 2010-11-27 10:44:35 -08:00
John MacFarlane
283f1e60cc Use parsec parsers to split locator.
This is easier to read and maintain.
Also, formatting is now stripped from the locator prefix,
so you can write e.g. '*p.* 33'.
2010-11-27 07:08:32 -08:00
John MacFarlane
044a9a6157 Added 'stringify' to Text.Pandoc.Shared. 2010-11-27 07:08:06 -08:00
John MacFarlane
0ca84f0d38 Markdown suffix parser fix.
If suffix doesn't begin with punctuation, include opening
comma and space in result.

Previously,

@item [only a suffix]

would result in something like

Doe (2002only a suffix)

because there was no opening delimiter.
2010-11-26 22:34:53 -08:00
John MacFarlane
0871a512d7 Split locator and suffix in Biblio rather than Markdown parser.
Patch from Nathan Gass.
2010-11-26 12:06:56 -08:00
John MacFarlane
0955a0e329 More flexible handling of --csl.
Look for csl files in ~/.csl if not found locally.
Add .csl extension if it is not provided.
2010-11-23 21:40:05 -08:00
John MacFarlane
1b1287e888 Removed citeproc flag and CPP conditionals. 2010-11-23 21:14:31 -08:00
John MacFarlane
b48fa0ea59 Check biblio for all citations, not just textual. 2010-11-22 23:09:30 -08:00
John MacFarlane
7ef7d85b3f HTML reader: Export htmlTag. 2010-11-20 22:10:16 -08:00
John MacFarlane
05f5766abe Biblio: Check for == rather than /=.
This is more perspicuous.
2010-11-20 22:00:17 -08:00
John MacFarlane
3eef887dfa Citation related changes.
* Don't look for bibliography in ~/.pandoc.  Reason:  doing
  this requires a read + parse of the bibliography even when
  the document doesn't use citations.  This is a big performance
  drag on regular pandoc invocations.
* Only look for default.csl if the document contains references.
  Reason:  avoids the need to read and parse csl file when the
  document contains no references anyway.
* Removed findFirstFile from Shared.
2010-11-20 08:11:30 -08:00
John MacFarlane
46121aa2e1 Use default biblio.{xml,json,bib} in pandoc data dir if none specified. 2010-11-19 22:14:02 -08:00
John MacFarlane
9cb0581de6 Shared: Added findFirstFile, findDataFile, refactored readDataFile. 2010-11-19 22:13:30 -08:00
John MacFarlane
6390103509 Markdown citation parser: small refactoring for clarity. 2010-11-18 14:16:18 -08:00
John MacFarlane
bbb60a2586 If --csl not specified, read from data files or default.
Thus --csl behaves like --reference-odt, --template, etc.
2010-11-18 14:15:26 -08:00
John MacFarlane
f3bb3c1ff1 Markdown citation parser improvements and test updates.
Now we handle a suffix after a bare locator, e.g.
@item1 [p. 30, suffix]
The suffix now includes any punctuation that introduces it.
A few tests fail because of problems with citeproc (extra space
before the suffix, missing space after comma separating multiple
page ranges in the locator).
2010-11-18 13:22:20 -08:00
John MacFarlane
aaf7de0dda Markdown reader: Revised parser for new citation syntax.
Suffixes and prefixes are now [Inline].  The locator is separated
from the citation key by a blank space.  The locator consists of
one introductory word and any number of words containing at
least one digit.  The suffix, if any, is separated from the locator
by a comma, and continues til the end of the citation.
2010-11-18 12:38:45 -08:00
John MacFarlane
dbe0cefc9a Biblio: Removed stringify; pass inline list to citeproc. 2010-11-17 15:36:17 -08:00
John MacFarlane
47c64d4fc4 Don't pass a [Str ""] as citationPrefix. 2010-11-17 15:35:53 -08:00
John MacFarlane
ce9fc2a37d Updated for changes in Citaiton type.
citationPrefix now [Inline] rather than String;
citationSuffix added.

This change presupposes no changes in citeproc-hs.
It passes a string for these values to citeproc-hs.
Eventually, citeproc-hs should use an [Inline] for
these as well.
2010-11-16 20:31:22 -08:00
John MacFarlane
55e991614d Removed unneeded format argument in call to readBiblioFile. 2010-11-16 07:16:38 -08:00
John MacFarlane
d73a531d89 Biblio: don't add footnote if empty. 2010-11-16 07:15:30 -08:00
John MacFarlane
b158ae21a2 Improve handling of bibliography not found error. 2010-11-13 08:50:04 -08:00
John MacFarlane
7aecddd0f7 Replaced --biblio-file with --bibliography, removed --biblio-format.
Bibliography format is guessed from the file extension of the
bibliography.

Also, the bibliography entries are now read during option parsing.
2010-11-13 08:42:09 -08:00
John MacFarlane
1fa2973da6 Repairs to citation parser + citation test suite. 2010-11-12 19:30:59 -08:00
John MacFarlane
e88daeba11 Merge branch 'master' into citeproc 2010-11-12 18:57:37 -08:00
John MacFarlane
c2636e61d7 Treat argument as URI only if it has http(s) scheme.
Previously pandoc would treat the c: in some windowns filespecs
as a URI scheme and try to download... Thanks to Peter Wang for
pointing this out.
2010-11-12 18:30:50 -08:00
John MacFarlane
79bab2d210 Revised citation parsers for markdown reader.
Added a form for in-text citations:

@doe99 [30; see also @smith99].
2010-11-12 00:37:44 -08:00
John MacFarlane
5c6dc5767d Biblio: Use a Map for the lookup table. 2010-11-11 22:35:04 -08:00
John MacFarlane
1bfd8110af Merge branch 'master' into citeproc 2010-11-11 21:31:15 -08:00
John MacFarlane
36d4e649a6 Added support for textual citations (but not yet markdown syntax).
Patch from Andrea Rossato.
2010-11-11 21:30:34 -08:00
John MacFarlane
ca51bbbf16 HTML reader: don't parse raw HTML inside <code> tag.
Previously '<code><a>x</a></code>' would be parsed as
Code "<a>x</a>", which is not what you want.
2010-11-11 20:02:37 -08:00
John MacFarlane
83e6c01e4d Merge branch 'master' into citeproc 2010-11-09 22:52:36 -08:00
John MacFarlane
21556e37f4 Allow HTML comments as inline elements in markdown.
So,
aaa <!-- comment --> bbb
can be a single paragraph.
2010-11-09 22:51:02 -08:00
John MacFarlane
23c6f56bc5 Removed CITEPROC CPP conditionals from library code.
By Cabal policy, the API should not change depending on flags.
2010-11-06 14:58:54 -07:00
John MacFarlane
f7f6b2427d Changes to use citeproc-hs 0.3. 2010-11-06 14:43:23 -07:00
John MacFarlane
db03741847 Removed Text.Pandoc.Definition, bump version to 1.7.
We now get Text.Pandoc.Definition from the new pandoc-types package.
This will make it possible for other programs to supply output
in Pandoc format, without depending on the whole pandoc package.
2010-11-05 17:06:47 -07:00
John MacFarlane
5871c4d51f Biblio: small fix to detection of punctuation (A. Rossato). 2010-11-04 09:11:15 -07:00
John MacFarlane
5e1dc6adda Biblio: Improve footnote generation.
Patch from Andrea Rossato.
2010-11-03 12:58:29 -07:00
John MacFarlane
075840231b Improve footnote generation of in-text citations w/ note styles.
Patch from Andrea Rossato.
2010-11-02 21:10:33 -07:00
John MacFarlane
bd24e83c81 --mathjax: Use mathjax with raw latex rather than mathml.
It seems to work better, and the default config can be used.
2010-10-31 18:55:35 -07:00
John MacFarlane
ac06ca2b00 Changes to use citeproc 0.3.
Patch from Andrea Rossato.
Note: the markdown syntax is preliminary and will probably change.
2010-10-27 18:25:59 -07:00
John MacFarlane
9cf27c92c1 Added support for MathJax for displaying math in HTML.
Added --mathjax option.
Added MathJax to HTMLMathMethod.
Supported MathJax in HTML writer.

Resolves Issue #259.
2010-10-26 21:07:51 -07:00
John MacFarlane
f870777c36 Parse blanklines after macro definitions. 2010-10-26 19:52:12 -07:00
John MacFarlane
6b722d1b45 Process LaTeX macros in markdown, and apply to TeX math.
Example:
\newcommand{\plus}[2]{#1 + #2}

$\plus{3}{4}$

yields:

3+4
2010-10-26 09:03:03 -07:00
John MacFarlane
7e9e959548 LaTeX & ConTeXt writers: escape [ and ] as {[} and {]}.
This avoids unwanted interpretation as optional arguments
in some contexts, which caused the brackets to silently
disappear!
2010-10-24 19:38:16 -07:00
John MacFarlane
220a20bf92 Changed --help message for --variable to KEY:VALUE.
Was previously FILENAME.
2010-10-24 19:17:03 -07:00
John MacFarlane
4d08bc38a9 TeXMath: handle variables modified with \acute, \bar, etc.
Complete list: \acute, \grave, \breve, \check, \dot,
\mathring, \vec, \overrightarrow, \overleftarrow, \hat,
\tilde, \bar.
2010-10-19 15:03:30 -07:00
John MacFarlane
11672c4987 TeXMath reader: handle \textit, \textbf, etc. 2010-10-19 13:22:50 -07:00
John MacFarlane
ca5217881d Encode filenames as UTF8.
Resolves Issue #252 (pandoc doesn't properly handle unicode filenames).
2010-09-10 19:53:45 -07:00
John MacFarlane
6ccdde5571 gladTeX HTML - specify ENV for display or inline.
Thanks to Jonathan Daugherty for the patch.

The gladTeX program gives finer control over the LaTeX environment
used to render its input.  The latest version (1.1) uses the
"displaymath" environment by default, which is nice for large,
block-level equations, but it isn't so nice for inline math (where
"math" is more appropriate).  This patch causes the HTML writer to
differentiate between the two by explicitly setting the LaTeX
environment on the generated EQ tag.
2010-08-01 08:30:04 -07:00
John MacFarlane
8fe468463e --offline implies --standalone. 2010-07-24 00:49:10 -07:00
John MacFarlane
01a191709e Moved Text.Pandoc.Writers.S5 -> Text.Pandoc.S5.
Now it doesn't export a writer, just some CSS and JS.
2010-07-22 23:37:06 -07:00
John MacFarlane
851d39f8f8 Improved cutUp function, removed extra </div> 2010-07-22 23:23:47 -07:00
John MacFarlane
a11b530935 Moved s5 writing from S5 module to HTML.
Now s5 is handled in more or less the same way as slidy,
as a variant of HTML.
2010-07-22 22:58:48 -07:00
John MacFarlane
da52412455 Extended --offline to s5.
S5 default is now to include links, rather than a full copy
of scripts and stylesheets.
2010-07-22 22:23:43 -07:00
John MacFarlane
c5ed016616 Added new --offline option for slidy.
Added slidy/slidy.min.{css,js}.
2010-07-22 21:50:53 -07:00
John MacFarlane
5fd1389263 Slidy writer: Avoid spurious blank page. 2010-07-22 17:28:15 -07:00
John MacFarlane
a3051b8acb Export HTMLSlideVariant in Text.Pandoc. 2010-07-22 17:12:57 -07:00
John MacFarlane
2253c8ef65 Require texmath >= 0.3, adjusted for new elements. 2010-07-22 15:06:46 -07:00
John MacFarlane
4c88ecaeca Changed to using strict bytestrings in UTF8 module.
This avoids a problem on Windows reading from stdin.
Previously we'd get an error from hGetBufNonBlocking.
2010-07-21 15:14:20 -07:00