pandoc

Author	SHA1	Message	Date
John MacFarlane	1b8a9711b8	Replaced more noneOf/oneOf parsers.	2011-01-19 15:14:23 -08:00
John MacFarlane	a400cfe10f	Replaced uses of oneOf with more efficient parsers. This speeds up the markdown reader.	2011-01-19 15:06:56 -08:00
John MacFarlane	22b2c02aeb	Markdown reader: Removed unneeded definitions. specialChars, strChar, specialCharsMinusLt.	2011-01-04 22:11:56 -08:00
John MacFarlane	fcbe1e95eb	Moved 'macro' and 'applyMacros'' from markdown reader to Parsing.	2011-01-04 19:12:33 -08:00
John MacFarlane	3e61333af0	Fixed regression in markdown reader. '(_hi_)' was being parsed with literal underscores (no emphasis). The fix: the 'str' parser now only parses alphanumerics and embedded underscores. All other symbols are handled by the 'symbol' parser. This has a slight effect on the AST, since you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there should not be a visible effect in any of the writers. Thanks to gwern for pointing out the regression.	2011-01-01 22:46:30 -08:00
John MacFarlane	904050fa36	New HTML reader using tagsoup as a lexer. * The new reader is faster and more accurate. * API changes for Text.Pandoc.Readers.HTML: - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag, anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType, htmlBlockElement, htmlComment - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag * tagsoup is a new dependency. * Text.Pandoc.Parsing: Generalized type on readWith. * Benchmark.hs: Added length calculation to force full evaluation. * Updated HTML reader tests. * Updated markdown and textile readers to use the functions from the HTML reader. * Note: The markdown reader now correctly handles some cases it did not before. For example: <hr/> is reproduced without adding a space. <script> a = '<b>'; </script> is parsed correctly.	2010-12-30 13:55:40 -08:00
John MacFarlane	10d85f8b0b	Use functions from Text.Pandoc.Generic instead of processWith(M).	2010-12-24 13:39:27 -08:00
John MacFarlane	128cf46089	Fixed regression in parsing _emph_ There was a bug in parsing '_emph_, ...': when followed by a comma, underscore emphasis did not register. (Thanks to gwern for pointing this out.) This bug was introduced by the change in `c66921f2ac`	2010-12-14 18:23:26 -08:00
Nathan Gass	2e728df756	Moved special handling of punctuation in suffix out of markdown reader. This allows different writers to handle punctuation in the suffix differently.	2010-12-13 20:50:29 -08:00
John MacFarlane	1a4a0d0283	Markdown reader: Further fix to abbrevs.	2010-12-13 20:05:50 -08:00
John MacFarlane	7b4d3c77ec	Markdown reader: Fixed abbrev handler to allow abbrev at end of line. E.g., Mr. Frank.	2010-12-13 20:04:11 -08:00
John MacFarlane	3822d6c440	Markdown reader: Fixed referenceKey parser to allow space after newline.	2010-12-13 20:03:59 -08:00
John MacFarlane	71e0557e61	Markdown reader: Fixed regression in reference key parser. * The recent change allowing spaces and newlines in the URL caused problems when reference keys are stacked up without blank lines between. This is now fixed. * Added test.	2010-12-13 20:03:12 -08:00
John MacFarlane	3748dfeb91	Markdown reader: fix superscripts with links. Moved inlineNote parser after superscript parser, so ^[link](/foo)^ gets recognized as a superscripted link, not an inline note followed by garbage. Thanks to Conal Elliott for pointing out the problem.	2010-12-12 20:30:55 -08:00
John MacFarlane	de6452c0d1	Markdown reader: small cosmetic code improvements.	2010-12-10 16:26:35 -08:00
John MacFarlane	5770ceca36	Removed HTML sanitization. This is better done on the resulting HTML; use the xss-sanitize library for this. xss-sanitize is based on pandoc's sanitization, but improves it. - Removed stateSanitize from ParserState. - Removed --sanitize-html option.	2010-12-10 12:26:03 -08:00
John MacFarlane	17d48cf4af	Markdown reader: Allow linebreaks in URLs (treat as spaces). Also, a string of consecutive spaces or tabs is now parsed as a single space. If you have multiple spaces in your URL, use %20%20.	2010-12-10 12:14:51 -08:00
John MacFarlane	ee0a0953de	Markdown reader: Rewrote para parser for better efficiency. This change avoids repeated parsing of inline lists for 'plain' blocks.	2010-12-10 10:47:46 -08:00
John MacFarlane	91978d2201	Markdown reader: minor footnote changes. Don't skipNonindentSpaces in noteMarker, since it's also used in the inline note parser.	2010-12-08 08:17:16 -08:00
John MacFarlane	33ba35da9f	Smart punctuation: recognize entities. Now “Hi” gets parsed as a Quoted DoubleQuote inline.	2010-12-07 20:44:43 -08:00
John MacFarlane	e20052a1ba	Markdown reader: Moved smartPunctuation parser, for slight speed bump.	2010-12-07 20:09:40 -08:00
John MacFarlane	50ca61ef49	Moved smartPunctuation from Markdown to Parsing. + Parameterized smartPunctuation on an inline parser. + Handle smartPunctuation in Textile reader.	2010-12-07 19:03:08 -08:00
John MacFarlane	c66921f2ac	Markdown reader: better handling of intraword _. The 'str' parser now reads internal _'s as part of the string. This prevents pandoc from getting started looking for an emphasized block, which can cause exponential slowdowns in some cases. Resolves Issue #182.	2010-12-06 22:12:18 -08:00
John MacFarlane	7864f30717	Markdown reader: handle curly quotes better. Previously, curly quotes were just parsed literally, leading to problems in some output formats. Now they are parsed as Quoted inlines, if --smart is specified. Resolves Issue #270.	2010-12-06 20:36:58 -08:00
John MacFarlane	5a4609584c	Fix regression: markdown references should be case-insensitive. This broke when we added the Key type. We had assumed that the custom case-insensitive Ord instance would ensure case-insensitive matching, but that is not how Data.Map works. * Added a test case for case-insensitivity in markdown-reader-more * Removed old refsMatch from Text.Pandoc.Parsing module; * hid the 'Key' constructor; * dropped the custom Ord and Eq instances, deriving instead; * added fromKey and toKey to convert between Keys and Inline lists; * toKey ensures that keys are case-insensitive, since this is the only way the API provides to construct a Key. Resolves Issue #272.	2010-12-05 19:27:00 -08:00
John MacFarlane	357b965b44	Merge branch 'citeproc' into master. Conflicts: src/Text/Pandoc/Definition.hs	2010-12-03 23:43:47 -08:00
paul.rivier	c3866f3c66	punctuation handling, and more html-specific handling	2010-12-03 23:10:52 -08:00
John MacFarlane	4c21c5566d	Merge branch 'master' into citeproc	2010-11-28 20:21:07 -08:00
John MacFarlane	3ffd724617	Markdown parser performance improvement. Do a quick lookahead to make sure what follows looks like a setext header before parsing any Inlines. This gives a 15% performance boost in one benchmark. Many thanks to knieriem for finding the problem (in peg-markdown): https://github.com/jgm/peg-markdown/issues/issue/3	2010-11-28 20:19:32 -08:00
John MacFarlane	0ca84f0d38	Markdown suffix parser fix. If suffix doesn't begin with punctuation, include opening comma and space in result. Previously, @item [only a suffix] would result in something like Doe (2002only a suffix) because there was no opening delimiter.	2010-11-26 22:34:53 -08:00
John MacFarlane	0871a512d7	Split locator and suffix in Biblio rather than Markdown parser. Patch from Nathan Gass.	2010-11-26 12:06:56 -08:00
John MacFarlane	b48fa0ea59	Check biblio for all citations, not just textual.	2010-11-22 23:09:30 -08:00
John MacFarlane	6390103509	Markdown citation parser: small refactoring for clarity.	2010-11-18 14:16:18 -08:00
John MacFarlane	f3bb3c1ff1	Markdown citation parser improvements and test updates. Now we handle a suffix after a bare locator, e.g. @item1 [p. 30, suffix] The suffix now includes any punctuation that introduces it. A few tests fail because of problems with citeproc (extra space before the suffix, missing space after comma separating multiple page ranges in the locator).	2010-11-18 13:22:20 -08:00
John MacFarlane	aaf7de0dda	Markdown reader: Revised parser for new citation syntax. Suffixes and prefixes are now [Inline]. The locator is separated from the citation key by a blank space. The locator consists of one introductory word and any number of words containing at least one digit. The suffix, if any, is separated from the locator by a comma, and continues til the end of the citation.	2010-11-18 12:38:45 -08:00
John MacFarlane	47c64d4fc4	Don't pass a [Str ""] as citationPrefix.	2010-11-17 15:35:53 -08:00
John MacFarlane	ce9fc2a37d	Updated for changes in Citaiton type. citationPrefix now [Inline] rather than String; citationSuffix added. This change presupposes no changes in citeproc-hs. It passes a string for these values to citeproc-hs. Eventually, citeproc-hs should use an [Inline] for these as well.	2010-11-16 20:31:22 -08:00
John MacFarlane	1fa2973da6	Repairs to citation parser + citation test suite.	2010-11-12 19:30:59 -08:00
John MacFarlane	79bab2d210	Revised citation parsers for markdown reader. Added a form for in-text citations: @doe99 [30; see also @smith99].	2010-11-12 00:37:44 -08:00
John MacFarlane	36d4e649a6	Added support for textual citations (but not yet markdown syntax). Patch from Andrea Rossato.	2010-11-11 21:30:34 -08:00
John MacFarlane	83e6c01e4d	Merge branch 'master' into citeproc	2010-11-09 22:52:36 -08:00
John MacFarlane	21556e37f4	Allow HTML comments as inline elements in markdown. So, aaa <!-- comment --> bbb can be a single paragraph.	2010-11-09 22:51:02 -08:00
John MacFarlane	23c6f56bc5	Removed CITEPROC CPP conditionals from library code. By Cabal policy, the API should not change depending on flags.	2010-11-06 14:58:54 -07:00
John MacFarlane	f7f6b2427d	Changes to use citeproc-hs 0.3.	2010-11-06 14:43:23 -07:00
John MacFarlane	ac06ca2b00	Changes to use citeproc 0.3. Patch from Andrea Rossato. Note: the markdown syntax is preliminary and will probably change.	2010-10-27 18:25:59 -07:00
John MacFarlane	f870777c36	Parse blanklines after macro definitions.	2010-10-26 19:52:12 -07:00
John MacFarlane	6b722d1b45	Process LaTeX macros in markdown, and apply to TeX math. Example: \newcommand{\plus}[2]{#1 + #2} $\plus{3}{4}$ yields: 3+4	2010-10-26 09:03:03 -07:00
John MacFarlane	afe18e53f1	Modified example refs so they can occur before or after target. The refs are now replaced by numbers at the final stage, using processWith.	2010-07-12 23:05:46 -07:00
John MacFarlane	0181e66250	Merge branch 'atlists'. Added auto-numbered example lists.	2010-07-11 22:47:52 -07:00
John MacFarlane	73b4cc0897	Minor comment change.	2010-07-06 21:23:25 -07:00

1 2 3 4 5

209 commits