Now these are stored as a '"' character, not as '"'.
The function escapeLinkTitle in the Markdown writer is
unnecessary and was removed. Tests modified accordingly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
above 128 in HTML and Docbook output, we now just use unicode. After all,
we're declaring UTF-8 content in the header. This makes the HTML and
docbook files produced by pandoc much more readable and editable.
Changes to Entities.hs:
+ Removed specialCharToEntity
+ Added escapeSGMLChar (which just escapes the basic four, <>&")
+ Modified encodeEntities and stringToSGML to use escapeSGMLChar
+ Removed encodeEntitiesNumerical
+ Rewrote encodeEntities for better performance
+ Rewrote stringToSGML for better performance
git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Entities are parsed (and unicode characters returned) in both
Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
parser from Entities.hs.
+ Added new 'entity' parser to Markdown reader, and added '&' as a
special character. Adjusted test suite accordingly since now we
get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs. escapeSGML removed as redundant,
given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
boolean parameter that causes only numerical entities to be used.
This is used in the docbook writer. The HTML writer uses named
entities where possible, but not all docbook-consumers know about
the named entities without special instructions, so it seems safer
to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
detecting "mailto" links.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
which does not support unicode:
- escapePreservingRegex removed
- stringToSGML rewritten using Parsec parser
- new parsers for SGML character entities
- escapeSGML rewritten using specialCharToEntity
- new function specialCharToEntity
git-svn-id: https://pandoc.googlecode.com/svn/trunk@514 788f1e2b-df1e-0410-8736-df70ead52e1b
Replaced email regex test with a custom email autolink parser
(autoLinkEmail). Also replaced 'selfClosingTag' with a
custom function 'isSelfClosingTag'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@511 788f1e2b-df1e-0410-8736-df70ead52e1b
required before an attribute. Previously, <a.b>
would be parsed as an HTML tag with an attribute!
git-svn-id: https://pandoc.googlecode.com/svn/trunk@509 788f1e2b-df1e-0410-8736-df70ead52e1b
So, instead of [site.com](site.com) we get <site.com>.
Changed test suite accordingly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@508 788f1e2b-df1e-0410-8736-df70ead52e1b
+ LaTeX writer now handles consecutive quotes properly:
for example, ``\,`hello'\,''
+ LaTeX reader now parses '\,' as empty Str
+ normalizeSpaces function in Shared now removes empty Str elements
+ Modified tests accordingly
git-svn-id: https://pandoc.googlecode.com/svn/trunk@506 788f1e2b-df1e-0410-8736-df70ead52e1b
list function that can be used to substitute one substring
for another in a string, like 'gsub' except without regular
expressions.
+ Use 'substitute' instead of 'gsub' in the LaTeX writer. This
avoids what appears to be a bug in Text.Regex, whereby "\\^"
matches "\350". There seems to be a slight speed improvement
as well. (Note: If this works, it would be good to replace
other uses of gsub that don't employ regexs with 'substitute'.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@500 788f1e2b-df1e-0410-8736-df70ead52e1b
start if followed by 's' and then a non-alphanumeric. (Yes,
this is English-centric, I'm afraid. But it does help, and I
can't think of a language in which 's' by itself is a word.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@498 788f1e2b-df1e-0410-8736-df70ead52e1b
DocBook, and HTML writers. The syntax is documented in
README. Tests have been added to the test suite.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@493 788f1e2b-df1e-0410-8736-df70ead52e1b
one gets an error creating the output file in the /tmp directory.
I haven't tracked this one down, but this should serve as a
workaround.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@481 788f1e2b-df1e-0410-8736-df70ead52e1b
work. This only affects the test target on systems without
GNU diff (rare), so I'm not too worried about it.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@480 788f1e2b-df1e-0410-8736-df70ead52e1b
an error condition when it gives warnings, so instead we grep
for warnings or error messages to see if we need to print the
log.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@476 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Exit if pandoc fails (second time through) -- no need to store the log for this.
+ Run pdflatex up to three times, if needed to resolve references. Also
run bibtex as needed.
+ Minor reformatting.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@469 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Descriptions on examples.
+ New "features" page highlighting Pandoc's features.
+ Small other improvements.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@466 788f1e2b-df1e-0410-8736-df70ead52e1b
characters Markdown escapes are escaped in strict mode.
When not in strict mode, Pandoc allows all non-alphanumeric
characters to be escaped.
+ Added documentation of backslash escapes to README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@461 788f1e2b-df1e-0410-8736-df70ead52e1b