treat ' followed by ll, re, ve, then a
non-letter, as a contraction. (e.g.
I've, you're, he'll)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@559 788f1e2b-df1e-0410-8736-df70ead52e1b
in cleaner, faster code, and it makes it easier to use Pandoc in
other projects, like wikis, that use Text.XHtml. Two functions
are now provided, writeHtml and writeHtmlString: the former outputs
an Html structure, the latter a rendered string. The S5 writer is
also changed, in parallel ways (writeS5, writeS5String). The Html
header is now written programmatically, so it has been removed from
the 'headers' directory. The S5 header is still needed, but the
doctype and some of the meta declarations have been removed, since
they are written programatically. The INSTALL file and cabalize
have been updated to reflect the new dependency on the xhtml package.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@549 788f1e2b-df1e-0410-8736-df70ead52e1b
xml, html, and tex demo pages (in website /examples/).
Add links so that html files can be viewed as web pages
(without syntax highlighting).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@543 788f1e2b-df1e-0410-8736-df70ead52e1b
'.text' -- the only reason for this is that I use '.txt' for
page source files, and generally exclude them from being uploaded
to the website.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@542 788f1e2b-df1e-0410-8736-df70ead52e1b
(160) as " ", since otherwise it is hard to distinguish
from a regular space. (Addresses Issue #3.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@541 788f1e2b-df1e-0410-8736-df70ead52e1b
printing a unicode non-breaking space, which is
hard to distinguish visually from a regular space.
(Resolves issue #3.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@540 788f1e2b-df1e-0410-8736-df70ead52e1b
reference keys in Markdown parser, instead of parsing
normally, then using setInput to reset input. Slight
performance improvement.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@536 788f1e2b-df1e-0410-8736-df70ead52e1b
of entity by character, in Entities.hs. This yields a
small performance improvement.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@534 788f1e2b-df1e-0410-8736-df70ead52e1b
RST reader. The problem was that ``#`` was seen by
'inline' as a potential link or image. Fix: insert
'notFollowedBy (char '`')' in link parsers.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@529 788f1e2b-df1e-0410-8736-df70ead52e1b
especially in "option" contexts.
+ Removed the "try" from the "end" parser in "enclosed"
(Text.Pandoc.Shared). Now "enclosed" behaves like
"option", "manyTill", etc.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@527 788f1e2b-df1e-0410-8736-df70ead52e1b
used a different parser, in RST reader. This fixes
a bug where ````` would not be correctly parsed as
a verbatim `.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@526 788f1e2b-df1e-0410-8736-df70ead52e1b
It is no longer needed now that all entities are processed in the markdown
and HTML readers. All calls to stringToSGML have been replaced by calls
to encodeEntities.
+ Since inTag's attribute handling already encodes entities,
calls to encodeEntities are no longer needed for attribute values, so
they've been removed.
+ The HTML and Markdown readers now call decodeEntities on all raw
strings (e.g. authors, dates, link titles), to ensure that no unprocessed
entities are included in the native representation of the document.
(In the HTML reader, most of this work is done by a change in
extractAttributeName.)
+ The result is a small speed improvement (around 5% on my benchmark)
and cleaner code.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@519 788f1e2b-df1e-0410-8736-df70ead52e1b
Str inline in Docbook and HTML writers, since now these
strings should not contain literal entity references.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@518 788f1e2b-df1e-0410-8736-df70ead52e1b
Now these are stored as a '"' character, not as '"'.
The function escapeLinkTitle in the Markdown writer is
unnecessary and was removed. Tests modified accordingly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
above 128 in HTML and Docbook output, we now just use unicode. After all,
we're declaring UTF-8 content in the header. This makes the HTML and
docbook files produced by pandoc much more readable and editable.
Changes to Entities.hs:
+ Removed specialCharToEntity
+ Added escapeSGMLChar (which just escapes the basic four, <>&")
+ Modified encodeEntities and stringToSGML to use escapeSGMLChar
+ Removed encodeEntitiesNumerical
+ Rewrote encodeEntities for better performance
+ Rewrote stringToSGML for better performance
git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Entities are parsed (and unicode characters returned) in both
Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
parser from Entities.hs.
+ Added new 'entity' parser to Markdown reader, and added '&' as a
special character. Adjusted test suite accordingly since now we
get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs. escapeSGML removed as redundant,
given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
boolean parameter that causes only numerical entities to be used.
This is used in the docbook writer. The HTML writer uses named
entities where possible, but not all docbook-consumers know about
the named entities without special instructions, so it seems safer
to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
detecting "mailto" links.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
which does not support unicode:
- escapePreservingRegex removed
- stringToSGML rewritten using Parsec parser
- new parsers for SGML character entities
- escapeSGML rewritten using specialCharToEntity
- new function specialCharToEntity
git-svn-id: https://pandoc.googlecode.com/svn/trunk@514 788f1e2b-df1e-0410-8736-df70ead52e1b
Replaced email regex test with a custom email autolink parser
(autoLinkEmail). Also replaced 'selfClosingTag' with a
custom function 'isSelfClosingTag'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@511 788f1e2b-df1e-0410-8736-df70ead52e1b
required before an attribute. Previously, <a.b>
would be parsed as an HTML tag with an attribute!
git-svn-id: https://pandoc.googlecode.com/svn/trunk@509 788f1e2b-df1e-0410-8736-df70ead52e1b
So, instead of [site.com](site.com) we get <site.com>.
Changed test suite accordingly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@508 788f1e2b-df1e-0410-8736-df70ead52e1b
+ LaTeX writer now handles consecutive quotes properly:
for example, ``\,`hello'\,''
+ LaTeX reader now parses '\,' as empty Str
+ normalizeSpaces function in Shared now removes empty Str elements
+ Modified tests accordingly
git-svn-id: https://pandoc.googlecode.com/svn/trunk@506 788f1e2b-df1e-0410-8736-df70ead52e1b
list function that can be used to substitute one substring
for another in a string, like 'gsub' except without regular
expressions.
+ Use 'substitute' instead of 'gsub' in the LaTeX writer. This
avoids what appears to be a bug in Text.Regex, whereby "\\^"
matches "\350". There seems to be a slight speed improvement
as well. (Note: If this works, it would be good to replace
other uses of gsub that don't employ regexs with 'substitute'.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@500 788f1e2b-df1e-0410-8736-df70ead52e1b
start if followed by 's' and then a non-alphanumeric. (Yes,
this is English-centric, I'm afraid. But it does help, and I
can't think of a language in which 's' by itself is a word.)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@498 788f1e2b-df1e-0410-8736-df70ead52e1b