Commit graph

8 commits

Author SHA1 Message Date
fiddlosopher
6c3d96a776 Changed 'encodeEntities' to 'escapeSGMLString'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@520 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28 16:40:44 +00:00
fiddlosopher
dc6925542c + Simplified entity handling by removing stringToSGML from Entities.hs.
It is no longer needed now that all entities are processed in the markdown 
  and HTML readers.  All calls to stringToSGML have been replaced by calls
  to encodeEntities.
+ Since inTag's attribute handling already encodes entities, 
  calls to encodeEntities are no longer needed for attribute values, so
  they've been removed.
+ The HTML and Markdown readers now call decodeEntities on all raw
  strings (e.g. authors, dates, link titles), to ensure that no unprocessed
  entities are included in the native representation of the document. 
  (In the HTML reader, most of this work is done by a change in
  extractAttributeName.)
+ The result is a small speed improvement (around 5% on my benchmark)
  and cleaner code.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@519 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28 00:04:43 +00:00
fiddlosopher
141affdb51 More changes in entity handling: Instead of using entities for characters
above 128 in HTML and Docbook output, we now just use unicode.  After all,
we're declaring UTF-8 content in the header.  This makes the HTML and
docbook files produced by pandoc much more readable and editable.

Changes to Entities.hs:
+ Removed specialCharToEntity
+ Added escapeSGMLChar (which just escapes the basic four, <>&")
+ Modified encodeEntities and stringToSGML to use escapeSGMLChar
+ Removed encodeEntitiesNumerical
+ Rewrote encodeEntities for better performance
+ Rewrote stringToSGML for better performance 



git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 22:13:11 +00:00
fiddlosopher
d06417125d Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
  Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
  to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
  parser from Entities.hs.  
+ Added new 'entity' parser to Markdown reader, and added '&' as a 
  special character.  Adjusted test suite accordingly since now we 
  get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs.  escapeSGML removed as redundant,
  given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
  boolean parameter that causes only numerical entities to be used.
  This is used in the docbook writer.  The HTML writer uses named
  entities where possible, but not all docbook-consumers know about
  the named entities without special instructions, so it seems safer
  to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
  the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
  Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
  sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
  detecting "mailto" links.



git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
fiddlosopher
58dcef0625 Added support for hexadecimal entities: e.g. &#xA0AB;
git-svn-id: https://pandoc.googlecode.com/svn/trunk@441 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06 19:47:05 +00:00
fiddlosopher
bb8478e4e2 Merged changes from 'quotes' branch since r431. Smart typography
is now handled in the Markdown and LaTeX readers, rather than in
the writers.  The HTML writer has been rewritten to use the
prettyprinting library.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@436 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06 09:54:58 +00:00
fiddlosopher
24f3710e09 Fixed bug in encodeEntities (characters less than 128, not 127,
should be encoded).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@416 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-04 17:51:46 +00:00
fiddlosopher
d4454536f0 Change 'HtmlEntities' module to 'Entities'. Adjusted calling
code accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@395 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-02 00:40:12 +00:00
Renamed from src/Text/Pandoc/HtmlEntities.hs (Browse further)