Commit graph

14 commits

Author SHA1 Message Date
fiddlosopher
d06417125d Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
  Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
  to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
  parser from Entities.hs.  
+ Added new 'entity' parser to Markdown reader, and added '&' as a 
  special character.  Adjusted test suite accordingly since now we 
  get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs.  escapeSGML removed as redundant,
  given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
  boolean parameter that causes only numerical entities to be used.
  This is used in the docbook writer.  The HTML writer uses named
  entities where possible, but not all docbook-consumers know about
  the named entities without special instructions, so it seems safer
  to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
  the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
  Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
  sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
  detecting "mailto" links.



git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
fiddlosopher
c94dacec35 Fixed bug in 'extractTagType' in HTML reader: previous
version was not skipping / in close tags.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@512 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 20:26:06 +00:00
fiddlosopher
8f0cfe9bd0 Fixed a bug in extractTagType in HTML Reader: the previous
version extracted the attributes, too, which is not wanted.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@510 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 19:40:32 +00:00
fiddlosopher
c61f2b6984 Fixed bug in HTML attribute parser: now a space is
required before an attribute.  Previously, <a.b>
would be parsed as an HTML tag with an attribute!


git-svn-id: https://pandoc.googlecode.com/svn/trunk@509 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 19:07:35 +00:00
fiddlosopher
0646eef976 Rewrote 'extractTagType' in HTML reader so that it doesn't use
regexs.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@507 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 17:43:39 +00:00
fiddlosopher
e4880319e6 Modified HTML reader to skip a newline following a <br> tag.
Otherwise the newline will be treated as a space at the beginning
of the next line.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@410 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-03 20:52:12 +00:00
fiddlosopher
d4454536f0 Change 'HtmlEntities' module to 'Entities'. Adjusted calling
code accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@395 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-02 00:40:12 +00:00
fiddlosopher
4ea1b2bdc0 Merged 'strict' branch from r324. This adds a '--strict'
option to pandoc, which forces it to stay as close as possible
to official Markdown syntax.  


git-svn-id: https://pandoc.googlecode.com/svn/trunk@347 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-30 22:51:49 +00:00
fiddlosopher
11cd6e94e0 Added license text to top of source files.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@258 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 20:54:23 +00:00
fiddlosopher
70d291026d Changed 'stability' from 'provisional' to 'alpha'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@257 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 20:20:10 +00:00
fiddlosopher
1fded403c5 Changed 'status' in comment headers from 'unstable' to 'provisional'
(which seems to be the term that is used in this context).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@255 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 19:48:46 +00:00
fiddlosopher
dc9c6450f3 + Added module data for haddock.
+ Reformatted code consistently.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@252 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 06:50:14 +00:00
fiddlosopher
986c1f9dee Pandoc bug fixes:
+ LaTeX reader did not parse metadata correctly.  Now the title,
  author, and date are parsed correctly, and everything else in
  the preamble is skipped.
+ Simplified parsing of LaTeX command arguments and options.  
  The function commandArgs now returns a list of arguments OR
  options (in whatever order they appear).  The brackets are
  included, and a new stripFirstAndLast function is provided
  to strip them off when needed.  This fixes a problem in dealing
  with \newcommand, etc.
+ Added a "try" before "parser" in definition of notFollowedBy'
  combinator.  Adjusted the code using this combinator accordingly.
+ Changed handling of code blocks.  Previously, some readers allowed
  trailing newlines, while others stripped them.  Now, all readers
  strip trailing newlines in code blocks; writers insert a newline
  at the end of code blocks as needed.
+ Changed test suite to reflect these changes. 


git-svn-id: https://pandoc.googlecode.com/svn/trunk@137 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-11-26 07:01:37 +00:00
fiddlosopher
df7b682251 initial import
git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-10-17 14:22:29 +00:00