Commit graph

16 commits

Author SHA1 Message Date
fiddlosopher
31c030e3a5 Added --inline-links option to force links in HTML to be parsed
as inline links, rather than reference links.  (Addresses Issue
#4.)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@554 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03 18:19:31 +00:00
fiddlosopher
dc6925542c + Simplified entity handling by removing stringToSGML from Entities.hs.
It is no longer needed now that all entities are processed in the markdown 
  and HTML readers.  All calls to stringToSGML have been replaced by calls
  to encodeEntities.
+ Since inTag's attribute handling already encodes entities, 
  calls to encodeEntities are no longer needed for attribute values, so
  they've been removed.
+ The HTML and Markdown readers now call decodeEntities on all raw
  strings (e.g. authors, dates, link titles), to ensure that no unprocessed
  entities are included in the native representation of the document. 
  (In the HTML reader, most of this work is done by a change in
  extractAttributeName.)
+ The result is a small speed improvement (around 5% on my benchmark)
  and cleaner code.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@519 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-28 00:04:43 +00:00
fiddlosopher
d06417125d Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
  Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
  to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
  parser from Entities.hs.  
+ Added new 'entity' parser to Markdown reader, and added '&' as a 
  special character.  Adjusted test suite accordingly since now we 
  get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs.  escapeSGML removed as redundant,
  given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
  boolean parameter that causes only numerical entities to be used.
  This is used in the docbook writer.  The HTML writer uses named
  entities where possible, but not all docbook-consumers know about
  the named entities without special instructions, so it seems safer
  to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
  the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
  Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
  sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
  detecting "mailto" links.



git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
fiddlosopher
c94dacec35 Fixed bug in 'extractTagType' in HTML reader: previous
version was not skipping / in close tags.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@512 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 20:26:06 +00:00
fiddlosopher
8f0cfe9bd0 Fixed a bug in extractTagType in HTML Reader: the previous
version extracted the attributes, too, which is not wanted.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@510 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 19:40:32 +00:00
fiddlosopher
c61f2b6984 Fixed bug in HTML attribute parser: now a space is
required before an attribute.  Previously, <a.b>
would be parsed as an HTML tag with an attribute!


git-svn-id: https://pandoc.googlecode.com/svn/trunk@509 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 19:07:35 +00:00
fiddlosopher
0646eef976 Rewrote 'extractTagType' in HTML reader so that it doesn't use
regexs.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@507 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 17:43:39 +00:00
fiddlosopher
e4880319e6 Modified HTML reader to skip a newline following a <br> tag.
Otherwise the newline will be treated as a space at the beginning
of the next line.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@410 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-03 20:52:12 +00:00
fiddlosopher
d4454536f0 Change 'HtmlEntities' module to 'Entities'. Adjusted calling
code accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@395 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-02 00:40:12 +00:00
fiddlosopher
4ea1b2bdc0 Merged 'strict' branch from r324. This adds a '--strict'
option to pandoc, which forces it to stay as close as possible
to official Markdown syntax.  


git-svn-id: https://pandoc.googlecode.com/svn/trunk@347 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-30 22:51:49 +00:00
fiddlosopher
11cd6e94e0 Added license text to top of source files.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@258 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 20:54:23 +00:00
fiddlosopher
70d291026d Changed 'stability' from 'provisional' to 'alpha'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@257 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 20:20:10 +00:00
fiddlosopher
1fded403c5 Changed 'status' in comment headers from 'unstable' to 'provisional'
(which seems to be the term that is used in this context).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@255 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 19:48:46 +00:00
fiddlosopher
dc9c6450f3 + Added module data for haddock.
+ Reformatted code consistently.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@252 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 06:50:14 +00:00
fiddlosopher
986c1f9dee Pandoc bug fixes:
+ LaTeX reader did not parse metadata correctly.  Now the title,
  author, and date are parsed correctly, and everything else in
  the preamble is skipped.
+ Simplified parsing of LaTeX command arguments and options.  
  The function commandArgs now returns a list of arguments OR
  options (in whatever order they appear).  The brackets are
  included, and a new stripFirstAndLast function is provided
  to strip them off when needed.  This fixes a problem in dealing
  with \newcommand, etc.
+ Added a "try" before "parser" in definition of notFollowedBy'
  combinator.  Adjusted the code using this combinator accordingly.
+ Changed handling of code blocks.  Previously, some readers allowed
  trailing newlines, while others stripped them.  Now, all readers
  strip trailing newlines in code blocks; writers insert a newline
  at the end of code blocks as needed.
+ Changed test suite to reflect these changes. 


git-svn-id: https://pandoc.googlecode.com/svn/trunk@137 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-11-26 07:01:37 +00:00
fiddlosopher
df7b682251 initial import
git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-10-17 14:22:29 +00:00