Commit graph

45 commits

Author SHA1 Message Date
fiddlosopher
8e1f484353 Added table tests for all writers.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@639 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-07 22:50:12 +00:00
fiddlosopher
aedc2095f5 Added RTF table writer tests.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@600 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-11 17:13:42 +00:00
fiddlosopher
5660e6ba11 Updated test suite with new tests for definition lists.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@597 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-10 22:04:36 +00:00
fiddlosopher
71cf0a11b3 + Use new alignment parameter in title/author/date,
instead of hardcoded \qc.
+ Adjusted test suite to account for changes in RTF writer.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@594 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-09 00:44:56 +00:00
fiddlosopher
24ee5f1f49 Updated test suite to reflect new prettyprinted native
Table format.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@583 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-13 22:18:47 +00:00
fiddlosopher
23df0ed176 Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required.  This rewrite yields a small speed
boost and considerably cleaner code. 

* Text/Pandoc/Definition.hs:
  + blocks:  removed Key and Note
  + inlines:  removed NoteRef, added Note
  + modified Target:  there is no longer a 'Ref' target; all targets
    are explicit URL, title pairs

* Text/Pandoc/Shared.hs:

  + Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
    used in some of the readers.
  + Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
    'replaceRefLinksBlockList', along with some auxiliary functions
    used only by them.  These are no longer needed, since
    reference links are resolved in the Markdown and RST readers.
  + Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
    to the Docbook writer, since that is now the only module that uses
    them.
  + Changed name of 'escapeSGMLString' to 'escapeStringForXML'
  + Added KeyTable and NoteTable types
  + Removed fields from ParserState;  'stateKeyBlocks', 'stateKeysUsed',
    'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'. 
    Added 'stateKeys' and 'stateNotes'.
  + Added clause for Note to 'prettyBlock'.
  + Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.

* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
  'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'

* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
  line block up to and including following blank lines.

* Main.hs:  Replaced --inline-links with --reference-links.

* README: 
  + Documented --reference-links and removed description of --inline-links.
  + Added note that footnotes may occur anywhere in the document, but must
    be at the outer level, not embedded in block elements.
  
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
  option, added --reference-links option

* Markdown and RST readers:
  + Rewrote to fit new Pandoc definition.  Since there are no longer
    Note or Key blocks, all note and key blocks are parsed on a first pass
    through the document.  Once tables of notes and keys have been constructed,
    the remaining parts of the document are reassembled and parsed.
  + Refactored link parsers.

* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
  there are no longer Note or Key blocks, notes and references can be
  parsed in a single pass through the document.

* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
  and definition. State is used to hold lists of references footnotes to
  and be printed at the end of the document.

* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
  of the different treatment of footnotes, the "notes" parameter is no
  longer needed in the block and inline conversion functions.)

* Docbook writer:
  + Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
    'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
    they are now used only by the Docbook writer.
  + Rewrote using new Pandoc definition.  (Because of the different
    treatment of footnotes, the "notes" parameter is no longer needed
    in the block and inline conversion functions.)

* Updated test suite

* Throughout:  old haskell98 module names replaced by hierarchical module
  names, e.g. List by Data.List.

* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
  in "Build-Depends."

* cabalize: 
  + Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
    module names are being used throughout)
  + Added mtl to BASE_DEPENDS (needed for state monad)
  + Removed html from GHC66_DEPENDS (not needed since xhtml is now used)



git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
fiddlosopher
571c3b4173 Removed Blank block element as unnecessary.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@578 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-17 20:23:47 +00:00
fiddlosopher
402016df28 Fixed bug in noscript part of email obfuscation:
& instead of &


git-svn-id: https://pandoc.googlecode.com/svn/trunk@557 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-04 07:40:22 +00:00
fiddlosopher
463a0e5c3e Changes to test suite for new XHTML output.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@550 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-02-27 07:05:11 +00:00
fiddlosopher
8e0ad5a006 Cleaned up handling of embedded quotes in link titles.
Now these are stored as a '"' character, not as '"'.
The function escapeLinkTitle in the Markdown writer is
unnecessary and was removed.  Tests modified accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@517 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 22:45:14 +00:00
fiddlosopher
141affdb51 More changes in entity handling: Instead of using entities for characters
above 128 in HTML and Docbook output, we now just use unicode.  After all,
we're declaring UTF-8 content in the header.  This makes the HTML and
docbook files produced by pandoc much more readable and editable.

Changes to Entities.hs:
+ Removed specialCharToEntity
+ Added escapeSGMLChar (which just escapes the basic four, <>&")
+ Modified encodeEntities and stringToSGML to use escapeSGMLChar
+ Removed encodeEntitiesNumerical
+ Rewrote encodeEntities for better performance
+ Rewrote stringToSGML for better performance 



git-svn-id: https://pandoc.googlecode.com/svn/trunk@516 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 22:13:11 +00:00
fiddlosopher
d06417125d Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
  Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
  to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
  parser from Entities.hs.  
+ Added new 'entity' parser to Markdown reader, and added '&' as a 
  special character.  Adjusted test suite accordingly since now we 
  get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs.  escapeSGML removed as redundant,
  given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
  boolean parameter that causes only numerical entities to be used.
  This is used in the docbook writer.  The HTML writer uses named
  entities where possible, but not all docbook-consumers know about
  the named entities without special instructions, so it seems safer
  to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
  the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
  Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
  sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
  detecting "mailto" links.



git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
fiddlosopher
ca6cb23f23 Modified Markdown writer to use autolinks when possible.
So, instead of [site.com](site.com) we get <site.com>.
Changed test suite accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@508 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 19:06:30 +00:00
fiddlosopher
96919a6ac5 More smart quote bug fixes:
+ LaTeX writer now handles consecutive quotes properly:
  for example, ``\,`hello'\,''
+ LaTeX reader now parses '\,' as empty Str
+ normalizeSpaces function in Shared now removes empty Str elements
+ Modified tests accordingly


git-svn-id: https://pandoc.googlecode.com/svn/trunk@506 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-24 08:14:43 +00:00
fiddlosopher
60989d0637 Added support for tables in markdown reader and in LaTeX,
DocBook, and HTML writers.  The syntax is documented in
README.  Tests have been added to the test suite.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@493 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-15 19:52:42 +00:00
fiddlosopher
46580147a5 Reverted r471. My alternative to --strip-trailing-cr didn't
work.  This only affects the test target on systems without
GNU diff (rare), so I'm not too worried about it.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@480 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09 07:40:00 +00:00
fiddlosopher
458bb40989 Fixed docbook writer test -- removed named entities.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@474 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09 05:45:06 +00:00
fiddlosopher
acb4dab5eb Added comment relevant to last revision.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@472 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09 04:28:29 +00:00
fiddlosopher
35a06b9028 Replaced diff --strip-trailing-cr with something more portable
in runtests.pl.  (This is a GNU option.)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@471 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-09 04:27:07 +00:00
fiddlosopher
d47ce5b1f4 Added [breaklinks=true] to latex writer test case.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@453 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07 05:28:36 +00:00
fiddlosopher
f2c2494b66 Modified HTML output for Image elements, to conform to
Markdown.pl:
+ title attribute comes after alt attribute
+ title is included even if null


git-svn-id: https://pandoc.googlecode.com/svn/trunk@445 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-07 01:06:34 +00:00
fiddlosopher
233148f963 Fixed bug in Markdown reader's handling of underscores and other
inline formatting markers inside reference labels:  for example,
in '[A_B]: /url/a_b', the material between underscores was being
parsed as emphasized inlines.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@442 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06 20:47:00 +00:00
fiddlosopher
bb8478e4e2 Merged changes from 'quotes' branch since r431. Smart typography
is now handled in the Markdown and LaTeX readers, rather than in
the writers.  The HTML writer has been rewritten to use the
prettyprinting library.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@436 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-06 09:54:58 +00:00
roktas
c9f72f4c39 Setup executable permissions on some files.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@423 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-05 07:39:32 +00:00
fiddlosopher
39eb8cbad8 Changed Markdown writer so that it does not use the single-bracket
style of implicit reference link.  It now uses [this style][],
not [this style].  Reason:  only newer, beta versions of Markdown
allow the single-bracket style.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@419 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-05 00:55:38 +00:00
fiddlosopher
030d94e1c3 Refactored SGML escaping functions and "in tag" functions to
Text/Shared/Pandoc.  (escapeSGML, stringToSGML, inTag,
inTagSimple, inTagIndented, selfClosingTag)  These can be
used by both the HTML and Docbook writers.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@417 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-04 22:52:16 +00:00
fiddlosopher
99959b68e9 + Improved text wrapping algorithm in markdown, docbook, and RST writers.
LineBreaks no longer cause ugly wrapping in Markdown output.
+ Replaced splitBySpace with the more general, polymorphic function
  splitBy (in Text/Pandoc/Shared).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@411 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-04 01:04:56 +00:00
fiddlosopher
4e5745134a Use entities for all characters above 127 in docbook output.
Though XML tools should support unicode, some people will be
using SGML tools, and these do not.  Using entities makes the
docbook files more portable.

Also refactored encodeEntities and charToHtmlEntity in
HtmlEntities.hs


git-svn-id: https://pandoc.googlecode.com/svn/trunk@394 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-02 00:29:22 +00:00
fiddlosopher
2716943855 Changed representation of code blocks to use <screen> and
escaped characters rather than <programlisting> and CDATA.
Reason:  XML source more easily editable and readable.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@393 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-01 22:07:19 +00:00
fiddlosopher
a9e32505de Merged changes from docbook branch since r363.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@386 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-01 21:08:12 +00:00
fiddlosopher
4ea1b2bdc0 Merged 'strict' branch from r324. This adds a '--strict'
option to pandoc, which forces it to stay as close as possible
to official Markdown syntax.  


git-svn-id: https://pandoc.googlecode.com/svn/trunk@347 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-30 22:51:49 +00:00
fiddlosopher
eea359203a Reversed changes from r246:
+ Removed invisible anchors in front of header tags in HTML output.
  Reason:  no way to prevent duplicate ID attributes (which is invalid
  HTML), since there might be duplicate header titles.  See 
  http://six.pairlist.net/pipermail/markdown-discuss/2005-January/000975.html.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@306 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-29 08:04:39 +00:00
fiddlosopher
ff93d50142 + Added --strip-trailing-cr option to diff in runtests.pl, so that
the test suite will work in Windows.
+ Converted some CR's to LF's in print.css and adjusted test suite
  accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@290 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-24 22:58:29 +00:00
fiddlosopher
618d2ff006 Changed default ASCIIMathML text color to black.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@289 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-24 16:45:17 +00:00
fiddlosopher
d2105f6693 + Added regression tests with footnotes in quote blocks and lists.
+ This uncovered an existing bug in the RTF writer, which got indentation
  wrong on footnotes occuring in indented blocks like lists.  Fixed
  this bug.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@263 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-21 19:33:57 +00:00
fiddlosopher
b98edf2c74 Made javascript obfuscation of emails even more obfuscatory,
by combining it with entity obfuscation.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@254 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 18:16:07 +00:00
fiddlosopher
5cf769b1cd Modified the HTML writer to add invisible anchors to each section
heading.  The anchors are derived form the text of the section
heading as described in README.  This makes it easy to insert
links that jump from one part of a document to another:
for example, '[back to the Introduction](#Introduction)'.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@246 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-20 00:25:54 +00:00
fiddlosopher
c1ebe94e40 + Replaced 'comparing' combinator in markdown reader with 'compare'.
'comparing' is from Data.Ord, which is not available in GHC 6.4.
+ Added line break after </li> in HTML footnote output, for easier
  inspection of the source.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@245 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-19 23:46:07 +00:00
fiddlosopher
661c7e7b1d Merged changes to footnotes branch r219-r240.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@241 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-19 23:13:03 +00:00
fiddlosopher
3a6296acae Changed footnote syntax to conform to the de facto standard
for markdown footnotes.  References are now like this[^1]
rather than like this^(1).  There are corresponding changes
in the footnotes themselves.  See the updated README for
more details.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@230 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-19 07:30:36 +00:00
fiddlosopher
fe66a90a2a Changed 'putStrLn' to 'putStr' in Main.hs, and modified some
of the readers to make spacing at end of output more consistent.
Modified tests accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@201 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-12-16 05:05:02 +00:00
fiddlosopher
986c1f9dee Pandoc bug fixes:
+ LaTeX reader did not parse metadata correctly.  Now the title,
  author, and date are parsed correctly, and everything else in
  the preamble is skipped.
+ Simplified parsing of LaTeX command arguments and options.  
  The function commandArgs now returns a list of arguments OR
  options (in whatever order they appear).  The brackets are
  included, and a new stripFirstAndLast function is provided
  to strip them off when needed.  This fixes a problem in dealing
  with \newcommand, etc.
+ Added a "try" before "parser" in definition of notFollowedBy'
  combinator.  Adjusted the code using this combinator accordingly.
+ Changed handling of code blocks.  Previously, some readers allowed
  trailing newlines, while others stripped them.  Now, all readers
  strip trailing newlines in code blocks; writers insert a newline
  at the end of code blocks as needed.
+ Changed test suite to reflect these changes. 


git-svn-id: https://pandoc.googlecode.com/svn/trunk@137 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-11-26 07:01:37 +00:00
fiddlosopher
3dbd266d21 Improved LaTeX writer's handling of dashes:
+ Recognize a double hyphen as an Em-dash, even when it occurs next
  to punctuation (e.g. a quotation mark).
+ Collapse space around Em-dashes.
+ Process quotes before dashes.  This way (foo -- 'bar') will turn into
  (foo---`bar') instead of (foo---'bar').


git-svn-id: https://pandoc.googlecode.com/svn/trunk@49 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-10-30 23:15:28 +00:00
fiddlosopher
09473903dc Changes to RTF writer:
+ use Helvetica instead of Times New Roman as default font
+ specify \f0 in every \pard; otherwise font sizes are not registered properly
+ modify test of RTF writer accordingly


git-svn-id: https://pandoc.googlecode.com/svn/trunk@32 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-10-29 08:56:26 +00:00
fiddlosopher
df7b682251 initial import
git-svn-id: https://pandoc.googlecode.com/svn/trunk@2 788f1e2b-df1e-0410-8736-df70ead52e1b
2006-10-17 14:22:29 +00:00