Commit graph

85 commits

Author SHA1 Message Date
fiddlosopher
5df912b162 Added optional HTML sanitization using a whitelist.
When this option is specified (--sanitize-html on the command line),
unsafe HTML tags will be replaced by HTML comments, and unsafe HTML
attributes will be removed.  This option should be especially useful
for those who want to use pandoc libraries in web applications, where
users will provide the input.

+ Main.hs:  Added --sanitize-html option.
+ Text.Pandoc.Shared:  Added stateSanitizeHTML to ParserState.
+ Text.Pandoc.Readers.HTML:
  - Added whitelists of sanitaryTags and sanitaryAttributes.
  - Added parsers to check these lists (and state) to see if a given
    tag or attribute should be counted unsafe.
  - Modified anyHtmlTag and anyHtmlEndTag to replace unsafe tags
    with comments.
  - Modified htmlAttribute to remove unsafe attributes.
  - Modified htmlScript and htmlStyle to remove these elements if
    unsafe.
  - Modified rawHtmlBlock to use anyHtmlBlockTag instead of anyHtmlTag
    and anyHtmlEndTag.  This fixes a bug in markdown parsing, where
    inline tags would be included in raw HTML blocks.
  - Modified anyHtmlBlockTag to test for (not inline) rather than
    directly for block.  This allows us to handle e.g. docbook in
    the markdown reader.
  - Minor tweaks in nonTitleNonHead  and parseTitle.
+ Text.Pandoc.Readers.Markdown:
  - In non-strict mode use rawHtmlBlocks instead of htmlBlock.
    Simplified htmlBlock, since we know it's only called in strict
    mode.
+ Modified README and man pages to document new option.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1166 788f1e2b-df1e-0410-8736-df70ead52e1b
2008-01-03 21:32:32 +00:00
fiddlosopher
48f2cc5600 Modified rules for HTML header identifiers to ensure legal identifiers.
+ Modified htmlListToIdentifier and uniqueIdentifier in HTML writer
  to ensure that identifiers begin with an alphabetic character.
+ The new rules are described in README.
+ Resolves Issue #33.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1150 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-21 19:25:54 +00:00
fiddlosopher
aea6f6802b Removed support for "box-style" block quotes in markdown.
This adds unneeded complexity and makes pandoc diverge further
than necessary from other markdown extensions.
Brought documentation, tests, and debian/changelog up to date.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1141 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-08 19:32:18 +00:00
fiddlosopher
804756dd1f Removed note about public mimetex server from README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1134 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-02 16:29:07 +00:00
fiddlosopher
d411b10438 Put math in HTML inside <span class="math">.
This way it can be distinguished from the surrounding text, e.g. put
in a different font.  Updated README accordingly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1130 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-02 02:50:41 +00:00
fiddlosopher
d1832da9e1 Added Text.Pandoc.Readers.TeXMath and changed default handling of math.
+ Text.Pandoc.Readers.TeXMath exports readTeXMath, which reads raw TeX
  math and outputs a string of pandoc inlines that tries to render it
  as far as possible, lapsing into literal TeX when needed.
+ Added Text.Pandoc.Readers.TeXMath to pandoc.cabal + ghc66 version.
+ Modified writers so that readTeXMath is used for default HTMl output
  in HTML, S5, RTF, Docbook.
+ Updated README with information about how math is rendered in all formats.
+ Updated test suite.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1129 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-02 00:36:32 +00:00
fiddlosopher
6e079a67e8 Documented new --gladtex and --mimetex options, and new treatment of TeX math.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1124 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-01 03:11:47 +00:00
fiddlosopher
b6f1ccc90b Small change to wording in README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1121 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-12-01 03:11:35 +00:00
fiddlosopher
7deee9c874 Reverted changes in r1086 (implicit section header references).
This caused too much of a performance hit.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1093 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-23 03:51:21 +00:00
fiddlosopher
f7b705b44c Implemented implicit reference-style links to section headers in markdown.
For example, if you have a header '# Supported architectures', you can
link to it with '[Supported architectures]'.  If there are multiple
headers with this label, the link will point to the first of them.
Implicit references are always overridden by explicitly specified references.
Addresses Issue #20.

+ Moved isPunctuation, uniqueIdentifiers, and inlineListToIdentifier from
  Text.Pandoc.Writers.HTML to Text.Pandoc.Shared.

+ Added stHeaders to ParserState.   This holds a list of header texts
  used in the document, and is used to construct implicit header references.

+ In Text.Pandoc.Readers.Markdown, added call to headerReference
  parser in initial parsing pass, constructing a list of section header
  labels. This is then passed to uniqueIdentifiers to produce
  identifiers, and a list of implicit references is constructed. This is
  added to the end of the explicitly specified references, so it will be
  overridden by explicitly specified references. All of this processing
  is skipped if --strict was specified.

+ Modified documentation in README.



git-svn-id: https://pandoc.googlecode.com/svn/trunk@1086 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-22 17:14:21 +00:00
fiddlosopher
506bf38bcb Updated documentation to reflect the fact that LaTeX and ConTeXt writers
now wrap text by default.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1074 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-11-15 03:23:02 +00:00
fiddlosopher
447b99e35d '--no-wrap' option now prevents the addition of structural whitespace
in HTML output, minimizing the file size.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@1053 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-10-18 15:36:51 +00:00
fiddlosopher
7a32ad72e3 Documented '--no-wrap' option in README and man pages.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1035 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-09-27 01:28:28 +00:00
fiddlosopher
d98dcfbb94 Minor formatting change in README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@895 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-25 18:04:17 +00:00
fiddlosopher
f11360f50e Added new rule for enhanced markdown ordered lists: if the list marker
is a capital letter followed by a period (including a single-letter
capital roman numeral), then it must be followed by at least two spaces.
The point of this is to avoid accidentally treating people's initials as
list markers: a paragraph may begin:

    B. Russell was an English philosopher.

and this shouldn't be treated as a list.

Modified Markdown reader and README documentation.
Added a test case.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@880 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-23 04:25:09 +00:00
fiddlosopher
e775273011 Changed date on README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@856 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15 23:49:25 +00:00
fiddlosopher
3d83624e22 Documented fix for paragraphs starting with (C)
in README.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@848 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15 17:34:12 +00:00
fiddlosopher
8dc4e67400 Changed (C) to a unicode copyright symbol.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@843 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-15 17:27:46 +00:00
fiddlosopher
e814a3f6d2 Major change in the way ordered lists are handled:
+ The changes are documented in README, under Lists.
+ The OrderedList block element now stores information
  about list number style, list number delimiter, and
  starting number.
+ The readers parse this information, when possible.
+ The writers use this information to style ordered
  lists.
+ Test suites have been changed accordingly.

Motivation:  It's often useful to start lists with
numbers other than 1, and to have control over the
style of the list.

Added to Text.Pandoc.Shared:
+ camelCaseToHyphenated
+ toRomanNumeral
+ anyOrderedListMarker
+ orderedListMarker
+ orderedListMarkers

Added to Text.Pandoc.ParserCombinators:
+ charsInBalanced'
+ withHorizDisplacement
+ romanNumeral

RST writer:
+ Force blank line before lists, so that sublists will be handled
  correctly.

LaTeX reader:
+ Fixed bug in parsing of footnotes containing multiple paragraphs,
  introduced by use of charsInBalanced.  Fix: use charsInBalanced'
  instead.

LaTeX header:
+ use mathletters option in ucs package, so that basic unicode Greek
  letters will work properly.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@834 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-08-08 02:43:15 +00:00
fiddlosopher
1e4f05d2bd Removed references to examplep package in documentation, and
removed suggest of latex-texlive-extras in debian/control,
since we're not using examplep.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@830 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28 19:16:58 +00:00
fiddlosopher
0ae4a1081b Changed [URL] to [url] in description of --asciimathml option.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@822 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28 17:13:25 +00:00
fiddlosopher
d488dd0f66 Reinstated dependence on fancyvrb. It is compatible with examplep.
fancyvrb is needed for verbatim environments in footnotes.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@808 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28 01:40:48 +00:00
fiddlosopher
b29f221cba Changed LaTeX writer to use the examplep package instead
of fancyvrb. examplep allows verbatim text in places where
fancyvrb does not, e.g. definition list terms, and provides
for line-breaking of verbatim text.
+ examplep code put in LaTeX header instead of being dynamically
  included, since it is frequently used, and people may want to
  customize the options.
+ documented dependency on examplep
+ added texlive-latex-extra as a "Suggested" package in debian/control
+ use examplep's \Q{} is now used instead of \verb:  note that 
  \Q requires backslash- escaping symbols in its scope.
+ modified README so that the verbatim sections will look good at
  shorter line lengths.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@807 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-28 01:10:04 +00:00
fiddlosopher
622606bae9 Updated documentation on ASCIIMathML.js.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@800 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-26 02:40:18 +00:00
fiddlosopher
dccc63fda4 Copyright date change - README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@797 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-24 01:07:39 +00:00
fiddlosopher
76001db2c6 README: Use definition list for command-line options.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@796 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-24 01:04:19 +00:00
fiddlosopher
3b60ce318b README: Added missing ~ after '~a\ cat' in subscript example.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@794 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-24 00:16:53 +00:00
fiddlosopher
aaee6816b4 Added quotes around attribute in ASCIIMathML link example
(in README).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@788 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-23 22:31:05 +00:00
fiddlosopher
9a410e1635 README: Removed the statement that the RST reader doesn't parse
definition lists.
HTML reader:  Added failIfStrict to the definitionList parser, so
definition lists will be passed through as raw HTML if --strict
specified.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@783 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-23 01:41:37 +00:00
fiddlosopher
fbf7bba8af Clarified role of --strict option when input is HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@773 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-22 20:15:19 +00:00
fiddlosopher
86453926b6 Documented fact that --strict option has a role even when
input format is not markdown (in README).


git-svn-id: https://pandoc.googlecode.com/svn/trunk@749 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-21 21:42:03 +00:00
fiddlosopher
1a90879f8b Use capital letters for title in sample man page title block.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@746 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-21 20:33:26 +00:00
fiddlosopher
2f7a38e1ab Changed system for indicating man page title, section,
header and footer.  Documented in README.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@745 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-21 20:30:40 +00:00
fiddlosopher
2df03311c3 README changes:
+ Documented superscript, subscript, and strikeout syntax
+ Modified description of LaTeX packages needed for markdown2pdf


git-svn-id: https://pandoc.googlecode.com/svn/trunk@743 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-21 19:10:28 +00:00
fiddlosopher
676b1ab149 Documented ConTeXt writer in README. Removed statement
that table output is limited to HTML and LaTeX writers,
since it is now supported in all writers.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@724 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-15 03:24:48 +00:00
fiddlosopher
532dd43139 Changed title in README to "Pandoc User's Guide."
git-svn-id: https://pandoc.googlecode.com/svn/trunk@698 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-14 06:27:34 +00:00
fiddlosopher
7f5a554989 Added note in README about how you might want to link to
an external ASCIIMathML.js script instead of including it in
the generated HTML file using -m.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@694 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-13 07:12:23 +00:00
fiddlosopher
34d875aefd Slightly larger table of header identifiers, so stuff doesn't wrap on LaTeX
output.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@681 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-12 05:07:08 +00:00
fiddlosopher
62227fe259 README: Documented scheme for header identifiers in HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@680 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-12 04:33:15 +00:00
fiddlosopher
389c762afc Documented --toc/--table-of-contents option in pandoc man
page and README.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@679 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-12 03:45:00 +00:00
fiddlosopher
b106e51120 README: Documented man page writer, special title-line
conventions for man pages.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@670 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-10 06:19:49 +00:00
fiddlosopher
f8ce411ea1 README: Documented the fact that if pandoc is called
as 'hsmarkdown', it runs in strict markdown compatibility
mode.  This can be achieved using a symbolic link.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@627 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-06 16:30:32 +00:00
fiddlosopher
242ee99e8d Minor wording changes in README.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@626 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-07-06 16:21:38 +00:00
fiddlosopher
cf081435ff Changed definition list syntax in markdown reader and simplified
the parsing code. A colon is now required before every block in a
definition. This fixes a problem with the old syntax, in which the last
block in the following was ambiguous between a regular paragraph in the
definition and a code block following the definition list:

term
:   definition

    is this code or more definition?



git-svn-id: https://pandoc.googlecode.com/svn/trunk@589 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-05-03 14:42:40 +00:00
fiddlosopher
23df0ed176 Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required.  This rewrite yields a small speed
boost and considerably cleaner code. 

* Text/Pandoc/Definition.hs:
  + blocks:  removed Key and Note
  + inlines:  removed NoteRef, added Note
  + modified Target:  there is no longer a 'Ref' target; all targets
    are explicit URL, title pairs

* Text/Pandoc/Shared.hs:

  + Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
    used in some of the readers.
  + Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
    'replaceRefLinksBlockList', along with some auxiliary functions
    used only by them.  These are no longer needed, since
    reference links are resolved in the Markdown and RST readers.
  + Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
    to the Docbook writer, since that is now the only module that uses
    them.
  + Changed name of 'escapeSGMLString' to 'escapeStringForXML'
  + Added KeyTable and NoteTable types
  + Removed fields from ParserState;  'stateKeyBlocks', 'stateKeysUsed',
    'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'. 
    Added 'stateKeys' and 'stateNotes'.
  + Added clause for Note to 'prettyBlock'.
  + Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.

* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
  'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'

* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
  line block up to and including following blank lines.

* Main.hs:  Replaced --inline-links with --reference-links.

* README: 
  + Documented --reference-links and removed description of --inline-links.
  + Added note that footnotes may occur anywhere in the document, but must
    be at the outer level, not embedded in block elements.
  
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
  option, added --reference-links option

* Markdown and RST readers:
  + Rewrote to fit new Pandoc definition.  Since there are no longer
    Note or Key blocks, all note and key blocks are parsed on a first pass
    through the document.  Once tables of notes and keys have been constructed,
    the remaining parts of the document are reassembled and parsed.
  + Refactored link parsers.

* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
  there are no longer Note or Key blocks, notes and references can be
  parsed in a single pass through the document.

* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
  and definition. State is used to hold lists of references footnotes to
  and be printed at the end of the document.

* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
  of the different treatment of footnotes, the "notes" parameter is no
  longer needed in the block and inline conversion functions.)

* Docbook writer:
  + Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
    'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
    they are now used only by the Docbook writer.
  + Rewrote using new Pandoc definition.  (Because of the different
    treatment of footnotes, the "notes" parameter is no longer needed
    in the block and inline conversion functions.)

* Updated test suite

* Throughout:  old haskell98 module names replaced by hierarchical module
  names, e.g. List by Data.List.

* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
  in "Build-Depends."

* cabalize: 
  + Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
    module names are being used throughout)
  + Added mtl to BASE_DEPENDS (needed for state monad)
  + Removed html from GHC66_DEPENDS (not needed since xhtml is now used)



git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
fiddlosopher
44cec96e61 New syntax documentation for definition lists. Now we
require a ':' at the beginning of the definition; otherwise,
too many false positives for definition lists. 


git-svn-id: https://pandoc.googlecode.com/svn/trunk@570 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-11 07:53:21 +00:00
fiddlosopher
d277baebe4 Added documentation for definition lists.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@567 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-10 20:43:59 +00:00
fiddlosopher
d5f9b1dbb4 Change in ordered lists in Markdown reader:
+ Lists may begin with lowercase letters only, and only 'a' through
    'n'. Otherwise first initials and page references (e.g., p. 400)
    are too easily parsed as lists.
  + Numbers beginning list items must end with '.' (not ')', which is
    now allowed only after letters).
NOTE:  This change may cause documents to be parsed differently.
Users should take care in upgrading.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@561 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-09 02:37:49 +00:00
fiddlosopher
31c030e3a5 Added --inline-links option to force links in HTML to be parsed
as inline links, rather than reference links.  (Addresses Issue
#4.)


git-svn-id: https://pandoc.googlecode.com/svn/trunk@554 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-03-03 18:19:31 +00:00
fiddlosopher
60989d0637 Added support for tables in markdown reader and in LaTeX,
DocBook, and HTML writers.  The syntax is documented in
README.  Tests have been added to the test suite.


git-svn-id: https://pandoc.googlecode.com/svn/trunk@493 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-15 19:52:42 +00:00