2006-12-20 20:54:23 +00:00
|
|
|
{-
|
2007-07-07 22:51:55 +00:00
|
|
|
Copyright (C) 2006-7 John MacFarlane <jgm@berkeley.edu>
|
2006-12-20 20:54:23 +00:00
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation; either version 2 of the License, or
|
|
|
|
(at your option) any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
|
|
|
-}
|
|
|
|
|
2006-12-20 06:50:14 +00:00
|
|
|
{- |
|
|
|
|
Module : Text.Pandoc.Readers.Markdown
|
2007-07-07 22:51:55 +00:00
|
|
|
Copyright : Copyright (C) 2006-7 John MacFarlane
|
2006-12-20 06:50:14 +00:00
|
|
|
License : GNU GPL, version 2 or above
|
|
|
|
|
2007-07-07 22:51:55 +00:00
|
|
|
Maintainer : John MacFarlane <jgm@berkeley.edu>
|
2006-12-20 20:20:10 +00:00
|
|
|
Stability : alpha
|
2006-12-20 06:50:14 +00:00
|
|
|
Portability : portable
|
|
|
|
|
|
|
|
Conversion of markdown-formatted plain text to 'Pandoc' document.
|
|
|
|
-}
|
2006-10-17 14:22:29 +00:00
|
|
|
module Text.Pandoc.Readers.Markdown (
|
|
|
|
readMarkdown
|
|
|
|
) where
|
|
|
|
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
import Data.List ( findIndex, sortBy, transpose, isSuffixOf, intersect, lookup )
|
2007-03-07 20:53:37 +00:00
|
|
|
import Data.Char ( isAlphaNum )
|
2007-07-09 03:39:25 +00:00
|
|
|
import Text.Pandoc.ParserCombinators
|
2006-10-17 14:22:29 +00:00
|
|
|
import Text.Pandoc.Definition
|
|
|
|
import Text.Pandoc.Readers.LaTeX ( rawLaTeXInline, rawLaTeXEnvironment )
|
|
|
|
import Text.Pandoc.Shared
|
2006-12-30 22:51:49 +00:00
|
|
|
import Text.Pandoc.Readers.HTML ( rawHtmlBlock,
|
|
|
|
anyHtmlBlockTag, anyHtmlInlineTag,
|
|
|
|
anyHtmlTag, anyHtmlEndTag,
|
|
|
|
htmlEndTag, extractTagType,
|
|
|
|
htmlBlockElement )
|
2007-01-28 00:04:43 +00:00
|
|
|
import Text.Pandoc.Entities ( characterEntity, decodeEntities )
|
2006-10-17 14:22:29 +00:00
|
|
|
import Text.ParserCombinators.Parsec
|
|
|
|
|
|
|
|
-- | Read markdown from an input string and return a Pandoc document.
|
|
|
|
readMarkdown :: ParserState -> String -> Pandoc
|
2007-04-22 04:38:05 +00:00
|
|
|
readMarkdown state str = (readWith parseMarkdown) state (str ++ "\n\n")
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- | Parse markdown string with default options and print result (for testing).
|
|
|
|
testString :: String -> IO ()
|
|
|
|
testString = testStringWith parseMarkdown
|
|
|
|
|
|
|
|
--
|
|
|
|
-- Constants and data structure definitions
|
|
|
|
--
|
|
|
|
|
|
|
|
spaceChars = " \t"
|
|
|
|
bulletListMarkers = "*+-"
|
|
|
|
hruleChars = "*-_"
|
|
|
|
titleOpeners = "\"'("
|
2007-07-21 22:52:07 +00:00
|
|
|
setextHChars = "=-"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- treat these as potentially non-text when parsing inline:
|
2007-07-21 22:52:07 +00:00
|
|
|
specialChars = "\\[]*_~`<>$!^-.&'\""
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- auxiliary functions
|
|
|
|
--
|
|
|
|
|
|
|
|
-- | Skip a single endline if there is one.
|
|
|
|
skipEndline = option Space endline
|
|
|
|
|
|
|
|
indentSpaces = do
|
|
|
|
state <- getState
|
|
|
|
let tabStop = stateTabStop state
|
2007-03-11 07:56:29 +00:00
|
|
|
try (count tabStop (char ' ')) <|>
|
|
|
|
(do{many (char ' '); string "\t"}) <?> "indentation"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-01-15 19:52:42 +00:00
|
|
|
nonindentSpaces = do
|
2006-10-17 14:22:29 +00:00
|
|
|
state <- getState
|
|
|
|
let tabStop = stateTabStop state
|
|
|
|
choice (map (\n -> (try (count n (char ' ')))) (reverse [0..(tabStop - 1)]))
|
|
|
|
|
2006-12-30 22:51:49 +00:00
|
|
|
-- | Fail unless we're at beginning of a line.
|
|
|
|
failUnlessBeginningOfLine = do
|
|
|
|
pos <- getPosition
|
|
|
|
if sourceColumn pos == 1 then return () else fail "not beginning of line"
|
|
|
|
|
2007-01-06 09:54:58 +00:00
|
|
|
-- | Fail unless we're in "smart typography" mode.
|
|
|
|
failUnlessSmart = do
|
|
|
|
state <- getState
|
|
|
|
if stateSmart state then return () else fail "Smart typography feature"
|
|
|
|
|
2007-07-15 23:53:22 +00:00
|
|
|
-- | Parse a sequence of inline elements between a string
|
|
|
|
-- @opener@ and a string @closer@, including inlines
|
|
|
|
-- between balanced pairs of @opener@ and a @closer@.
|
|
|
|
inlinesInBalanced :: String -> String -> GenParser Char ParserState [Inline]
|
|
|
|
inlinesInBalanced opener closer = try $ do
|
2007-07-16 07:22:17 +00:00
|
|
|
let openerSymbol = try $ do
|
|
|
|
res <- inline
|
2007-07-15 23:53:22 +00:00
|
|
|
if res == Str opener
|
2007-07-16 07:22:17 +00:00
|
|
|
then return res
|
|
|
|
else pzero
|
2007-07-15 23:53:22 +00:00
|
|
|
try (string opener)
|
2007-07-16 07:22:17 +00:00
|
|
|
result <- manyTill ( (do lookAhead openerSymbol
|
2007-07-15 23:53:22 +00:00
|
|
|
bal <- inlinesInBalanced opener closer
|
|
|
|
return $ [Str opener] ++ bal ++ [Str closer])
|
|
|
|
<|> (count 1 inline))
|
|
|
|
(try (string closer))
|
|
|
|
return $ concat result
|
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
--
|
|
|
|
-- document structure
|
|
|
|
--
|
|
|
|
|
|
|
|
titleLine = try (do
|
|
|
|
char '%'
|
|
|
|
skipSpaces
|
|
|
|
line <- manyTill inline newline
|
|
|
|
return line)
|
|
|
|
|
|
|
|
authorsLine = try (do
|
|
|
|
char '%'
|
|
|
|
skipSpaces
|
|
|
|
authors <- sepEndBy (many1 (noneOf ",;\n")) (oneOf ",;")
|
|
|
|
newline
|
2007-01-28 00:04:43 +00:00
|
|
|
return (map (decodeEntities . removeLeadingTrailingSpace) authors))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
dateLine = try (do
|
|
|
|
char '%'
|
|
|
|
skipSpaces
|
|
|
|
date <- many (noneOf "\n")
|
|
|
|
newline
|
2007-01-28 00:04:43 +00:00
|
|
|
return (decodeEntities $ removeTrailingSpace date))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
titleBlock = try (do
|
2006-12-30 22:51:49 +00:00
|
|
|
failIfStrict
|
2006-10-17 14:22:29 +00:00
|
|
|
title <- option [] titleLine
|
|
|
|
author <- option [] authorsLine
|
|
|
|
date <- option "" dateLine
|
|
|
|
option "" blanklines
|
|
|
|
return (title, author, date))
|
|
|
|
|
|
|
|
parseMarkdown = do
|
2007-07-06 06:46:31 +00:00
|
|
|
updateState (\state -> state { stateParseRaw = True }) -- markdown allows raw HTML
|
2006-10-17 14:22:29 +00:00
|
|
|
(title, author, date) <- option ([],[],"") titleBlock
|
2006-12-31 17:34:06 +00:00
|
|
|
-- go through once just to get list of reference keys
|
2007-07-06 06:46:31 +00:00
|
|
|
refs <- manyTill (referenceKey <|> (do l <- lineClump
|
|
|
|
return (LineClump l))) eof
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
let keys = map (\(KeyBlock label target) -> (label, target)) $
|
|
|
|
filter isKeyBlock refs
|
2007-07-06 06:46:31 +00:00
|
|
|
let rawlines = map (\(LineClump ln) -> ln) $ filter isLineClump refs
|
|
|
|
setInput $ concat rawlines -- with keys stripped out
|
|
|
|
updateState (\state -> state { stateKeys = keys })
|
|
|
|
-- now go through for notes
|
|
|
|
refs <- manyTill (noteBlock <|> (do l <- lineClump
|
|
|
|
return (LineClump l))) eof
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
let notes = map (\(NoteBlock label blocks) -> (label, blocks)) $
|
|
|
|
filter isNoteBlock refs
|
|
|
|
let rawlines = map (\(LineClump ln) -> ln) $ filter isLineClump refs
|
|
|
|
setInput $ concat rawlines -- with note blocks and keys stripped out
|
2007-07-06 06:46:31 +00:00
|
|
|
updateState (\state -> state { stateNotes = notes })
|
2006-12-30 22:51:49 +00:00
|
|
|
blocks <- parseBlocks -- go through again, for real
|
2006-12-19 23:13:03 +00:00
|
|
|
let blocks' = filter (/= Null) blocks
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
return (Pandoc (Meta title author date) blocks')
|
|
|
|
|
|
|
|
--
|
|
|
|
-- initial pass for references
|
|
|
|
--
|
|
|
|
|
|
|
|
referenceKey = try $ do
|
|
|
|
nonindentSpaces
|
|
|
|
label <- reference
|
2007-07-21 22:52:07 +00:00
|
|
|
char ':'
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
skipSpaces
|
2007-07-21 22:52:07 +00:00
|
|
|
option ' ' (char '<')
|
|
|
|
src <- many (noneOf "> \n\t")
|
|
|
|
option ' ' (char '>')
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
tit <- option "" title
|
|
|
|
blanklines
|
|
|
|
return $ KeyBlock label (removeTrailingSpace src, tit)
|
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
noteMarker = try $ do
|
|
|
|
char '['
|
|
|
|
char '^'
|
|
|
|
manyTill (noneOf " \t\n") (char ']')
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
rawLine = try $ do
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
notFollowedBy' blankline
|
|
|
|
notFollowedBy' noteMarker
|
|
|
|
contents <- many1 nonEndline
|
|
|
|
end <- option "" (do
|
|
|
|
newline
|
|
|
|
option "" (try indentSpaces)
|
|
|
|
return "\n")
|
2007-07-21 22:52:07 +00:00
|
|
|
return (contents ++ end)
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
|
|
|
|
rawLines = do
|
|
|
|
lines <- many1 rawLine
|
|
|
|
return (concat lines)
|
|
|
|
|
|
|
|
noteBlock = try $ do
|
|
|
|
failIfStrict
|
|
|
|
ref <- noteMarker
|
|
|
|
char ':'
|
|
|
|
option ' ' (try blankline)
|
|
|
|
option "" (try indentSpaces)
|
|
|
|
raw <- sepBy rawLines (try (do {blankline; indentSpaces}))
|
|
|
|
option "" blanklines
|
|
|
|
-- parse the extracted text, which may contain various block elements:
|
|
|
|
rest <- getInput
|
|
|
|
setInput $ (joinWithSep "\n" raw) ++ "\n\n"
|
|
|
|
contents <- parseBlocks
|
|
|
|
setInput rest
|
|
|
|
return (NoteBlock ref contents)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- parsing blocks
|
|
|
|
--
|
|
|
|
|
2006-12-31 16:46:48 +00:00
|
|
|
parseBlocks = manyTill block eof
|
2006-10-17 14:22:29 +00:00
|
|
|
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
block = choice [ header
|
|
|
|
, table
|
|
|
|
, codeBlock
|
|
|
|
, hrule
|
|
|
|
, list
|
|
|
|
, blockQuote
|
|
|
|
, htmlBlock
|
|
|
|
, rawLaTeXEnvironment'
|
|
|
|
, para
|
|
|
|
, plain
|
|
|
|
, nullBlock ] <?> "block"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- header blocks
|
|
|
|
--
|
|
|
|
|
|
|
|
header = choice [ setextHeader, atxHeader ] <?> "header"
|
|
|
|
|
|
|
|
atxHeader = try (do
|
2007-07-21 22:52:07 +00:00
|
|
|
lead <- many1 (char '#')
|
2006-10-17 14:22:29 +00:00
|
|
|
skipSpaces
|
2006-11-26 07:01:37 +00:00
|
|
|
txt <- manyTill inline atxClosing
|
2006-10-17 14:22:29 +00:00
|
|
|
return (Header (length lead) (normalizeSpaces txt)))
|
|
|
|
|
|
|
|
atxClosing = try (do
|
2007-07-21 22:52:07 +00:00
|
|
|
skipMany (char '#')
|
2006-10-17 14:22:29 +00:00
|
|
|
skipSpaces
|
|
|
|
newline
|
|
|
|
option "" blanklines)
|
|
|
|
|
2006-12-20 06:50:14 +00:00
|
|
|
setextHeader = choice $
|
|
|
|
map (\x -> setextH x) (enumFromTo 1 (length setextHChars))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
setextH n = try (do
|
2006-12-30 22:51:49 +00:00
|
|
|
txt <- many1Till inline newline
|
2006-12-20 06:50:14 +00:00
|
|
|
many1 (char (setextHChars !! (n-1)))
|
|
|
|
skipSpaces
|
|
|
|
newline
|
|
|
|
option "" blanklines
|
|
|
|
return (Header n (normalizeSpaces txt)))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- hrule block
|
|
|
|
--
|
|
|
|
|
2006-12-20 06:50:14 +00:00
|
|
|
hruleWith chr = try (do
|
|
|
|
skipSpaces
|
|
|
|
char chr
|
|
|
|
skipSpaces
|
|
|
|
char chr
|
|
|
|
skipSpaces
|
|
|
|
char chr
|
|
|
|
skipMany (oneOf (chr:spaceChars))
|
|
|
|
newline
|
|
|
|
option "" blanklines
|
|
|
|
return HorizontalRule)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
hrule = choice (map hruleWith hruleChars) <?> "hrule"
|
|
|
|
|
|
|
|
--
|
|
|
|
-- code blocks
|
|
|
|
--
|
|
|
|
|
|
|
|
indentedLine = try (do
|
2006-12-20 06:50:14 +00:00
|
|
|
indentSpaces
|
|
|
|
result <- manyTill anyChar newline
|
|
|
|
return (result ++ "\n"))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- two or more indented lines, possibly separated by blank lines
|
|
|
|
indentedBlock = try (do
|
|
|
|
res1 <- indentedLine
|
|
|
|
blanks <- many blankline
|
|
|
|
res2 <- choice [indentedBlock, indentedLine]
|
|
|
|
return (res1 ++ blanks ++ res2))
|
|
|
|
|
|
|
|
codeBlock = do
|
2006-12-20 06:50:14 +00:00
|
|
|
result <- choice [indentedBlock, indentedLine]
|
|
|
|
option "" blanklines
|
|
|
|
return (CodeBlock (stripTrailingNewlines result))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- block quotes
|
|
|
|
--
|
|
|
|
|
|
|
|
emacsBoxQuote = try (do
|
2006-12-30 22:51:49 +00:00
|
|
|
failIfStrict
|
2006-12-20 06:50:14 +00:00
|
|
|
string ",----"
|
|
|
|
manyTill anyChar newline
|
|
|
|
raw <- manyTill (try (do
|
|
|
|
char '|'
|
|
|
|
option ' ' (char ' ')
|
|
|
|
result <- manyTill anyChar newline
|
|
|
|
return result))
|
|
|
|
(string "`----")
|
|
|
|
manyTill anyChar newline
|
|
|
|
option "" blanklines
|
|
|
|
return raw)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
emailBlockQuoteStart = try (do
|
2007-01-15 19:52:42 +00:00
|
|
|
nonindentSpaces
|
2007-07-21 22:52:07 +00:00
|
|
|
char '>'
|
2006-10-17 14:22:29 +00:00
|
|
|
option ' ' (char ' ')
|
|
|
|
return "> ")
|
|
|
|
|
|
|
|
emailBlockQuote = try (do
|
2006-12-20 06:50:14 +00:00
|
|
|
emailBlockQuoteStart
|
|
|
|
raw <- sepBy (many (choice [nonEndline,
|
|
|
|
(try (do
|
|
|
|
endline
|
|
|
|
notFollowedBy' emailBlockQuoteStart
|
|
|
|
return '\n'))]))
|
|
|
|
(try (do {newline; emailBlockQuoteStart}))
|
|
|
|
newline <|> (do{ eof; return '\n' })
|
|
|
|
option "" blanklines
|
|
|
|
return raw)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
blockQuote = do
|
2006-12-20 06:50:14 +00:00
|
|
|
raw <- choice [ emailBlockQuote, emacsBoxQuote ]
|
|
|
|
-- parse the extracted block, which may contain various block elements:
|
2006-12-21 09:02:06 +00:00
|
|
|
rest <- getInput
|
|
|
|
setInput $ (joinWithSep "\n" raw) ++ "\n\n"
|
|
|
|
contents <- parseBlocks
|
|
|
|
setInput rest
|
|
|
|
return (BlockQuote contents)
|
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
--
|
|
|
|
-- list blocks
|
|
|
|
--
|
|
|
|
|
2007-03-11 07:56:29 +00:00
|
|
|
list = choice [ bulletList, orderedList, definitionList ] <?> "list"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2006-12-20 06:50:14 +00:00
|
|
|
bulletListStart = try (do
|
|
|
|
option ' ' newline -- if preceded by a Plain block in a list context
|
2007-01-15 19:52:42 +00:00
|
|
|
nonindentSpaces
|
2006-12-20 06:50:14 +00:00
|
|
|
notFollowedBy' hrule -- because hrules start out just like lists
|
|
|
|
oneOf bulletListMarkers
|
|
|
|
spaceChar
|
|
|
|
skipSpaces)
|
|
|
|
|
2007-03-16 06:05:43 +00:00
|
|
|
standardOrderedListStart = try (do
|
2007-03-09 02:37:49 +00:00
|
|
|
many1 digit
|
2007-03-16 06:05:43 +00:00
|
|
|
char '.')
|
2007-03-09 02:37:49 +00:00
|
|
|
|
2007-03-16 06:05:43 +00:00
|
|
|
extendedOrderedListStart = try (do
|
2007-03-09 02:37:49 +00:00
|
|
|
failIfStrict
|
|
|
|
oneOf ['a'..'n']
|
2007-03-16 06:05:43 +00:00
|
|
|
oneOf ".)")
|
2007-03-09 02:37:49 +00:00
|
|
|
|
|
|
|
orderedListStart = try $ do
|
2006-12-20 06:50:14 +00:00
|
|
|
option ' ' newline -- if preceded by a Plain block in a list context
|
2007-01-15 19:52:42 +00:00
|
|
|
nonindentSpaces
|
2007-03-09 02:37:49 +00:00
|
|
|
standardOrderedListStart <|> extendedOrderedListStart
|
2006-12-20 06:50:14 +00:00
|
|
|
oneOf spaceChars
|
2007-03-09 02:37:49 +00:00
|
|
|
skipSpaces
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- parse a line of a list item (start = parser for beginning of list item)
|
|
|
|
listLine start = try (do
|
|
|
|
notFollowedBy' start
|
|
|
|
notFollowedBy blankline
|
2006-12-20 06:50:14 +00:00
|
|
|
notFollowedBy' (do
|
|
|
|
indentSpaces
|
|
|
|
many (spaceChar)
|
|
|
|
choice [bulletListStart, orderedListStart])
|
2006-10-17 14:22:29 +00:00
|
|
|
line <- manyTill anyChar newline
|
|
|
|
return (line ++ "\n"))
|
|
|
|
|
|
|
|
-- parse raw text for one list item, excluding start marker and continuations
|
2006-12-20 06:50:14 +00:00
|
|
|
rawListItem start = try (do
|
|
|
|
start
|
|
|
|
result <- many1 (listLine start)
|
|
|
|
blanks <- many blankline
|
|
|
|
return ((concat result) ++ blanks))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- continuation of a list item - indented and separated by blankline
|
|
|
|
-- or (in compact lists) endline.
|
|
|
|
-- note: nested lists are parsed as continuations
|
2006-12-20 06:50:14 +00:00
|
|
|
listContinuation start = try (do
|
2007-02-15 01:10:15 +00:00
|
|
|
lookAhead indentSpaces
|
2006-12-20 06:50:14 +00:00
|
|
|
result <- many1 (listContinuationLine start)
|
|
|
|
blanks <- many blankline
|
|
|
|
return ((concat result) ++ blanks))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
listContinuationLine start = try (do
|
2006-12-20 06:50:14 +00:00
|
|
|
notFollowedBy' blankline
|
|
|
|
notFollowedBy' start
|
2007-03-10 17:48:16 +00:00
|
|
|
option "" (try indentSpaces)
|
2006-12-20 06:50:14 +00:00
|
|
|
result <- manyTill anyChar newline
|
|
|
|
return (result ++ "\n"))
|
|
|
|
|
|
|
|
listItem start = try (do
|
|
|
|
first <- rawListItem start
|
2006-12-21 09:02:06 +00:00
|
|
|
continuations <- many (listContinuation start)
|
2006-12-20 06:50:14 +00:00
|
|
|
-- parsing with ListItemState forces markers at beginning of lines to
|
|
|
|
-- count as list item markers, even if not separated by blank space.
|
|
|
|
-- see definition of "endline"
|
|
|
|
state <- getState
|
2006-12-21 09:02:06 +00:00
|
|
|
let oldContext = stateParserContext state
|
|
|
|
setState $ state {stateParserContext = ListItemState}
|
|
|
|
-- parse the extracted block, which may contain various block elements:
|
|
|
|
rest <- getInput
|
|
|
|
let raw = concat (first:continuations)
|
2007-03-10 17:48:16 +00:00
|
|
|
setInput raw
|
2006-12-21 09:02:06 +00:00
|
|
|
contents <- parseBlocks
|
|
|
|
setInput rest
|
|
|
|
updateState (\st -> st {stateParserContext = oldContext})
|
|
|
|
return contents)
|
2006-12-20 06:50:14 +00:00
|
|
|
|
|
|
|
orderedList = try (do
|
|
|
|
items <- many1 (listItem orderedListStart)
|
|
|
|
let items' = compactify items
|
|
|
|
return (OrderedList items'))
|
|
|
|
|
|
|
|
bulletList = try (do
|
|
|
|
items <- many1 (listItem bulletListStart)
|
|
|
|
let items' = compactify items
|
|
|
|
return (BulletList items'))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-03-10 17:48:16 +00:00
|
|
|
-- definition lists
|
|
|
|
|
|
|
|
definitionListItem = try $ do
|
|
|
|
notFollowedBy blankline
|
|
|
|
notFollowedBy' indentSpaces
|
|
|
|
term <- manyTill inline newline
|
2007-05-03 14:42:40 +00:00
|
|
|
raw <- many1 defRawBlock
|
2007-03-10 17:48:16 +00:00
|
|
|
state <- getState
|
|
|
|
let oldContext = stateParserContext state
|
|
|
|
-- parse the extracted block, which may contain various block elements:
|
|
|
|
rest <- getInput
|
2007-05-03 14:42:40 +00:00
|
|
|
setInput (concat raw)
|
2007-03-10 17:48:16 +00:00
|
|
|
contents <- parseBlocks
|
|
|
|
setInput rest
|
|
|
|
updateState (\st -> st {stateParserContext = oldContext})
|
|
|
|
return ((normalizeSpaces term), contents)
|
|
|
|
|
|
|
|
defRawBlock = try $ do
|
2007-05-03 14:42:40 +00:00
|
|
|
char ':'
|
|
|
|
state <- getState
|
|
|
|
let tabStop = stateTabStop state
|
|
|
|
try (count (tabStop - 1) (char ' ')) <|> (do{many (char ' '); string "\t"})
|
|
|
|
firstline <- anyLine
|
|
|
|
rawlines <- many (do {notFollowedBy' blankline; indentSpaces; anyLine})
|
2007-03-10 20:45:19 +00:00
|
|
|
trailing <- option "" blanklines
|
2007-05-03 14:42:40 +00:00
|
|
|
return $ firstline ++ "\n" ++ unlines rawlines ++ trailing
|
2007-03-10 17:48:16 +00:00
|
|
|
|
|
|
|
definitionList = do
|
2007-03-11 07:56:29 +00:00
|
|
|
failIfStrict
|
2007-03-10 17:48:16 +00:00
|
|
|
items <- many1 definitionListItem
|
2007-03-10 20:45:19 +00:00
|
|
|
let (terms, defs) = unzip items
|
|
|
|
let defs' = compactify defs
|
|
|
|
let items' = zip terms defs'
|
|
|
|
return $ DefinitionList items'
|
2007-03-10 17:48:16 +00:00
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
--
|
|
|
|
-- paragraph block
|
|
|
|
--
|
|
|
|
|
|
|
|
para = try (do
|
|
|
|
result <- many1 inline
|
|
|
|
newline
|
2006-12-30 22:51:49 +00:00
|
|
|
st <- getState
|
|
|
|
if stateStrict st
|
2007-02-15 01:10:15 +00:00
|
|
|
then choice [lookAhead blockQuote, lookAhead header,
|
|
|
|
(do{blanklines; return Null})]
|
|
|
|
else choice [(do{lookAhead emacsBoxQuote; return Null}),
|
|
|
|
(do{blanklines; return Null})]
|
2006-10-17 14:22:29 +00:00
|
|
|
let result' = normalizeSpaces result
|
|
|
|
return (Para result'))
|
|
|
|
|
|
|
|
plain = do
|
|
|
|
result <- many1 inline
|
|
|
|
let result' = normalizeSpaces result
|
|
|
|
return (Plain result')
|
|
|
|
|
|
|
|
--
|
|
|
|
-- raw html
|
|
|
|
--
|
|
|
|
|
2006-12-30 22:51:49 +00:00
|
|
|
htmlElement = choice [strictHtmlBlock,
|
|
|
|
htmlBlockElement] <?> "html element"
|
|
|
|
|
|
|
|
htmlBlock = do
|
|
|
|
st <- getState
|
|
|
|
if stateStrict st
|
|
|
|
then do
|
|
|
|
failUnlessBeginningOfLine
|
|
|
|
first <- htmlElement
|
|
|
|
finalSpace <- many (oneOf spaceChars)
|
|
|
|
finalNewlines <- many newline
|
|
|
|
return (RawHtml (first ++ finalSpace ++ finalNewlines))
|
|
|
|
else rawHtmlBlocks
|
|
|
|
|
|
|
|
-- True if tag is self-closing
|
2007-01-24 19:44:43 +00:00
|
|
|
isSelfClosing tag =
|
|
|
|
isSuffixOf "/>" $ filter (\c -> (not (c `elem` " \n\t"))) tag
|
2006-12-30 22:51:49 +00:00
|
|
|
|
|
|
|
strictHtmlBlock = try (do
|
|
|
|
tag <- anyHtmlBlockTag
|
|
|
|
let tag' = extractTagType tag
|
2007-01-24 19:44:43 +00:00
|
|
|
if isSelfClosing tag || tag' == "hr"
|
2006-12-30 22:51:49 +00:00
|
|
|
then return tag
|
|
|
|
else do
|
|
|
|
contents <- many (do{notFollowedBy' (htmlEndTag tag');
|
|
|
|
htmlElement <|> (count 1 anyChar)})
|
|
|
|
end <- htmlEndTag tag'
|
|
|
|
return $ tag ++ (concat contents) ++ end)
|
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
rawHtmlBlocks = try (do
|
2006-12-20 06:50:14 +00:00
|
|
|
htmlBlocks <- many1 rawHtmlBlock
|
|
|
|
let combined = concatMap (\(RawHtml str) -> str) htmlBlocks
|
|
|
|
let combined' = if (last combined == '\n')
|
|
|
|
then init combined -- strip extra newline
|
|
|
|
else combined
|
|
|
|
return (RawHtml combined'))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2006-12-30 22:51:49 +00:00
|
|
|
--
|
|
|
|
-- LaTeX
|
|
|
|
--
|
|
|
|
|
|
|
|
rawLaTeXEnvironment' = do
|
|
|
|
failIfStrict
|
|
|
|
rawLaTeXEnvironment
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-01-15 19:52:42 +00:00
|
|
|
--
|
|
|
|
-- Tables
|
|
|
|
--
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a dashed line with optional trailing spaces; return its length
|
|
|
|
-- and the length including trailing space.
|
2007-01-15 19:52:42 +00:00
|
|
|
dashedLine ch = do
|
|
|
|
dashes <- many1 (char ch)
|
|
|
|
sp <- many spaceChar
|
|
|
|
return $ (length dashes, length $ dashes ++ sp)
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a table header with dashed lines of '-' preceded by
|
|
|
|
-- one line of text.
|
2007-01-15 19:52:42 +00:00
|
|
|
simpleTableHeader = do
|
|
|
|
rawContent <- anyLine
|
|
|
|
initSp <- nonindentSpaces
|
|
|
|
dashes <- many1 (dashedLine '-')
|
|
|
|
newline
|
|
|
|
let (lengths, lines) = unzip dashes
|
|
|
|
let indices = scanl (+) (length initSp) lines
|
|
|
|
let rawHeads = tail $ splitByIndices (init indices) rawContent
|
|
|
|
let aligns = zipWith alignType (map (\a -> [a]) rawHeads) lengths
|
|
|
|
return $ (rawHeads, aligns, indices)
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a table footer - dashed lines followed by blank line.
|
2007-01-15 19:52:42 +00:00
|
|
|
tableFooter = try $ do
|
|
|
|
nonindentSpaces
|
|
|
|
many1 (dashedLine '-')
|
|
|
|
blanklines
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a table separator - dashed line.
|
2007-01-15 19:52:42 +00:00
|
|
|
tableSep = try $ do
|
|
|
|
nonindentSpaces
|
|
|
|
many1 (dashedLine '-')
|
|
|
|
string "\n"
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a raw line and split it into chunks by indices.
|
2007-01-15 19:52:42 +00:00
|
|
|
rawTableLine indices = do
|
|
|
|
notFollowedBy' (blanklines <|> tableFooter)
|
|
|
|
line <- many1Till anyChar newline
|
|
|
|
return $ map removeLeadingTrailingSpace $ tail $
|
|
|
|
splitByIndices (init indices) line
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a table line and return a list of lists of blocks (columns).
|
2007-01-15 19:52:42 +00:00
|
|
|
tableLine indices = try $ do
|
|
|
|
rawline <- rawTableLine indices
|
2007-07-23 00:19:00 +00:00
|
|
|
mapM (parseFromString (many plain)) rawline
|
2007-01-15 19:52:42 +00:00
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a multiline table row and return a list of blocks (columns).
|
2007-01-15 19:52:42 +00:00
|
|
|
multilineRow indices = try $ do
|
|
|
|
colLines <- many1 (rawTableLine indices)
|
|
|
|
option "" blanklines
|
|
|
|
let cols = map unlines $ transpose colLines
|
2007-07-23 00:19:00 +00:00
|
|
|
mapM (parseFromString (many plain)) cols
|
2007-01-15 19:52:42 +00:00
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Calculate relative widths of table columns, based on indices
|
|
|
|
widthsFromIndices :: Int -- Number of columns on terminal
|
|
|
|
-> [Int] -- Indices
|
|
|
|
-> [Float] -- Fractional relative sizes of columns
|
2007-01-15 19:52:42 +00:00
|
|
|
widthsFromIndices _ [] = []
|
|
|
|
widthsFromIndices numColumns indices =
|
|
|
|
let lengths = zipWith (-) indices (0:indices)
|
|
|
|
totLength = sum lengths
|
|
|
|
quotient = if totLength > numColumns
|
|
|
|
then fromIntegral totLength
|
|
|
|
else fromIntegral numColumns
|
|
|
|
fracs = map (\l -> (fromIntegral l) / quotient) lengths in
|
|
|
|
tail fracs
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parses a table caption: inlines beginning with 'Table:'
|
|
|
|
-- and followed by blank lines.
|
2007-01-15 19:52:42 +00:00
|
|
|
tableCaption = try $ do
|
|
|
|
nonindentSpaces
|
|
|
|
string "Table:"
|
|
|
|
result <- many1 inline
|
|
|
|
blanklines
|
|
|
|
return $ normalizeSpaces result
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a table using 'headerParser', 'lineParser', and 'footerParser'.
|
2007-01-15 19:52:42 +00:00
|
|
|
tableWith headerParser lineParser footerParser = try $ do
|
|
|
|
(rawHeads, aligns, indices) <- headerParser
|
|
|
|
lines <- many1Till (lineParser indices) footerParser
|
|
|
|
caption <- option [] tableCaption
|
2007-07-23 00:19:00 +00:00
|
|
|
heads <- mapM (parseFromString (many plain)) rawHeads
|
2007-01-15 19:52:42 +00:00
|
|
|
state <- getState
|
|
|
|
let numColumns = stateColumns state
|
|
|
|
let widths = widthsFromIndices numColumns indices
|
|
|
|
return $ Table caption aligns widths heads lines
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a simple table with '---' header and one line per row.
|
2007-01-15 19:52:42 +00:00
|
|
|
simpleTable = tableWith simpleTableHeader tableLine blanklines
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Parse a multiline table: starts with row of '-' on top, then header
|
|
|
|
-- (which may be multiline), then the rows,
|
|
|
|
-- which may be multiline, separated by blank lines, and
|
|
|
|
-- ending with a footer (dashed line followed by blank line).
|
2007-01-15 19:52:42 +00:00
|
|
|
multilineTable = tableWith multilineTableHeader multilineRow tableFooter
|
|
|
|
|
|
|
|
multilineTableHeader = try $ do
|
|
|
|
tableSep
|
|
|
|
rawContent <- many1 (do{notFollowedBy' tableSep;
|
|
|
|
many1Till anyChar newline})
|
|
|
|
initSp <- nonindentSpaces
|
|
|
|
dashes <- many1 (dashedLine '-')
|
|
|
|
newline
|
|
|
|
let (lengths, lines) = unzip dashes
|
|
|
|
let indices = scanl (+) (length initSp) lines
|
|
|
|
let rawHeadsList = transpose $ map
|
|
|
|
(\ln -> tail $ splitByIndices (init indices) ln)
|
|
|
|
rawContent
|
|
|
|
let rawHeads = map (joinWithSep " ") rawHeadsList
|
|
|
|
let aligns = zipWith alignType rawHeadsList lengths
|
|
|
|
return $ ((map removeLeadingTrailingSpace rawHeads),
|
|
|
|
aligns, indices)
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Returns the longest of a list of strings.
|
2007-01-15 19:52:42 +00:00
|
|
|
longest :: [String] -> String
|
|
|
|
longest [] = ""
|
|
|
|
longest [x] = x
|
|
|
|
longest (x:xs) =
|
|
|
|
if (length x) >= (maximum $ map length xs)
|
|
|
|
then x
|
|
|
|
else longest xs
|
|
|
|
|
2007-01-16 00:07:42 +00:00
|
|
|
-- Returns an alignment type for a table, based on a list of strings
|
|
|
|
-- (the rows of the column header) and a number (the length of the
|
|
|
|
-- dashed line under the rows.
|
2007-01-15 19:52:42 +00:00
|
|
|
alignType :: [String] -> Int -> Alignment
|
|
|
|
alignType [] len = AlignDefault
|
|
|
|
alignType strLst len =
|
|
|
|
let str = longest $ map removeTrailingSpace strLst
|
|
|
|
leftSpace = if null str then False else ((str !! 0) `elem` " \t")
|
|
|
|
rightSpace = (length str < len || (str !! (len - 1)) `elem` " \t") in
|
|
|
|
case (leftSpace, rightSpace) of
|
|
|
|
(True, False) -> AlignRight
|
|
|
|
(False, True) -> AlignLeft
|
|
|
|
(True, True) -> AlignCenter
|
|
|
|
(False, False) -> AlignDefault
|
|
|
|
|
|
|
|
table = do
|
|
|
|
failIfStrict
|
|
|
|
result <- simpleTable <|> multilineTable <?> "table"
|
|
|
|
return result
|
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
--
|
|
|
|
-- inline
|
|
|
|
--
|
|
|
|
|
2007-03-17 17:25:28 +00:00
|
|
|
inline = choice [ rawLaTeXInline'
|
|
|
|
, escapedChar
|
|
|
|
, entity
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
, note
|
2007-03-17 17:25:28 +00:00
|
|
|
, inlineNote
|
|
|
|
, link
|
|
|
|
, referenceLink
|
|
|
|
, rawHtmlInline'
|
|
|
|
, autoLink
|
|
|
|
, image
|
|
|
|
, math
|
|
|
|
, strong
|
|
|
|
, emph
|
2007-07-21 22:52:07 +00:00
|
|
|
, strikeout
|
|
|
|
, superscript
|
|
|
|
, subscript
|
2007-03-17 17:25:28 +00:00
|
|
|
, smartPunctuation
|
|
|
|
, code
|
|
|
|
, ltSign
|
|
|
|
, symbol
|
|
|
|
, str
|
|
|
|
, linebreak
|
|
|
|
, tabchar
|
|
|
|
, whitespace
|
|
|
|
, endline ] <?> "inline"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-01-08 23:54:15 +00:00
|
|
|
escapedChar = try $ do
|
|
|
|
char '\\'
|
|
|
|
state <- getState
|
|
|
|
result <- if stateStrict state
|
2007-07-21 22:52:07 +00:00
|
|
|
then oneOf "\\`*_{}[]()>#+-.!~"
|
2007-01-08 23:54:15 +00:00
|
|
|
else satisfy (not . isAlphaNum)
|
|
|
|
return (Str [result])
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2006-11-26 07:01:37 +00:00
|
|
|
ltSign = try (do
|
2007-01-06 09:54:58 +00:00
|
|
|
notFollowedBy (noneOf "<") -- continue only if it's a <
|
2006-10-17 14:22:29 +00:00
|
|
|
notFollowedBy' rawHtmlBlocks -- don't return < if it starts html
|
|
|
|
char '<'
|
2006-11-26 07:01:37 +00:00
|
|
|
return (Str ['<']))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
specialCharsMinusLt = filter (/= '<') specialChars
|
|
|
|
|
|
|
|
symbol = do
|
|
|
|
result <- oneOf specialCharsMinusLt
|
|
|
|
return (Str [result])
|
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
-- parses inline code, between n `s and n `s
|
2006-12-31 19:22:02 +00:00
|
|
|
code = try (do
|
2007-07-21 22:52:07 +00:00
|
|
|
starts <- many1 (char '`')
|
2006-12-31 19:22:02 +00:00
|
|
|
let num = length starts
|
2007-07-21 22:52:07 +00:00
|
|
|
result <- many1Till anyChar (try (count num (char '`')))
|
2006-12-20 06:50:14 +00:00
|
|
|
-- get rid of any internal newlines
|
|
|
|
let result' = removeLeadingTrailingSpace $ joinWithSep " " $ lines result
|
|
|
|
return (Code result'))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
mathWord = many1 (choice [ (noneOf " \t\n\\$"),
|
2006-12-20 06:50:14 +00:00
|
|
|
(try (do
|
|
|
|
c <- char '\\'
|
2007-07-21 22:52:07 +00:00
|
|
|
notFollowedBy (char '$')
|
2006-12-20 06:50:14 +00:00
|
|
|
return c))])
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
math = try (do
|
2006-12-30 22:51:49 +00:00
|
|
|
failIfStrict
|
2007-07-21 22:52:07 +00:00
|
|
|
char '$'
|
2006-10-17 14:22:29 +00:00
|
|
|
notFollowedBy space
|
|
|
|
words <- sepBy1 mathWord (many1 space)
|
2007-07-21 22:52:07 +00:00
|
|
|
char '$'
|
2006-10-17 14:22:29 +00:00
|
|
|
return (TeX ("$" ++ (joinWithSep " " words) ++ "$")))
|
|
|
|
|
|
|
|
emph = do
|
2007-07-21 22:52:07 +00:00
|
|
|
result <- choice [ (enclosed (char '*') (char '*') inline),
|
|
|
|
(enclosed (char '_') (char '_') inline) ]
|
|
|
|
return $ Emph (normalizeSpaces result)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
strong = do
|
2007-07-21 22:52:07 +00:00
|
|
|
result <- (enclosed (string "**") (string "**") inline) <|>
|
|
|
|
(enclosed (string "__") (string "__") inline)
|
|
|
|
return $ Strong (normalizeSpaces result)
|
|
|
|
|
|
|
|
strikeout = do
|
|
|
|
failIfStrict
|
|
|
|
result <- enclosed (string "~~") (string "~~") inline
|
|
|
|
return $ Strikeout (normalizeSpaces result)
|
|
|
|
|
|
|
|
superscript = do
|
|
|
|
failIfStrict
|
|
|
|
result <- enclosed (char '^') (char '^')
|
|
|
|
(notFollowedBy' whitespace >> inline) -- may not contain Space
|
|
|
|
return $ Superscript result
|
|
|
|
|
|
|
|
subscript = do
|
|
|
|
failIfStrict
|
|
|
|
result <- enclosed (char '~') (char '~')
|
|
|
|
(notFollowedBy' whitespace >> inline) -- may not contain Space
|
|
|
|
return $ Subscript result
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2007-01-06 09:54:58 +00:00
|
|
|
smartPunctuation = do
|
|
|
|
failUnlessSmart
|
|
|
|
choice [ quoted, apostrophe, dash, ellipses ]
|
|
|
|
|
|
|
|
apostrophe = do
|
|
|
|
char '\'' <|> char '\8217'
|
|
|
|
return Apostrophe
|
|
|
|
|
|
|
|
quoted = do
|
|
|
|
doubleQuoted <|> singleQuoted
|
|
|
|
|
2007-01-06 18:41:01 +00:00
|
|
|
withQuoteContext context parser = do
|
|
|
|
oldState <- getState
|
|
|
|
let oldQuoteContext = stateQuoteContext oldState
|
|
|
|
setState oldState { stateQuoteContext = context }
|
|
|
|
result <- parser
|
|
|
|
newState <- getState
|
|
|
|
setState newState { stateQuoteContext = oldQuoteContext }
|
|
|
|
return result
|
|
|
|
|
|
|
|
singleQuoted = try $ do
|
|
|
|
singleQuoteStart
|
|
|
|
withQuoteContext InSingleQuote $ do
|
|
|
|
result <- many1Till inline singleQuoteEnd
|
|
|
|
return $ Quoted SingleQuote $ normalizeSpaces result
|
|
|
|
|
|
|
|
doubleQuoted = try $ do
|
|
|
|
doubleQuoteStart
|
|
|
|
withQuoteContext InDoubleQuote $ do
|
|
|
|
result <- many1Till inline doubleQuoteEnd
|
|
|
|
return $ Quoted DoubleQuote $ normalizeSpaces result
|
|
|
|
|
|
|
|
failIfInQuoteContext context = do
|
|
|
|
st <- getState
|
|
|
|
if (stateQuoteContext st == context)
|
|
|
|
then fail "already inside quotes"
|
|
|
|
else return ()
|
2007-01-06 09:54:58 +00:00
|
|
|
|
2007-01-16 23:46:33 +00:00
|
|
|
singleQuoteStart = try $ do
|
2007-01-06 18:41:01 +00:00
|
|
|
failIfInQuoteContext InSingleQuote
|
2007-01-24 04:49:36 +00:00
|
|
|
char '\8216' <|> do
|
|
|
|
char '\''
|
|
|
|
notFollowedBy (oneOf ")!],.;:-? \t\n")
|
|
|
|
notFollowedBy (try (do -- possessive or contraction
|
2007-03-07 20:53:37 +00:00
|
|
|
oneOfStrings ["s","t","m","ve","ll","re"]
|
2007-01-24 04:49:36 +00:00
|
|
|
satisfy (not . isAlphaNum)))
|
|
|
|
return '\''
|
2007-01-06 09:54:58 +00:00
|
|
|
|
2007-01-16 23:46:33 +00:00
|
|
|
singleQuoteEnd = try $ do
|
2007-01-06 18:41:01 +00:00
|
|
|
char '\'' <|> char '\8217'
|
2007-01-16 23:46:33 +00:00
|
|
|
notFollowedBy alphaNum
|
2007-01-06 09:54:58 +00:00
|
|
|
|
2007-01-16 23:51:57 +00:00
|
|
|
doubleQuoteStart = try $ do
|
2007-01-06 18:41:01 +00:00
|
|
|
failIfInQuoteContext InDoubleQuote
|
|
|
|
char '"' <|> char '\8220'
|
2007-01-16 23:51:57 +00:00
|
|
|
notFollowedBy (oneOf " \t\n")
|
2007-01-06 09:54:58 +00:00
|
|
|
|
|
|
|
doubleQuoteEnd = char '"' <|> char '\8221'
|
|
|
|
|
|
|
|
ellipses = try (do
|
|
|
|
oneOfStrings ["...", " . . . ", ". . .", " . . ."]
|
|
|
|
return Ellipses)
|
|
|
|
|
|
|
|
dash = enDash <|> emDash
|
|
|
|
|
|
|
|
enDash = try (do
|
|
|
|
char '-'
|
2007-01-06 18:41:01 +00:00
|
|
|
notFollowedBy (noneOf "0123456789")
|
2007-01-06 09:54:58 +00:00
|
|
|
return EnDash)
|
|
|
|
|
|
|
|
emDash = try (do
|
|
|
|
skipSpaces
|
|
|
|
oneOfStrings ["---", "--"]
|
|
|
|
skipSpaces
|
|
|
|
return EmDash)
|
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
whitespace = do
|
|
|
|
many1 (oneOf spaceChars) <?> "whitespace"
|
|
|
|
return Space
|
|
|
|
|
|
|
|
tabchar = do
|
|
|
|
tab
|
|
|
|
return (Str "\t")
|
|
|
|
|
|
|
|
-- hard line break
|
|
|
|
linebreak = try (do
|
|
|
|
oneOf spaceChars
|
|
|
|
many1 (oneOf spaceChars)
|
|
|
|
endline
|
|
|
|
return LineBreak )
|
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
nonEndline = satisfy (/='\n')
|
2006-10-17 14:22:29 +00:00
|
|
|
|
Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
parser from Entities.hs.
+ Added new 'entity' parser to Markdown reader, and added '&' as a
special character. Adjusted test suite accordingly since now we
get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs. escapeSGML removed as redundant,
given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
boolean parameter that causes only numerical entities to be used.
This is used in the docbook writer. The HTML writer uses named
entities where possible, but not all docbook-consumers know about
the named entities without special instructions, so it seems safer
to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
detecting "mailto" links.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
|
|
|
entity = do
|
|
|
|
ent <- characterEntity
|
|
|
|
return $ Str [ent]
|
|
|
|
|
2007-07-21 22:52:07 +00:00
|
|
|
strChar = noneOf (specialChars ++ spaceChars ++ "\n")
|
2007-02-15 02:27:36 +00:00
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
str = do
|
2007-02-15 02:27:36 +00:00
|
|
|
result <- many1 strChar
|
Changes in entity handling:
+ Entities are parsed (and unicode characters returned) in both
Markdown and HTML readers.
+ Parsers characterEntity, namedEntity, decimalEntity, hexEntity added
to Entities.hs; these parse a string and return a unicode character.
+ Changed 'entity' parser in HTML reader to use the 'characterEntity'
parser from Entities.hs.
+ Added new 'entity' parser to Markdown reader, and added '&' as a
special character. Adjusted test suite accordingly since now we
get 'Str "AT",Str "&",Str "T"' instead of 'Str "AT&T"..
+ stringToSGML moved to Entities.hs. escapeSGML removed as redundant,
given encodeEntities.
+ stringToSGML, encodeEntities, and specialCharToEntity are given a
boolean parameter that causes only numerical entities to be used.
This is used in the docbook writer. The HTML writer uses named
entities where possible, but not all docbook-consumers know about
the named entities without special instructions, so it seems safer
to use numerical entities there.
+ decodeEntities is rewritten in a way that avoids Text.Regex, using
the new parsers.
+ charToEntity and charToNumericalEntity added to Entities.hs.
+ Moved specialCharToEntity from Shared.hs to Entities.hs.
+ Removed unneeded 'decodeEntities' from 'str' parser in HTML and
Markdown readers.
+ Removed sgmlHexEntity, sgmlDecimalEntity, sgmlNamedEntity, and
sgmlCharacterEntity from Shared.hs.
+ Modified Docbook writer so that it doesn't rely on Text.Regex for
detecting "mailto" links.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@515 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-27 03:04:40 +00:00
|
|
|
return (Str result)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
-- an endline character that can be treated as a space, not a structural break
|
2006-12-20 06:50:14 +00:00
|
|
|
endline = try (do
|
|
|
|
newline
|
|
|
|
notFollowedBy blankline
|
|
|
|
st <- getState
|
2006-12-30 22:51:49 +00:00
|
|
|
if stateStrict st
|
|
|
|
then do
|
|
|
|
notFollowedBy' emailBlockQuoteStart
|
2007-07-21 22:52:07 +00:00
|
|
|
notFollowedBy (char '#') -- atx header
|
2007-01-07 00:48:46 +00:00
|
|
|
notFollowedBy (try (do{manyTill anyChar newline;
|
|
|
|
oneOf setextHChars})) -- setext header
|
2006-12-30 22:51:49 +00:00
|
|
|
else return ()
|
|
|
|
-- parse potential list-starts differently if in a list:
|
2006-12-20 06:50:14 +00:00
|
|
|
if (stateParserContext st) == ListItemState
|
2006-12-30 22:51:49 +00:00
|
|
|
then notFollowedBy' (orderedListStart <|> bulletListStart)
|
|
|
|
else return ()
|
2006-12-20 06:50:14 +00:00
|
|
|
return Space)
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
--
|
|
|
|
-- links
|
|
|
|
--
|
|
|
|
|
2007-01-06 20:47:00 +00:00
|
|
|
-- a reference label for a link
|
2007-07-21 22:52:07 +00:00
|
|
|
reference = inlinesInBalanced "[" "]" >>= (return . normalizeSpaces)
|
2007-07-15 23:53:22 +00:00
|
|
|
|
2006-10-17 14:22:29 +00:00
|
|
|
-- source for a link, with optional title
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
source = try $ do
|
2007-07-21 22:52:07 +00:00
|
|
|
char '('
|
|
|
|
option ' ' (char '<')
|
|
|
|
src <- many (noneOf ")> \t\n")
|
|
|
|
option ' ' (char '>')
|
2006-12-20 06:50:14 +00:00
|
|
|
tit <- option "" title
|
|
|
|
skipSpaces
|
2007-07-21 22:52:07 +00:00
|
|
|
char ')'
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
return (removeTrailingSpace src, tit)
|
2006-12-20 06:50:14 +00:00
|
|
|
|
|
|
|
titleWith startChar endChar = try (do
|
2007-05-10 22:06:13 +00:00
|
|
|
leadingSpace <- many1 (oneOf " \t\n")
|
|
|
|
if length (filter (=='\n') leadingSpace) > 1
|
|
|
|
then fail "title must be separated by space and on same or next line"
|
|
|
|
else return ()
|
2006-12-20 06:50:14 +00:00
|
|
|
char startChar
|
2007-07-15 23:53:22 +00:00
|
|
|
tit <- manyTill anyChar (try (do char endChar
|
|
|
|
skipSpaces
|
|
|
|
notFollowedBy (noneOf ")\n")))
|
2007-01-28 00:04:43 +00:00
|
|
|
return $ decodeEntities tit)
|
2006-12-20 06:50:14 +00:00
|
|
|
|
|
|
|
title = choice [ titleWith '(' ')',
|
|
|
|
titleWith '"' '"',
|
|
|
|
titleWith '\'' '\''] <?> "title"
|
2006-10-17 14:22:29 +00:00
|
|
|
|
|
|
|
link = choice [explicitLink, referenceLink] <?> "link"
|
|
|
|
|
2006-12-20 06:50:14 +00:00
|
|
|
explicitLink = try (do
|
|
|
|
label <- reference
|
|
|
|
src <- source
|
|
|
|
return (Link label src))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
-- a link like [this][ref] or [this][] or [this]
|
|
|
|
referenceLink = try $ do
|
2006-12-20 06:50:14 +00:00
|
|
|
label <- reference
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
ref <- option [] (try (do skipSpaces
|
|
|
|
option ' ' newline
|
|
|
|
skipSpaces
|
|
|
|
reference))
|
2006-12-30 22:51:49 +00:00
|
|
|
let ref' = if null ref then label else ref
|
|
|
|
state <- getState
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
case lookupKeySrc (stateKeys state) ref' of
|
|
|
|
Nothing -> fail "no corresponding key"
|
|
|
|
Just target -> return (Link label target)
|
2006-12-20 06:50:14 +00:00
|
|
|
|
2007-01-24 20:55:27 +00:00
|
|
|
autoLink = autoLinkEmail <|> autoLinkRegular
|
2007-01-24 19:44:43 +00:00
|
|
|
|
|
|
|
-- a link <like@this.com>
|
|
|
|
autoLinkEmail = try $ do
|
2007-07-21 22:52:07 +00:00
|
|
|
char '<'
|
2007-01-24 19:44:43 +00:00
|
|
|
name <- many1Till (noneOf "/:<> \t\n") (char '@')
|
|
|
|
domain <- sepBy1 (many1 (noneOf "/:.@<> \t\n")) (char '.')
|
|
|
|
let src = name ++ "@" ++ (joinWithSep "." domain)
|
2007-07-21 22:52:07 +00:00
|
|
|
char '>'
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
return $ Link [Str src] (("mailto:" ++ src), "")
|
2007-01-24 19:44:43 +00:00
|
|
|
|
2007-01-24 20:55:27 +00:00
|
|
|
-- a link <http://like.this.com>
|
2007-01-24 19:44:43 +00:00
|
|
|
autoLinkRegular = try $ do
|
2007-07-21 22:52:07 +00:00
|
|
|
char '<'
|
2007-01-24 20:55:27 +00:00
|
|
|
prot <- oneOfStrings ["http:", "ftp:", "mailto:"]
|
2007-07-21 22:52:07 +00:00
|
|
|
rest <- many1Till (noneOf " \t\n<>") (char '>')
|
2007-01-24 20:55:27 +00:00
|
|
|
let src = prot ++ rest
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
return $ Link [Str src] (src, "")
|
2006-12-20 06:50:14 +00:00
|
|
|
|
|
|
|
image = try (do
|
2007-07-21 22:52:07 +00:00
|
|
|
char '!'
|
2006-12-20 06:50:14 +00:00
|
|
|
(Link label src) <- link
|
|
|
|
return (Image label src))
|
2006-10-17 14:22:29 +00:00
|
|
|
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
note = try $ do
|
2006-12-30 22:51:49 +00:00
|
|
|
failIfStrict
|
2006-12-20 06:50:14 +00:00
|
|
|
ref <- noteMarker
|
|
|
|
state <- getState
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
let notes = stateNotes state
|
|
|
|
case lookup ref notes of
|
|
|
|
Nothing -> fail "note not found"
|
|
|
|
Just contents -> return (Note contents)
|
2006-12-19 23:13:03 +00:00
|
|
|
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
inlineNote = try $ do
|
2006-12-30 22:51:49 +00:00
|
|
|
failIfStrict
|
2007-07-21 22:52:07 +00:00
|
|
|
char '^'
|
2007-07-15 23:53:22 +00:00
|
|
|
contents <- inlinesInBalanced "[" "]"
|
Extensive changes stemming from a rethinking of the Pandoc data
structure. Key and Note blocks have been removed. Link and image URLs
are now stored directly in Link and Image inlines, and note blocks
are stored in Note inlines. This requires changes in both parsers
and writers. Markdown and RST parsers need to extract data from key
and note blocks and insert them into the relevant inline elements.
Other parsers can be simplified, since there is no longer any need to
construct separate key and note blocks. Markdown, RST, and HTML writers
need to construct lists of notes; Markdown and RST writers need to
construct lists of link references (when the --reference-links option
is specified); and the RST writer needs to construct a list of image
substitution references. All writers have been rewritten to use the
State monad when state is required. This rewrite yields a small speed
boost and considerably cleaner code.
* Text/Pandoc/Definition.hs:
+ blocks: removed Key and Note
+ inlines: removed NoteRef, added Note
+ modified Target: there is no longer a 'Ref' target; all targets
are explicit URL, title pairs
* Text/Pandoc/Shared.hs:
+ Added 'Reference', 'isNoteBlock', 'isKeyBlock', 'isLineClump',
used in some of the readers.
+ Removed 'generateReference', 'keyTable', 'replaceReferenceLinks',
'replaceRefLinksBlockList', along with some auxiliary functions
used only by them. These are no longer needed, since
reference links are resolved in the Markdown and RST readers.
+ Moved 'inTags', 'selfClosingTag', 'inTagsSimple', and 'inTagsIndented'
to the Docbook writer, since that is now the only module that uses
them.
+ Changed name of 'escapeSGMLString' to 'escapeStringForXML'
+ Added KeyTable and NoteTable types
+ Removed fields from ParserState; 'stateKeyBlocks', 'stateKeysUsed',
'stateNoteBlocks', 'stateNoteIdentifiers', 'stateInlineLinks'.
Added 'stateKeys' and 'stateNotes'.
+ Added clause for Note to 'prettyBlock'.
+ Added 'writerNotes', 'writerReferenceLinks' fields to WriterOptions.
* Text/Pandoc/Entities.hs: Renamed 'escapeSGMLChar' and
'escapeSGMLString' to 'escapeCharForXML' and 'escapeStringForXML'
* Text/ParserCombinators/Pandoc.hs: Added lineClump parser: parses a raw
line block up to and including following blank lines.
* Main.hs: Replaced --inline-links with --reference-links.
* README:
+ Documented --reference-links and removed description of --inline-links.
+ Added note that footnotes may occur anywhere in the document, but must
be at the outer level, not embedded in block elements.
* man/man1/pandoc.1, man/man1/html2markdown.1: Removed --inline-links
option, added --reference-links option
* Markdown and RST readers:
+ Rewrote to fit new Pandoc definition. Since there are no longer
Note or Key blocks, all note and key blocks are parsed on a first pass
through the document. Once tables of notes and keys have been constructed,
the remaining parts of the document are reassembled and parsed.
+ Refactored link parsers.
* LaTeX and HTML readers: Rewrote to fit new Pandoc definition. Since
there are no longer Note or Key blocks, notes and references can be
parsed in a single pass through the document.
* RST, Markdown, and HTML writers: Rewrote using state monad new Pandoc
and definition. State is used to hold lists of references footnotes to
and be printed at the end of the document.
* RTF and LaTeX writers: Rewrote using new Pandoc definition. (Because
of the different treatment of footnotes, the "notes" parameter is no
longer needed in the block and inline conversion functions.)
* Docbook writer:
+ Moved the functions 'attributeList', 'inTags', 'selfClosingTag',
'inTagsSimple', 'inTagsIndented' from Text/Pandoc/Shared, since
they are now used only by the Docbook writer.
+ Rewrote using new Pandoc definition. (Because of the different
treatment of footnotes, the "notes" parameter is no longer needed
in the block and inline conversion functions.)
* Updated test suite
* Throughout: old haskell98 module names replaced by hierarchical module
names, e.g. List by Data.List.
* debian/control: Include libghc6-xhtml-dev instead of libghc6-html-dev
in "Build-Depends."
* cabalize:
+ Remove haskell98 from BASE_DEPENDS (since now the new hierarchical
module names are being used throughout)
+ Added mtl to BASE_DEPENDS (needed for state monad)
+ Removed html from GHC66_DEPENDS (not needed since xhtml is now used)
git-svn-id: https://pandoc.googlecode.com/svn/trunk@580 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-04-10 01:56:50 +00:00
|
|
|
return (Note [Para contents])
|
2006-10-17 14:22:29 +00:00
|
|
|
|
2006-12-30 22:51:49 +00:00
|
|
|
rawLaTeXInline' = do
|
|
|
|
failIfStrict
|
|
|
|
rawLaTeXInline
|
|
|
|
|
|
|
|
rawHtmlInline' = do
|
|
|
|
st <- getState
|
|
|
|
result <- if stateStrict st
|
|
|
|
then choice [htmlBlockElement, anyHtmlTag, anyHtmlEndTag]
|
|
|
|
else choice [htmlBlockElement, anyHtmlInlineTag]
|
|
|
|
return (HtmlInline result)
|
|
|
|
|