Added plain writer.

Text.Pandoc.Writers.Markdown now exports a writePlain,
which writes plain text without links, pictures, or
special formatting (not even markdown conventions).

git-svn-id: https://pandoc.googlecode.com/svn/trunk@1907 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2010-03-18 06:45:43 +00:00
parent 081ae411b3
commit 695961155a
12 changed files with 904 additions and 55 deletions

8
README
View file

@ -5,8 +5,8 @@
Pandoc is a [Haskell] library for converting from one markup format to
another, and a command-line tool that uses this library. It can read
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX]; and
it can write [markdown], [reStructuredText], [HTML], [LaTeX], [ConTeXt],
[RTF], [DocBook XML], [OpenDocument XML], [ODT], [GNU Texinfo],
it can write plain text, [markdown], [reStructuredText], [HTML], [LaTeX],
[ConTeXt], [RTF], [DocBook XML], [OpenDocument XML], [ODT], [GNU Texinfo],
[MediaWiki markup], [groff man] pages, and [S5] HTML slide shows.
Pandoc's enhanced version of markdown includes syntax for footnotes,
tables, flexible ordered lists, definition lists, delimited code blocks,
@ -192,8 +192,8 @@ For further documentation, see the `pandoc(1)` man page.
: specifies the output format -- the format Pandoc will
be converting *to*. *format* can be `native`, `html`, `s5`,
`docbook`, `opendocument`, `latex`, `context`, `markdown`, `man`,
`rst`, and `rtf`. (`+lhs` can be appended to indicate that the
output should be treated as literate Haskell source. See
`plain`, `rst`, and `rtf`. (`+lhs` can be appended to indicate that
the output should be treated as literate Haskell source. See
[Literate Haskell support](#literate-haskell-support), below.)
`-s` or `--standalone`

View file

@ -14,9 +14,9 @@ pandoc [*options*] [*input-file*]...
Pandoc converts files from one markup format to another. It can
read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and
it can write markdown, reStructuredText, HTML, LaTeX, ConTeXt, Texinfo,
groff man, MediaWiki markup, RTF, OpenDocument XML, ODT, DocBook XML,
and S5 HTML slide shows.
it can write plain text, markdown, reStructuredText, HTML, LaTeX,
ConTeXt, Texinfo, groff man, MediaWiki markup, RTF, OpenDocument XML,
ODT, DocBook XML, and S5 HTML slide shows.
If no *input-file* is specified, input is read from *stdin*.
Otherwise, the *input-files* are concatenated (with a blank
@ -71,7 +71,7 @@ should pipe input and output through `iconv`:
-t *FORMAT*, -w *FORMAT*, \--to=*FORMAT*, \--write=*FORMAT*
: Specify output format. *FORMAT* can be `native` (native Haskell),
`markdown` (markdown or plain text), `rst` (reStructuredText),
`plain` (plain text), `markdown` (markdown), `rst` (reStructuredText),
`html` (HTML), `latex` (LaTeX), `context` (ConTeXt), `man` (groff man),
`mediawiki` (MediaWiki markup), `texinfo` (GNU Texinfo),
`docbook` (DocBook XML), `opendocument` (OpenDocument XML),

View file

@ -40,7 +40,7 @@ Data-Files:
templates/opendocument.template, templates/latex.template,
templates/context.template, templates/texinfo.template,
templates/man.template, templates/markdown.template,
templates/rst.template,
templates/rst.template, templates/plain.template,
templates/mediawiki.template, templates/rtf.template,
-- data for ODT writer
reference.odt,

View file

@ -72,6 +72,7 @@ module Text.Pandoc
, HeaderType (..)
-- * Writers: converting /from/ Pandoc format
, writeMarkdown
, writePlain
, writeRST
, writeLaTeX
, writeConTeXt

View file

@ -29,7 +29,7 @@ Conversion of 'Pandoc' documents to markdown-formatted plain text.
Markdown: <http://daringfireball.net/projects/markdown/>
-}
module Text.Pandoc.Writers.Markdown ( writeMarkdown) where
module Text.Pandoc.Writers.Markdown (writeMarkdown, writePlain) where
import Text.Pandoc.Definition
import Text.Pandoc.Templates (renderTemplate)
import Text.Pandoc.Shared
@ -41,12 +41,43 @@ import Control.Monad.State
type Notes = [[Block]]
type Refs = KeyTable
type WriterState = (Notes, Refs)
data WriterState = WriterState { stNotes :: Notes
, stRefs :: Refs
, stPlain :: Bool }
-- | Convert Pandoc to Markdown.
writeMarkdown :: WriterOptions -> Pandoc -> String
writeMarkdown opts document =
evalState (pandocToMarkdown opts document) ([],[])
evalState (pandocToMarkdown opts document) WriterState{ stNotes = []
, stRefs = []
, stPlain = False }
-- | Convert Pandoc to plain text (like markdown, but without links,
-- pictures, or inline formatting).
writePlain :: WriterOptions -> Pandoc -> String
writePlain opts document =
evalState (pandocToMarkdown opts document') WriterState{ stNotes = []
, stRefs = []
, stPlain = True }
where document' = plainify document
plainify :: Pandoc -> Pandoc
plainify = processWith go
where go :: [Inline] -> [Inline]
go (Emph xs : ys) = go xs ++ go ys
go (Strong xs : ys) = go xs ++ go ys
go (Strikeout xs : ys) = go xs ++ go ys
go (Superscript xs : ys) = go xs ++ go ys
go (Subscript xs : ys) = go xs ++ go ys
go (SmallCaps xs : ys) = go xs ++ go ys
go (Code s : ys) = Str s : go ys
go (Math _ s : ys) = Str s : go ys
go (TeX _ : ys) = Str "" : go ys
go (HtmlInline _ : ys) = Str "" : go ys
go (Link xs _ : ys) = go xs ++ go ys
go (Image _ _ : ys) = go ys
go (x : ys) = x : go ys
go [] = []
-- | Return markdown representation of document.
pandocToMarkdown :: WriterOptions -> Pandoc -> State WriterState String
@ -60,10 +91,10 @@ pandocToMarkdown opts (Pandoc (Meta title authors date) blocks) = do
then tableOfContents opts headerBlocks
else empty
body <- blockListToMarkdown opts blocks
(notes, _) <- get
notes' <- notesToMarkdown opts (reverse notes)
(_, refs) <- get -- note that the notes may contain refs
refs' <- keyTableToMarkdown opts (reverse refs)
st <- get
notes' <- notesToMarkdown opts (reverse $ stNotes st)
st' <- get -- note that the notes may contain refs
refs' <- keyTableToMarkdown opts (reverse $ stRefs st')
let main = render $ body $+$ text "" $+$ notes' $+$ text "" $+$ refs'
let context = writerVariables opts ++
[ ("toc", render toc)
@ -114,7 +145,9 @@ tableOfContents :: WriterOptions -> [Block] -> Doc
tableOfContents opts headers =
let opts' = opts { writerIgnoreNotes = True }
contents = BulletList $ map elementToListItem $ hierarchicalize headers
in evalState (blockToMarkdown opts' contents) ([],[])
in evalState (blockToMarkdown opts' contents) WriterState{ stNotes = []
, stRefs = []
, stPlain = False }
-- | Converts an Element to a list item for a table of contents,
elementToListItem :: Element -> [Block]
@ -164,13 +197,18 @@ blockToMarkdown opts (Para inlines) = do
then char '\\'
else empty
return $ esc <> contents <> text "\n"
blockToMarkdown _ (RawHtml str) = return $ text str
blockToMarkdown _ (RawHtml str) = do
st <- get
if stPlain st
then return empty
else return $ text str
blockToMarkdown _ HorizontalRule = return $ text "\n* * * * *\n"
blockToMarkdown opts (Header level inlines) = do
contents <- inlineListToMarkdown opts inlines
st <- get
-- use setext style headers if in literate haskell mode.
-- ghc interprets '#' characters in column 1 as line number specifiers.
if writerLiterateHaskell opts
if writerLiterateHaskell opts || stPlain st
then let len = length $ render contents
in return $ contents <> text "\n" <>
case level of
@ -185,11 +223,14 @@ blockToMarkdown opts (CodeBlock (_,classes,_) str) | "haskell" `elem` classes &&
blockToMarkdown opts (CodeBlock _ str) = return $
(nest (writerTabStop opts) $ vcat $ map text (lines str)) <> text "\n"
blockToMarkdown opts (BlockQuote blocks) = do
st <- get
-- if we're writing literate haskell, put a space before the bird tracks
-- so they won't be interpreted as lhs...
let leader = if writerLiterateHaskell opts
then text . (" > " ++)
else text . ("> " ++)
else if stPlain st
then text . (" " ++)
else text . ("> " ++)
contents <- blockListToMarkdown opts blocks
return $ (vcat $ map leader $ lines $ render contents) <>
text "\n"
@ -273,7 +314,8 @@ definitionListItemToMarkdown :: WriterOptions
definitionListItemToMarkdown opts (label, defs) = do
labelText <- inlineListToMarkdown opts label
let tabStop = writerTabStop opts
let leader = char ':'
st <- get
let leader = if stPlain st then empty else text " ~"
contents <- liftM vcat $
mapM (liftM ((leader $$) . nest tabStop . vcat) . mapM (blockToMarkdown opts)) defs
return $ labelText $+$ contents
@ -289,18 +331,18 @@ blockListToMarkdown opts blocks =
-- Prefer label if possible; otherwise, generate a unique key.
getReference :: [Inline] -> Target -> State WriterState [Inline]
getReference label (src, tit) = do
(_,refs) <- get
case find ((== (src, tit)) . snd) refs of
st <- get
case find ((== (src, tit)) . snd) (stRefs st) of
Just (ref, _) -> return ref
Nothing -> do
let label' = case find ((== label) . fst) refs of
let label' = case find ((== label) . fst) (stRefs st) of
Just _ -> -- label is used; generate numerical label
case find (\n -> not (any (== [Str (show n)])
(map fst refs))) [1..(10000 :: Integer)] of
(map fst (stRefs st)))) [1..(10000 :: Integer)] of
Just x -> [Str (show x)]
Nothing -> error "no unique label"
Nothing -> label
modify (\(notes, refs') -> (notes, (label', (src,tit)):refs'))
modify (\s -> s{ stRefs = (label', (src,tit)) : stRefs st })
return label'
-- | Convert list of Pandoc inline elements to markdown.
@ -346,7 +388,11 @@ inlineToMarkdown _ (Code str) =
marker = replicate (longest + 1) '`'
spacer = if (longest == 0) then "" else " " in
return $ text (marker ++ spacer ++ str ++ spacer ++ marker)
inlineToMarkdown _ (Str str) = return $ text $ escapeString str
inlineToMarkdown _ (Str str) = do
st <- get
if stPlain st
then return $ text str
else return $ text $ escapeString str
inlineToMarkdown _ (Math InlineMath str) = return $ char '$' <> text str <> char '$'
inlineToMarkdown _ (Math DisplayMath str) = return $ text "$$" <> text str <> text "$$"
inlineToMarkdown _ (TeX str) = return $ text str
@ -380,7 +426,7 @@ inlineToMarkdown opts (Image alternate (source, tit)) = do
linkPart <- inlineToMarkdown opts (Link txt (source, tit))
return $ char '!' <> linkPart
inlineToMarkdown _ (Note contents) = do
modify (\(notes, refs) -> (contents:notes, refs)) -- add to notes in state
(notes, _) <- get
let ref = show $ (length notes)
modify (\st -> st{ stNotes = contents : stNotes st })
st <- get
let ref = show $ (length $ stNotes st)
return $ text "[^" <> text ref <> char ']'

View file

@ -123,6 +123,7 @@ writers = [("native" , writeDoc)
,("man" , writeMan)
,("markdown" , writeMarkdown)
,("markdown+lhs" , writeMarkdown)
,("plain" , writePlain)
,("rst" , writeRST)
,("rst+lhs" , writeRST)
,("mediawiki" , writeMediaWiki)

23
templates/plain.template Normal file
View file

@ -0,0 +1,23 @@
$if(titleblock)$
$title$
$for(author)$$author$$sep$; $endfor$
$date$
$endif$
$for(header-includes)$
$header-includes$
$endfor$
$for(include-before)$
$include-before$
$endfor$
$if(toc)$
$toc$
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$

View file

@ -52,6 +52,7 @@ writerFormats = [ "native"
, "context"
, "texinfo"
, "man"
, "plain"
, "markdown"
, "rst"
, "mediawiki"

79
tests/tables.plain Normal file
View file

@ -0,0 +1,79 @@
Simple table with caption:
Right Left Center Default
------- ------ -------- ---------
12 12 12 12
123 123 123 123
1 1 1 1
Table: Demonstration of simple table syntax.
Simple table without caption:
Right Left Center Default
------- ------ -------- ---------
12 12 12 12
123 123 123 123
1 1 1 1
Simple table indented two spaces:
Right Left Center Default
------- ------ -------- ---------
12 12 12 12
123 123 123 123
1 1 1 1
Table: Demonstration of simple table syntax.
Multiline table with caption:
--------------------------------------------------------------
Centered Left Right Default aligned
Header Aligned Aligned
----------- ---------- ------------ --------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
--------------------------------------------------------------
Table: Here's the caption. It may span multiple lines.
Multiline table without caption:
--------------------------------------------------------------
Centered Left Right Default aligned
Header Aligned Aligned
----------- ---------- ------------ --------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
--------------------------------------------------------------
Table without column headers:
----- ----- ----- -----
12 12 12 12
123 123 123 123
1 1 1 1
----- ----- ----- -----
Multiline table without column headers:
----------- ---------- ------------ --------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
----------- ---------- ------------ --------------------------

View file

@ -281,42 +281,42 @@ B. Williams
Tight using spaces:
apple
: red fruit
~ red fruit
orange
: orange fruit
~ orange fruit
banana
: yellow fruit
~ yellow fruit
Tight using tabs:
apple
: red fruit
~ red fruit
orange
: orange fruit
~ orange fruit
banana
: yellow fruit
~ yellow fruit
Loose:
apple
: red fruit
~ red fruit
orange
: orange fruit
~ orange fruit
banana
: yellow fruit
~ yellow fruit
Multiple blocks with italics:
*apple*
: red fruit
~ red fruit
contains seeds, crisp, pleasant to taste
*orange*
: orange fruit
~ orange fruit
{ orange code block }
@ -326,34 +326,34 @@ Multiple blocks with italics:
Multiple definitions, tight:
apple
: red fruit
: computer
~ red fruit
~ computer
orange
: orange fruit
: bank
~ orange fruit
~ bank
Multiple definitions, loose:
apple
: red fruit
~ red fruit
: computer
~ computer
orange
: orange fruit
~ orange fruit
: bank
~ bank
Blank line after term, indented marker, alternate markers:
apple
: red fruit
~ red fruit
: computer
~ computer
orange
: orange fruit
~ orange fruit
1. sublist
2. sublist

698
tests/writer.plain Normal file
View file

@ -0,0 +1,698 @@
Pandoc Test Suite
John MacFarlane; Anonymous
July 17, 2006
This is a set of tests for pandoc. Most of them are adapted from
John Gruber's markdown test suite.
* * * * *
Headers
=======
Level 2 with an embedded link
-----------------------------
Level 3 with emphasis
Level 4
Level 5
Level 1
=======
Level 2 with emphasis
---------------------
Level 3
with no blank line
Level 2
-------
with no blank line
* * * * *
Paragraphs
==========
Here's a regular paragraph.
In Markdown 1.0.0 and earlier. Version 8. This line turns into a
list item. Because a hard-wrapped line in the middle of a paragraph
looked like a list item.
Here's one with a bullet. * criminey.
There should be a hard line break
here.
* * * * *
Block Quotes
============
E-mail style:
This is a block quote. It is pretty short.
Code in a block quote:
sub status {
print "working";
}
A list:
1. item one
2. item two
Nested block quotes:
nested
nested
This should not be a block quote: 2 > 1.
And a following paragraph.
* * * * *
Code Blocks
===========
Code:
---- (should be four hyphens)
sub status {
print "working";
}
this code block is indented by one tab
And:
this code block is indented by two tabs
These should not be escaped: \$ \\ \> \[ \{
* * * * *
Lists
=====
Unordered
---------
Asterisks tight:
- asterisk 1
- asterisk 2
- asterisk 3
Asterisks loose:
- asterisk 1
- asterisk 2
- asterisk 3
Pluses tight:
- Plus 1
- Plus 2
- Plus 3
Pluses loose:
- Plus 1
- Plus 2
- Plus 3
Minuses tight:
- Minus 1
- Minus 2
- Minus 3
Minuses loose:
- Minus 1
- Minus 2
- Minus 3
Ordered
-------
Tight:
1. First
2. Second
3. Third
and:
1. One
2. Two
3. Three
Loose using tabs:
1. First
2. Second
3. Third
and using spaces:
1. One
2. Two
3. Three
Multiple paragraphs:
1. Item 1, graf one.
Item 1. graf two. The quick brown fox jumped over the lazy dog's
back.
2. Item 2.
3. Item 3.
Nested
------
- Tab
- Tab
- Tab
Here's another:
1. First
2. Second:
- Fee
- Fie
- Foe
3. Third
Same thing but with paragraphs:
1. First
2. Second:
- Fee
- Fie
- Foe
3. Third
Tabs and spaces
---------------
- this is a list item indented with tabs
- this is a list item indented with spaces
- this is an example list item indented with tabs
- this is an example list item indented with spaces
Fancy list markers
------------------
(2) begins with 2
(3) and now 3
with a continuation
iv. sublist with roman numerals, starting with 4
v. more items
(A) a subsublist
(B) a subsublist
Nesting:
A. Upper Alpha
I. Upper Roman.
(6) Decimal start with 6
c) Lower alpha with paren
Autonumbering:
1. Autonumber.
2. More.
1. Nested.
Should not be a list item:
M.A. 2007
B. Williams
* * * * *
Definition Lists
================
Tight using spaces:
apple
red fruit
orange
orange fruit
banana
yellow fruit
Tight using tabs:
apple
red fruit
orange
orange fruit
banana
yellow fruit
Loose:
apple
red fruit
orange
orange fruit
banana
yellow fruit
Multiple blocks with italics:
apple
red fruit
contains seeds, crisp, pleasant to taste
orange
orange fruit
{ orange code block }
orange block quote
Multiple definitions, tight:
apple
red fruit
computer
orange
orange fruit
bank
Multiple definitions, loose:
apple
red fruit
computer
orange
orange fruit
bank
Blank line after term, indented marker, alternate markers:
apple
red fruit
computer
orange
orange fruit
1. sublist
2. sublist
HTML Blocks
===========
Simple block on one line:
foo
And nested without indentation:
foo
bar
Interpreted markdown in a table:
This is emphasized
And this is strong
Here's a simple block:
foo
This should be a code block, though:
<div>
foo
</div>
As should this:
<div>foo</div>
Now, nested:
foo
This should just be an HTML comment:
Multiline:
Code block:
<!-- Comment -->
Just plain comment, with trailing spaces on the line:
Code:
<hr />
Hr's:
* * * * *
Inline Markup
=============
This is emphasized, and so is this.
This is strong, and so is this.
An emphasized link.
This is strong and em.
So is this word.
This is strong and em.
So is this word.
This is code: >, $, \, \$, <html>.
This is strikeout.
Superscripts: abcd ahello ahello there.
Subscripts: H2O, H23O, Hmany of themO.
These should not be superscripts or subscripts, because of the
unescaped spaces: a^b c^d, a~b c~d.
* * * * *
Smart quotes, ellipses, dashes
==============================
"Hello," said the spider. "'Shelob' is my name."
'A', 'B', and 'C' are letters.
'Oak,' 'elm,' and 'beech' are names of trees. So is 'pine.'
'He said, "I want to go."' Were you alive in the 70's?
Here is some quoted 'code' and a "quoted link".
Some dashes: one--two -- three--four -- five.
Dashes between numbers: 5-7, 255-66, 1987-1999.
Ellipses...and...and....
* * * * *
LaTeX
=====
-
- 2+2=4
- x \in y
- \alpha \wedge \omega
- 223
- p-Tree
- Here's some display math:
\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}
- Here's one that has a line break in it:
\alpha + \omega \times x^2.
These shouldn't be math:
- To get the famous equation, write $e = mc^2$.
- $22,000 is a lot of money. So is $34,000. (It worked if "lot"
is emphasized.)
- Shoes ($20) and socks ($5).
- Escaped $: $73 this should be emphasized 23$.
Here's a LaTeX table:
* * * * *
Special Characters
==================
Here is some unicode:
- I hat: Î
- o umlaut: ö
- section: §
- set membership: ∈
- copyright: ©
AT&T has an ampersand in their name.
AT&T is another way to write it.
This & that.
4 < 5.
6 > 5.
Backslash: \
Backtick: `
Asterisk: *
Underscore: _
Left brace: {
Right brace: }
Left bracket: [
Right bracket: ]
Left paren: (
Right paren: )
Greater-than: >
Hash: #
Period: .
Bang: !
Plus: +
Minus: -
* * * * *
Links
=====
Explicit
--------
Just a URL.
URL and title.
URL and title.
URL and title.
URL and title
URL and title
with_underscore
Email link
Empty.
Reference
---------
Foo bar.
Foo bar.
Foo bar.
With embedded [brackets].
b by itself should be a link.
Indented once.
Indented twice.
Indented thrice.
This should [not][] be a link.
[not]: /url
Foo bar.
Foo biz.
With ampersands
---------------
Here's a link with an ampersand in the URL.
Here's a link with an amersand in the link text: AT&T.
Here's an inline link.
Here's an inline link in pointy braces.
Autolinks
---------
With an ampersand: http://example.com/?foo=1&bar=2
- In a list?
- http://example.com/
- It should.
An e-mail address: nobody@nowhere.net
Blockquoted: http://example.com/
Auto-links should not occur here: <http://example.com/>
or here: <http://example.com/>
* * * * *
Images
======
From "Voyage dans la Lune" by Georges Melies (1902):
Here is a movie icon.
* * * * *
Footnotes
=========
Here is a footnote reference,[^1] and another.[^2] This should not
be a footnote reference, because it contains a space.[^my note]
Here is an inline note.[^3]
Notes can go in quotes.[^4]
1. And in list items.[^5]
This paragraph should not be part of the note, as it is not
indented.
[^1]:
Here is the footnote. It can go anywhere after the footnote
reference. It need not be placed at the end of the document.
[^2]:
Here's the long note. This one contains multiple blocks.
Subsequent blocks are indented to show that they belong to the
footnote (as with list items).
{ <code> }
If you want, you can indent every line, but you can also be lazy
and just indent the first line of each block.
[^3]:
This is easier to type. Inline notes may contain links and ]
verbatim characters, as well as [bracketed text].
[^4]:
In quote.
[^5]:
In list.

View file

@ -5,9 +5,9 @@
Pandoc is a [Haskell] library for converting from one markup format
to another, and a command-line tool that uses this library. It can read
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [ConTeXt],
[PDF], [RTF], [DocBook XML], [OpenDocument XML], [ODT], [GNU Texinfo],
[MediaWiki markup], [groff man] pages, and [S5] HTML slide shows.
and it can write plain text, [markdown], [reStructuredText], [HTML], [LaTeX],
[ConTeXt], [PDF], [RTF], [DocBook XML], [OpenDocument XML], [ODT],
[GNU Texinfo], [MediaWiki markup], [groff man] pages, and [S5] HTML slide shows.
Pandoc features