Merged changes from docbook branch since r363.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@386 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2007-01-01 21:08:12 +00:00
parent 0c6c5d528b
commit a9e32505de
14 changed files with 1285 additions and 22 deletions

30
README
View file

@ -6,8 +6,8 @@ Pandoc is a [Haskell] library for converting from one markup format
to another, and a command-line tool that uses this library. It can read
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
and [S5] HTML slide shows. Pandoc's version of markdown contains some
enhancements, like footnotes and embedded LaTeX.
[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown
contains some enhancements, like footnotes and embedded LaTeX.
In contrast to existing tools for converting markdown to HTML, which
use regex substitutions, Pandoc has a modular design: it consists of a
@ -22,6 +22,7 @@ or output format requires only adding a reader or writer.
[HTML]: http://www.w3.org/TR/html40/
[LaTeX]: http://www.latex-project.org/
[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
[DocBook XML]: http://www.docbook.org/
[Haskell]: http://www.haskell.org/
(c) 2006 John MacFarlane (jgm at berkeley dot edu). Released under the
@ -107,17 +108,17 @@ To convert `hello.html` from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats include `markdown`, `latex`, `html`, `rtf`
(rich text format), `rst` (reStructuredText), and `s5` (which produces
an HTML file that acts like powerpoint). Supported input formats
include `markdown`, `html`, `latex`, and `rst`. Note that the `rst`
reader only parses a subset of reStructuredText syntax. For example,
it doesn't handle tables, definition lists, option lists, or footnotes.
It handles only the constructs expressible in unextended markdown.
But for simple documents it should be adequate. The `latex` and `html`
readers are also limited in what they can do. Because the `html`
reader is picky about the HTML it parses, it is recommended that you
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
`html2markdown` script described below.
(rich text format), `rst` (reStructuredText), `docbook` (DocBook
XML), and `s5` (which produces an HTML file that acts like powerpoint).
Supported input formats include `markdown`, `html`, `latex`, and `rst`.
Note that the `rst` reader only parses a subset of reStructuredText
syntax. For example, it doesn't handle tables, definition lists, option
lists, or footnotes. It handles only the constructs expressible in
unextended markdown. But for simple documents it should be adequate.
The `latex` and `html` readers are also limited in what they can do.
Because the `html` reader is picky about the HTML it parses, it is
recommended that you pipe HTML through [HTML Tidy] before sending it to
`pandoc`, or use the `html2markdown` script described below.
If you don't specify a reader or writer explicitly, `pandoc` will
try to determine the input and output format from the extensions of
@ -200,7 +201,8 @@ formats are `native`, `markdown`, `rst`, `html`, and `latex`.
`-t`, `--to`, `-w`, or `--write` can be used to specify the output
format -- the format Pandoc will be converting *to*. Available formats
are `native`, `html`, `s5`, `latex`, `markdown`, `rst`, and `rtf`.
are `native`, `html`, `s5`, `docbook`, `latex`, `markdown`, `rst`, and
`rtf`.
`-s` or `--standalone` indicates that a standalone document is to be
produced (with appropriate headers and footers), rather than a fragment.

2
debian/changelog vendored
View file

@ -46,6 +46,8 @@ pandoc (0.3) unstable; urgency=low
+ Removed extra blanks after '-h' and '-D' output.
+ Added copyright message to '-v' output, modeled after FSF messages.
* Added docbook writer.
* Added implicit setting of default input and output format based
on input and output filename extensions. These defaults are
overridden if explicit input and output formats are specified using

View file

@ -6,8 +6,8 @@ pandoc \- general markup converter
.SH DESCRIPTION
\fBPandoc\fR converts files from one markup format to another. It can
read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and
it can write markdown, reStructuredText, HTML, LaTeX, RTF, and S5 HTML
slide shows.
it can write markdown, reStructuredText, HTML, LaTeX, RTF, DocBook
XML, and S5 HTML slide shows.
.PP
If no \fIinput\-file\fR is specified, input is read from STDIN.
Otherwise, the \fIinput\-files\fR are concatenated (with a blank
@ -80,6 +80,8 @@ can be
(HTML),
.B latex
(LaTeX),
.B docbook
(DocBook XML),
.B s5
(S5 HTML and javascript slide show),
or
@ -141,7 +143,7 @@ default header, which can be printed by using the \fB\-D\fR option).
Implies \fB-s\fR.
.TP
.B \-D \fIFORMAT\fB, \-\-print-default-header=\fIFORMAT\fB
Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex,
Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, docbook,
markdown, rst, rtf\fR).
.TP
.B \-T \fISTRING\fB, \-\-title-prefix=\fISTRING\fB

View file

@ -37,12 +37,14 @@ import Text.Pandoc.Writers.RST ( writeRST )
import Text.Pandoc.Readers.RST ( readRST )
import Text.Pandoc.ASCIIMathML ( asciiMathMLScript )
import Text.Pandoc.Writers.HTML ( writeHtml )
import Text.Pandoc.Writers.Docbook ( writeDocbook )
import Text.Pandoc.Writers.LaTeX ( writeLaTeX )
import Text.Pandoc.Readers.LaTeX ( readLaTeX )
import Text.Pandoc.Writers.RTF ( writeRTF )
import Text.Pandoc.Writers.Markdown ( writeMarkdown )
import Text.Pandoc.Writers.DefaultHeaders ( defaultHtmlHeader,
defaultRTFHeader, defaultS5Header, defaultLaTeXHeader )
defaultRTFHeader, defaultS5Header, defaultLaTeXHeader,
defaultDocbookHeader )
import Text.Pandoc.Definition
import Text.Pandoc.Shared
import Text.Regex ( mkRegex, matchRegex )
@ -79,6 +81,7 @@ writers :: [ ( String, ( WriterOptions -> Pandoc -> String, String ) ) ]
writers = [("native" , (writeDoc, ""))
,("html" , (writeHtml, defaultHtmlHeader))
,("s5" , (writeS5, defaultS5Header))
,("docbook" , (writeDocbook, defaultDocbookHeader))
,("latex" , (writeLaTeX, defaultLaTeXHeader))
,("markdown" , (writeMarkdown, ""))
,("rst" , (writeRST, ""))
@ -331,6 +334,8 @@ defaultWriterName x =
Just ["text"] -> "markdown"
Just ["md"] -> "markdown"
Just ["markdown"] -> "markdown"
Just ["db"] -> "docbook"
Just ["xml"] -> "docbook"
Just _ -> "html"
main = do
@ -423,6 +428,7 @@ main = do
writerSmart = smart &&
(not strict),
writerTabStop = tabStop,
writerNotes = [],
writerS5 = (writerName=="s5"),
writerIncremental = incremental,
writerNumberSections = numberSections,

View file

@ -322,7 +322,7 @@ containsPara (x:rest) = containsPara rest
-- | Options for writers
data WriterOptions = WriterOptions
{ writerStandalone :: Bool -- ^ If @True@, writer header and footer
{ writerStandalone :: Bool -- ^ Include header and footer
, writerTitlePrefix :: String -- ^ Prefix for HTML titles
, writerHeader :: String -- ^ Header for the document
, writerIncludeBefore :: String -- ^ String to include before the body
@ -334,6 +334,7 @@ data WriterOptions = WriterOptions
, writerStrictMarkdown :: Bool -- ^ Use strict markdown syntax
, writerTabStop :: Int -- ^ Tabstop for conversion between
-- spaces and tabs
, writerNotes :: [Block] -- ^ List of note blocks
} deriving Show
--

View file

@ -0,0 +1,236 @@
{-
Copyright (C) 2006 John MacFarlane <jgm at berkeley dot edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-}
{- |
Module : Text.Pandoc.Writers.Docbook
Copyright : Copyright (C) 2006 John MacFarlane
License : GNU GPL, version 2 or above
Maintainer : John MacFarlane <jgm at berkeley dot edu>
Stability : alpha
Portability : portable
Conversion of 'Pandoc' documents to Docbook XML.
-}
module Text.Pandoc.Writers.Docbook (
writeDocbook
) where
import Text.Pandoc.Definition
import Text.Pandoc.Shared
import Text.Pandoc.Writers.HTML ( stringToSmartHtml, stringToHtml )
import Text.Html ( stringToHtmlString )
import Text.Regex ( mkRegex, matchRegex )
import Data.Char ( toLower )
import Data.List ( isPrefixOf, partition )
import Text.PrettyPrint.HughesPJ hiding ( Str )
-- | Data structure for defining hierarchical Pandoc documents
data Element = Blk Block
| Sec [Inline] [Element] deriving (Eq, Read, Show)
-- | Returns true on Header block with level at least 'level'
headerAtLeast :: Int -> Block -> Bool
headerAtLeast level (Header x _) = x <= level
headerAtLeast level _ = False
-- | Convert list of Pandoc blocks into list of Elements (hierarchical)
hierarchicalize :: [Block] -> [Element]
hierarchicalize [] = []
hierarchicalize (block:rest) =
case block of
(Header level title) -> let (thisSection, rest') = break (headerAtLeast level) rest in
(Sec title (hierarchicalize thisSection)):(hierarchicalize rest')
x -> (Blk x):(hierarchicalize rest)
-- | Convert list of authors to a docbook <author> section
authorToDocbook :: WriterOptions -> [Char] -> Doc
authorToDocbook options name = indentedInTags options "author" $
if ',' `elem` name
then -- last name first
let (lastname, rest) = break (==',') name
firstname = removeLeadingSpace rest in
inTags "firstname" (text $ stringToXML options firstname) <>
inTags "surname" (text $ stringToXML options lastname)
else -- last name last
let namewords = words name
lengthname = length namewords
(firstname, lastname) = case lengthname of
0 -> ("","")
1 -> ("", name)
n -> (joinWithSep " " (take (n-1) namewords), last namewords) in
inTags "firstname" (text $ stringToXML options firstname) $$
inTags "surname" (text $ stringToXML options lastname)
-- | Convert Pandoc document to string in Docbook format.
writeDocbook :: WriterOptions -> Pandoc -> String
writeDocbook options (Pandoc (Meta title authors date) blocks) =
let head = if (writerStandalone options)
then text (writerHeader options)
else empty
meta = if (writerStandalone options)
then indentedInTags options "articleinfo" $
(inTags "title" (inlinesToDocbook options title)) $$
(vcat (map (authorToDocbook options) authors)) $$
(inTags "date" (text date))
else empty
blocks' = replaceReferenceLinks blocks
(noteBlocks, blocks'') = partition isNoteBlock blocks'
options' = options {writerNotes = noteBlocks}
elements = hierarchicalize blocks''
body = text (writerIncludeBefore options') <>
vcat (map (elementToDocbook options') elements) $$
text (writerIncludeAfter options')
body' = if writerStandalone options'
then indentedInTags options' "article" (meta $$ body)
else body in
render $ head $$ body' <> text "\n"
-- | Put the supplied contents between start and end tags of tagType,
-- with specified attributes.
inTagsWithAttrib :: String -> [(String, String)] -> Doc -> Doc
inTagsWithAttrib tagType attribs contents = text ("<" ++ tagType ++
(concatMap (\(a, b) -> " " ++ attributeStringToXML a ++
"=\"" ++ attributeStringToXML b ++ "\"") attribs)) <>
if isEmpty contents
then text " />" -- self-closing tag
else text ">" <> contents <> text ("</" ++ tagType ++ ">")
-- | Put the supplied contents between start and end tags of tagType.
inTags :: String -> Doc -> Doc
inTags tagType contents = inTagsWithAttrib tagType [] contents
-- | Put the supplied contents in indented block btw start and end tags.
indentedInTags :: WriterOptions -> [Char] -> Doc -> Doc
indentedInTags options tagType contents = text ("<" ++ tagType ++ ">") $$
nest 2 contents $$ text ("</" ++ tagType ++ ">")
-- | Convert an Element to Docbook.
elementToDocbook :: WriterOptions -> Element -> Doc
elementToDocbook options (Blk block) = blockToDocbook options block
elementToDocbook options (Sec title elements) =
-- Docbook doesn't allow sections with no content, so insert some if needed
let elements' = if null elements
then [Blk (Para [])]
else elements in
indentedInTags options "section" $
inTags "title" (wrap options title) $$
vcat (map (elementToDocbook options) elements')
-- | Convert a list of Pandoc blocks to Docbook.
blocksToDocbook :: WriterOptions -> [Block] -> Doc
blocksToDocbook options = vcat . map (blockToDocbook options)
-- | Convert a list of lists of blocks to a list of Docbook list items.
listItemsToDocbook :: WriterOptions -> [[Block]] -> Doc
listItemsToDocbook options items =
vcat $ map (listItemToDocbook options) items
-- | Convert a list of blocks into a Docbook list item.
listItemToDocbook :: WriterOptions -> [Block] -> Doc
listItemToDocbook options item =
let plainToPara (Plain x) = Para x
plainToPara y = y in
let item' = map plainToPara item in
indentedInTags options "listitem" (blocksToDocbook options item')
-- | Convert a Pandoc block element to Docbook.
blockToDocbook :: WriterOptions -> Block -> Doc
blockToDocbook options Blank = text ""
blockToDocbook options Null = empty
blockToDocbook options (Plain lst) = wrap options lst
blockToDocbook options (Para lst) =
indentedInTags options "para" (wrap options lst)
blockToDocbook options (BlockQuote blocks) =
indentedInTags options "blockquote" (blocksToDocbook options blocks)
blockToDocbook options (CodeBlock str) =
text "<programlisting>" <> (cdata str) <> text "</programlisting>"
blockToDocbook options (BulletList lst) =
indentedInTags options "itemizedlist" $ listItemsToDocbook options lst
blockToDocbook options (OrderedList lst) =
indentedInTags options "orderedlist" $ listItemsToDocbook options lst
blockToDocbook options (RawHtml str) = text str -- raw XML block
blockToDocbook options HorizontalRule = empty -- not semantic
blockToDocbook options (Note _ _) = empty -- shouldn't occur
blockToDocbook options (Key _ _) = empty -- shouldn't occur
blockToDocbook options _ = indentedInTags options "para" (text "Unknown block type")
-- | Put string in CDATA section
cdata :: String -> Doc
cdata str = text $ "<![CDATA[" ++ str ++ "]]>"
-- | Take list of inline elements and return wrapped doc.
wrap :: WriterOptions -> [Inline] -> Doc
wrap options lst = fsep $ map (hcat . (map (inlineToDocbook options))) (splitBySpace lst)
-- | Escape a string for XML (with "smart" option if specified).
stringToXML :: WriterOptions -> String -> String
stringToXML options = if writerSmart options
then stringToSmartHtml
else stringToHtml
-- | Escape string to XML appropriate for attributes
attributeStringToXML :: String -> String
attributeStringToXML = gsub "\"" "&quot;" . codeStringToXML
-- | Escape a literal string for XML.
codeStringToXML :: String -> String
codeStringToXML = gsub "<" "&lt;" . gsub "&" "&amp;"
-- | Convert a list of inline elements to Docbook.
inlinesToDocbook :: WriterOptions -> [Inline] -> Doc
inlinesToDocbook options lst = hcat (map (inlineToDocbook options) lst)
-- | Convert an inline element to Docbook.
inlineToDocbook :: WriterOptions -> Inline -> Doc
inlineToDocbook options (Str str) = text $ stringToXML options str
inlineToDocbook options (Emph lst) =
inTags "emphasis" (inlinesToDocbook options lst)
inlineToDocbook options (Strong lst) =
inTagsWithAttrib "emphasis" [("role", "strong")]
(inlinesToDocbook options lst)
inlineToDocbook options (Code str) =
inTags "literal" $ text (codeStringToXML str)
inlineToDocbook options (TeX str) = inlineToDocbook options (Code str)
inlineToDocbook options (HtmlInline str) = empty
inlineToDocbook options LineBreak =
text $ "<literallayout></literallayout>"
inlineToDocbook options Space = char ' '
inlineToDocbook options (Link txt (Src src tit)) =
case (matchRegex (mkRegex "mailto:(.*)") src) of
Just [addr] -> inTags "email" $ text (codeStringToXML addr)
Nothing -> inTagsWithAttrib "ulink" [("url", src)] $
inlinesToDocbook options txt
inlineToDocbook options (Link text (Ref ref)) = empty -- shouldn't occur
inlineToDocbook options (Image alt (Src src tit)) =
let titleDoc = if null tit
then empty
else indentedInTags options "objectinfo" $
indentedInTags options "title"
(text $ stringToXML options tit) in
indentedInTags options "inlinemediaobject" $
indentedInTags options "imageobject" $
titleDoc $$ inTagsWithAttrib "imagedata" [("fileref", src)] empty
inlineToDocbook options (Image alternate (Ref ref)) = empty --shouldn't occur
inlineToDocbook options (NoteRef ref) =
let notes = writerNotes options
hits = filter (\(Note r _) -> r == ref) notes in
if null hits
then empty
else let (Note _ contents) = head hits in
indentedInTags options "footnote" $ blocksToDocbook options contents
inlineToDocbook options _ = empty

View file

@ -28,7 +28,9 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Conversion of 'Pandoc' documents to HTML.
-}
module Text.Pandoc.Writers.HTML (
writeHtml
writeHtml,
stringToSmartHtml,
stringToHtml
) where
import Text.Pandoc.Definition
import Text.Pandoc.Shared

View file

@ -0,0 +1,3 @@
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

View file

@ -1,6 +1,7 @@
-- | Default headers for Pandoc writers.
module Text.Pandoc.Writers.DefaultHeaders (
defaultLaTeXHeader,
defaultDocbookHeader,
defaultHtmlHeader,
defaultS5Header,
defaultRTFHeader
@ -10,6 +11,9 @@ import Text.Pandoc.Writers.S5
defaultLaTeXHeader :: String
defaultLaTeXHeader = "@LaTeXHeader@"
defaultDocbookHeader :: String
defaultDocbookHeader = "@DocbookHeader@"
defaultHtmlHeader :: String
defaultHtmlHeader = "@HtmlHeader@"

View file

@ -7,4 +7,5 @@
../pandoc -r native -s -w html -S testsuite.native > writer.smart.html
../pandoc -r native -s -w latex testsuite.native > writer.latex
../pandoc -r native -s -w rtf testsuite.native > writer.rtf
sed -e '/^, Header 1 \[Str "HTML",Space,Str "Blocks"\]/,/^, HorizontalRule/d' testsuite.native | ../pandoc -r native -w docbook -s > writer.docbook

View file

@ -52,6 +52,11 @@ foreach my $format (@writeformats)
test_results("$format writer", "tmp.$extension", "writer.$format");
}
print "Testing docbook writer...";
# remove HTML block tests, as this produces invalid docbook...
`sed -e '/^, Header 1 \\[Str "HTML",Space,Str "Blocks"\\]/,/^, HorizontalRule/d' testsuite.native | $script -r native -w docbook -s > tmp.docbook`;
test_results("docbook writer", "tmp.docbook", "writer.docbook");
print "Testing s5 writer (basic)...";
`$script -r native -w s5 -s s5.native > tmp.html`;
test_results("s5 writer (basic)", "tmp.html", "s5.basic.html");

997
tests/writer.docbook Normal file
View file

@ -0,0 +1,997 @@
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<article>
<articleinfo>
<title>Pandoc Test Suite</title>
<author>
<firstname>John</firstname>
<surname>MacFarlane</surname>
</author>
<author>
<firstname></firstname>
<surname>Anonymous</surname>
</author>
<date>July 17, 2006</date>
</articleinfo>
<para>
This is a set of tests for pandoc. Most of them are adapted from
John Gruber's markdown test suite.
</para>
<section>
<title>Headers</title>
<section>
<title>Level 2 with an
<ulink url="/url">embedded link</ulink></title>
<section>
<title>Level 3 with <emphasis>emphasis</emphasis></title>
<section>
<title>Level 4</title>
<section>
<title>Level 5</title>
<para>
</para>
</section>
</section>
</section>
</section>
</section>
<section>
<title>Level 1</title>
<section>
<title>Level 2 with <emphasis>emphasis</emphasis></title>
<section>
<title>Level 3</title>
<para>
with no blank line
</para>
</section>
</section>
<section>
<title>Level 2</title>
<para>
with no blank line
</para>
</section>
</section>
<section>
<title>Paragraphs</title>
<para>
Here's a regular paragraph.
</para>
<para>
In Markdown 1.0.0 and earlier. Version 8. This line turns into a
list item. Because a hard-wrapped line in the middle of a paragraph
looked like a list item.
</para>
<para>
Here's one with a bullet. * criminey.
</para>
<para>
There should be a hard line
break<literallayout></literallayout>here.
</para>
</section>
<section>
<title>Block Quotes</title>
<para>
E-mail style:
</para>
<blockquote>
<para>
This is a block quote. It is pretty short.
</para>
</blockquote>
<blockquote>
<para>
Code in a block quote:
</para>
<programlisting><![CDATA[sub status {
print "working";
}]]></programlisting>
<para>
A list:
</para>
<orderedlist>
<listitem>
<para>
item one
</para>
</listitem>
<listitem>
<para>
item two
</para>
</listitem>
</orderedlist>
<para>
Nested block quotes:
</para>
<blockquote>
<para>
nested
</para>
</blockquote>
<blockquote>
<para>
nested
</para>
</blockquote>
</blockquote>
<para>
This should not be a block quote: 2 &gt; 1.
</para>
<para>
Box-style:
</para>
<blockquote>
<para>
Example:
</para>
<programlisting><![CDATA[sub status {
print "working";
}]]></programlisting>
</blockquote>
<blockquote>
<orderedlist>
<listitem>
<para>
do laundry
</para>
</listitem>
<listitem>
<para>
take out the trash
</para>
</listitem>
</orderedlist>
</blockquote>
<para>
Here's a nested one:
</para>
<blockquote>
<para>
Joe said:
</para>
<blockquote>
<para>
Don't quote me.
</para>
</blockquote>
</blockquote>
<para>
And a following paragraph.
</para>
</section>
<section>
<title>Code Blocks</title>
<para>
Code:
</para>
<programlisting><![CDATA[---- (should be four hyphens)
sub status {
print "working";
}
this code block is indented by one tab]]></programlisting>
<para>
And:
</para>
<programlisting><![CDATA[ this code block is indented by two tabs
These should not be escaped: \$ \\ \> \[ \{]]></programlisting>
</section>
<section>
<title>Lists</title>
<section>
<title>Unordered</title>
<para>
Asterisks tight:
</para>
<itemizedlist>
<listitem>
<para>
asterisk 1
</para>
</listitem>
<listitem>
<para>
asterisk 2
</para>
</listitem>
<listitem>
<para>
asterisk 3
</para>
</listitem>
</itemizedlist>
<para>
Asterisks loose:
</para>
<itemizedlist>
<listitem>
<para>
asterisk 1
</para>
</listitem>
<listitem>
<para>
asterisk 2
</para>
</listitem>
<listitem>
<para>
asterisk 3
</para>
</listitem>
</itemizedlist>
<para>
Pluses tight:
</para>
<itemizedlist>
<listitem>
<para>
Plus 1
</para>
</listitem>
<listitem>
<para>
Plus 2
</para>
</listitem>
<listitem>
<para>
Plus 3
</para>
</listitem>
</itemizedlist>
<para>
Pluses loose:
</para>
<itemizedlist>
<listitem>
<para>
Plus 1
</para>
</listitem>
<listitem>
<para>
Plus 2
</para>
</listitem>
<listitem>
<para>
Plus 3
</para>
</listitem>
</itemizedlist>
<para>
Minuses tight:
</para>
<itemizedlist>
<listitem>
<para>
Minus 1
</para>
</listitem>
<listitem>
<para>
Minus 2
</para>
</listitem>
<listitem>
<para>
Minus 3
</para>
</listitem>
</itemizedlist>
<para>
Minuses loose:
</para>
<itemizedlist>
<listitem>
<para>
Minus 1
</para>
</listitem>
<listitem>
<para>
Minus 2
</para>
</listitem>
<listitem>
<para>
Minus 3
</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Ordered</title>
<para>
Tight:
</para>
<orderedlist>
<listitem>
<para>
First
</para>
</listitem>
<listitem>
<para>
Second
</para>
</listitem>
<listitem>
<para>
Third
</para>
</listitem>
</orderedlist>
<para>
and:
</para>
<orderedlist>
<listitem>
<para>
One
</para>
</listitem>
<listitem>
<para>
Two
</para>
</listitem>
<listitem>
<para>
Three
</para>
</listitem>
</orderedlist>
<para>
Loose using tabs:
</para>
<orderedlist>
<listitem>
<para>
First
</para>
</listitem>
<listitem>
<para>
Second
</para>
</listitem>
<listitem>
<para>
Third
</para>
</listitem>
</orderedlist>
<para>
and using spaces:
</para>
<orderedlist>
<listitem>
<para>
One
</para>
</listitem>
<listitem>
<para>
Two
</para>
</listitem>
<listitem>
<para>
Three
</para>
</listitem>
</orderedlist>
<para>
Multiple paragraphs:
</para>
<orderedlist>
<listitem>
<para>
Item 1, graf one.
</para>
<para>
Item 1. graf two. The quick brown fox jumped over the lazy dog's
back.
</para>
</listitem>
<listitem>
<para>
Item 2.
</para>
</listitem>
<listitem>
<para>
Item 3.
</para>
</listitem>
</orderedlist>
</section>
<section>
<title>Nested</title>
<itemizedlist>
<listitem>
<para>
Tab
</para>
<itemizedlist>
<listitem>
<para>
Tab
</para>
<itemizedlist>
<listitem>
<para>
Tab
</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
<para>
Here's another:
</para>
<orderedlist>
<listitem>
<para>
First
</para>
</listitem>
<listitem>
<para>
Second:
</para>
<itemizedlist>
<listitem>
<para>
Fee
</para>
</listitem>
<listitem>
<para>
Fie
</para>
</listitem>
<listitem>
<para>
Foe
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Third
</para>
</listitem>
</orderedlist>
<para>
Same thing but with paragraphs:
</para>
<orderedlist>
<listitem>
<para>
First
</para>
</listitem>
<listitem>
<para>
Second:
</para>
<itemizedlist>
<listitem>
<para>
Fee
</para>
</listitem>
<listitem>
<para>
Fie
</para>
</listitem>
<listitem>
<para>
Foe
</para>
</listitem>
</itemizedlist>
</listitem>
<listitem>
<para>
Third
</para>
</listitem>
</orderedlist>
</section>
<section>
<title>Tabs and spaces</title>
<itemizedlist>
<listitem>
<para>
this is a list item indented with tabs
</para>
</listitem>
<listitem>
<para>
this is a list item indented with spaces
</para>
<itemizedlist>
<listitem>
<para>
this is an example list item indented with tabs
</para>
</listitem>
<listitem>
<para>
this is an example list item indented with spaces
</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</section>
</section>
<section>
<title>Inline Markup</title>
<para>
This is <emphasis>emphasized</emphasis>, and so
<emphasis>is this</emphasis>.
</para>
<para>
This is <emphasis role="strong">strong</emphasis>, and so
<emphasis role="strong">is this</emphasis>.
</para>
<para>
An <emphasis><ulink url="/url">emphasized link</ulink></emphasis>.
</para>
<para>
<emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis>
</para>
<para>
So is <emphasis role="strong"><emphasis>this</emphasis></emphasis>
word.
</para>
<para>
<emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis>
</para>
<para>
So is <emphasis role="strong"><emphasis>this</emphasis></emphasis>
word.
</para>
<para>
This is code: <literal>></literal>, <literal>$</literal>,
<literal>\</literal>, <literal>\$</literal>,
<literal>&lt;html></literal>.
</para>
</section>
<section>
<title>Smart quotes, ellipses, dashes</title>
<para>
"Hello," said the spider. "'Shelob' is my name."
</para>
<para>
'A', 'B', and 'C' are letters.
</para>
<para>
'Oak,' 'elm,' and 'beech' are names of trees. So is 'pine.'
</para>
<para>
'He said, "I want to go."' Were you alive in the 70's?
</para>
<para>
Here is some quoted '<literal>code</literal>' and a
"<ulink url="http://example.com/?foo=1&amp;bar=2">quoted link</ulink>".
</para>
<para>
Some dashes: one---two --- three--four -- five.
</para>
<para>
Dashes between numbers: 5-7, 255-66, 1987-1999.
</para>
<para>
Ellipses...and. . .and . . . .
</para>
</section>
<section>
<title>LaTeX</title>
<itemizedlist>
<listitem>
<para>
<literal>\cite[22-23]{smith.1899}</literal>
</para>
</listitem>
<listitem>
<para>
<literal>\doublespacing</literal>
</para>
</listitem>
<listitem>
<para>
<literal>$2+2=4$</literal>
</para>
</listitem>
<listitem>
<para>
<literal>$x \in y$</literal>
</para>
</listitem>
<listitem>
<para>
<literal>$\alpha \wedge \omega$</literal>
</para>
</listitem>
<listitem>
<para>
<literal>$223$</literal>
</para>
</listitem>
<listitem>
<para>
<literal>$p$</literal>-Tree
</para>
</listitem>
<listitem>
<para>
<literal>$\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$</literal>
</para>
</listitem>
<listitem>
<para>
Here's one that has a line break in it:
<literal>$\alpha + \omega \times x^2$</literal>.
</para>
</listitem>
</itemizedlist>
<para>
These shouldn't be math:
</para>
<itemizedlist>
<listitem>
<para>
To get the famous equation, write <literal>$e = mc^2$</literal>.
</para>
</listitem>
<listitem>
<para>
$22,000 is a <emphasis>lot</emphasis> of money. So is $34,000. (It
worked if "lot" is emphasized.)
</para>
</listitem>
<listitem>
<para>
Escaped <literal>$</literal>: $73
<emphasis>this should be emphasized</emphasis> 23$.
</para>
</listitem>
</itemizedlist>
<para>
Here's a LaTeX table:
</para>
<para>
<literal>\begin{tabular}{|l|l|}\hline
Animal &amp; Number \\ \hline
Dog &amp; 2 \\
Cat &amp; 1 \\ \hline
\end{tabular}</literal>
</para>
</section>
<section>
<title>Special Characters</title>
<para>
Here is some unicode:
</para>
<itemizedlist>
<listitem>
<para>
I hat: Î
</para>
</listitem>
<listitem>
<para>
o umlaut: ö
</para>
</listitem>
<listitem>
<para>
section: §
</para>
</listitem>
<listitem>
<para>
set membership: ∈
</para>
</listitem>
<listitem>
<para>
copyright: ©
</para>
</listitem>
</itemizedlist>
<para>
AT&amp;T has an ampersand in their name.
</para>
<para>
AT&amp;T is another way to write it.
</para>
<para>
This &amp; that.
</para>
<para>
4 &lt; 5.
</para>
<para>
6 &gt; 5.
</para>
<para>
Backslash: \
</para>
<para>
Backtick: `
</para>
<para>
Asterisk: *
</para>
<para>
Underscore: _
</para>
<para>
Left brace: {
</para>
<para>
Right brace: }
</para>
<para>
Left bracket: [
</para>
<para>
Right bracket: ]
</para>
<para>
Left paren: (
</para>
<para>
Right paren: )
</para>
<para>
Greater-than: &gt;
</para>
<para>
Hash: #
</para>
<para>
Period: .
</para>
<para>
Bang: !
</para>
<para>
Plus: +
</para>
<para>
Minus: -
</para>
</section>
<section>
<title>Links</title>
<section>
<title>Explicit</title>
<para>
Just a <ulink url="/url/">URL</ulink>.
</para>
<para>
<ulink url="/url/">URL and title</ulink>.
</para>
<para>
<ulink url="/url/">URL and title</ulink>.
</para>
<para>
<ulink url="/url/">URL and title</ulink>.
</para>
<para>
<ulink url="/url/">URL and title</ulink>
</para>
<para>
<ulink url="/url/">URL and title</ulink>
</para>
<para>
<email>nobody@nowhere.net</email>
</para>
<para>
<ulink url="">Empty</ulink>.
</para>
</section>
<section>
<title>Reference</title>
<para>
Foo <ulink url="/url/">bar</ulink>.
</para>
<para>
Foo <ulink url="/url/">bar</ulink>.
</para>
<para>
Foo <ulink url="/url/">bar</ulink>.
</para>
<para>
With <ulink url="/url/">embedded [brackets]</ulink>.
</para>
<para>
<ulink url="/url/">b</ulink> by itself should be a link.
</para>
<para>
Indented <ulink url="/url">once</ulink>.
</para>
<para>
Indented <ulink url="/url">twice</ulink>.
</para>
<para>
Indented <ulink url="/url">thrice</ulink>.
</para>
<para>
This should [not][] be a link.
</para>
<programlisting><![CDATA[[not]: /url]]></programlisting>
<para>
Foo <ulink url="/url/">bar</ulink>.
</para>
<para>
Foo <ulink url="/url/">biz</ulink>.
</para>
</section>
<section>
<title>With ampersands</title>
<para>
Here's a
<ulink url="http://example.com/?foo=1&amp;bar=2">link with an ampersand in the URL</ulink>.
</para>
<para>
Here's a link with an amersand in the link text:
<ulink url="http://att.com/">AT&amp;T</ulink>.
</para>
<para>
Here's an <ulink url="/script?foo=1&amp;bar=2">inline link</ulink>.
</para>
<para>
Here's an
<ulink url="/script?foo=1&amp;bar=2">inline link in pointy braces</ulink>.
</para>
</section>
<section>
<title>Autolinks</title>
<para>
With an ampersand:
<ulink url="http://example.com/?foo=1&amp;bar=2">http://example.com/?foo=1&amp;bar=2</ulink>
</para>
<itemizedlist>
<listitem>
<para>
In a list?
</para>
</listitem>
<listitem>
<para>
<ulink url="http://example.com/">http://example.com/</ulink>
</para>
</listitem>
<listitem>
<para>
It should.
</para>
</listitem>
</itemizedlist>
<para>
An e-mail address: <email>nobody@nowhere.net</email>
</para>
<blockquote>
<para>
Blockquoted:
<ulink url="http://example.com/">http://example.com/</ulink>
</para>
</blockquote>
<para>
Auto-links should not occur here:
<literal>&lt;http://example.com/></literal>
</para>
<programlisting><![CDATA[or here: <http://example.com/>]]></programlisting>
</section>
</section>
<section>
<title>Images</title>
<para>
From "Voyage dans la Lune" by Georges Melies (1902):
</para>
<para>
<inlinemediaobject>
<imageobject>
<objectinfo>
<title>
Voyage dans la Lune
</title>
</objectinfo>
<imagedata fileref="lalune.jpg" />
</imageobject>
</inlinemediaobject>
</para>
<para>
Here is a movie
<inlinemediaobject>
<imageobject>
<imagedata fileref="movie.jpg" />
</imageobject>
</inlinemediaobject>
icon.
</para>
</section>
<section>
<title>Footnotes</title>
<para>
Here is a footnote
reference,<footnote>
<para>
Here is the footnote. It can go anywhere after the footnote
reference. It need not be placed at the end of the document.
</para>
</footnote>
and
another.<footnote>
<para>
Here's the long note. This one contains multiple blocks.
</para>
<para>
Subsequent blocks are indented to show that they belong to the
footnote (as with list items).
</para>
<programlisting><![CDATA[ { <code> }]]></programlisting>
<para>
If you want, you can indent every line, but you can also be lazy
and just indent the first line of each block.
</para>
</footnote>
This should <emphasis>not</emphasis> be a footnote reference,
because it contains a space.[^my note] Here is an inline
note.<footnote>
<para>
This is <emphasis>easier</emphasis> to type. Inline notes may
contain <ulink url="http://google.com">links</ulink> and
<literal>]</literal> verbatim characters.
</para>
</footnote>
</para>
<blockquote>
<para>
Notes can go in
quotes.<footnote>
<para>
In quote.
</para>
</footnote>
</para>
</blockquote>
<orderedlist>
<listitem>
<para>
And in list
items.<footnote>
<para>
In list.
</para>
</footnote>
</para>
</listitem>
</orderedlist>
<para>
This paragraph should not be part of the note, as it is not
indented.
</para>
</section>
</article>

View file

@ -4,8 +4,8 @@ Pandoc is a [Haskell] library for converting from one markup format
to another, and a command-line tool that uses this library. It can read
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
and [S5] HTML slide shows. Pandoc's version of markdown contains some
enhancements, like footnotes and embedded LaTeX.
[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown
contains some enhancements, like footnotes and embedded LaTeX.
In contrast to existing tools for converting markdown to HTML, which
use regex substitutions, Pandoc has a modular design: it consists of a
@ -73,6 +73,7 @@ kind.
[HTML]: http://www.w3.org/TR/html40/
[LaTeX]: http://www.latex-project.org/
[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
[DocBook XML]: http://www.docbook.org/
[Haskell]: http://www.haskell.org/
[GHC]: http://www.haskell.org/ghc/
[GPL]: http://www.gnu.org/copyleft/gpl.html

View file

@ -16,6 +16,7 @@ pandoc -s README.tex -o example0.txt
pandoc -s -w rst README -o example0.txt
pandoc -s README -o example0.rtf
pandoc -s -m -i -w s5 S5DEMO -o example0.html
pandoc -s -w docbook README -o example0.db
html2markdown http://www.gnu.org/software/make/ -o example0.txt
markdown2pdf README -o example0.pdf
markdown2pdf -C myheader.tex README -o example0.pdf'