Merged changes from docbook branch since r363.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@386 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-01 21:08:12 +00:00 · 2007-01-01 21:08:12 +00:00 · a9e32505de
commit a9e32505de
parent 0c6c5d528b
14 changed files with 1285 additions and 22 deletions
--- a/30
+++ b/30
@ -6,8 +6,8 @@ Pandoc is a [Haskell] library for converting from one markup format
 to another, and a command-line tool that uses this library. It can read
 [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
 and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
-and [S5] HTML slide shows. Pandoc's version of markdown contains some
-enhancements, like footnotes and embedded LaTeX.
+[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown
+contains some enhancements, like footnotes and embedded LaTeX.

 In contrast to existing tools for converting markdown to HTML, which
 use regex substitutions, Pandoc has a modular design: it consists of a
@ -22,6 +22,7 @@ or output format requires only adding a reader or writer.
 [HTML]:  http://www.w3.org/TR/html40/
 [LaTeX]: http://www.latex-project.org/
 [RTF]:  http://en.wikipedia.org/wiki/Rich_Text_Format
+[DocBook XML]:  http://www.docbook.org/
 [Haskell]:  http://www.haskell.org/

 (c) 2006 John MacFarlane (jgm at berkeley dot edu). Released under the
@ -107,17 +108,17 @@ To convert `hello.html` from html to markdown:
 	pandoc -f html -t markdown hello.html

 Supported output formats include `markdown`, `latex`, `html`, `rtf`
-(rich text format), `rst` (reStructuredText), and `s5` (which produces
-an HTML file that acts like powerpoint).  Supported input formats
-include `markdown`, `html`, `latex`, and `rst`.  Note that the `rst`
-reader only parses a subset of reStructuredText syntax.  For example,
-it doesn't handle tables, definition lists, option lists, or footnotes.
-It handles only the constructs expressible in unextended markdown.
-But for simple documents it should be adequate.  The `latex` and `html`
-readers are also limited in what they can do.  Because the `html`
-reader is picky about the HTML it parses, it is recommended that you
-pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
-`html2markdown` script described below.
+(rich text format), `rst` (reStructuredText), `docbook` (DocBook
+XML), and `s5` (which produces an HTML file that acts like powerpoint).
+Supported input formats include `markdown`, `html`, `latex`, and `rst`.
+Note that the `rst` reader only parses a subset of reStructuredText
+syntax.  For example, it doesn't handle tables, definition lists, option
+lists, or footnotes.  It handles only the constructs expressible in
+unextended markdown.  But for simple documents it should be adequate.
+The `latex` and `html` readers are also limited in what they can do.
+Because the `html` reader is picky about the HTML it parses, it is
+recommended that you pipe HTML through [HTML Tidy] before sending it to
+`pandoc`, or use the `html2markdown` script described below.

 If you don't specify a reader or writer explicitly, `pandoc` will
 try to determine the input and output format from the extensions of
@ -200,7 +201,8 @@ formats are `native`, `markdown`, `rst`, `html`, and `latex`.

 `-t`, `--to`, `-w`, or `--write` can be used to specify the output
 format -- the format Pandoc will be converting *to*.  Available formats
-are `native`, `html`, `s5`, `latex`, `markdown`, `rst`, and `rtf`.
+are `native`, `html`, `s5`, `docbook`, `latex`, `markdown`, `rst`, and
+`rtf`.

 `-s` or `--standalone` indicates that a standalone document is to be
 produced (with appropriate headers and footers), rather than a fragment.
--- a/debian/changelog
+++ b/debian/changelog
@ -46,6 +46,8 @@ pandoc (0.3) unstable; urgency=low
    + Removed extra blanks after '-h' and '-D' output.
    + Added copyright message to '-v' output, modeled after FSF messages.

+  * Added docbook writer.
+
  * Added implicit setting of default input and output format based
    on input and output filename extensions.  These defaults are
    overridden if explicit input and output formats are specified using
--- a/man/man1/pandoc.1
+++ b/man/man1/pandoc.1
@ -6,8 +6,8 @@ pandoc \- general markup converter
 .SH DESCRIPTION
 \fBPandoc\fR converts files from one markup format to another. It can
 read markdown and (subsets of) reStructuredText, HTML, and LaTeX, and
-it can write markdown, reStructuredText, HTML, LaTeX, RTF, and S5 HTML
-slide shows.
+it can write markdown, reStructuredText, HTML, LaTeX, RTF, DocBook
+XML, and S5 HTML slide shows.
 .PP
 If no \fIinput\-file\fR is specified, input is read from STDIN.
 Otherwise, the \fIinput\-files\fR are concatenated (with a blank
@ -80,6 +80,8 @@ can be
 (HTML),
 .B latex
 (LaTeX),
+.B docbook
+(DocBook XML),
 .B s5
 (S5 HTML and javascript slide show),
 or
@ -141,7 +143,7 @@ default header, which can be printed by using the \fB\-D\fR option).
 Implies \fB-s\fR.
 .TP
 .B \-D \fIFORMAT\fB, \-\-print-default-header=\fIFORMAT\fB
-Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex,
+Print the default header for \fIFORMAT\fR (\fIhtml, s5, latex, docbook,
 markdown, rst, rtf\fR).
 .TP
 .B \-T \fISTRING\fB, \-\-title-prefix=\fISTRING\fB
--- a/src/Main.hs
+++ b/src/Main.hs
@ -37,12 +37,14 @@ import Text.Pandoc.Writers.RST ( writeRST )
 import Text.Pandoc.Readers.RST ( readRST )
 import Text.Pandoc.ASCIIMathML ( asciiMathMLScript )
 import Text.Pandoc.Writers.HTML ( writeHtml )
+import Text.Pandoc.Writers.Docbook ( writeDocbook )
 import Text.Pandoc.Writers.LaTeX ( writeLaTeX )
 import Text.Pandoc.Readers.LaTeX ( readLaTeX )
 import Text.Pandoc.Writers.RTF ( writeRTF )
 import Text.Pandoc.Writers.Markdown ( writeMarkdown )
 import Text.Pandoc.Writers.DefaultHeaders ( defaultHtmlHeader, 
-       defaultRTFHeader, defaultS5Header, defaultLaTeXHeader )
+       defaultRTFHeader, defaultS5Header, defaultLaTeXHeader,
+       defaultDocbookHeader )
 import Text.Pandoc.Definition
 import Text.Pandoc.Shared
 import Text.Regex ( mkRegex, matchRegex )
@ -79,6 +81,7 @@ writers :: [ ( String, ( WriterOptions -> Pandoc -> String, String ) ) ]
 writers = [("native"   , (writeDoc, ""))
          ,("html"     , (writeHtml, defaultHtmlHeader))
          ,("s5"       , (writeS5, defaultS5Header))
+          ,("docbook"  , (writeDocbook, defaultDocbookHeader))
          ,("latex"    , (writeLaTeX, defaultLaTeXHeader))
          ,("markdown" , (writeMarkdown, ""))
          ,("rst"      , (writeRST, ""))
@ -331,6 +334,8 @@ defaultWriterName x =
    Just ["text"]     -> "markdown"
    Just ["md"]       -> "markdown"
    Just ["markdown"] -> "markdown"
+    Just ["db"]       -> "docbook"
+    Just ["xml"]      -> "docbook"
    Just _            -> "html"

 main = do
@ -423,6 +428,7 @@ main = do
                                      writerSmart          = smart && 
                                                             (not strict), 
                                      writerTabStop        = tabStop, 
+                                      writerNotes          = [],
                                      writerS5             = (writerName=="s5"),
                                      writerIncremental    = incremental, 
                                      writerNumberSections = numberSections,
--- a/src/Text/Pandoc/Shared.hs
+++ b/src/Text/Pandoc/Shared.hs
@ -322,7 +322,7 @@ containsPara (x:rest) = containsPara rest

 -- | Options for writers
 data WriterOptions = WriterOptions
-    { writerStandalone      :: Bool   -- ^ If @True@, writer header and footer
+    { writerStandalone      :: Bool   -- ^ Include header and footer
    , writerTitlePrefix     :: String -- ^ Prefix for HTML titles
    , writerHeader          :: String -- ^ Header for the document
    , writerIncludeBefore   :: String -- ^ String to include before the  body
@ -334,6 +334,7 @@ data WriterOptions = WriterOptions
    , writerStrictMarkdown  :: Bool   -- ^ Use strict markdown syntax
    , writerTabStop         :: Int    -- ^ Tabstop for conversion between 
                                      -- spaces and tabs
+    , writerNotes           :: [Block] -- ^ List of note blocks
    } deriving Show

 --
--- a/src/Text/Pandoc/Writers/Docbook.hs
+++ b/src/Text/Pandoc/Writers/Docbook.hs
@ -0,0 +1,236 @@
+{-
+Copyright (C) 2006 John MacFarlane <jgm at berkeley dot edu>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+-}
+
+{- |
+   Module      : Text.Pandoc.Writers.Docbook
+   Copyright   : Copyright (C) 2006 John MacFarlane
+   License     : GNU GPL, version 2 or above 
+
+   Maintainer  : John MacFarlane <jgm at berkeley dot edu>
+   Stability   : alpha
+   Portability : portable
+
+Conversion of 'Pandoc' documents to Docbook XML.
+-}
+module Text.Pandoc.Writers.Docbook ( 
+                                     writeDocbook
+                                   ) where
+import Text.Pandoc.Definition
+import Text.Pandoc.Shared
+import Text.Pandoc.Writers.HTML ( stringToSmartHtml, stringToHtml )
+import Text.Html ( stringToHtmlString )
+import Text.Regex ( mkRegex, matchRegex )
+import Data.Char ( toLower )
+import Data.List ( isPrefixOf, partition )
+import Text.PrettyPrint.HughesPJ hiding ( Str )
+
+-- | Data structure for defining hierarchical Pandoc documents
+data Element = Blk Block 
+             | Sec [Inline] [Element] deriving (Eq, Read, Show)
+
+-- | Returns true on Header block with level at least 'level'
+headerAtLeast :: Int -> Block -> Bool
+headerAtLeast level (Header x _) = x <= level
+headerAtLeast level _ = False
+
+-- | Convert list of Pandoc blocks into list of Elements (hierarchical) 
+hierarchicalize :: [Block] -> [Element]
+hierarchicalize [] = []
+hierarchicalize (block:rest) = 
+  case block of
+    (Header level title) -> let (thisSection, rest') = break (headerAtLeast level) rest in
+                            (Sec title (hierarchicalize thisSection)):(hierarchicalize rest') 
+    x                    -> (Blk x):(hierarchicalize rest)
+
+-- | Convert list of authors to a docbook <author> section
+authorToDocbook :: WriterOptions -> [Char] -> Doc
+authorToDocbook options name = indentedInTags options "author" $ 
+  if ',' `elem` name
+    then -- last name first
+      let (lastname, rest) = break (==',') name 
+          firstname = removeLeadingSpace rest in
+      inTags "firstname" (text $ stringToXML options firstname) <> 
+      inTags "surname" (text $ stringToXML options lastname) 
+    else -- last name last
+      let namewords = words name
+          lengthname = length namewords 
+          (firstname, lastname) = case lengthname of
+            0  -> ("","") 
+            1  -> ("", name)
+            n  -> (joinWithSep " " (take (n-1) namewords), last namewords) in
+       inTags "firstname" (text $ stringToXML options firstname) $$ 
+       inTags "surname" (text $ stringToXML options lastname) 
+
+-- | Convert Pandoc document to string in Docbook format.
+writeDocbook :: WriterOptions -> Pandoc -> String
+writeDocbook options (Pandoc (Meta title authors date) blocks) = 
+  let head = if (writerStandalone options)
+                then text (writerHeader options)
+                else empty
+      meta = if (writerStandalone options)
+                then indentedInTags options "articleinfo" $
+                     (inTags "title" (inlinesToDocbook options title)) $$ 
+                     (vcat (map (authorToDocbook options) authors)) $$ 
+                     (inTags "date" (text date)) 
+                else empty
+      blocks' = replaceReferenceLinks blocks
+      (noteBlocks, blocks'') = partition isNoteBlock blocks' 
+      options' = options {writerNotes = noteBlocks}
+      elements = hierarchicalize blocks''
+      body = text (writerIncludeBefore options') <>
+             vcat (map (elementToDocbook options') elements) $$
+             text (writerIncludeAfter options')
+      body' = if writerStandalone options'
+                then indentedInTags options' "article" (meta $$ body)
+                else body in  
+  render $ head $$ body' <> text "\n"
+
+-- | Put the supplied contents between start and end tags of tagType,
+--   with specified attributes.
+inTagsWithAttrib :: String -> [(String, String)] -> Doc -> Doc
+inTagsWithAttrib tagType attribs contents = text ("<" ++ tagType ++ 
+  (concatMap (\(a, b) -> " " ++ attributeStringToXML a ++ 
+  "=\"" ++ attributeStringToXML b ++ "\"") attribs)) <> 
+  if isEmpty contents
+    then text " />" -- self-closing tag
+    else text ">" <> contents <> text ("</" ++ tagType ++ ">") 
+
+-- | Put the supplied contents between start and end tags of tagType.
+inTags :: String -> Doc -> Doc
+inTags tagType contents = inTagsWithAttrib tagType [] contents
+
+-- | Put the supplied contents in indented block btw start and end tags.
+indentedInTags :: WriterOptions -> [Char] -> Doc -> Doc
+indentedInTags options tagType contents = text ("<" ++ tagType ++ ">") $$
+  nest 2 contents $$ text ("</" ++ tagType ++ ">") 
+
+-- | Convert an Element to Docbook.
+elementToDocbook :: WriterOptions -> Element -> Doc
+elementToDocbook options (Blk block) = blockToDocbook options block 
+elementToDocbook options (Sec title elements) =
+  -- Docbook doesn't allow sections with no content, so insert some if needed
+  let elements' = if null elements
+                    then [Blk (Para [])]
+                    else elements in 
+  indentedInTags options "section" $
+  inTags "title" (wrap options title) $$
+  vcat (map (elementToDocbook options) elements') 
+
+-- | Convert a list of Pandoc blocks to Docbook.
+blocksToDocbook :: WriterOptions -> [Block] -> Doc
+blocksToDocbook options = vcat . map (blockToDocbook options)
+
+-- | Convert a list of lists of blocks to a list of Docbook list items.
+listItemsToDocbook :: WriterOptions -> [[Block]] -> Doc
+listItemsToDocbook options items = 
+  vcat $ map (listItemToDocbook options) items
+
+-- | Convert a list of blocks into a Docbook list item.
+listItemToDocbook :: WriterOptions -> [Block] -> Doc
+listItemToDocbook options item =
+  let plainToPara (Plain x) = Para x
+      plainToPara y = y in
+  let item' = map plainToPara item in
+  indentedInTags options "listitem" (blocksToDocbook options item')
+
+-- | Convert a Pandoc block element to Docbook.
+blockToDocbook :: WriterOptions -> Block -> Doc
+blockToDocbook options Blank = text ""
+blockToDocbook options Null = empty
+blockToDocbook options (Plain lst) = wrap options lst
+blockToDocbook options (Para lst) = 
+  indentedInTags options "para" (wrap options lst)
+blockToDocbook options (BlockQuote blocks) =
+  indentedInTags options "blockquote" (blocksToDocbook options blocks)
+blockToDocbook options (CodeBlock str) = 
+  text "<programlisting>" <> (cdata str) <> text "</programlisting>"
+blockToDocbook options (BulletList lst) = 
+  indentedInTags options "itemizedlist" $ listItemsToDocbook options lst 
+blockToDocbook options (OrderedList lst) = 
+  indentedInTags options "orderedlist" $ listItemsToDocbook options lst 
+blockToDocbook options (RawHtml str) = text str -- raw XML block 
+blockToDocbook options HorizontalRule = empty -- not semantic
+blockToDocbook options (Note _ _) = empty -- shouldn't occur
+blockToDocbook options (Key _ _) = empty  -- shouldn't occur
+blockToDocbook options _ = indentedInTags options "para" (text "Unknown block type")
+
+-- | Put string in CDATA section
+cdata :: String -> Doc
+cdata str = text $ "<![CDATA[" ++ str ++ "]]>"
+
+-- | Take list of inline elements and return wrapped doc.
+wrap :: WriterOptions -> [Inline] -> Doc
+wrap options lst = fsep $ map (hcat . (map (inlineToDocbook options))) (splitBySpace lst)
+
+-- | Escape a string for XML (with "smart" option if specified).
+stringToXML :: WriterOptions -> String -> String
+stringToXML options = if writerSmart options
+                        then stringToSmartHtml
+                        else stringToHtml
+
+-- | Escape string to XML appropriate for attributes
+attributeStringToXML :: String -> String
+attributeStringToXML = gsub "\"" "&quot;" . codeStringToXML
+
+-- | Escape a literal string for XML.
+codeStringToXML :: String -> String
+codeStringToXML = gsub "<" "&lt;" . gsub "&" "&amp;" 
+
+-- | Convert a list of inline elements to Docbook.
+inlinesToDocbook :: WriterOptions -> [Inline] -> Doc
+inlinesToDocbook options lst = hcat (map (inlineToDocbook options) lst)
+
+-- | Convert an inline element to Docbook.
+inlineToDocbook :: WriterOptions -> Inline -> Doc
+inlineToDocbook options (Str str) = text $ stringToXML options str 
+inlineToDocbook options (Emph lst) = 
+  inTags "emphasis" (inlinesToDocbook options lst)
+inlineToDocbook options (Strong lst) = 
+  inTagsWithAttrib "emphasis" [("role", "strong")] 
+  (inlinesToDocbook options lst)
+inlineToDocbook options (Code str) = 
+  inTags "literal" $ text (codeStringToXML str)
+inlineToDocbook options (TeX str) = inlineToDocbook options (Code str)
+inlineToDocbook options (HtmlInline str) = empty
+inlineToDocbook options LineBreak = 
+  text $ "<literallayout></literallayout>" 
+inlineToDocbook options Space = char ' '
+inlineToDocbook options (Link txt (Src src tit)) =
+  case (matchRegex (mkRegex "mailto:(.*)") src) of
+    Just [addr] -> inTags "email" $ text (codeStringToXML addr)
+    Nothing     -> inTagsWithAttrib "ulink" [("url", src)] $
+                   inlinesToDocbook options txt
+inlineToDocbook options (Link text (Ref ref)) = empty -- shouldn't occur
+inlineToDocbook options (Image alt (Src src tit)) = 
+  let titleDoc = if null tit
+                   then empty
+                   else indentedInTags options "objectinfo" $
+                        indentedInTags options "title" 
+                        (text $ stringToXML options tit) in
+  indentedInTags options "inlinemediaobject" $ 
+  indentedInTags options "imageobject" $
+  titleDoc $$ inTagsWithAttrib "imagedata" [("fileref", src)] empty 
+inlineToDocbook options (Image alternate (Ref ref)) = empty --shouldn't occur
+inlineToDocbook options (NoteRef ref) = 
+  let notes = writerNotes options
+      hits = filter (\(Note r _) -> r == ref) notes in
+  if null hits
+    then empty
+    else let (Note _ contents) = head hits in
+         indentedInTags options "footnote" $ blocksToDocbook options contents
+inlineToDocbook options _ = empty
--- a/src/Text/Pandoc/Writers/HTML.hs
+++ b/src/Text/Pandoc/Writers/HTML.hs
@ -28,7 +28,9 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 Conversion of 'Pandoc' documents to HTML.
 -}
 module Text.Pandoc.Writers.HTML ( 
-                                 writeHtml
+                                 writeHtml,
+                                 stringToSmartHtml,
+                                 stringToHtml
                                ) where
 import Text.Pandoc.Definition
 import Text.Pandoc.Shared
--- a/src/headers/DocbookHeader
+++ b/src/headers/DocbookHeader
@ -0,0 +1,3 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+                  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
--- a/src/templates/DefaultHeaders.hs
+++ b/src/templates/DefaultHeaders.hs
@ -1,6 +1,7 @@
 -- | Default headers for Pandoc writers.
 module Text.Pandoc.Writers.DefaultHeaders  (
                                      defaultLaTeXHeader,
+                                      defaultDocbookHeader,
                                      defaultHtmlHeader,
                                      defaultS5Header,
                                      defaultRTFHeader
@ -10,6 +11,9 @@ import Text.Pandoc.Writers.S5
 defaultLaTeXHeader :: String
 defaultLaTeXHeader = "@LaTeXHeader@"

+defaultDocbookHeader :: String
+defaultDocbookHeader = "@DocbookHeader@"
+
 defaultHtmlHeader :: String
 defaultHtmlHeader = "@HtmlHeader@"

--- a/tests/generate.sh
+++ b/tests/generate.sh
@ -7,4 +7,5 @@
 ../pandoc -r native -s -w html -S testsuite.native > writer.smart.html
 ../pandoc -r native -s -w latex testsuite.native > writer.latex
 ../pandoc -r native -s -w rtf testsuite.native > writer.rtf
+sed -e '/^, Header 1 \[Str "HTML",Space,Str "Blocks"\]/,/^, HorizontalRule/d' testsuite.native | ../pandoc -r native -w docbook -s > writer.docbook

--- a/tests/runtests.pl
+++ b/tests/runtests.pl
@ -52,6 +52,11 @@ foreach my $format (@writeformats)
    test_results("$format writer", "tmp.$extension", "writer.$format");
 }

+print "Testing docbook writer...";
+# remove HTML block tests, as this produces invalid docbook...
+`sed -e '/^, Header 1 \\[Str "HTML",Space,Str "Blocks"\\]/,/^, HorizontalRule/d' testsuite.native | $script -r native -w docbook -s > tmp.docbook`;
+test_results("docbook writer", "tmp.docbook", "writer.docbook");
+
 print "Testing s5 writer (basic)...";
 `$script -r native -w s5 -s s5.native > tmp.html`;
 test_results("s5 writer (basic)", "tmp.html", "s5.basic.html");
--- a/tests/writer.docbook
+++ b/tests/writer.docbook
@ -0,0 +1,997 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+                  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
+
+<article>
+  <articleinfo>
+    <title>Pandoc Test Suite</title>
+    <author>
+      <firstname>John</firstname>
+      <surname>MacFarlane</surname>
+    </author>
+    <author>
+      <firstname></firstname>
+      <surname>Anonymous</surname>
+    </author>
+    <date>July 17, 2006</date>
+  </articleinfo>
+  <para>
+    This is a set of tests for pandoc. Most of them are adapted from
+    John Gruber's markdown test suite.
+  </para>
+  <section>
+    <title>Headers</title>
+    <section>
+      <title>Level 2 with an
+             <ulink url="/url">embedded link</ulink></title>
+      <section>
+        <title>Level 3 with <emphasis>emphasis</emphasis></title>
+        <section>
+          <title>Level 4</title>
+          <section>
+            <title>Level 5</title>
+            <para>
+            </para>
+          </section>
+        </section>
+      </section>
+    </section>
+  </section>
+  <section>
+    <title>Level 1</title>
+    <section>
+      <title>Level 2 with <emphasis>emphasis</emphasis></title>
+      <section>
+        <title>Level 3</title>
+        <para>
+          with no blank line
+        </para>
+      </section>
+    </section>
+    <section>
+      <title>Level 2</title>
+      <para>
+        with no blank line
+      </para>
+    </section>
+  </section>
+  <section>
+    <title>Paragraphs</title>
+    <para>
+      Here's a regular paragraph.
+    </para>
+    <para>
+      In Markdown 1.0.0 and earlier. Version 8. This line turns into a
+      list item. Because a hard-wrapped line in the middle of a paragraph
+      looked like a list item.
+    </para>
+    <para>
+      Here's one with a bullet. * criminey.
+    </para>
+    <para>
+      There should be a hard line
+      break<literallayout></literallayout>here.
+    </para>
+  </section>
+  <section>
+    <title>Block Quotes</title>
+    <para>
+      E-mail style:
+    </para>
+    <blockquote>
+      <para>
+        This is a block quote. It is pretty short.
+      </para>
+    </blockquote>
+    <blockquote>
+      <para>
+        Code in a block quote:
+      </para>
+      <programlisting><![CDATA[sub status {
+    print "working";
+}]]></programlisting>
+      <para>
+        A list:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            item one
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            item two
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        Nested block quotes:
+      </para>
+      <blockquote>
+        <para>
+          nested
+        </para>
+      </blockquote>
+      <blockquote>
+        <para>
+          nested
+        </para>
+      </blockquote>
+    </blockquote>
+    <para>
+      This should not be a block quote: 2 &gt; 1.
+    </para>
+    <para>
+      Box-style:
+    </para>
+    <blockquote>
+      <para>
+        Example:
+      </para>
+      <programlisting><![CDATA[sub status {
+    print "working";
+}]]></programlisting>
+    </blockquote>
+    <blockquote>
+      <orderedlist>
+        <listitem>
+          <para>
+            do laundry
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            take out the trash
+          </para>
+        </listitem>
+      </orderedlist>
+    </blockquote>
+    <para>
+      Here's a nested one:
+    </para>
+    <blockquote>
+      <para>
+        Joe said:
+      </para>
+      <blockquote>
+        <para>
+          Don't quote me.
+        </para>
+      </blockquote>
+    </blockquote>
+    <para>
+      And a following paragraph.
+    </para>
+  </section>
+  <section>
+    <title>Code Blocks</title>
+    <para>
+      Code:
+    </para>
+    <programlisting><![CDATA[---- (should be four hyphens)
+
+sub status {
+    print "working";
+}
+
+this code block is indented by one tab]]></programlisting>
+    <para>
+      And:
+    </para>
+    <programlisting><![CDATA[    this code block is indented by two tabs
+
+These should not be escaped:  \$ \\ \> \[ \{]]></programlisting>
+  </section>
+  <section>
+    <title>Lists</title>
+    <section>
+      <title>Unordered</title>
+      <para>
+        Asterisks tight:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            asterisk 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            asterisk 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            asterisk 3
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Asterisks loose:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            asterisk 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            asterisk 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            asterisk 3
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Pluses tight:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            Plus 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Plus 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Plus 3
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Pluses loose:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            Plus 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Plus 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Plus 3
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Minuses tight:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            Minus 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Minus 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Minus 3
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Minuses loose:
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            Minus 1
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Minus 2
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Minus 3
+          </para>
+        </listitem>
+      </itemizedlist>
+    </section>
+    <section>
+      <title>Ordered</title>
+      <para>
+        Tight:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            First
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Second
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Third
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        and:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            One
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Two
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Three
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        Loose using tabs:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            First
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Second
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Third
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        and using spaces:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            One
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Two
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Three
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        Multiple paragraphs:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            Item 1, graf one.
+          </para>
+          <para>
+            Item 1. graf two. The quick brown fox jumped over the lazy dog's
+            back.
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Item 2.
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Item 3.
+          </para>
+        </listitem>
+      </orderedlist>
+    </section>
+    <section>
+      <title>Nested</title>
+      <itemizedlist>
+        <listitem>
+          <para>
+            Tab
+          </para>
+          <itemizedlist>
+            <listitem>
+              <para>
+                Tab
+              </para>
+              <itemizedlist>
+                <listitem>
+                  <para>
+                    Tab
+                  </para>
+                </listitem>
+              </itemizedlist>
+            </listitem>
+          </itemizedlist>
+        </listitem>
+      </itemizedlist>
+      <para>
+        Here's another:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            First
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Second:
+          </para>
+          <itemizedlist>
+            <listitem>
+              <para>
+                Fee
+              </para>
+            </listitem>
+            <listitem>
+              <para>
+                Fie
+              </para>
+            </listitem>
+            <listitem>
+              <para>
+                Foe
+              </para>
+            </listitem>
+          </itemizedlist>
+        </listitem>
+        <listitem>
+          <para>
+            Third
+          </para>
+        </listitem>
+      </orderedlist>
+      <para>
+        Same thing but with paragraphs:
+      </para>
+      <orderedlist>
+        <listitem>
+          <para>
+            First
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            Second:
+          </para>
+          <itemizedlist>
+            <listitem>
+              <para>
+                Fee
+              </para>
+            </listitem>
+            <listitem>
+              <para>
+                Fie
+              </para>
+            </listitem>
+            <listitem>
+              <para>
+                Foe
+              </para>
+            </listitem>
+          </itemizedlist>
+        </listitem>
+        <listitem>
+          <para>
+            Third
+          </para>
+        </listitem>
+      </orderedlist>
+    </section>
+    <section>
+      <title>Tabs and spaces</title>
+      <itemizedlist>
+        <listitem>
+          <para>
+            this is a list item indented with tabs
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            this is a list item indented with spaces
+          </para>
+          <itemizedlist>
+            <listitem>
+              <para>
+                this is an example list item indented with tabs
+              </para>
+            </listitem>
+            <listitem>
+              <para>
+                this is an example list item indented with spaces
+              </para>
+            </listitem>
+          </itemizedlist>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </section>
+  <section>
+    <title>Inline Markup</title>
+    <para>
+      This is <emphasis>emphasized</emphasis>, and so
+      <emphasis>is this</emphasis>.
+    </para>
+    <para>
+      This is <emphasis role="strong">strong</emphasis>, and so
+      <emphasis role="strong">is this</emphasis>.
+    </para>
+    <para>
+      An <emphasis><ulink url="/url">emphasized link</ulink></emphasis>.
+    </para>
+    <para>
+      <emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis>
+    </para>
+    <para>
+      So is <emphasis role="strong"><emphasis>this</emphasis></emphasis>
+      word.
+    </para>
+    <para>
+      <emphasis role="strong"><emphasis>This is strong and em.</emphasis></emphasis>
+    </para>
+    <para>
+      So is <emphasis role="strong"><emphasis>this</emphasis></emphasis>
+      word.
+    </para>
+    <para>
+      This is code: <literal>></literal>, <literal>$</literal>,
+      <literal>\</literal>, <literal>\$</literal>,
+      <literal>&lt;html></literal>.
+    </para>
+  </section>
+  <section>
+    <title>Smart quotes, ellipses, dashes</title>
+    <para>
+      "Hello," said the spider. "'Shelob' is my name."
+    </para>
+    <para>
+      'A', 'B', and 'C' are letters.
+    </para>
+    <para>
+      'Oak,' 'elm,' and 'beech' are names of trees. So is 'pine.'
+    </para>
+    <para>
+      'He said, "I want to go."' Were you alive in the 70's?
+    </para>
+    <para>
+      Here is some quoted '<literal>code</literal>' and a
+      "<ulink url="http://example.com/?foo=1&amp;bar=2">quoted link</ulink>".
+    </para>
+    <para>
+      Some dashes: one---two --- three--four -- five.
+    </para>
+    <para>
+      Dashes between numbers: 5-7, 255-66, 1987-1999.
+    </para>
+    <para>
+      Ellipses...and. . .and . . . .
+    </para>
+  </section>
+  <section>
+    <title>LaTeX</title>
+    <itemizedlist>
+      <listitem>
+        <para>
+          <literal>\cite[22-23]{smith.1899}</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>\doublespacing</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$2+2=4$</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$x \in y$</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$\alpha \wedge \omega$</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$223$</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$p$</literal>-Tree
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          <literal>$\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$</literal>
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          Here's one that has a line break in it:
+          <literal>$\alpha + \omega \times x^2$</literal>.
+        </para>
+      </listitem>
+    </itemizedlist>
+    <para>
+      These shouldn't be math:
+    </para>
+    <itemizedlist>
+      <listitem>
+        <para>
+          To get the famous equation, write <literal>$e = mc^2$</literal>.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          $22,000 is a <emphasis>lot</emphasis> of money. So is $34,000. (It
+          worked if "lot" is emphasized.)
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          Escaped <literal>$</literal>: $73
+          <emphasis>this should be emphasized</emphasis> 23$.
+        </para>
+      </listitem>
+    </itemizedlist>
+    <para>
+      Here's a LaTeX table:
+    </para>
+    <para>
+      <literal>\begin{tabular}{|l|l|}\hline
+Animal &amp; Number \\ \hline
+Dog    &amp; 2      \\
+Cat    &amp; 1      \\ \hline
+\end{tabular}</literal>
+    </para>
+  </section>
+  <section>
+    <title>Special Characters</title>
+    <para>
+      Here is some unicode:
+    </para>
+    <itemizedlist>
+      <listitem>
+        <para>
+          I hat: Î
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          o umlaut: ö
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          section: §
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          set membership: ∈
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          copyright: ©
+        </para>
+      </listitem>
+    </itemizedlist>
+    <para>
+      AT&amp;T has an ampersand in their name.
+    </para>
+    <para>
+      AT&amp;T is another way to write it.
+    </para>
+    <para>
+      This &amp; that.
+    </para>
+    <para>
+      4 &lt; 5.
+    </para>
+    <para>
+      6 &gt; 5.
+    </para>
+    <para>
+      Backslash: \
+    </para>
+    <para>
+      Backtick: `
+    </para>
+    <para>
+      Asterisk: *
+    </para>
+    <para>
+      Underscore: _
+    </para>
+    <para>
+      Left brace: {
+    </para>
+    <para>
+      Right brace: }
+    </para>
+    <para>
+      Left bracket: [
+    </para>
+    <para>
+      Right bracket: ]
+    </para>
+    <para>
+      Left paren: (
+    </para>
+    <para>
+      Right paren: )
+    </para>
+    <para>
+      Greater-than: &gt;
+    </para>
+    <para>
+      Hash: #
+    </para>
+    <para>
+      Period: .
+    </para>
+    <para>
+      Bang: !
+    </para>
+    <para>
+      Plus: +
+    </para>
+    <para>
+      Minus: -
+    </para>
+  </section>
+  <section>
+    <title>Links</title>
+    <section>
+      <title>Explicit</title>
+      <para>
+        Just a <ulink url="/url/">URL</ulink>.
+      </para>
+      <para>
+        <ulink url="/url/">URL and title</ulink>.
+      </para>
+      <para>
+        <ulink url="/url/">URL and title</ulink>.
+      </para>
+      <para>
+        <ulink url="/url/">URL and title</ulink>.
+      </para>
+      <para>
+        <ulink url="/url/">URL and title</ulink>
+      </para>
+      <para>
+        <ulink url="/url/">URL and title</ulink>
+      </para>
+      <para>
+        <email>nobody@nowhere.net</email>
+      </para>
+      <para>
+        <ulink url="">Empty</ulink>.
+      </para>
+    </section>
+    <section>
+      <title>Reference</title>
+      <para>
+        Foo <ulink url="/url/">bar</ulink>.
+      </para>
+      <para>
+        Foo <ulink url="/url/">bar</ulink>.
+      </para>
+      <para>
+        Foo <ulink url="/url/">bar</ulink>.
+      </para>
+      <para>
+        With <ulink url="/url/">embedded [brackets]</ulink>.
+      </para>
+      <para>
+        <ulink url="/url/">b</ulink> by itself should be a link.
+      </para>
+      <para>
+        Indented <ulink url="/url">once</ulink>.
+      </para>
+      <para>
+        Indented <ulink url="/url">twice</ulink>.
+      </para>
+      <para>
+        Indented <ulink url="/url">thrice</ulink>.
+      </para>
+      <para>
+        This should [not][] be a link.
+      </para>
+      <programlisting><![CDATA[[not]: /url]]></programlisting>
+      <para>
+        Foo <ulink url="/url/">bar</ulink>.
+      </para>
+      <para>
+        Foo <ulink url="/url/">biz</ulink>.
+      </para>
+    </section>
+    <section>
+      <title>With ampersands</title>
+      <para>
+        Here's a
+        <ulink url="http://example.com/?foo=1&amp;bar=2">link with an ampersand in the URL</ulink>.
+      </para>
+      <para>
+        Here's a link with an amersand in the link text:
+        <ulink url="http://att.com/">AT&amp;T</ulink>.
+      </para>
+      <para>
+        Here's an <ulink url="/script?foo=1&amp;bar=2">inline link</ulink>.
+      </para>
+      <para>
+        Here's an
+        <ulink url="/script?foo=1&amp;bar=2">inline link in pointy braces</ulink>.
+      </para>
+    </section>
+    <section>
+      <title>Autolinks</title>
+      <para>
+        With an ampersand:
+        <ulink url="http://example.com/?foo=1&amp;bar=2">http://example.com/?foo=1&amp;bar=2</ulink>
+      </para>
+      <itemizedlist>
+        <listitem>
+          <para>
+            In a list?
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            <ulink url="http://example.com/">http://example.com/</ulink>
+          </para>
+        </listitem>
+        <listitem>
+          <para>
+            It should.
+          </para>
+        </listitem>
+      </itemizedlist>
+      <para>
+        An e-mail address: <email>nobody@nowhere.net</email>
+      </para>
+      <blockquote>
+        <para>
+          Blockquoted:
+          <ulink url="http://example.com/">http://example.com/</ulink>
+        </para>
+      </blockquote>
+      <para>
+        Auto-links should not occur here:
+        <literal>&lt;http://example.com/></literal>
+      </para>
+      <programlisting><![CDATA[or here: <http://example.com/>]]></programlisting>
+    </section>
+  </section>
+  <section>
+    <title>Images</title>
+    <para>
+      From "Voyage dans la Lune" by Georges Melies (1902):
+    </para>
+    <para>
+      <inlinemediaobject>
+        <imageobject>
+          <objectinfo>
+            <title>
+              Voyage dans la Lune
+            </title>
+          </objectinfo>
+          <imagedata fileref="lalune.jpg" />
+        </imageobject>
+      </inlinemediaobject>
+    </para>
+    <para>
+      Here is a movie
+      <inlinemediaobject>
+        <imageobject>
+          <imagedata fileref="movie.jpg" />
+        </imageobject>
+      </inlinemediaobject>
+      icon.
+    </para>
+  </section>
+  <section>
+    <title>Footnotes</title>
+    <para>
+      Here is a footnote
+      reference,<footnote>
+                  <para>
+                    Here is the footnote. It can go anywhere after the footnote
+                    reference. It need not be placed at the end of the document.
+                  </para>
+                </footnote>
+      and
+      another.<footnote>
+                <para>
+                  Here's the long note. This one contains multiple blocks.
+                </para>
+                <para>
+                  Subsequent blocks are indented to show that they belong to the
+                  footnote (as with list items).
+                </para>
+                <programlisting><![CDATA[  { <code> }]]></programlisting>
+                <para>
+                  If you want, you can indent every line, but you can also be lazy
+                  and just indent the first line of each block.
+                </para>
+              </footnote>
+      This should <emphasis>not</emphasis> be a footnote reference,
+      because it contains a space.[^my note] Here is an inline
+      note.<footnote>
+             <para>
+               This is <emphasis>easier</emphasis> to type. Inline notes may
+               contain <ulink url="http://google.com">links</ulink> and
+               <literal>]</literal> verbatim characters.
+             </para>
+           </footnote>
+    </para>
+    <blockquote>
+      <para>
+        Notes can go in
+        quotes.<footnote>
+                 <para>
+                   In quote.
+                 </para>
+               </footnote>
+      </para>
+    </blockquote>
+    <orderedlist>
+      <listitem>
+        <para>
+          And in list
+          items.<footnote>
+                  <para>
+                    In list.
+                  </para>
+                </footnote>
+        </para>
+      </listitem>
+    </orderedlist>
+    <para>
+      This paragraph should not be part of the note, as it is not
+      indented.
+    </para>
+  </section>
+  
+</article>
--- a/web/index.txt
+++ b/web/index.txt
@ -4,8 +4,8 @@ Pandoc is a [Haskell] library for converting from one markup format
 to another, and a command-line tool that uses this library. It can read
 [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
 and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
-and [S5] HTML slide shows. Pandoc's version of markdown contains some
-enhancements, like footnotes and embedded LaTeX.
+[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown
+contains some enhancements, like footnotes and embedded LaTeX.

 In contrast to existing tools for converting markdown to HTML, which
 use regex substitutions, Pandoc has a modular design: it consists of a
@ -73,6 +73,7 @@ kind.
 [HTML]:  http://www.w3.org/TR/html40/
 [LaTeX]:  http://www.latex-project.org/
 [RTF]:  http://en.wikipedia.org/wiki/Rich_Text_Format
+[DocBook XML]: http://www.docbook.org/
 [Haskell]:  http://www.haskell.org/
 [GHC]:  http://www.haskell.org/ghc/
 [GPL]:  http://www.gnu.org/copyleft/gpl.html
--- a/web/mkdemos.sh
+++ b/web/mkdemos.sh
@ -16,6 +16,7 @@ pandoc -s README.tex -o example0.txt
 pandoc -s -w rst README -o example0.txt
 pandoc -s README -o example0.rtf 
 pandoc -s -m -i -w s5 S5DEMO -o example0.html
+pandoc -s -w docbook README -o example0.db
 html2markdown http://www.gnu.org/software/make/ -o example0.txt
 markdown2pdf README -o example0.pdf
 markdown2pdf -C myheader.tex README -o example0.pdf'