FB2 writer: represent HorizontalRule as empty line

HorizontalRule corresponds to <hr> element in the default output
format, HTML. Current HTML standard defines <hr> element as
"paragraph-level thematic break". In typography it is often
represented by extra space or centered asterism ("⁂"), but since
FB2 does not support text centering, empty line (similar to extra space)
is the only solution.

Line breaks, on the other hand, don't generate <empty-line />
anymore. Previously line breaks generated <empty-line /> element
inside paragraph, which is not allowed. So, this commit addresses
issue #2424 ("FB2 produced by pandoc doesn't validate").

FB2 does not have a way to represent line breaks inside paragraphs.
They are replaced with LF character, which is not rendered by
FB2 readers, but at least preserves some information.
This commit is contained in:
Alexander Krotov 2018-04-04 02:41:56 +03:00
parent 87dda2109d
commit f224567d52
5 changed files with 82 additions and 62 deletions

View file

@ -365,10 +365,7 @@ blockToXml h@Header{} = do
-- should not occur after hierarchicalize, except inside lists/blockquotes
report $ BlockNotRendered h
return []
blockToXml HorizontalRule = return
[ el "empty-line" ()
, el "p" (txt (replicate 10 '—'))
, el "empty-line" () ]
blockToXml HorizontalRule = return [ el "empty-line" () ]
blockToXml (Table caption aligns _ headers rows) = do
hd <- mkrow "th" headers aligns
bd <- mapM (\r -> mkrow "td" r aligns) rows
@ -398,7 +395,7 @@ plainToPara [] = []
plainToPara (Plain inlines : rest) =
Para inlines : plainToPara rest
plainToPara (Para inlines : rest) =
Para inlines : Plain [LineBreak] : plainToPara rest
Para inlines : HorizontalRule : plainToPara rest -- HorizontalRule will be converted to <empty-line />
plainToPara (p:rest) = p : plainToPara rest
-- Simulate increased indentation level. Will not really work
@ -449,8 +446,8 @@ toXml (Quoted DoubleQuote ss) = do
toXml (Cite _ ss) = cMapM toXml ss -- FIXME: support citation styles
toXml (Code _ s) = return [el "code" s]
toXml Space = return [txt " "]
toXml SoftBreak = return [txt " "]
toXml LineBreak = return [el "empty-line" ()]
toXml SoftBreak = return [txt "\n"]
toXml LineBreak = return [txt "\n"]
toXml (Math _ formula) = insertMath InlineImage formula
toXml il@(RawInline _ _) = do
report $ InlineNotRendered il

View file

@ -25,8 +25,8 @@ tests = [ testGroup "block elements"
]
, testGroup "inlines"
[
"Emphasis" =: emph "emphasized"
=?> fb2 "<emphasis>emphasized</emphasis>"
"Emphasis" =: para (emph "emphasized")
=?> fb2 "<p><emphasis>emphasized</emphasis></p>"
]
, "bullet list" =: bulletList [ plain $ text "first"
, plain $ text "second"

View file

@ -1,3 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink"><description><title-info><genre>unrecognised</genre></title-info><document-info><program-used>pandoc</program-used></document-info></description><body><title><p /></title><section><title><p>Top-level title</p></title><section><title><p>Section</p></title><section><title><p>Subsection</p></title><p>This <emphasis>emphasized</emphasis> <strong>strong</strong> <code>verbatim</code> markdown. See this link<a l:href="#l1" type="note"><sup>[1]</sup></a>.</p><p>Ordered list:</p><p>1. one</p><p>2. two</p><p>3. three</p><cite><p>Blockquote is for citatons.</p></cite><empty-line /><p><code>Code</code></p><p><code>block</code></p><p><code>is</code></p><p><code>for</code></p><p><code>code.</code></p><empty-line /><p><strikethrough>Strikeout</strikethrough> is Pandocs extension. Superscript and subscripts too: H<sub>2</sub>O is a liquid<a l:href="#n2" type="note"><sup>[2]</sup></a>. 2<sup>10</sup> is 1024.</p><p>Math is another Pandoc extension: <code>E = m c^2</code>.</p></section></section></section></body><body name="notes"><section id="l1"><title><p>1</p></title><p><code>http://example.com/</code></p></section><section id="n2"><title><p>2</p></title><p>Sometimes.</p></section></body></FictionBook>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink"><description><title-info><genre>unrecognised</genre></title-info><document-info><program-used>pandoc</program-used></document-info></description><body><title><p /></title><section><title><p>Top-level title</p></title><section><title><p>Section</p></title><section><title><p>Subsection</p></title><p>This <emphasis>emphasized</emphasis> <strong>strong</strong> <code>verbatim</code> markdown.
See this link<a l:href="#l1" type="note"><sup>[1]</sup></a>.</p><p>Ordered list:</p><p>1. one</p><p>2. two</p><p>3. three</p><cite><p>Blockquote
is
for
citatons.</p></cite><empty-line /><p><code>Code</code></p><p><code>block</code></p><p><code>is</code></p><p><code>for</code></p><p><code>code.</code></p><empty-line /><p><strikethrough>Strikeout</strikethrough> is Pandocs extension.
Superscript and subscripts too: H<sub>2</sub>O is a liquid<a l:href="#n2" type="note"><sup>[2]</sup></a>.
2<sup>10</sup> is 1024.</p><p>Math is another Pandoc extension: <code>E = m c^2</code>.</p></section></section></section></body><body name="notes"><section id="l1"><title><p>1</p></title><p><code>http://example.com/</code></p></section><section id="n2"><title><p>2</p></title><p>Sometimes.</p></section></body></FictionBook>

View file

@ -1,3 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink"><description><title-info><genre>unrecognised</genre></title-info><document-info><program-used>pandoc</program-used></document-info></description><body><title><p /></title><section><p>Simple table with caption:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis>Demonstration of simple table syntax.</emphasis></p><p>Simple table without caption:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis /></p><p>Simple table indented two spaces:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis>Demonstration of simple table syntax.</emphasis></p><p>Multiline table with caption:</p><table><tr><th align="center">Centered Header</th><th align="left">Left Aligned</th><th align="right">Right Aligned</th><th align="left">Default aligned</th></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note the blank line between rows.</td></tr></table><p><emphasis>Heres the caption. It may span multiple lines.</emphasis></p><p>Multiline table without caption:</p><table><tr><th align="center">Centered Header</th><th align="left">Left Aligned</th><th align="right">Right Aligned</th><th align="left">Default aligned</th></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note the blank line between rows.</td></tr></table><p><emphasis /></p><p>Table without column headers:</p><table><tr><th align="right" /><th align="left" /><th align="center" /><th align="right" /></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="right">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="right">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="right">1</td></tr></table><p><emphasis /></p><p>Multiline table without column headers:</p><table><tr><th align="center" /><th align="left" /><th align="right" /><th align="left" /></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note the blank line between rows.</td></tr></table><p><emphasis /></p></section></body></FictionBook>
<FictionBook xmlns="http://www.gribuser.ru/xml/fictionbook/2.0" xmlns:l="http://www.w3.org/1999/xlink"><description><title-info><genre>unrecognised</genre></title-info><document-info><program-used>pandoc</program-used></document-info></description><body><title><p /></title><section><p>Simple table with caption:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis>Demonstration of simple table syntax.</emphasis></p><p>Simple table without caption:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis /></p><p>Simple table indented two spaces:</p><table><tr><th align="right">Right</th><th align="left">Left</th><th align="center">Center</th><th align="left">Default</th></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="left">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="left">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="left">1</td></tr></table><p><emphasis>Demonstration of simple table syntax.</emphasis></p><p>Multiline table with caption:</p><table><tr><th align="center">Centered
Header</th><th align="left">Left
Aligned</th><th align="right">Right
Aligned</th><th align="left">Default aligned</th></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans
multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note
the blank line between rows.</td></tr></table><p><emphasis>Heres the caption.
It may span multiple lines.</emphasis></p><p>Multiline table without caption:</p><table><tr><th align="center">Centered
Header</th><th align="left">Left
Aligned</th><th align="right">Right
Aligned</th><th align="left">Default aligned</th></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans
multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note
the blank line between rows.</td></tr></table><p><emphasis /></p><p>Table without column headers:</p><table><tr><th align="right" /><th align="left" /><th align="center" /><th align="right" /></tr><tr><td align="right">12</td><td align="left">12</td><td align="center">12</td><td align="right">12</td></tr><tr><td align="right">123</td><td align="left">123</td><td align="center">123</td><td align="right">123</td></tr><tr><td align="right">1</td><td align="left">1</td><td align="center">1</td><td align="right">1</td></tr></table><p><emphasis /></p><p>Multiline table without column headers:</p><table><tr><th align="center" /><th align="left" /><th align="right" /><th align="left" /></tr><tr><td align="center">First</td><td align="left">row</td><td align="right">12.0</td><td align="left">Example of a row that spans
multiple lines.</td></tr><tr><td align="center">Second</td><td align="left">row</td><td align="right">5.0</td><td align="left">Heres another one. Note
the blank line between rows.</td></tr></table><p><emphasis /></p></section></body></FictionBook>

View file

@ -22,9 +22,8 @@
<p>Pandoc Test Suite</p>
</title>
<section>
<p>This is a set of tests for pandoc. Most of them are adapted from John Grubers markdown test suite.</p>
<empty-line />
<p>——————————</p>
<p>This is a set of tests for pandoc. Most of them are adapted from
John Grubers markdown test suite.</p>
<empty-line />
</section>
<section>
@ -78,8 +77,6 @@
</title>
<p>with no blank line</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
</section>
<section>
@ -87,11 +84,15 @@
<p>Paragraphs</p>
</title>
<p>Heres a regular paragraph.</p>
<p>In Markdown 1.0.0 and earlier. Version 8. This line turns into a list item. Because a hard-wrapped line in the middle of a paragraph looked like a list item.</p>
<p>Heres one with a bullet. * criminey.</p>
<p>There should be a hard line break<empty-line />here.</p>
<empty-line />
<p>——————————</p>
<p>In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.</p>
<p>Heres one with a bullet.
* criminey.</p>
<p>There should be a hard line break
here.</p>
<empty-line />
</section>
<section>
@ -100,7 +101,8 @@
</title>
<p>E-mail style:</p>
<cite>
<p>This is a block quote. It is pretty short.</p>
<p>This is a block quote.
It is pretty short.</p>
</cite>
<cite>
<p>Code in a block quote:</p>
@ -126,11 +128,10 @@
<p>nested</p>
</cite>
</cite>
<p>This should not be a block quote: 2 &gt; 1.</p>
<p>This should not be a block quote: 2
&gt; 1.</p>
<p>And a following paragraph.</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -176,8 +177,6 @@
</p>
<empty-line />
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -250,7 +249,8 @@
<p>Multiple paragraphs:</p>
<p>1. Item 1, graf one.</p>
<empty-line />
<p>   Item 1. graf two. The quick brown fox jumped over the lazy dogs back.</p>
<p>   Item 1. graf two. The quick brown fox jumped over the lazy dogs
back.</p>
<empty-line />
<p>2. Item 2.</p>
<empty-line />
@ -286,13 +286,17 @@
<title>
<p>Tabs and spaces</p>
</title>
<p>• this is a list item indented with tabs</p>
<p>• this is a list item
indented with tabs</p>
<empty-line />
<p>• this is a list item indented with spaces</p>
<p>• this is a list item
indented with spaces</p>
<empty-line />
<p>•  this is an example list item indented with tabs</p>
<p>•  this is an example list item
indented with tabs</p>
<empty-line />
<p>•  this is an example list item indented with spaces</p>
<p>•  this is an example list item
indented with spaces</p>
<empty-line />
</section>
<section>
@ -304,7 +308,8 @@
<empty-line />
<p>    with a continuation</p>
<empty-line />
<p>(3) iv. sublist with roman numerals, starting with 4</p>
<p>(3) iv. sublist with roman numerals,
starting with 4</p>
<p>(3) v. more items</p>
<p>(3) v. (A) a subsublist</p>
<p>(3) v. (B) a subsublist</p>
@ -321,8 +326,6 @@
<p>M.A. 2007</p>
<p>B. Williams</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
</section>
<section>
@ -379,7 +382,8 @@
</p>
<p>    red fruit</p>
<empty-line />
<p>    contains seeds, crisp, pleasant to taste</p>
<p>    contains seeds,
crisp, pleasant to taste</p>
<empty-line />
<p>
<strong>
@ -481,8 +485,6 @@
<empty-line />
<p>Hrs:</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -518,9 +520,8 @@
<emphasis>hello</emphasis>
</sup> a<sup>hello there</sup>.</p>
<p>Subscripts: H<sub>2</sub>O, H<sub>23</sub>O, H<sub>many of them</sub>O.</p>
<p>These should not be superscripts or subscripts, because of the unescaped spaces: a^b c^d, a~b c~d.</p>
<empty-line />
<p>——————————</p>
<p>These should not be superscripts or subscripts,
because of the unescaped spaces: a^b c^d, a~b c~d.</p>
<empty-line />
</section>
<section>
@ -529,8 +530,10 @@
</title>
<p>“Hello,” said the spider. “Shelob is my name.”</p>
<p>A, B, and C are letters.</p>
<p>Oak, elm, and beech are names of trees. So is pine.</p>
<p>He said, “I want to go.”’ Were you alive in the 70s?</p>
<p>Oak, elm, and beech are names of trees.
So is pine.</p>
<p>He said, “I want to go.”’ Were you alive in the
70s?</p>
<p>Here is some quoted <code>code</code> and a “quoted link<a l:href="#l3" type="note">
<sup>[3]</sup>
</a>”.</p>
@ -538,8 +541,6 @@
<p>Dashes between numbers: 57, 25566, 19871999.</p>
<p>Ellipses…and…and….</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -555,18 +556,18 @@
<p>• <code>223</code>
</p>
<p>• <code>p</code>-Tree</p>
<p>• Heres some display math: <code>\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}</code>
<p>• Heres some display math:
<code>\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}</code>
</p>
<p>• Heres one that has a line break in it: <code>\alpha + \omega \times x^2</code>.</p>
<p>These shouldnt be math:</p>
<p>• To get the famous equation, write <code>$e = mc^2$</code>.</p>
<p>• $22,000 is a <emphasis>lot</emphasis> of money. So is $34,000. (It worked if “lot” is emphasized.)</p>
<p>• $22,000 is a <emphasis>lot</emphasis> of money. So is $34,000.
(It worked if “lot” is emphasized.)</p>
<p>• Shoes ($20) and socks ($5).</p>
<p>• Escaped <code>$</code>: $73 <emphasis>this should be emphasized</emphasis> 23$.</p>
<p>Heres a LaTeX table:</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -600,8 +601,6 @@
<p>Plus: +</p>
<p>Minus: -</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -727,8 +726,6 @@
</p>
<empty-line />
<empty-line />
<p>——————————</p>
<empty-line />
</section>
</section>
<section>
@ -739,8 +736,6 @@
<image l:href="#image1" l:type="imageType" alt="lalune" title="Voyage dans la Lune" />
<p>Here is a movie <image l:href="#image2" l:type="inlineImageType" alt="movie" /> icon.</p>
<empty-line />
<p>——————————</p>
<empty-line />
</section>
<section>
<title>
@ -750,7 +745,9 @@
<sup>[29]</sup>
</a> and another.<a l:href="#n30" type="note">
<sup>[30]</sup>
</a> This should <emphasis>not</emphasis> be a footnote reference, because it contains a space.[^my note] Here is an inline note.<a l:href="#n31" type="note">
</a>
This should <emphasis>not</emphasis> be a footnote reference, because it
contains a space.[^my note] Here is an inline note.<a l:href="#n31" type="note">
<sup>[31]</sup>
</a>
</p>
@ -989,28 +986,35 @@
<title>
<p>29</p>
</title>
<p>Here is the footnote. It can go anywhere after the footnote reference. It need not be placed at the end of the document.</p>
<p>Here is the footnote. It can go anywhere after the footnote
reference. It need not be placed at the end of the document.</p>
</section>
<section id="n30">
<title>
<p>30</p>
</title>
<p>Heres the long note. This one contains multiple blocks.</p>
<p>Subsequent blocks are indented to show that they belong to the footnote (as with list items).</p>
<p>Heres the long note. This one contains multiple
blocks.</p>
<p>Subsequent blocks are indented to show that they belong to the
footnote (as with list items).</p>
<empty-line />
<p>
<code> { &lt;code&gt; }</code>
</p>
<empty-line />
<p>If you want, you can indent every line, but you can also be lazy and just indent the first line of each block.</p>
<p>If you want, you can indent every line, but you can also be
lazy and just indent the first line of each block.</p>
</section>
<section id="n31">
<title>
<p>31</p>
</title>
<p>This is <emphasis>easier</emphasis> to type. Inline notes may contain links<a l:href="#l31" type="note">
<p>This
is <emphasis>easier</emphasis> to type. Inline notes may contain
links<a l:href="#l31" type="note">
<sup>[31]</sup>
</a> and <code>]</code> verbatim characters, as well as [bracketed text].</p>
</a> and <code>]</code> verbatim characters,
as well as [bracketed text].</p>
</section>
<section id="n32">
<title>