Now we do as before, including blank lines after list items in
loose lists (even though RST doesn't care -- this is just a matter
of visual appeal). But we chomp any excess whitespace after the
last list item, which solves #1777.
While empty links are not allowed in Emacs org-mode, Pandoc org-mode
should support them: gitit relies on empty links as they are used to
create wiki links.
Fixesjgm/gitit#471
The org reader was to restrictive when parsing links, some relative
links and links to files given as absolute paths were not recognized
correctly. The org reader's link parsing function was amended to handle
such cases properly.
This fixes#1741
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported. Those trees are dropped
silently.
This closes#1678.
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings. This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text. This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue #1650.
Currently, pandoc has hard-coded the following in order to make tight lists in
LaTeX:
```hs
text "\\itemsep1pt\\parskip0pt\\parsep0pt"
```
Which is fine, but does not allow customizations. For example, the `memoir`
class already has a `\tightlist` declaration for this purpose:
```tex
\newcommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
```
I'm proposing to use a similar solution:
```diff
@@ In Writers/LaTeX.hs:
-then text "\\itemsep1pt\\parskip0pt\\parsep0pt"
+then text "\\tightlist"
@@ In templates/default.latex:
+\newcommand{\tightlist}{%
+ \setlength{\itemsep}{1pt}\setlength{\parskip}{0pt}\setlength{\parsep}{0pt}}
```
This allows us to customize the tightness to our needs.
Backward Compatibility
If a person is using a custom LaTeX template (not based upon the `memoir`
class), the `\tightlist` declaration must be added.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.
Test case:
<div class="first">
This is a paragraph.
This is another paragraph.
</div>
Closes#1591.
We can now handle all different alignment types, for simple
tables only (no captions, no relative widths, cell contents just
plain inlines). Other tables are still handled using raw HTML.
Addresses #1585 as far as it can be addresssed, I believe.
Currently, pandoc has hard-coded the following in order to make horizontal
rules in LaTeX:
```hs
"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
```
Which is fine, but does not allow customizations. It also does not take into
consideration the current line width.
I'm proposing this change:
```diff
@@ In Writers/LaTeX.hs:
-"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
+"\\begin{center}\\rule{0.5\\linewidth}{\\linethickness}\\end{center}"
```
Renamed some tests, introducing subsidiary directories
for fb2, docx, epub.
Cleaned up tests in cabal file.
Combined dokuwiki-writer and dokuwiki_inline_formatting tests.
This was just too fragile and dependent on a changing Cabal API
(see #1526).
Instead of passing the bulid directory to the test program, we
now let the test program find itself (using executable-path)
and then find the pandoc executable relative to itself.
Indented code at the beginning of a list item must be indented eight
spaces from the margin (or from the edge of the container), or four
spaces past the list marker, whichever is farther.
Some examples in `tests/markdown-reader-more.txt`.
Closes#1513.
Lists can now start without an intervening blank line.
Also, html block-level tags that don't start a line are parsed
as RawInline and don't interrupt paragraphs, as in RedCloth.
Rewrote features test to remove all unimplemented features.
There are now all three examples of where an image can be included in
the test.
1. Cover image
2. As a spine elemnt
3. In the document
Tests have also been added to make sure that the mediabag contains all
these images after processing.
pandoc -t markdown-raw_html should not emit any raw HTML, even
span and div tags that go with pandoc Span and Div elements.
Cleaned up a bit of the logic with extensions and plain.
Renamed epub test files so they're identified more clearly as
epub: features.{epub,native} -> epub.features.{epub,native},
and similarly with formatting.{epub,native}.
Added epub test files to cabal file, so they'll be included in
the tarball.
Using `map toUpper` to capitalise text is wrong, as e.g.
“Straße” should be converted to “STRASSE”, which is 1 character
longer. This commit adds a `capitalize` function and replaces
2 identical implementations in different modules (`toCaps` and
`capitalize`) with it.
This will allow us to test the whole mediabag (making sure, for example,
that images are added with the correct keys) instead of just individual
extracted images. We compare each entry in the media bag to an image
extracted on the fly from the docx. As a result, we only need one file
to test with.
The image in the current tests was also replaced with a smaller one.
Moved `MediaBag` definition and functions from Shared:
`lookupMedia`, `mediaDirectory`, `insertMedia`, `extractMediaBag`.
Removed `emptyMediaBag`; use `mempty` instead, since `MediaBag`
is a Monoid.
This will make paragraphs styled with `Author`, `Title`, `Subtitle`,
`Date`, and `Abstract` into pandoc metavalues, rather than text. The
implementation only takes those elements from the beginning of the
document (ignoring empty paragraphs).
Multiple paragraphs in the `Author` style will be made into a metaList,
one paragraph per item. Hard linebreaks (shift-return) in the paragraph
will be maintained, and can be used for institution, email, etc.
Math now appears in unicode if possible, without the distracting
italics around identifiers.
Blank lines around headers are more consistent.
Footnotes appear in regular [n] style.
* This change brings pandoc's definition list syntax into alignment
with that used in PHP markdown extra and multimarkdown (with the
exception that pandoc is more flexible about the definition markers,
allowing tildes as well as colons).
* Lazily wrapped definitions are now allowed; blank space is required
between list items; and the space before definition is used to
determine whether it is a paragraph or a "plain" element.
* For backwards compatibility, a new extension,
`compact_definition_lists`, has been added that restores the behavior
of pandoc 1.12.x, allowing tight definition lists with no blank space
between items, and disallowing lazy wrapping.
This gives better results for tight lists. Closes#1437.
An alternative solution would be to use Para everywhere, and
never Plain. I am not sufficiently familiar with org to know
which is best. Thoughts, @tarleb?
Adds support to the org reader for conditionally exporting either the code block,
results block immediately following, both, or neither, depending on the value
of the `:exports` header argument. If no such argument is supplied, the default
org behavior (for most languages) of exporting code is used.
as in the mediawiki writer. The dokuwiki markup isn't able
to handle multiple block-level items within a list item, except
in a few special cases (e.g. code blocks, and these must be started
on the same line as the preceding paragraph). So we fall back to
raw HTML for these.
Perhaps there is a better solution. We can "fake" multiple
paragraphs within list items using hard line breaks (`\\`), but
we must keep everything on one line.
(#1398)
- We no longer include trailing spaces and newlines in the
raw blocks.
- We look for closing tags for elements (but without backtracking).
- Each block-level tag is its own RawBlock; we no longer try to
consolidate them (though `--normalize` will do so).
Closes#1330.
This doesn't change the testsuite behaviour, but it does mean that
all the testsuite output files are exactly identical to the
output obtained by running the current pandoc.
* Added normalizeInlines, normalizeBlocks.
* Type signature is now more narrow, `Pandoc -> Pandoc` instead of
`Data a :: a -> a`. Some users may need to change their uses of
`normalize` to the newly exported `normalizeInlines` or
`normalizeBlocks`.
We want to treat it as a plain paragraph if the hanging amount is
greater to or equal to the left indent---i.e., if the first line has
zero indentation. But we still want it to be a block quote if it starts
to the right of the margin. Someone might format verse with wrapping
lines with a hanging indent, for example.
The new code was got from inspecting changes in MediaWiki.hs
This slightly changes the output of Div blocks, but I'm not
convinced the original behaviour was really correct anyway.
The code for handling Span does nothing for now, until I can
work out the desired behaviour, and add tests for it.
This is what seems like the sensible default: read in insertions, and
ignore deletions. In the future, it would be good if options were
available for either taking in deletions or keeping both in some
scriptable format.
Add torture-test for new normalization functions.
One problem that this test demonstrates is that word has a tendency to
turn off formatting at a space, and then turn it back on after. I'm not
sure yet whether this is something we should fix.
This is just a wrapper around Pandoc that doesn't normalize with
`toString`. We want to make sure that our own normalization process
works. If, in the future, we are able to hook into the builder's
normalization, this will be removed.
This brings pandoc's rendering of haddock markup in line
with the new haddock.
Note that we preserve line breaks in `@` code blocks, unlike
the earlier version.
Modified tests pass. More tests would be good.
Closes#1345. Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.
I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc. The previous behavior resulted in the
text not appearing in any output format.
Inline LaTeX is now accepted and parsed by the org-mode reader. Both,
math symbols (like \tau) and LaTeX commands (like \cite{Coffee}), can be
used without any further escaping.
Citations are defined via the "normal citation" syntax used in markdown,
with the sole difference that newlines are not allowed between "[...]".
This is for consistency, as org-mode generally disallows newlines
between square brackets.
The extension is turned on by default and can be turned off via the
default syntax-extension mechanism, i.e. by specifying "org-citation" as
the input format.
Move `citeKey` from Readers.Markdown into Parsing
The function can be used by other readers, so it is made accessible for
all parsers.
The reader produced wrong results for block containing non-letter chars
in their parameter arguments. This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).
Thanks to Xiao Hanyu for noticing this.
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader. This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.
This closes#1286.
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document. The blocks are read as inline code and marked with
the special class `rundoc-block`. Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.
This closes#1278.
Org allows users to define their own custom link types. E.g., in a
document with a lot of links to Wikipedia articles, one can define a
custom wikipedia link-type via
#+LINK: wp https://en.wikipedia.org/wiki/
This allows to write [[wp:Org_mode][Org-mode]] instead of the
equivallent [[https://en.wikipedia.org/wiki/Org_mode][Org-mode]].
Internal links in Org are possible by using an anchor-name as the target
of a link:
[[some-anchor][This]] is an internal link.
It links <<some-anchor>> here.
Footnotes can consist of multiple blocks and end only at a header or at
the beginning of another footnote. This fixes the previous behavior,
which restricted notes to a single paragraph.
Support for standard org-blocks is improved. The parser now handles
"HTML", "LATEX", "ASCII", "EXAMPLE", "QUOTE" and "VERSE" blocks in a
sensible fashion.
These are primarily aimed at testing the new treatment of line breaks,
but hopefully other tests can be added more easily now as features
and changes are implemented in the writer.
Adapted from Tests.Writers.HTML.tests.
* Use a <literallayout> for the entire paragraph, not just for the
newline character
* Don't let LineBreaks inside footnotes influence the enclosing
paragraph
This fixes the org-reader's handling of sub- and superscript
expressions. Simple expressions (like `2^+10`), expressions in
parentheses (`a_(n+1)`) and nested sexp (like `a_(nested()parens)`) are
now read correctly.
Support all of the following variants as valid ways to define inline or
display math inlines:
- `\[..\]` (display)
- `$$..$$` (display)
- `\(..\)` (inline)
- `$..$` (inline)
This closes#1223. Again.
Instead of being ignored, attributes are now parsed and
included in Span inlines.
The output will be a bit different from stock textile:
e.g. for `*(foo)hi*`, we'll get `<em><span class="foo">hi</span></em>`
instead of `<em class="foo">hi</em>`. But at least the data is
not lost.
Text such as /*this*/ was not correctly parsed as a strong, emphasised
word. This was due to the end-of-word recognition being to strict as it
did not accept markup chars as part of a word. The fix involves an
additional parser state field, listing the markup chars which might be
parsed as part of a word.
Previously normalisation was handled by the `normalizeSpaces` function. The behavoir of the builder monoid is slightly different and melds together more items such as consecutive strings and spaces adjacent to line breaks. The tests have been changed to reflect this.
All relevant tests passed when the string melding line of the builder monoid was commented out.
rST parser now supports:
- All built-in rST roles
- New role definition
- Role inheritance
Issues/TODO:
- Silently ignores illegal fields on roles
- Silently drops class annotations for roles
- Only supports :format: fields with a single format for :raw: roles,
requires a change to Text.Pandoc.Definition.Format to support multiple
formats.
- Allows direct use of :raw: role, rST only allows indirect (i.e.,
inherited use of :raw:).
A consequence of this change is that the backtick form will be
preferred in general if both are enabled. I think that is good,
as it is much more widespread than the tilde form.
Closes#1084.