The previous verse parsing code made the faulty assumption that empty
strings are valid (and empty) inlines. This isn't the case, so lines
are changed to contain at least a newline.
It would generally be nicer and faster to keep the newlines while
splitting the string. However, this would require more code, which
seems unjustified for a simple (and fairly rare) block as *verse*.
This fixes#2402.
Previously the parser failed on this kind of case
.. role:: indirect(code)
.. role:: py(indirect)
:language: python
:py:`hi`
Now it currectly recognizes `:py:` as a code role.
The previous test for this didn't work, because the
name of the indirect role was the same as the language
defined its parent, os it didn't really test for this
behavior. Updated test.
Previously the left-hand column could not start with 4 or
more spaces indent. This was inconvenient for right-aligned
left columns.
Note that the first (header column) must still have 3 or fewer
spaces indentation, or the table will be treated as an indented
code block.
Technically this isn't allowed in an HTML comment, but
we've always allowed it, and so do most other implementations.
It is handy if e.g. you want to put command line arguments
in HTML comments.
Replaced styles.xml in headers.docx with pandoc's current styles.xml, which
contains styles for Heading 1 through 6. Added Heading 4 through 7 to the test
document. Note that Heading 7 is not parsed as a Heading because there is no
Heading 7 style.
Previously we disallowed `-` at the end of an autolink,
and disallowed the combination `=-`.
This commit liberalizes the rules for allowing punctuation in
a bare URI.
Added test cases.
One potential drawback is that you can no longer put a bare
URI in em dashes like this
this uri---http://example.com---is an example.
But in this respect we now match github's treatment of bare URIs.
Closes#2299.
This fixes a potential security issue. Because single quotes weren't
being escaped in the link portion, a specially crafted email address
could allow javascript code injection.
[Jim'+alert('hi')+'OBrien](mailto:me@example.com)
Closes#2280.
Colons are valid characters in URLs, and used e.g. by the Internet Archive's Wayback Machine - a popular resource amongst researchers. When InDesign encounters a HyperlinkURLDestination with more than one colon character in it, it crashes when placing the ICML. (This was tested against CS6.) The IDML specification hints at this requirement in section 6.4.1: "The colon apppears in the Name attribute of the style, but is encoded as %3a when it appears in the Self attribute". Follow this example for all colon characters in URLs.
This avoids an error "Please load package hyperref before bidi package,
and then try to run xelatex on your document again". See
jgm/pandoc-templates #96.
Org mode allows headers to be tagged:
``` org-mode
* Headline :TAG1:TAG2:
```
Instead of being interpreted as part of the headline, the tags are now
put into the attributes of empty spans. Spans without textual content
won't be visible by default, but they are detectable by filters. They
can also be styled using CSS when written as HTML.
This fixes#2160.
Added `stateHeaderKeys` to `ParserState`; this is a `KeyTable`
like `stateKeys`, but it only gets consulted if we don't find
a match in `stateKeys`, and if `Ext_implicit_header_references`
is enabled.
Closes#1606.
We only support the href attribute, as there's no place for
"target" in the Pandoc document model for links.
Added HTML reader test module, with tests for this feature.
Closes#1751.
Instead, use a forward-slash to join paths, regardless of the
platform. This matches the way MediaBag now works.
See
56e4ecab20 (commitcomment-10858449)
`<` should not be escaped as `\<`, for compatibility with
original Markdown. We now escape `<` and `>` with entities.
Also, we now backslash-escape square brackets.
Closes#2086.
Previously the body of the definition (after the `:` or `~` marker)
needed to be in column 4. This commit relaxes that requirement,
to better match the behavior of PHP Markdown Extra. So, now
this is a valid definition list:
foo
: bar
This patch also helps resolve a potentially ambiguity with table
captions:
foo
: bar
-----
table
-----
Is "bar" a definition, or the caption for the table? We'll count
it as a caption for the table.
Closes#2087.
If the tag parses as a comment, we check to see if the
input starts with `<!--`. If not, it's bogus comment mode
and we fail htmlTag.
Includes test case. Closes#1820.
Issue #1977
Most markdown processors support the [shortcut format] for reference links.
Pandoc's markdown reader parsed this shortcuts unoptionally.
Pandoc's markdown writer (with --reference-links option) never shortcutted links.
This commit adds an extension `shortcut_reference_links`. The extension is
enabled by default for those markdown flavors that support reading shortcut
reference links, namely:
- pandoc
- strict pandoc
- github flavoured
- PHPmarkdown
If extension is enabled, reader parses the shortcuts in the same way as
it preveously did. Otherwise it would parse them as normal text.
If extension is enabled, writer outputs shortcut reference links unless
doing so would cause problems (see test cases in `tests/Tests/Writers/Markdown.hs`).
The `tabular` environment allows non-empty column separators
with the "@{...}" syntax. Previously, pandoc would fail to
parse tables if a non-empty colsep was present. With this
commit, these separators are still ignored, but the table gets
parsed. A test case is included.
The `tabular` environment takes an optional parameter for
vertical alignment. Previously, pandoc would fail to parse
tables if this parameter was present. With this commit,
the parameter is still ignored, but the table gets
parsed. A test case is included.
GFM and PHP Markdown Extra pipe tables require headers.
Previously pandoc allowed pipe tables not to include headers,
and produced headerless pipe tables in Markdown output, but this
was based on a misconception about pipe table syntax. This
commit fixes this.
Note: If you have been using headerless pipe tables, this may
cause existing tables to break.
Closes#1996.
Previously these were always escaped and printed verbatim.
Now they are ignored unless the format is "icml", in which
case they are passed through unescaped.
Closes#1951.
The preferred syntax for Images and other media is [[File:Foo.jpg]] in MediaWiki since v1.14 (2008). [[Image:Foo.jpg]] is deprecated but still works as an alias to the File namespace.
This change improves output formatting of content with a large amount of force line breaks, such as line-blocks. The following writers are affected:
* Dokuwiki
* HTML
* EPUB (via HTML)
* LaTeX
* MediaWiki
* OpenDocument
* Texinfo
This commit resolves#1924
Org links like `[[file:target][title]]` were not handled correctly,
parsing the link target verbatim. The org reader is changed such that
the leading `file:` is dropped from the link target.
This is related to issues #756 and #1812.
Move recursive role lookup from renderRole to addNewRole. The Attr value
will be the same for every occurance of this role, so there's no reason
to compute it every time. This allows simplifying the
stateRstCustomRoles map considerably.
We could go even further, and remove the fmt and attr arguments to
renderRole, which are null except for custom roles.
The class directive accepts one or more class names, and creates a Div
value with those classes. If the directive has an indented body, the
body is parsed as the children of the Div. If not, the first block
folowing the directive is made a child of the Div.
This differs from the behavior of rst2xml, which does not create a Div
element. Instead, the specified classes are applied to each child of
the directive. However, most Pandoc Block constructors to not take an
Attr argument, so we can't duplicate this behavior.
closes#65
RST quoted literal blocks are the same as indented literal blocks (which
pandoc already supports) except that the quote character is preserved in
each line.
This includes test cases for the quoted literal block, as well as
additional tests for line blocks and indented literal blocks, to verify
that these are unaffected by the changes.
Now we do as before, including blank lines after list items in
loose lists (even though RST doesn't care -- this is just a matter
of visual appeal). But we chomp any excess whitespace after the
last list item, which solves #1777.
While empty links are not allowed in Emacs org-mode, Pandoc org-mode
should support them: gitit relies on empty links as they are used to
create wiki links.
Fixesjgm/gitit#471
The org reader was to restrictive when parsing links, some relative
links and links to files given as absolute paths were not recognized
correctly. The org reader's link parsing function was amended to handle
such cases properly.
This fixes#1741
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported. Those trees are dropped
silently.
This closes#1678.
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings. This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text. This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue #1650.
Currently, pandoc has hard-coded the following in order to make tight lists in
LaTeX:
```hs
text "\\itemsep1pt\\parskip0pt\\parsep0pt"
```
Which is fine, but does not allow customizations. For example, the `memoir`
class already has a `\tightlist` declaration for this purpose:
```tex
\newcommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
```
I'm proposing to use a similar solution:
```diff
@@ In Writers/LaTeX.hs:
-then text "\\itemsep1pt\\parskip0pt\\parsep0pt"
+then text "\\tightlist"
@@ In templates/default.latex:
+\newcommand{\tightlist}{%
+ \setlength{\itemsep}{1pt}\setlength{\parskip}{0pt}\setlength{\parsep}{0pt}}
```
This allows us to customize the tightness to our needs.
Backward Compatibility
If a person is using a custom LaTeX template (not based upon the `memoir`
class), the `\tightlist` declaration must be added.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.
Test case:
<div class="first">
This is a paragraph.
This is another paragraph.
</div>
Closes#1591.
We can now handle all different alignment types, for simple
tables only (no captions, no relative widths, cell contents just
plain inlines). Other tables are still handled using raw HTML.
Addresses #1585 as far as it can be addresssed, I believe.
Currently, pandoc has hard-coded the following in order to make horizontal
rules in LaTeX:
```hs
"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
```
Which is fine, but does not allow customizations. It also does not take into
consideration the current line width.
I'm proposing this change:
```diff
@@ In Writers/LaTeX.hs:
-"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
+"\\begin{center}\\rule{0.5\\linewidth}{\\linethickness}\\end{center}"
```