Require space after key-value delimiter colon in mmd title block.
Issue #2026
Amend: parsec's `spaces` include newlines, but we don't want that. Had
to make custom `spaceNoNewline` parser here
This commit derives a few types from Data and Typeable used by
libpandoc.
Requires corresponding pull-request for Highlighting-Kate:
https://github.com/jgm/highlighting-kate/pull/64
Signed-off-by: Shahbaz Youssefi <ShabbyX@gmail.com>
- Added commonmark as an input format.
- Added `Text.Pandoc.Readers.CommonMark.readCommonMark`.
- For now, we use the markdown writer to generate benchmark
text for the CommonMark reader. We can change this when we
get a writer.
This closes#1394, which actually wasn't fixed by the earlier commit.
This ensures that lists in speaker notes don't add "fragment" classes,
which can cause additional keypresses to be needed to advance a slide.
Issue #1977
Most markdown processors support the [shortcut format] for reference links.
Pandoc's markdown reader parsed this shortcuts unoptionally.
Pandoc's markdown writer (with --reference-links option) never shortcutted links.
This commit adds an extension `shortcut_reference_links`. The extension is
enabled by default for those markdown flavors that support reading shortcut
reference links, namely:
- pandoc
- strict pandoc
- github flavoured
- PHPmarkdown
If extension is enabled, reader parses the shortcuts in the same way as
it preveously did. Otherwise it would parse them as normal text.
If extension is enabled, writer outputs shortcut reference links unless
doing so would cause problems (see test cases in `tests/Tests/Writers/Markdown.hs`).
The `tabular` environment allows non-empty column separators
with the "@{...}" syntax. Previously, pandoc would fail to
parse tables if a non-empty colsep was present. With this
commit, these separators are still ignored, but the table gets
parsed. A test case is included.
The `tabular` environment takes an optional parameter for
vertical alignment. Previously, pandoc would fail to parse
tables if this parameter was present. With this commit,
the parameter is still ignored, but the table gets
parsed. A test case is included.
GFM and PHP Markdown Extra pipe tables require headers.
Previously pandoc allowed pipe tables not to include headers,
and produced headerless pipe tables in Markdown output, but this
was based on a misconception about pipe table syntax. This
commit fixes this.
Note: If you have been using headerless pipe tables, this may
cause existing tables to break.
Closes#1996.
This sets `--chapters` implicitly if the documentclass in metadata
is a book documentclass. Previously this was done only if a book
documentclass was set in a variable.
Closes#1971.
This patch attempts to build a style name -> style id mapping based on styles.xml from reference doc, and changes pStyle and rStyle to accept style name as a parameter instead of styleId. There is a fallback mechanic that removes spaces from style name and returns it as style id, but it likely won't help much.
Style names are matched lower-case, since headings and `footnote text` have lowercase names.
This allows inherited styles with numbering (lists). It works like this:
1. check to see if the style has numbering info.
2. if the paragraph has explicit numbering info in the doc that takes
precedence.
3. if not we use the numbering info in the style, if it's there.
4. otherwise normal paragraph.
We no longer assume it's not a numbering element if it doesn't have an
explicit level---we just set that level to 1. (In the style files, the
examples I've seen don't have that explicit level.)
Previously these were always escaped and printed verbatim.
Now they are ignored unless the format is "icml", in which
case they are passed through unescaped.
Closes#1951.
Word uses, by default, footnotes with id -1 and 0 for separators. If a
user modifies reference.docx, they will end up with a settings.xml file
that references these footnotes, but no such footnotes in the
document. This will produce a corruption error. Here we add these to the
document and settings.xml file, so future modifications won't break the file.
We apply a "BodyText" style to all unstyled paragraphs. This is,
essentially, the same as "Normal" up until now -- except that since not
everything inherits from "BodyText" (the metadata won't, for example, or
the headers or footnote numbers) we can change the text in the body
without having to make exceptions for everything.
This will still inherit from Normal, so if we want to
change *everything*, we can do it through "Normal".
Before we had used `FirstParagraph` style after Headings, BlockQuotes,
and other blocks a user might not want an indentation after. We hadn't
actually used it for the first paragraph -- i.e. the opening of the
body. This makes sure the first body paragraph gets that style.
Following the odt writer, we make the first text paragraph following an
image, blockquote, table, or heading into a "FirstParagraph" style. This
allows it to be styled differently, if the user wishes. The default is
for it to be the same as "Normal"
The preferred syntax for Images and other media is [[File:Foo.jpg]] in MediaWiki since v1.14 (2008). [[Image:Foo.jpg]] is deprecated but still works as an alias to the File namespace. I don't think this would break any existing wikis since talk of switching the syntax/namespace for images started back in 2002 (https://phabricator.wikimedia.org/T2044). NS_FILE became the new namespace for Files in v 1.14 in late 2008. (https://www.mediawiki.org/wiki/Release_notes/1.14) There is still a namespace alias so '[[Image:]]' still works today. It's just that MediaWiki supports other media as well, and so the name and syntax used in documentation (see https://www.mediawiki.org/wiki/Help:Images) has long been '[[File:foo.jpg]]'
This change improves output formatting of content with a large amount of force line breaks, such as line-blocks. The following writers are affected:
* Dokuwiki
* HTML
* EPUB (via HTML)
* LaTeX
* MediaWiki
* OpenDocument
* Texinfo
This commit resolves#1924
Previously `\input` and `\include` would only work if the
included files had the extension `.tex`. This change relaxes
that restriction, though if the extension is not `.tex`, it
must be given explicitly in the `\input` or `\include`.
Closes#1882.
Some older versions of word use vml (vector markup language) and put
their images in a "v:imagedata" tag inside a "w:pict". We read those as
we read the more modern "blip" inside a "w:drawing".
Note that this does not mean the reader knows anything about vml. It
just looks for a `v:imagdata`. It's possible that, with more complicated
uses of images in vml, it won't do the right thing.
This change allows pandoc not to choke on the table-width parameter
of `tabular*`. Note that the table width is not actually parsed
or taken into account, but this should give tolerable results in
many cases.
Closes#1850.
Org links like `[[file:target][title]]` were not handled correctly,
parsing the link target verbatim. The org reader is changed such that
the leading `file:` is dropped from the link target.
This is related to issues #756 and #1812.
Move recursive role lookup from renderRole to addNewRole. The Attr value
will be the same for every occurance of this role, so there's no reason
to compute it every time. This allows simplifying the
stateRstCustomRoles map considerably.
We could go even further, and remove the fmt and attr arguments to
renderRole, which are null except for custom roles.
- Add "sourceCode" to classes for :code: role, and anything inheriting
from it.
- Add the name of the custom role to classes if the Inline constructor
supports Attr.
- If the custom role directive does not specify a parent role, inherit
from the :span: role.
This differs somewhat from the rst2xml.py behavior. If a custom role
inherits from another custom role, Pandoc will attach both roles' names
as classes. rst2xml.py will only use the class of the directly invoked
role (though in the case of inheriting from a :code: role with a
:language: defined, it will also provide the inherited language as a
class).
code role should have "code" class.
http://docutils.sourceforge.net/docs/ref/rst/roles.html says that
`text`:literal` is the same as ``text``. docutils outputs a <literal>
element in both cases, whereas for the code role, it outputs a <literal>
element with the "code" class.
This commit moves some code which was only used for the Markdown Reader
into a generic form which can be used for any Reader. Otherwise, it
takes naming and interface cues from the preexisting Markdown code.
Word doesn't really treat table captions as something special. It's just a paragraph with special style, nothing more, so simple reversal of output order in writer works fine.
The class directive accepts one or more class names, and creates a Div
value with those classes. If the directive has an indented body, the
body is parsed as the children of the Div. If not, the first block
folowing the directive is made a child of the Div.
This differs from the behavior of rst2xml, which does not create a Div
element. Instead, the specified classes are applied to each child of
the directive. However, most Pandoc Block constructors to not take an
Attr argument, so we can't duplicate this behavior.
closes#65
RST quoted literal blocks are the same as indented literal blocks (which
pandoc already supports) except that the quote character is preserved in
each line.
This includes test cases for the quoted literal block, as well as
additional tests for line blocks and indented literal blocks, to verify
that these are unaffected by the changes.
Now we do as before, including blank lines after list items in
loose lists (even though RST doesn't care -- this is just a matter
of visual appeal). But we chomp any excess whitespace after the
last list item, which solves #1777.
While empty links are not allowed in Emacs org-mode, Pandoc org-mode
should support them: gitit relies on empty links as they are used to
create wiki links.
Fixesjgm/gitit#471
The org reader was to restrictive when parsing links, some relative
links and links to files given as absolute paths were not recognized
correctly. The org reader's link parsing function was amended to handle
such cases properly.
This fixes#1741
This patch builds paragraph styles tree, then checks if paragraph has
style.styleId or style/name.val matching predetermined patterns.
Works with "Heading#" (name.val="heading #") for headings and
"Quote"|"BlockQuote"|"BlockQuotation" (name.val="Quote"|"Block Text")
for block quotes.
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported. Those trees are dropped
silently.
This closes#1678.
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings. This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text. This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue #1650.
Now we outsource most of the work to `fetchItem'`.
Also, do not include queries in file extensions.
Improves fix to #1671.
It is possible that this will have some unexpected effects, so
further testing would be good.
Closes#1669.
If there are further issues, please open a new, targeted issue on the
tracker. Some notes on the further issues you gestured at:
Data URIs are indeed dereferenced, but why is this a problem?
(The function being used to fetch from URLs is used for many different
formats. Preserving data URIs would make sense in EPUBs, but not
for e.g. PDF output. And by dereferencing we can get a smaller,
more efficient EPUB, with the data stored as bytes in a file rather
than encoded in textual representation.)
"absolute uris are not recognized" -- I assume that is the problem
just fixed. If not, please open a new issue.
"relative uris are resolved (wrongly) like file paths" -- can you
give an example?
`<base>` tag is ignored. Yes. I didn't know about the base tag. Could
you open a new issue just for this?
This function can be used to sanitize reference labels so that
they do not contain any of the illegal characters \#[]",{}%()|= .
Currently only Links have their labels sanitized, because they
are the only Elements that use passed labels.
We previously took the old relationship names of the headers and footer in
secptr. That led to collisions. We now make a map of availabl names in the
relationships file, and then rename in secptr.
Graphics in `\section`/`\subsection` etc titles need to be `\protect`ed.
This adds a state value and manually turns it on before every invocation
of `sectionHeader` and manually turns it off after. Using a writer value
and applying `local` would probably be cleaner, but this fits with the
current style.
When we encounter one of the polyglot header styles, we want to remove
that from the par styles after we convert to a header. To do that, we
have to keep track of the style name, and remove it appropriately.
We're just keeping a list of header formats that different languages use
as their default styles. At the moment, we have English, German, Danish,
and French. We can continue to add to this.
This is simpler than parsing the styles file, and perhaps less
error-prone, since there seems to be some variations, even within a
language, of how a style file will define headers.
When users number their headers, Word understands that as a single item
enumerated list. We make the assumption that such a list is, in fact, a header.