The reader produced wrong results for block containing non-letter chars
in their parameter arguments. This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).
Thanks to Xiao Hanyu for noticing this.
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader. This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.
This closes#1286.
(It seems clearer to put the whitespace parsing in the grouped
parser. This also uses stateLastStrPos to determine when the
border is adjacent to an alphanumeric.)
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document. The blocks are read as inline code and marked with
the special class `rundoc-block`. Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.
This closes#1278.
* Undid changes to parseXml in last commit.
* Instead of a string fallback, we have parseXml fall back
on the reference.docx that comes with pandoc if the user's
reference.docx does not contain a needed file.
* Closes#1185.
Closes#1274.
Rewrote handleIncludes.
We now report the actual source file and position where the error
occurs, even if it is included. We do this by inserting special
commands, `\PandocStartInclude` and `\PandocEndInclude`, that encode
this information in the preprocessing phase.
Also generalized the types of a couple functions from
`Text.Pandoc.Parsing`.
Org allows users to define their own custom link types. E.g., in a
document with a lot of links to Wikipedia articles, one can define a
custom wikipedia link-type via
#+LINK: wp https://en.wikipedia.org/wiki/
This allows to write [[wp:Org_mode][Org-mode]] instead of the
equivallent [[https://en.wikipedia.org/wiki/Org_mode][Org-mode]].
* We now correctly handle field lists that are indented more than
3 spaces.
* We treat an "aafig" directive as a code block with attributes,
so it can be processed in a filter. (Closes #1212.)
We now check the writerName for a lua script in pandoc.hs, so that
lowercasing and format parsing aren't done. Note this behavior
change: getWriter in Text.Pandoc no longer returns a custom writer on
input "foo.lua".
This adds nocite citations to a metadata field, `nocite`.
These will appear in the bibliography but not in the text
(unless you use a `$nocite$` variable in your template, of
course).
Internal links in Org are possible by using an anchor-name as the target
of a link:
[[some-anchor][This]] is an internal link.
It links <<some-anchor>> here.
The function `compactify'DL`, used to change the final definition item of a
definition list into a `Plain` iff all other items are `Plain`s as well, is
useful in many parsers and hence moved into Text.Pandoc.Shared.
Footnotes can consist of multiple blocks and end only at a header or at
the beginning of another footnote. This fixes the previous behavior,
which restricted notes to a single paragraph.
Support for standard org-blocks is improved. The parser now handles
"HTML", "LATEX", "ASCII", "EXAMPLE", "QUOTE" and "VERSE" blocks in a
sensible fashion.
* Use a <literallayout> for the entire paragraph, not just for the
newline character
* Don't let LineBreaks inside footnotes influence the enclosing
paragraph
This fixes the org-reader's handling of sub- and superscript
expressions. Simple expressions (like `2^+10`), expressions in
parentheses (`a_(n+1)`) and nested sexp (like `a_(nested()parens)`) are
now read correctly.
Support all of the following variants as valid ways to define inline or
display math inlines:
- `\[..\]` (display)
- `$$..$$` (display)
- `\(..\)` (inline)
- `$..$` (inline)
This closes#1223. Again.
In particular we now pick up on attributes. Since pandoc links
can't have attributes, we enclose the whole link in a span
if there are attributes.
Closes#1008.
These previously produced invalid LaTeX: `\paragraph` or
`\subparagraph` in a `quote` environment. This adds an
`mbox{}` in these contexts to work around the problem.
See http://tex.stackexchange.com/a/169833/22451.
Closes#1221.
Instead of being ignored, attributes are now parsed and
included in Span inlines.
The output will be a bit different from stock textile:
e.g. for `*(foo)hi*`, we'll get `<em><span class="foo">hi</span></em>`
instead of `<em class="foo">hi</em>`. But at least the data is
not lost.
Org-mode and Pandoc use different language identifiers, marking source
code as being written in a certain programming language. This adds more
translations from identifiers as used in Org to identifiers used in
Pandoc.
The full list of identifiers used in Org and Pandoc is available through
http://orgmode.org/manual/Languages.html and `pandoc -v`, respectively.
Text such as /*this*/ was not correctly parsed as a strong, emphasised
word. This was due to the end-of-word recognition being to strict as it
did not accept markup chars as part of a word. The fix involves an
additional parser state field, listing the markup chars which might be
parsed as part of a word.
The default pandoc ParserState is replaced with `OrgParserState`. This
is done to simplify the introduction of new state fields required for
efficient Org parsing.
The reader did not correctly parse inline markup. The behavoir is now as follows.
(a) The markup must start at the start of a line, be inside previous
inline markup or be preceeded by whitespace.
(b) The markup can not span across paragraphs (delimited by \n\n)
(c) The markup can not be followed by a alphanumeric character.
(d) Square brackets can be placed around the markup to avoid having
to have white space before it.
In order to make these changes it was either necessary to convert the parser to return a list of inlines or to convert the whole reader to use the builder. The latter approach whilst more work makes a bit more sense as it becomes easy to arbitarily append and prepend elements without changing the type.
Tests are accordingly updated in a later commit to reflect the different normalisation behavoir specified by the builder monoid.
If the content contains a backtick fence and there are
attributes, make sure longer fences are used to delimit the code.
Note: This works well in pandoc, but github markdown is more
limited, and will interpret the first string of three or more
backticks as ending the code block.
Closes#1206.
Previously these were typeclasses of monads. They've been changed
to be typeclasses of states. This ismplifies the instance definitions
and provides more flexibility.
This is an API change! However, it should be backwards compatible
unless you're defining instances of HasReaderOptions, HasHeaderMap,
or HasIdentifierList. The old getOption function should work as
before (albeit with a more general type).
The function askReaderOption has been removed.
extractReaderOptions has been added.
getOption has been given a default definition.
In HasHeaderMap, extractHeaderMap and updateHeaderMap have been added.
Default definitions have been given for getHeaderMap, putHeaderMap,
and modifyHeaderMap.
In HasIdentifierList, extractIdentifierList and updateIdentifierList
have been added. Default definitions have been given for
getIdentifierList, putIdentifierList, and modifyIdentifierList.
The ultimate goal here is to allow different parsers to use their
own, tailored parser states (instead of ParserState) while still
using shared functions.
src and poster will both be incorporated into content.opf
and the epub container.
This partially address #1170.
Still need to do something similar for <audio>.
Closes#1197.
Note that there are still problems with the formatting of
the tables inside tables with output produced from the input
file in the original bug report. But this fixes the stack
overflow problem.
Fixes compile error on Windows for 5040f3e
Reverted back to canonical file separators </> in all places except for
arguments to the LaTeX builder and in TEXINPUTS
See #1151.
Note: Temporary directories still fail to be removed in Windows due to
call of ByteString.Lazy.readFile creating process ownership of the
compiled pdf file.
This is needed for texlive.
Note that the / is used only in the body of withTempDir,
so when the directory is deleted, the original separators will
be used.
See #1151.
Closes#1133.
Note: If address is a YAML object and you just have $address$
in your template, the word "true" will appear, which may be
unexpected. (Previously nothing would appear.)
This is to debug backtracking-related parsing bugs.
So far it is only implemented for markdown, but it would
be good to extend it to latex and html readers.
rST parser now supports:
- All built-in rST roles
- New role definition
- Role inheritance
Issues/TODO:
- Silently ignores illegal fields on roles
- Silently drops class annotations for roles
- Only supports :format: fields with a single format for :raw: roles,
requires a change to Text.Pandoc.Definition.Format to support multiple
formats.
- Allows direct use of :raw: role, rST only allows indirect (i.e.,
inherited use of :raw:).
Keys may now start with an underscore as well as a letter.
Underscores do not count as internal punctuation, but are
treated like alphanumerics, so "key:_2008" will work, as
it did not before. (This change was necessary to use keys
generated by zotero.)
Closes#1111, closes#1011.
Note: For now we just assign them all 72 dpi. It wasn't
clear to me how to extract the resolution information.
At least the aspect ratio will be right, and 72 dpi is
the most common setting.
Closes#976.
This arose because the headings are copied into the metadata
"title" field, and the note gets rendered twice. We strip the
note now before putting the heading in "title".
Now the contents of `writerEpubStylesheet` (set by `--epub-stylesheet`)
should again work, and take precedence over a stylesheet specified
in the metadata.
A consequence of this change is that the backtick form will be
preferred in general if both are enabled. I think that is good,
as it is much more widespread than the tilde form.
Closes#1084.