Commit graph

8003 commits

Author SHA1 Message Date
John MacFarlane
d40d94ebd9 EndNote reader: add nocite as the other bib format readers do. 2022-02-04 23:51:12 -08:00
John MacFarlane
b7f1c97b6a Docx zotero/mendeley/endnote: add comma before locator in suffix. 2022-02-04 23:28:46 -08:00
John MacFarlane
f48890eff0 Support Prefix, Suffix, Pages in endnote ADDINs. 2022-02-04 22:20:51 -08:00
John MacFarlane
19cfe6a907 Got endnote citations working in docx...
Still to do:  prefix, suffix, locator.
2022-02-04 21:54:50 -08:00
John MacFarlane
28349447cb Docx reader: skeleton for endnote citation ADDINs. 2022-02-04 17:02:43 -08:00
John MacFarlane
15316a0058 EndNote: export readEndNoteXMLCitation...
instead of readEndNoteXMLReferences.  This is the function
we'll need in the docx reader.

We still need to implement locator, prefix, and suffix.
2022-02-04 14:02:58 -08:00
John MacFarlane
d164e5bb1d Docx reader: parse EN.CITE and EN.REFLIST fields. 2022-02-04 10:04:16 -08:00
John MacFarlane
34897031f4 Add endnote XML reader.
New input format: endnotexml

New reader module: Text.Pandoc.Readers.EndNote, exporting
`readEndNoteXML` and `readEndNoteXMLReferences`. [API change]

This reader is still a bit rudimentary, but it should get
be good enough to be helpful.
2022-02-04 10:03:52 -08:00
John MacFarlane
e07c0e74ce Support embedded Mendeley citations in docx.
These are supported in the same way as Zotero citations,
using the same code.  As with Zotero, enable the `citations`
extension on `docx` to parse these as native citations.

Closes #7840.
2022-02-04 10:00:23 -08:00
John MacFarlane
d40236805d MediaBag: improve detection of absolute paths.
Previously we used System.FilePath's isRelative to
determine when paths are relative (since absolute
paths need to get a new name based on the sha1 hash).
But this has an OS-specific behavior and actually
returns True on Windows for paths like `/media/file.png`.
This ought to fix #7881.
2022-02-04 09:47:07 -08:00
John MacFarlane
40b174c770 Revert "T.P.Class.IO.adjustImagePath: avoid double slash."
This reverts commit 3dcb526b9b.
2022-02-04 09:29:49 -08:00
John MacFarlane
3dcb526b9b T.P.Class.IO.adjustImagePath: avoid double slash.
PReviously if the directory argument ended in slash,
we'd get a doubled slash in the path.  This may help
with #7881.
2022-02-04 07:54:52 -08:00
John MacFarlane
60caa0a1e1 Docx reader: add bibliographic entries for zotero ADDIN.
Bibliographic data embedded in citation items is added
to the `references` metadata field.

Closes #7840.
2022-02-03 22:08:46 -08:00
John MacFarlane
4086873281 Improve locators for docx Zotero citations. 2022-02-03 20:23:11 -08:00
John MacFarlane
1c2f0fe1d2 Enable citations extension for docx reader.
When enabled, Zotero citations are parsed as native pandoc
citations.  (When disabled, the Zotero-generated citation
text is passed through as regular text.)  In addition, the
Zotero-generated bibliography is suppressed.

Locators still need some work.
2022-02-03 19:34:05 -08:00
John MacFarlane
9ef8650612 Docx reader: Parse CSL JSON in Zotero addin.
This gives us what we ned for #7840, except adding
to the references in metadata.
2022-02-03 16:04:15 -08:00
John MacFarlane
b9ac243986 Trim whitespace from math in --webtex.
This fixes problems with --webtex and markdown output,
when display math starts or ends with a newline.

Closes #7892.
2022-02-03 13:21:27 -08:00
John MacFarlane
9618b66fe8 Whitespace fixes. 2022-02-03 13:13:03 -08:00
John MacFarlane
0011c9520d Docx reader: add more framework for Zotero citations.
- Add docxReferences to state, so we can accumulate
  references for metadata.
- Add a clause for ZoteroItem to parPartToInlines'.
  So far it doesn't do anything except add a surrounding Cite element.

See #7840.
2022-02-03 07:39:07 -08:00
John MacFarlane
54279149ab Use unreleased citeproc. 2022-02-03 07:37:13 -08:00
John MacFarlane
6ed8999f75 LaTeX reader: handle subequations as inline math environment.
Closes #7883.
2022-02-02 10:41:46 -08:00
Albert Krewinkel
2fa8308afa Restore wkhtmltopdf as default pdf engine for HTML 2022-02-01 14:43:38 -08:00
John MacFarlane
4c8c7f6dff Revert "T.P.App.Opt: fix logic bug in fullDefaultsPath."
This reverts commit 545c0911aa.

Fixes regression in 2.17.1.

The original commit was completely misguided, and caused
problems finding defaults files in the default user data
directory.
2022-01-31 09:36:49 -08:00
John MacFarlane
a246107347 LaTeX reader: ensure that \raggedright doesn't gobble an argument.
See #7757.
2022-01-29 22:15:04 -08:00
John MacFarlane
c348c4d4fe Use [x] not [X] for asciidoctor checklists.
See #7798.
2022-01-29 17:47:35 -08:00
Albert Krewinkel
fbb9fbf9bb
Custom writer: preserve order of element attributes
Attribute key-value pairs are marshaled as AttributeList, i.e., as a
userdata type that behaves both like a list and a map. This allows to
preserve the order of key-value pairs.

Closes: #7489
2022-01-29 22:36:22 +01:00
Albert Krewinkel
412596c30b Switch to hslua-2.1
This allows for some code simplification and improves stability.
2022-01-29 08:43:14 -08:00
Albert Krewinkel
a6fa3df114
HTML writer: avoid duplicate "style" attributes on table cells
Fixes: #7871
2022-01-28 18:20:14 +01:00
Even Brenden
d36a16a4df Don't read files outside of user data directory
If a file path does not exist relative to the working directory, but
it does exist relative to the user data directory, and it exists outside
of the user data directory, do not read it. This applies to readDataFile
and readMetadataFile in PandocMonad and, by extension, any module that
uses these by passing them relative paths.
2022-01-28 08:51:27 -08:00
Even Brenden
e1f8c4b396 Handle consecutive ".."s in makeCanonical
As an example, prior to this commit, "../../file" would evaluate to
"file", when it should be unchanged.
2022-01-28 08:51:27 -08:00
John MacFarlane
7fbce82f2f LaTeX writer: allow arbitrary frameoptions to be passed...
to a beamer frame, using the frameoptions attribute.
Updated manual.

See #7869.
2022-01-27 14:07:51 -08:00
John MacFarlane
4fa042f847 LaTeX writer: add s and squeeze to recognized beamer frameoptions.
Closes #7869.
2022-01-27 14:07:51 -08:00
John MacFarlane
183fb3e327 LaTeX reader: improve descItem.
For some reason we were skipping arbitrary blocks before `\item`.
This is now changed to "skip whitespace and comments."
2022-01-25 08:43:12 -08:00
John MacFarlane
a9f901cf6b CommonMark reader: fix source position after YAML metadata.
Closes #7863.
2022-01-23 22:13:58 -08:00
John MacFarlane
67f2b25c05 LaTeX reader: improve handling of newif.
Adding a pair of braces around the second argument of `\def`
prevents LaTeX from an emergency stop with:  Closes #6096.

```
pandoc -f markdown -o test.pdf
\newif\ifepub

\epubtrue

\ifepub

hi

\fi
^D
```
2022-01-22 21:48:14 -08:00
Even Brenden
7df29e495f
Search for metadata files in $DATADIR/metadata (#7851)
If files specified with `--metadata-file` are not found in the working
directory, look in `$DATADIR/metadata`.

Expose new `readMetadataFile` function from Text.Pandoc.Class
[API change].

Expose new `PandocCouldNotFindMetadataFileError` constructor for
`PandocError` from Text.Pandoc.Error [API change].

Closes #5876.
2022-01-21 12:00:45 -08:00
John MacFarlane
672b6dc7e6 Remove retokenizing in rawLaTeXParser.
This was causing serious problems with `newif` commands.
See #6096.  And it didn't seem to make any difference for
the tests; I assume that, unless there's some untested
behavior, this is something that has now become unnecessary.
2022-01-21 10:17:58 -08:00
John MacFarlane
52b78b10c8 Avoid putting a frame around speaker notes in beamer.
If speaker notes (a Div with class 'notes') occur right
after a section heading, but above slide level, the
resulting `\note{..}` caommand should not be wrapped in
a frame, as that will cause a spurious blank slide.

Closes #7857.
2022-01-20 19:09:44 -08:00
John MacFarlane
ef8135b4a7 HTML writer: don't break lines inside code elements.
With the new (default) line wrapping of HTML, in
conjunction with the default CSS which includes
`code { whitespace: pre-wrap; }`, spurious line
breaks could be introduced into inline code.

Closes #7858.
2022-01-20 09:17:34 -08:00
John MacFarlane
d9ec95e7ab Modify stringify so it ignores [Citation] inside Cite.
Otherwise we'll sometimes get two copies of things, one
from the `citationPrefix` or `citationSuffix` and another
from the embedded fallback text.

When there is no fallback text, we'll get no content.
However, it really isn't an alternative to just rely
on the result of running `query` on the embedded `Citation`s;
this will result in a jumble of text rather than anything
structured.

Closes #7855.
2022-01-19 22:06:06 -08:00
John MacFarlane
d5818413ff Docx reader: parse both zotero citation and bibliography...
as FieldInfo.
2022-01-19 10:31:00 -08:00
John MacFarlane
73fe7c129e Docx reader: add skeleton for parsing zotero ADDINs.
So far this just adds a constructor for FieldInfo;
we'll need to adjust the rest of the reader code to
parse the JSON and do something with it.

See #7840.
2022-01-19 10:20:15 -08:00
John MacFarlane
6723891c72 Markdown writer: handle explicit column widths with pipe tables.
If a table has explicit column width information *and* the
content extends beyond the `--columns` width, we need to
adjust the widths of the pipe separators to encode this width
information.

Closes #7847.
2022-01-19 09:36:48 -08:00
Michael Hoffmann
e146b1ff3b
Docx writer: Separate tables even with RawBlocks between (#7844)
Adjacent docx tables need to be separated by an empty paragraph. If
there's a RawBlock between tables which renders to nothing, be sure to
still insert the empty paragraph so that they will not collapse
together.

Fixes #7724
2022-01-18 14:28:28 -08:00
John MacFarlane
c1717378b0 Fix some haddock errors. 2022-01-17 21:03:25 -08:00
John MacFarlane
545c0911aa T.P.App.Opt: fix logic bug in fullDefaultsPath.
Previously we would (also) search the default user data directory
for a defaults file, even if a different user data directory
was specified using `--data-dir`.  This was a mistake; if
`--data-dir` is used, the default user data directory should
not be searched.
2022-01-17 21:03:25 -08:00
John MacFarlane
a92ae0a58a T.P.Shared.defaultUserDataDir: behavior change.
If the XDG data directory is not defined (e.g. because
it's not supported in the OS or HOME isn't defined), we
return the empty string instead of raising an exception.

Closes #7842.
2022-01-17 21:03:25 -08:00
Albert Krewinkel
7f50324ff9
PDF: support pagedjs-cli as pdf engine (#7838)
PagedJS is a polyfill and supports the Paged Media standards by the W3C.
<https://www.pagedjs.org/>
2022-01-17 09:19:03 -08:00
Nikolai Korobeinikov
b683b8d48a
Support checklists in asciidoctor writer (#7832)
The checklist syntax (similar to `task_list` in markdown) seems to be
an asciidoctor-only addition.

Co-authored-by: ricnorr <ricnorr@yandex-tream.ru>
2022-01-16 11:05:19 -08:00
John MacFarlane
c40727bfbb Man writer: use custom font V for inline code.
The V font is defined conditionally, so that it renders
like CB in output formats that support that, and like B
in those that don't (e.g. the terminal).

We could just redefine C, but this would affect code
blocks, too, and putting them all in boldface looks ugly,
I think.

Possible drawback: fragments created by pandoc's man
writer will presuppose a nonstandard V font.

Closes #7506.
Supersedes 253467a549.
2022-01-15 12:39:19 -08:00
John MacFarlane
253467a549 Man writer: Use boldface for inline code.
Closes #7506.

This also allows us to get rid of some special casing
on definition lists that ensured that options in code
spans would be boldface.  (If this change is ever reverted,
we'll need that again.)
2022-01-15 12:07:18 -08:00
John MacFarlane
4214218256 T.P.Readers.LaTeX.Parsing: don't export totoks.
Make the first param of `tokenize` a SourcePos instead of
SourceName, and use it instead of `totoks`.
2022-01-14 21:27:33 -08:00
John MacFarlane
0d1ba3dce3 When reading defaults file, stop at a line ....
This line signals the end of a YAML document.
This restores the behavior we got with HsYaml.
yaml complains about content past this line.
See https://github.com/jgm/pandoc/issues/4627#issuecomment-1012438765
2022-01-13 11:58:33 -08:00
John MacFarlane
4fdbb30a97 Citeproc: allow notes-after-punctuation to work...
with numerical styles that use superscripts (e.g.
american-medical-association.csl), as well as with
note styles. The default setting of `notes-after-punctuation`
is true for note styles and false otherwise.

This restores a behavior of pandoc-citeproc that wasn't properly
carried over to Citeproc.

Closes #7826.
See also jgm/pandoc-citeproc#384.
2022-01-12 21:08:28 -08:00
Michael Hoffmann
5001fd3f4d
Docx writer: Handle bullets correctly in lists by not reusing numIds (#7822)
Make sure that we only create one bullet per list item in docx.  In
particular, when a div is a list item, its contained paragraphs will
now no longer wrongly get individual bullets.

This is accomplished by making sure that for each list, we only use
the associated numId once.  Any repeated use would add incorrect
bullets to the document.

Closes #7689
2022-01-11 15:48:41 -08:00
John MacFarlane
a25e79b5be DocBook reader: Collapse internal spaces in literal...
and other similar tags.  This seems to accord with what
the docbook toolchain does.

Closes #7821.
2022-01-10 11:47:55 -08:00
John MacFarlane
7bf1191686 HTML writer: don't break attributes values when wrapping. 2022-01-10 10:40:49 -08:00
John MacFarlane
6f739cdb4d Fix regression: allow blank lines in HTML attributes.
The commit 7a9832166e
had the effect that blank lines would be collapsed
in HTML attributes.

We also roll back a change that collapsed multiple
spaces into one.
2022-01-10 10:29:25 -08:00
John MacFarlane
2e50c8d137 Improve abstract in HTML template.
* Add localized title "abstract", unless `abstract-title` variable
  is set.
* Add `abstract-title` div to abstract CSS.
* Move abstract CSS out of CSL conditional.
* Ensure that abstract is aligned left but indented on all sides.
* Use smaller font for abstract.

Improves #7588.
2022-01-09 10:56:28 -08:00
Lucas Viana
fb91a91615 Org reader: support alphabetical (fancy) lists
This adds support for alphabetical lists in org by enabling the
extension Ext_fancy_lists, mimicking the behaviour of Org Mode when
org-list-allow-alphabetical is enabled.

Enabling Ext_fancy_lists will also make Pandoc differentiate between the
delimiters of ordered lists (periods or closing parentheses). Org does
this differentiation by default when exporting to some formats (e.g.
plain text) but does not in others (e.g. html and latex), so I decided
to copy Pandoc's markdown reader behaviour.
2022-01-09 09:39:27 -08:00
John MacFarlane
66636c89b0 Org writer: fix list items starting with a code block...
or other non-paragraph content.

Closes #7810.
2022-01-08 23:21:15 -08:00
John MacFarlane
61968047e4 Avoid blank lines after tight sublists in org, haddock.
T.P.Writers.Shared `endsWithPlain` now returns True if
the list ends with a list which ends with a Plain.

See #7810.
2022-01-08 23:11:08 -08:00
John MacFarlane
1b7bdb1016 RST writer: avoid extra blank line after empty list item.
See #7810 (2).
2022-01-08 21:24:41 -08:00
John MacFarlane
8736fe11ee Org writer: fix extra blank line inserted after empty list item.
Addresses issue 2 from #7810.
2022-01-08 21:22:21 -08:00
John MacFarlane
252211bd27 Org writer: don't add blank line before lists.
The code to do this was apparently copied over from the RST
writer, but these blank lines aren't necessary or desirable
in org.  See #7810 comment 3.
2022-01-08 19:39:26 -08:00
John MacFarlane
a6741bd555 writeMedia: unescape percent-encoding in creating file path.
Closes #7819 (problem with spaces in image filenames when creating
PDFs).
2022-01-08 19:10:46 -08:00
John MacFarlane
2b51f54e19 toLocatorMap: store keys as lowercase.
We want to do a case-insensitive comparison when parsing
locators, so that e.g. both `Chap.` and `chap.` work.

Previously we lowercase terms when doing the lookup,
but they weren't lowercased in the map itself, which
led to locator-detection breaking for German (where the
terms have uppercase letters).

See
https://groups.google.com/d/msgid/pandoc-discuss/1dd44886-7b79-4e5f-97ec-57b91113df36n%40googlegroups.com
2022-01-08 16:57:59 -08:00
John MacFarlane
2986a06aaa T.P.Readers.LaTeX.SIunitx: explicit imports. 2022-01-07 18:00:57 -08:00
John MacFarlane
a965111680 Fix parsing of footnotes in --metadata-file.
Closes #7813.
2022-01-07 15:58:26 -08:00
Lucas Viana
45e2e0d018 Org writer: support starting number cookies
This complements #7806 by supporting writing Org ordered lists that
start at a specific number.
2022-01-07 10:48:28 -08:00
John MacFarlane
d562de5039 Add LaTeX babel mappings for Guajati (gu) and Oriya (or).
Closes #7815.
2022-01-07 10:25:34 -08:00
John MacFarlane
90e74c2b76 Fix typo panjabi -> punjabi.
This affects the mapping to Babel language names in the
LaTeX reader and writer.  Closes #7814.
2022-01-07 10:08:41 -08:00
Jesse Hathaway
4dcb2b6fb6 MediaWiki writer: Remove redundant display text for wiki links
Prior to this commit the MediaWiki writer always added the display
text for a wiki link:

    * [[Help|Help]]
    * [[Bubbles|Everyone loves bubbles]]

However the display text in the first example is redundant since
MediaWiki uses the target as the default display text. The result being:

    * [[Help]]
    * [[Bubbles|Everyone loves bubbles]]
2022-01-06 15:05:39 -08:00
John MacFarlane
0d99a131b1 reveal.js: Make sure images with r-stretch are not in p tags.
They must be direct children of the section.
There was previously code to make this work with the older
class name `stretch`.
See https://github.com/jgm/pandoc/issues/5965#issuecomment-1006623836
2022-01-06 10:56:33 -08:00
John MacFarlane
517d7a9cd3 reveal.js: don't add r-fit-text class to section.
It must go on header only.  See
https://github.com/jgm/pandoc/issues/5965#issuecomment-1006623836
2022-01-06 10:45:15 -08:00
Lucas Viana
4be41e3bb5 Org reader: support counter cookies in lists
This adds support for counter cookies in org lists. Such cookies are
used to override the item counter in ordered lists. In org it is
possible to set the counter at any list item, but since Pandoc AST does
not support this, we restrict the usage to setting an offset for the
entire ordered list, by using the cookie in the first list item.

Note that even though unordered lists do not have counters, Org Mode
still parses such cookies in unordered lists and suppresses them in the
output, so we do the same.

Also, even though org-list-allow-alphabetical is disabled in Emacs by
default, for some reason alphabetical cookies are always parsed and used
in Org Mode regardlessly of whether this option is enabled or the list
style is decimal, so we do the same.

E.g.
 2. test
 3. test
Is parsed as an ordered list starting at 1, as before. This also
conforms to Org Mode behaviour.

 1. [@2] test
 2. test
Is now parsed as an ordered list starting at 2, so that it conforms to
Org Mode behaviour.

Note that when parsing
 1. [@2] test
 2. [@9] test
the second cookie is silenced and the entire list starts at 2. This is
because the current Pandoc AST does not support expressing a change in
the counter at a specific item.
2022-01-06 19:33:13 +01:00
John MacFarlane
ea74582288 AsciiDoc writer: improve detection of intraword emphasis.
Closes #7803.
2022-01-05 14:48:19 -08:00
Albert Krewinkel
1f8638fb54 Lua: add pandoc.template module
The module provides a `compile` function to use strings as templates.
2022-01-04 11:55:59 -08:00
Albert Krewinkel
974a9d353a Lua: marshal templates as opaque userdata values 2022-01-04 11:55:59 -08:00
Albert Krewinkel
6a5ac90bf1 Lua: add pandoc.WriterOptions constructor 2022-01-04 11:55:59 -08:00
Albert Krewinkel
0d1d52f0a0 Lua: add function pandoc.write 2022-01-04 11:55:59 -08:00
Albert Krewinkel
5c53837259 Stop exporting writeCustom from module T.P.Writers [API change]
This ensures that all writer exported in T.P.Writers are parameterized
and work with any `PandocMonad` type. This is consistent with
T.P.Readers, as `readCustom` is not exported from that module either.
2022-01-04 11:55:59 -08:00
John MacFarlane
322364ff66 Markdown writer: fix indentation issue in footnotes.
Closes #7801.
2022-01-03 18:50:44 -08:00
John MacFarlane
53699f2ab3 DocBook reader: be sensitive to spacing="compact" in lists.
When spacing="compact" is set, Para elements are turned
into Plain, so we get a "tight" list.

Closes #7799.
2022-01-03 14:19:53 -08:00
John MacFarlane
ca7a3ed5ed Issue error with --list-extensions for invalid formats.
Cloess #7797.
2022-01-03 11:08:14 -08:00
John MacFarlane
cdfdfae4dd parseFormatSpec: cleaner error message for invalid extensions. 2022-01-03 10:32:57 -08:00
John MacFarlane
0abfe4fdab Minor code improvement. 2022-01-03 10:26:18 -08:00
John MacFarlane
75c5218d4f Don't read sources until in/out format are verified.
Partially addresses #7797.
2022-01-03 10:16:07 -08:00
Tuong Nguyen Manh
32297d5677 Odt: Add list-header
The list-header is a type of list-item.
Therefore, it will be treated exactly like one.
2022-01-02 15:05:09 -08:00
Albert Krewinkel
b7a44f9d19 Copyright notices: update for 2022 2022-01-02 11:59:22 -08:00
Albert Krewinkel
efdba79ad1 Lua writer: allow variables to be set via second return value of Doc
New templates variables can be added by giving variable-value pairs as a
second return value of the global function `Doc`.

Example:

    function Doc (body, meta, vars)
      vars.date = vars.date or os.date '%B %e, %Y'
      return body, vars
    end

Closes: #6731
2022-01-02 11:55:02 -08:00
Albert Krewinkel
85334eb6c4
Lua writer: provide global PANDOC_WRITER_OPTIONS
Closes: #6731
2022-01-02 13:57:01 +01:00
John MacFarlane
6121de369c Use latest version of KaTeX. 2022-01-01 23:29:46 -08:00
Albert Krewinkel
1e60181ee3 Lua: provide global PANDOC_WRITER_OPTIONS [API change]
API changes:

- The function T.P.Filter.applyFilters now takes a filter
  environment of type `Environment`, instead of a ReaderOptions value.
  The `Environment` type is exported from `T.P.Filter` and allows to
  combine ReaderOptions and WriterOptions in a single value.

- Global, exported from T.P.Lua, has a new type constructor
  `PANDOC_WRITER_OPTIONS`.

Closes: #5221
2022-01-01 14:31:42 -08:00
Albert Krewinkel
b5da58e8b4
Apply some HLint suggestions 2022-01-01 20:22:24 +01:00
Albert Krewinkel
eae9be3a48
Org reader: allow trailing spaces after key/value pairs in directives
Ensures that spaces at the end of attribute directives like
`#+ATTR_HTML: :width 100%` (note the trailing spaces) are accepted.
2022-01-01 13:44:14 +01:00
Albert Krewinkel
e58a5ceed8
Lua: marshal ReaderOptions field extensions, track_changes via JSON
Extensions are now available as a list of strings; the track-changes
settings are given as the kebab-case representation used in JSON.
2022-01-01 13:44:13 +01:00
Albert Krewinkel
03054a33e8 Lua: use global state when parsing documents in pandoc.read
The function `pandoc.read` is updated to use the same state that was
used while parsing the main input files. This ensures that log messages
are preserved and that images embedded in the input are added to the
mediabag.
2021-12-31 17:35:52 -08:00
Albert Krewinkel
d6e66b1f1d
Lua: cleanup stack in peekReadOptionsTable
A ReaderOptions element was left on top of the stack when the
`peekReadOptionsTable` function was invoked.
2021-12-31 11:02:16 +01:00
John MacFarlane
7ff1b798c4 Docx reader: handle multiple pic elements inside a drawing.
Closes #7786.
2021-12-30 21:26:30 -08:00
John MacFarlane
cc30d646ca Docx reader: change elemToParPart to return [ParPart]
...instead of ParPart.

Also remove NullParPart constructor, as it is no longer
needed.

This will allow us to handle elements that contain multiple
ParParts, e.g. w:drawing elements with multiple pic:pic.

See #7786.
2021-12-30 21:26:30 -08:00
John MacFarlane
4ff997bf68 Fix ghc 9.2.1 warnings. 2021-12-30 21:26:30 -08:00
Albert Krewinkel
2dd1cde715 Lua: allow binary (byte string) readers to be used with pandoc.read 2021-12-30 22:41:15 +01:00
John MacFarlane
d960282b10 Use splitDirectories istead of splitPath.
We were using `splitPath` in two places in the code
where `splitDirectories` should have been used.

This led to a test for `..` in paths in `extractMedia`
failing, so that images with `..` in the path name
could be extracted outside the directory specified
by `extractMedia`.

It also led a test for `media` in resource paths to fail
in the docx reader.
2021-12-28 16:31:54 -08:00
John MacFarlane
7d56650e01 OpenDocument writer: fix vertical align bug with display math.
Previously some displayed formulas would be floated above
a preceding text line.  This is fixed by setting vertical-rel
to 'text' rather than 'paragraph-content'.

Closes #7777.
2021-12-28 16:06:25 -08:00
Albert Krewinkel
fbd2c8e376
Lua: improve handling of empty caption, body by from_simple_table
Create truly empty table caption and body when these are empty in the
simple table.

Fixes: #7776
2021-12-25 21:20:18 +01:00
John MacFarlane
811601aa8b RTF writer: properly handle images in data URIs.
See #7771.
2021-12-22 11:59:07 -08:00
John MacFarlane
c4f6e6cb57 HTML writer: make line breaks more consistent.
- With `--wrap=none`, we now output line breaks between
  block-level elements. Previously they were omitted
  entirely, so the whole document was on one line, unless
  there were literal line breaks in pre sections.  This makes
  the HTML writer's behavior more consistent with that of
  other writers.

- Put newline after `<dd>`.

- Put newlines after block-level elements in footnote section.
2021-12-22 09:45:02 -08:00
John MacFarlane
7a9832166e Add text wrapping to HTML output.
Previously the HTML writer was exceptional in not being
sensitive to the `--wrap` option.  With this change `--wrap`
now works for HTML. The default (as with other formats) is
automatic wrapping to 72 columns.

A new internal module, T.P.Writers.Blaze, exports `layoutMarkup`.
This converts a blaze Html structure into a doclayout Doc Text.

In addition, we now add a line break between an `img` tag
and the associated `figcaption`.

Note: Output is never wrapped in `writeHtmlStringForEPUB`.
This accords with previous behavior since previously the HTML
writer was insensitive to `--wrap` settings.  There's no real
need to wrap HTML inside a zipped container.

Note that the contents of script, textarea, and pre tags are
always laid out with the `flush` combinator, so that unwanted
spaces won't be introduced if these occur in an indented context
in a template.

Closes #7764.
2021-12-22 09:45:02 -08:00
Albert Krewinkel
0bdf373157
Lua: simplify code of pandoc.utils.stringify
Minor behavior change: plain strings nested in tables are now included
in the result string.
2021-12-21 21:50:13 +01:00
Albert Krewinkel
17a32a99a5
Lua: simplify and deprecate function pandoc.utils.equals
The function is no longer required for element comparisons; it is now an
alias for the `==` operator.
2021-12-21 19:01:11 +01:00
Albert Krewinkel
d7cab51982 Lua: add new library function pandoc.utils.type.
The function behaves like the default `type` function from Lua's
standard library, but is aware of pandoc userdata types. A typical
use-case would be to determine the type of a metadata value.
2021-12-21 09:24:21 -08:00
Albert Krewinkel
c90802d7d8
Lua: fix return types of blocks_to_inlines, make_sections
Ensures the returned lists have the correct type (`Inlines` and
`Blocks`, respectively).
2021-12-21 09:53:44 +01:00
Albert Krewinkel
cd2bffee1e
Lua: use more natural representation for Reference values
Omit `false` boolean values, push integers as numbers.
2021-12-20 09:41:03 +01:00
Albert Krewinkel
993222d2c9
Custom writer: assign default Pandoc object to global PANDOC_DOCUMENT
The default Pandoc object is now non-strict, i.e., only the parts of the
document that are accessed will be marshaled to Lua. A special type is
no longer necessary.

This change also makes it possible to use the global variable with
library functions such as `pandoc.utils.references`, or to inspect the
document contents with `walk()`.
2021-12-19 23:17:27 +01:00
binaarinen
0610f16f7f
Add a writer for Markua 0.10 (#7729)
Markua is a markdown variant used by Leanpub.
More information about Markua can be found at https://leanpub.com/markua/read.

Adds a new exported function `writeMarkua` from T.P.Writers.Markdown.
[API change]

Closes #1871.

Co-authored by Tim Wisotzki and Samuel Lemmenmeier.
2021-12-19 12:10:41 -08:00
Albert Krewinkel
f8f03c2ffc JATS writer: keep quotes in element-citations
The JATS writer was losing quotes in element-citations, as it uses the
`T.P.Citeproc.getReferences` function to get references. That function
replaces `Quoted` elements with spans. That transformation is required
in `T.P.Citeproc.processCitations`, so it has been moved there.
2021-12-19 12:03:01 -08:00
Albert Krewinkel
dc3dcc2ccd
Lua: fixup, should have been part of previous commit 2021-12-19 14:31:52 +01:00
John MacFarlane
4b220d592c Citeproc: avoid adding comma before an author-in-text citation...
...in a note if it begins with a title (no author).

Closes #7761.
2021-12-18 12:13:06 -08:00
Albert Krewinkel
7a70b87fac Lua: add function pandoc.utils.references
List with all cited references of a document.

Closes: #7752
2021-12-17 14:45:27 -08:00
John MacFarlane
61ffa55835 T.P.Citeproc: do not export getStyle, getCiteprocLang.
This commit undoes the API changes noted in
ea77f2e6f6

They are no longer needed, and we should avoid unnecessary
API changes.
2021-12-15 16:05:15 -08:00
John MacFarlane
a527a2f345 Org writer: use the citation locator list from the org source code...
which is not localized, instead of getting locators from the
localized CSL stylesheet as we did before.
2021-12-14 20:30:55 -08:00
John MacFarlane
394fa9d072 Org reader: parse official org-cite citations.
We also support the older org-ref style as a fallback.
We no longer support the "markdown-style" citations.

See #7329.
2021-12-14 11:34:32 -08:00
John MacFarlane
be0e3f9794 Markdown writer: avoid extra space before citation suffix...
if it already starts with a space.
2021-12-14 11:34:32 -08:00
John MacFarlane
d393f2f158 Markdown writer: ensure semicolon btw locator and next citation...
when an author-in-text citation has a locator and following
citations.
2021-12-14 11:34:32 -08:00
John MacFarlane
5817e86491 Org reader: remove support for "Berkeley style" citations.
See #7329.
2021-12-14 09:20:26 -08:00
John MacFarlane
9f089aa286 Org writer: add tests for org-cite citations, and improve support. 2021-12-13 12:11:58 -08:00
John MacFarlane
2015c9070d Markdown reader: fix parsing of "bare locators"...
...after author-in-text citations.

Previously `@item [p. 12; @item2]` was incorrectly parsed as
three citations rather than two.  This is now fixed by ensuring
that `prefix` doesn't gobble any semicolons.
2021-12-13 12:11:58 -08:00
John MacFarlane
ea77f2e6f6 Citeproc changes:
T.P.Citeproc exports `getCiteprocLang` and `getStyle` [API change].

T.P.Citeproc.Locator now exports `toLocatorMap`, `LocatorInfo`,
and `LocatorMap`.  The type of `parseLocator` has changed, so
it now takes a `LocatorMap` rather than a `Locale` as parameter,
and returns a `LocatorInfo` instead of a tuple.
2021-12-13 12:11:58 -08:00
John MacFarlane
0679620f92 Org writer: preliminary support for new org-cite syntax.
See #7329.

This could use some tests.
2021-12-12 23:42:13 -08:00
Kolen Cheung
a9a9a2c62a fix(IpynbOutput)!: rank always favors output format
Previously, both `fmt == f` case and Image have a rank of 1.
In the end, e.g. from ipynb to html conversion,
if both html and image exists, it actually prefers the image.
This commit changes this, so that fmt == f is always highest rank,
and rank never collides.
This is achieved by keeping fmt == f case having rank 1,
and every other rank increased by 1.
2021-12-11 09:42:30 -08:00
Albert Krewinkel
e88224621d Custom reader: ensure old Readers continue to work
Retry conversion by passing a string instead of sources when the
`Reader` fails with a message that hints at an outdated function. A
deprecation notice is reported in that case.
2021-12-11 08:59:11 -08:00
Albert Krewinkel
83b5b79c0e Custom reader: pass list of sources instead of concatenated text
The first argument passed to Lua `Reader` functions is no longer a plain
string but a richer data structure. The structure can easily be
converted to a string by applying `tostring`, but is also a list with
elements that contain each the *text* and *name* of each input source as
a property of the respective name.

A small example is added to the custom reader documentation, showcasing
its use in a reader that creates a syntax-highlighted code block for
each source code file passed as input.

Existing readers must be updated.
2021-12-11 08:59:11 -08:00
Albert Krewinkel
3e7b46af64
Switch to released pandoc-lua-marshal-0.1.2
Cell values are now marshaled as userdata objects; a constructor
function for table cells is provided as `pandoc.Cell`.
2021-12-10 17:24:50 +01:00
Kolen Cheung
20eb8ac7fd
ipynb writer: handle cell output with raw block of markdown (#7563)
Write RawBlock of markdown in code-cell output.

#7561 makes the ipynb reader reads code-cell output with mime
"text/markdown" to a RawBlock of markdown

This commit makes the ipynb writer writes this RawBlock of markdown
back inside a code-cell output with the same mime, preserving this
information in round-trip

Add tests of ipynb reader (#7561) and ipynb writer (#7563)'s ability to
handle a "text/markdown" mime type in a code-cell output
2021-12-09 20:36:56 -08:00
Albert Krewinkel
fa643ba6d7 Lua: update to latest pandoc-lua-marshal (0.1.1)
- `walk` methods are added to `Block` and `Inline` values; the methods
  are similar to `pandoc.utils.walk_block` and
  `pandoc.utils.walk_inline`, but apply to filter also to the element
  itself, and therefore return a list of element instead of a single
  element.

- Functions of name `Doc` are no longer accepted as alternatives for
  `Pandoc` filter functions. This functionality was undocumented.
2021-12-09 09:22:29 -08:00
John MacFarlane
9cbea695c4 Ipynb writer: ensure deterministic order of keys. 2021-12-08 23:21:39 -08:00
John MacFarlane
45e51ecd65 Revert "Markdown reader: Improve inlinesInBalancedBrackets."
This reverts commit fa83246d7d.
2021-12-07 23:52:29 -08:00
John MacFarlane
51142c6803 Ipynb reader & writer: properly handle cell "id".
This is passed through if it exists (in Nb4); otherwise
the writer will add a random one so that cells all have
an "id".

Closes #7728.
2021-12-06 23:40:51 -08:00
John MacFarlane
23b2617bf7 Ms writer: properly encode strings for PDF contents.
Closes #7731.
2021-12-06 12:00:08 -08:00
John MacFarlane
36807db531 Commonmark writer: allow ')' delimiters on ordered lists. 2021-12-05 11:26:01 -08:00
John MacFarlane
51f6f0e3a1 Improve Markdown writer escaping.
This fixes escaping for '#' in particular.
Closes #7726.
2021-12-03 17:52:47 -08:00
John MacFarlane
619dfa2a2a Markdown reader: don't allow ^ at beginning of link or image label.
This is reserved for footnotes.
Fixes a regression introduced by 0a93acf.

Closes #7723.
2021-11-30 12:53:54 -08:00
Albert Krewinkel
fa838deefc
Lua: remove pandoc.utils.text (#7720)
The new `pandoc.Inlines` function behaves identical on string input, but
allows other Inlines-like arguments as well.

The `pandoc.utils.text` function could be written as

    function pandoc.utils.text (x)
      assert(type(x) == 'string')
      return pandoc.Inlines(x)
    end
2021-11-29 09:12:30 -08:00
Albert Krewinkel
b9222e5cb1
Lua: add constructors pandoc.Blocks and pandoc.Inlines
The functions convert their argument into a list of Block and Inline
values, respectively.
2021-11-28 16:02:42 +01:00
Albert Krewinkel
3692a1d1e8
Lua: use package pandoc-lua-marshal (#7719)
The marshaling functions for pandoc's AST are extracted into a separate
package. The package comes with a number of changes:

  - Pandoc's List module was rewritten in C, thereby improving error
    messages.

  - Lists of `Block` and `Inline` elements are marshaled using the new
    list types `Blocks` and `Inlines`, respectively. These types
    currently behave identical to the generic List type, but give better
    error messages. This also opens up the possibility of adding
    element-specific methods to these lists in the future.

  - Elements of type `MetaValue` are no longer pushed as values which
    have `.t` and `.tag` properties. This was already true for
    `MetaString` and `MetaBool` values, which are still marshaled as Lua
    strings and booleans, respectively. Affected values:

      + `MetaBlocks` values are marshaled as a `Blocks` list;

      + `MetaInlines` values are marshaled as a `Inlines` list;

      + `MetaList` values are marshaled as a generic pandoc `List`s.

      + `MetaMap` values are marshaled as plain tables and no longer
        given any metatable.

  - The test suite for marshaled objects and their constructors has
    been extended and improved.

  - A bug in Citation objects, where setting a citation's suffix
    modified it's prefix, has been fixed.
2021-11-27 17:08:01 -08:00
John MacFarlane
0d25232bbf LaTeX reader: Fix semantics of \ref.
We were including the ams environment type in addition
to the number. This is proper behavior for `\cref` but
not for `\ref`.  To support `\cref` we need to store
the environment label separately.
2021-11-24 19:48:56 -08:00
John MacFarlane
2ca3993c67 LaTeX reader: improve references.
- Resolve references to theorem environments.
- Remove Span caused by "label" in figure, table, and theorem
  environments; this had an id that duplicated the environments' id.

See #813.
2021-11-24 18:41:20 -08:00
John MacFarlane
7726b69cd3 LaTeX reader: omit visible content for \label{...}.
Previously we included the text of the label in square brackets,
but this is undesirable in many cases.

See discussion in
<https://github.com/jgm/pandoc/issues/813#issuecomment-978232426>.
2021-11-24 14:47:00 -08:00
John MacFarlane
6072bdcec9 HTML reader: parse attributes on links and images.
Closes #6970.
2021-11-24 11:01:55 -08:00
Albert Krewinkel
a8638894ab
Lua: allow single elements as singleton MetaBlocks/MetaInlines
Single elements should always be treated as singleton lists in the Lua
subsystem.
2021-11-24 16:54:12 +01:00
John MacFarlane
79e6f8db13 Improve detection of pipe table line widths.
Fixed calculation of maximum column widths in pipe tables.
It is now based on the length of the markdown line, rather
than a "stringified" version of the parsed line.  This should
be more predictable for users. In addition, we take into account
double-wide characters such as emojis.

Closes #7713.
2021-11-23 13:29:25 -08:00
Albert Krewinkel
bffd74323c
Lua: add function pandoc.utils.text (#7710)
The function converts a string to `Inlines`, treating interword spaces
as `Space`s or `SoftBreak`s. If you want a `Str` with literal spaces,
use `pandoc.Str`.

Closes: #7709
2021-11-23 09:32:53 -08:00
Albert Krewinkel
0c0945b93c
Lua: split strings into words when treating them as Inline list (#7712)
Using a Lua string where a list of inlines is expected will cause the
string to be split into words, replacing spaces and tabs into
`pandoc.Space()` elements and newlines into `pandoc.SoftBreak()`.

The previous behavior was to treat the string `s` as `{pandoc.Str(s)}`.
The old behavior can be recovered by wrapping the string into a table
`{s}`.
2021-11-23 09:30:48 -08:00
Jörn Krenzer
17495bf8eb
Add .yml to Citeproc formatFromExtension (#7706)
Make Citeproc recognize files with .yml extension (in addition to .yaml)
as YAML bibliographies.

Closes #7707.
2021-11-22 08:42:09 -08:00
John MacFarlane
3f595659a3 yamlBsToRefs: allow multiple YAML documents.
Some people use `---` as the end delimiter in YAML
bibliography files, which causes the `yaml` library
to emit an error unless we explicitly allow multiple
YAML documents (and just consider the first).

In T.P.Readers.Metadata
2021-11-21 10:22:10 -08:00
Albert Krewinkel
96a4bbe264
Capture alt-text in JATS figures (#7703)
Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-11-20 09:48:01 -08:00
Albert Krewinkel
c1a82896c6
Lua: fix global module loading (#7701) 2021-11-19 20:59:23 +01:00
John MacFarlane
2b23861948 Remove unused line. 2021-11-19 10:21:23 -08:00
John MacFarlane
4f2eac88aa MediaWiki writer: fix code for generating spans for header IDs.
We need to generate a span when the header's ID doesn't match
the one MediaWiki would generate automatically.  But MediaWiki's
generation scheme is different from ours (it uses uppercase letters,
and `_` instead of `-`, for example).

This means that in going from markdown -> mediawiki, we'll now get
spans before almost every heading, unless explicit identifiers are
used that correspond to the ones MediaWiki auto-generates.
This is uglier output but it's necessary for internal links to
work properly.

See #7697.
2021-11-19 09:05:19 -08:00
John MacFarlane
df5ae1c186 HTML writer: Don't create invalid data- attribute...
for empty attribute key. (It would be better to make these
unrepresentable in the type system, but for now this is
an improvement.)

Closes #7546.
2021-11-19 08:50:18 -08:00
John MacFarlane
25bba0cc62 MediaWiki writer: use HTML spans for anchors when header has id.
Closes #7697.
2021-11-18 21:15:51 -08:00
willj-dev
005dc7ce56
RST reader: handle class attribute for for custom roles (#7700)
Previously the class attribute was ignored, and the name of the role used as the class.
Closes #7699.
2021-11-18 17:33:57 -08:00
John MacFarlane
1312594526 Babel mappings: use ancientgreek for grc. 2021-11-17 16:32:36 -08:00
Albert Krewinkel
cd91f72843
Lua: set lpeg, re as globals; allow shared lib access via require
The `lpeg` and `re` modules are loaded into globals of the respective
name, but they are not necessarily registered as loaded packages. This
ensures that

- the built-in library versions are preferred when setting the globals,
- a shared library is used if pandoc has been compiled without `lpeg`,
  and
- the `require` mechanism can be used to load the shared library if
  available, falling back to the internal version if possible and
  necessary.
2021-11-17 10:03:04 +01:00
Albert Krewinkel
305a4f406d
Lua: make loading of global LPeg modules more robust
Ignore errors if the normal package mechanism failed; this not only
covers the case of modules being unavailable on the system, but also
works if the modules are present, but fail to load for some reason.

This makes the built-in package version a true fallback.
2021-11-16 12:06:22 +01:00
John MacFarlane
c19f063420 Markdown writer: don't create autolinks when this loses information.
Previously we sometimes lost attributes when rendering links as autolinks.

Closes #7692.
2021-11-15 11:06:50 -08:00
Albert Krewinkel
ea268fd8a7
LaTeX reader: add rudimentary support for \autoref (#7693) 2021-11-15 09:40:50 -08:00
Albert Krewinkel
96a01451ef
JATS writer: ensure figures are wrapped with <p> in list items.
This prevents the generation of invalid output.
2021-11-12 13:29:08 +01:00
Albert Krewinkel
da96e1ff40
JATS writer: add URL to element citation entries
The URL of a reference, if present, is added in tag `<uri>` to
element-citation entries.
2021-11-12 11:56:58 +01:00
Christian Despres
abdfefebdf
Writers.Shared: Improve toLegacyTable.
Closes #7683.
(PR #7684)
2021-11-11 20:55:37 -08:00
Albert Krewinkel
ebf7f782d3 Lua: load re module available into global of the same name 2021-11-11 10:32:37 -08:00
John MacFarlane
6fb2973a58 Fix parsing of % in bibtex fields.
Closes #7678 (a bug introduced by 0a45f26).
2021-11-10 08:52:04 -08:00
John MacFarlane
03f9a0c61e Require ghc >= 8.6, base >= 4.12.
This allows us to get rid of the old custom prelude and
some crufty cpp.  But the primary reason for this is that
conduit has bumped its base lower bound to 4.12, making it
impossible for us to support lower base versions.
2021-11-09 23:43:12 -08:00
John MacFarlane
5fb3b82bdf Accept empty --metadata-file.
Closes #7675. This is a regression from 2.15 behavior.
2021-11-09 12:59:26 -08:00
Albert Krewinkel
d4c73d5e65
Lua: fix argument order in constructor pandoc.Cite.
This restores the old behavior; argument order had been switched
accidentally in pandoc 2.15.
2021-11-09 14:45:36 +01:00
John MacFarlane
9c153e3d6e With -t latex-smart, don't generate \ldots from ellipsis.
Instead just use unicode ellipsis.
Closes #7674.
2021-11-08 11:59:13 -08:00
John MacFarlane
0a45f2600a Properly handle commented lines in BibTeX/BibLaTeX.
Closes #7668.
2021-11-08 10:15:53 -08:00
Rowan Rodrik van der Molen
40aa74badc Add <titleabbr> support to DocBook reader 2021-11-08 07:30:20 -08:00
Albert Krewinkel
ab0fe676a8
Lua: ensure that 're' module is always available.
The module is shipped with LPeg.
2021-11-08 12:22:33 +01:00
John MacFarlane
cc46667953 LaTeX reader: add 'uri' class when parsing \url.
Closes #7672.
2021-11-07 21:08:20 -08:00
John MacFarlane
213913f025 Pass ReaderOptions to custom readers as second parameter. 2021-11-06 16:47:13 -07:00
Albert Krewinkel
4a3b3b1ac6
Lua: add Pushable instance for ReaderOptions 2021-11-06 17:58:44 +01:00
Albert Krewinkel
6b462e5933 Lua: allow to pass custom reader options to pandoc.read
Reader options can now be passed as an optional third argument to
`pandoc.read`. The object can either be a table or a ReaderOptions value
like `PANDOC_READER_OPTIONS`. Creating new ReaderOptions objects is
possible through the new constructor `pandoc.ReaderOptions`.

Closes: #7656
2021-11-06 09:04:29 -07:00
John MacFarlane
ee2f0021f9
Add interface for custom readers written in Lua. (#7671)
New module Text.Pandoc.Readers.Custom, exporting
readCustom [API change].

Users can now do `-f myreader.lua` and pandoc will treat the
script myreader.lua as a custom reader, which parses an input
string to a pandoc AST, using the pandoc module defined for
Lua filters.

A sample custom reader can be found in data/reader.lua.

Closes #7669.
2021-11-05 22:10:29 -07:00
Rowan Rodrik van der Molen
7a70a46c03
Support for <indexterm>s when reading DocBook (#7607)
* Support for <indexterm>s when reading DocBook
* Update implementation status of `<n-ary>` tags
* Remove non-idiomatic parentheses
* More complete `<indexterm>` support, with tests

Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-05 10:22:38 -07:00
Albert Krewinkel
f32fe9cbd5
T.P.Error: sort errors in handleError by exit code 2021-11-05 13:21:21 +01:00
Albert Krewinkel
ebdb39b3b4
Lua: display Pandoc values using their native Haskell representation 2021-11-05 13:11:02 +01:00
Albert Krewinkel
d089d799e7
Lua: always load lpeg as global module 2021-11-05 09:21:50 +01:00
Albert Krewinkel
a1b6bf69f2
Lua: include lpeg module (#7649)
Compiles the 'lpeg' library (Parsing Expression Grammars For Lua) into
the program.

Package maintainers may choose to rely on package dependencies to make
lpeg available, in which case they can compile the with the constraint
`lpeg +rely-on-shared-lpeg-library`.
2021-11-04 19:25:29 -07:00
John MacFarlane
0b254ea4af Allow plain to be used in raw attribute syntax. 2021-11-04 10:11:57 -07:00
Albert Krewinkel
c256ef34b3
Lua: add missing space in "package not found" message
Closes: #7658
2021-11-03 13:36:15 +01:00
John MacFarlane
fa83246d7d Markdown reader: Improve inlinesInBalancedBrackets.
This is just a small improvement in terms of performance,
but it's simpler and more direct code.

Also, we avoid parsing interparagraph spaces in balanced
brackets, as the original did.
2021-11-02 23:25:27 -07:00
John MacFarlane
938d557844 Docx reader: don't let first line indents trigger block quotes.
This fixes a regression introduced in pandoc 2.15 by PR #7606.
Closes #7655.
2021-11-02 14:04:38 -07:00
Albert Krewinkel
45bcd7d3f1
Lua: fix typo in SoftBreak constructor 2021-11-02 21:53:08 +01:00
Albert Krewinkel
421fd736d4
Lua: re-add content property to Strikeout elements
Fixes a regression introduced in 2.15.
2021-11-02 21:22:59 +01:00
Albert Krewinkel
cce49c5d4b
Lua: be more forgiving when retrieving the Image caption property
Fixes a regression introduced in 2.15.
2021-11-02 17:40:07 +01:00
John MacFarlane
70eeeca9c7 Docx writer: use getTimestamp for modification times in reference.docx.
This ensures that when `SOURCE_DATE_EPOCH` is set, the
modification times of files taken from the reference.docx will
be set deterministically, allowing for reproducible builds.

Closes #7654.
2021-11-02 09:29:34 -07:00
Albert Krewinkel
b26f950cca
Lua: display Attr values using their native Haskell representation 2021-11-02 17:25:47 +01:00
Albert Krewinkel
c467f0fed1
Lua: allow omitting the 2nd parameter in pandoc.Code constructor
Fixes a regression introduced in 2.15 which required users to always
specify an Attr value when constructing a Code element.
2021-11-02 17:23:24 +01:00
Albert Krewinkel
210e4c98b0
Lua: allow to compare, show Citation values
Comparisons of Citation values are performed in Haskell; values are
equal if they represent the same Haskell value. Converting a Citation
value to a string now yields its native Haskell string representation.
2021-11-02 16:49:50 +01:00
Albert Krewinkel
759aa50951
Lua: restore content property on Header elements 2021-11-01 15:43:51 +01:00
Albert Krewinkel
3de8f4fdc5
Lua: re-add content property to Link elements
This was a regression introduced in version 2.15.

Fixes: #7647
2021-10-31 11:15:50 +01:00
Joseph C. Sible
b3d09ebe01 Fix build on GHC 9.2 2021-10-30 11:39:20 -07:00
Tristan Stenner
6509ff6204 Docx writer: move ": " out of the caption bookmark.
This is needed so that native references to the figure are included as
"As seen in Figure X, it is..." instead of
"As seen in [Figure: , it is..."
2021-10-29 08:40:20 -07:00
Albert Krewinkel
f4d9b443d8
Lua: use hslua module abstraction where possible
This will make it easier to generate module documentation in the future.
2021-10-29 17:08:30 +02:00
Albert Krewinkel
af97598954
Lua: increase strictness when getting attribute keys 2021-10-28 10:32:00 +02:00
Albert Krewinkel
7fcf1d6184
Lua: re-add t and tag property to Attr values
Removal of these properties from Attr values was a regression.
2021-10-27 22:32:19 +02:00
John MacFarlane
25a86fc06f Markdown writer: Be sure to quote special values in YAML metadata.
E.g. "Y", "yes", which are now (with yaml library) considered
boolean values, as well as "null".

This fixes a bug with roundtripping markdown -> markdown:

```
---
foo: "true"
...
```
2021-10-27 12:50:51 -07:00
John MacFarlane
26a8de684e Change JSON encodings of some types.
- For LineEnding use lowercase constructors, e.g. `crlf`, `native`.
  This was the original intent, but there was a bug in the
  implementation.
- For HTMLSlideVariant use lowercase constructors.
- For ReaderOptions use e.g. `default-image-extension`
  instead of `readerDefaultImageExtension` for field names.
- For Extension, use e.g. `tex_math_dollars` instead of
  `Ext_tex_math_dollars` as constructor.
- For Extensions, use an array of Extensions, instead of
  an object wrapping the tag `Extensions` and an integer.
  (The representation is not supposed to be part of the
  public API.)
- For Opt, use field names like `tab-stop` instead of `optTabStop`.
2021-10-27 12:50:51 -07:00
John MacFarlane
d226a35c0a Switch back from HsYAML to yaml.
Reasons:

- Performance: HsYAML is around 20 times slower in parsing
  large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
  any attention.  HsYAML seems borderline unmaintained; it hasn't
  had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
  from C dependencies (#4535).  But I don't see a better alternative
  until a better pure Haskell parser is available.

Closes #6084.

Notes:

- We've removed the FromYAML instances for all types that had
  them, since this is a HsYAML-specific typeclass [API change].
  (The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
  parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
  Users may need to quote these when they are meant to be
  interpreted as strings.  Similarly, 'null' is parsed as
  a YAML null value (and will be treated as an empty string
  by pandoc rather than the string 'null').  Quoting it will
  force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
  escaping errors: instead of just falling back on treating
  the section as a table, it raises a YAML parsing error.
2021-10-27 12:50:51 -07:00
Albert Krewinkel
b990ca3c4c
Lua: fix pandoc.utils.stringify regression
The `pandoc.utils.stringify` function returned empty strings when called
with a string argument.
2021-10-27 20:56:30 +02:00
John MacFarlane
2910f9f3f8 Fix a copy/paste bug in Lua marshalling code.
This led changes in link properties in Lua filters to
change the links into images!

Closes #7639.
2021-10-26 21:48:56 -07:00
Albert Krewinkel
b95e864ecf
Lua: marshal SimpleTable values as userdata objects 2021-10-26 21:45:16 +02:00
Albert Krewinkel
80ed81822e
Lua: generate constants in module pandoc programmatically 2021-10-26 14:40:11 +02:00
Albert Krewinkel
f56d870631
Lua: marshal ListAttributes values as userdata objects 2021-10-26 14:40:11 +02:00
Albert Krewinkel
a493c7029c
Lua: marshal Block values as userdata objects
Properties of Block values are marshalled lazily, which generally
improves performance considerably. Script users may also notice the
following differences:

  - Block element properties can no longer be accessed by numerical
    indexing of the `.c` field. The `.c` property now serves as an alias
    for `.content`, so some filter that used this undocumented method
    for property access may continue to work, while others will need to
    be updated and use proper property names.

  - The marshalled Block elements now have a `show` method, and a
    `__tostring` metamethod. Both return the Haskell string
    representation of the element.

  - Block values now have the Lua type `userdata` instead of `table`.
2021-10-26 14:40:10 +02:00
Albert Krewinkel
230b133db5
Lua: marshal Citation values as userdata objects 2021-10-25 09:08:58 +02:00
Albert Krewinkel
2d3813e0dd Lua: convert IOErrors to PandocErrors in pandoc.pipe function
Fixes: #7523
2021-10-23 11:12:39 -07:00
John MacFarlane
c712d13b67 Org reader: allow an initial :PROPERTIES: drawer to add to metadata.
Closes #7520.
2021-10-22 22:10:25 -07:00
Aner Lucero
921af30854 Use simpleFigure in Readers. 2021-10-22 15:14:23 -07:00
Albert Krewinkel
c07005a095 Lua: marshal Version values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
6a03aca906 Lua: marshal Inline elements as userdata
This includes the following user-facing changes:

- Deprecated inline constructors are removed. These are `DoubleQuoted`,
  `SingleQuoted`, `DisplayMath`, and `InlineMath`.

- Attr values are no longer normalized when assigned to an Inline
  element property.

- It's no longer possible to access parts of Inline elements via
  numerical indexes. E.g., `pandoc.Span('test')[2]` used to give
  `pandoc.Str 'test'`, but yields `nil` now. This was undocumented
  behavior not intended to be used in user scripts. Use named properties
  instead.

- Accessing `.c` to get a JSON-like tuple of all components no longer
  works. This was undocumented behavior.

- Only known properties can be set on an element value. Trying to set a
  different property will now raise an error.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
8523bb01b2 Lua: marshal Attr values as userdata
- Adds a new `pandoc.AttributeList()` constructor, which creates the
  associative attribute list that is used as the third component of
  `Attr` values. Values of this type can often be passed to constructors
  instead of `Attr` values.

- `AttributeList` values can no longer be indexed numerically.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
e4287e6c95 Lua: marshal Pandoc values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
9e74826ba9 Switch to hslua-2.0
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.

Furthermore, new abstractions allow to document the code and exposed
functions.
2021-10-22 11:16:51 -07:00
John MacFarlane
ee34252219 Move splitStrWhen to T.P.Citeproc.Util.
Previously there were two copies, in BibTeX and Locator.
2021-10-21 22:27:07 -07:00
John MacFarlane
ef7b769fa0 SelfContained: fix bug that caused everything to be made a data uri.
All the code we needed to put most styles and scripts into
inline style and script tags was there, but because of the
order of pattern matching, it was never being called.
Putting the catch-all clause at the end fixes the bug.

Closes #7635, closes #7367.  See also #3423.
2021-10-21 21:51:53 -07:00
John MacFarlane
0a93acf91a Markdown reader: don't parse links or bracketed spans as citations.
Previously pandoc would parse

    [link to (@a)](url)

as a citation; similarly

    [(@a)]{#ident}

This is undesirable.  One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.

Closes #7632.
2021-10-20 10:34:47 -07:00
John MacFarlane
7754b7f2dd FormatHeuristics: remove .tei.xml extension for TEI.
As noted in #7630, this never worked, because `takeExtension`
only returns `.xml`.  So it won't be missed if we remove it.

Closes #7630.
2021-10-19 08:05:30 -07:00
Milan Bracke
465c28d28e Docx reader: fix handling of empty fields
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
2021-10-18 19:15:40 -07:00
Milan Bracke
6acc82c5d2 Docx parser: implement PAGEREF fields
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18 19:15:40 -07:00
Milan Bracke
193f6bfeba Docx reader: fix handling of nested fields
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.

To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
2021-10-18 19:15:40 -07:00
Emily Bourke
8de261ba4e pptx: Line up continuation paragraphs
This commit changes the `marL` and `indent` values used for plain
paragraphs and numbered lists, and changes the spacing defined in the
reference doc master for bulleted lists.

For paragraphs, there is now a left-indent taken from the `otherStyle`
in the master. For numbered lists, the number is positioned where the
text would be if this were a plain paragraph, and the text is indented
to the next level. This means that continuation paragraphs line up
nicely with numbered lists.

It also /mostly/ matches the observed PowerPoint behaviour when
inserting paragraphs and numbered lists: the only difference is that
PowerPoint was using a different margin value for the first level
numbered lists – I’ve changed this to match the other levels, as I don’t
think it makes the spacing unappealing and it allows continuation
paragraphs at any level to line up.

With bulleted lists, I’m keeping the observed PowerPoint behaviour of
specifying only a level, letting `marL` and `indent` be automatically
taken from `bodyStyle`. To that end, this commit changes the `bodyStyle`
spacing in the master of the default reference doc, to:

- line up the text of the first paragraph in each bullet with any
  continuation paragraphs
- line up nested bullet markers in any continuation paragraphs with the
  first paragraph, matching lists and plain paragraphs

This does mean the continuation paragraphs still won’t line up for
anyone using their own reference doc where they haven’t matched the
`otherStyle` and `bodyStyle` indent levels, but I think people in that
situation will be able to troubleshoot.
2021-10-17 17:24:30 -07:00
Emily Bourke
8981872bca pptx: Remove outdated comment
I removed the field this comment refers to recently, missed the
comment.
2021-10-17 17:24:30 -07:00
Emily Bourke
8af15ab345 pptx: Fix list level numbering
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.

At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.

This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.

- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
2021-10-17 17:24:30 -07:00
Samuel Tardieu
a41c1fe0bb asciidoc writer: translate numberLines attribute to linesnum switch
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:

```
[source%linesnum,haskell]
----
some Haskell code here
----
```
2021-10-14 13:41:12 -07:00
Samuel Tardieu
628cde48cf DocBook reader: honor linenumbering attribute
The attribute DocBook linenumbering="numbered" attribute on code blocks
maps to "numberLines" internally.
2021-10-14 09:04:56 -07:00
Samuel Tardieu
ed8877bd68 Remove redundant $
Found by hlint 3.3.1
2021-10-14 08:29:58 -07:00
John MacFarlane
49c4e1d014 Fix markdown parsing bug for math in bracketed spans and links.
This affects math with unbalanced brackets (e.g. `$(0,1]$`)
inside links, images, bracketed spans.

Closes #7623.
2021-10-13 08:59:37 -07:00
John MacFarlane
c636b5dd16 Revert "Depend on pandoc-types 1.23, remove Null constructor on Block."
This reverts commit fb0d6c7cb6.
2021-10-12 21:00:15 -07:00
John MacFarlane
906e6016bb T.P.Writers.Shared: remove 'breakable'...
which was introduced in the cherry-pick'd commit that
added splitSentences, but isn't needed here.
(It is for the nospace branch.)
2021-10-11 15:08:33 -07:00
John MacFarlane
5d17020a20 T.P.Writers.Shared: Export splitSentences as a Doc Text transform.
[API change]

Use this in man/ms.
2021-10-11 09:45:22 -07:00
John MacFarlane
2befeaa29f Remove splitSentences from T.P.Shared [API change].
We used to attempt automatic sentence splitting in man and ms
output, since sentence-ending periods need to be followed by
two spaces or a newline in these formats.

But it's difficult to do this reliably at the level of
`[Inline]`.
2021-10-11 09:35:50 -07:00
John MacFarlane
63ea754b49 Fix warning 2021-10-11 09:24:31 -07:00
John MacFarlane
84d68b92a2 LaTeX reader: Implement siunitx v3 commands.
We support `\unit`, `\qty`, `\qtyrange`, and `\qtylist`
as synonynms of `\si`, `\SI`, `\SIrange`, and `\SIlist`.

Closes #7614.
2021-10-11 08:54:45 -07:00
Milan Bracke
0f98cbff4b Avoid blockquote when parent style has more indent
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
2021-10-10 16:27:32 -07:00
John MacFarlane
c72277e986 LaTeX reader: Properly handle \^ followed by group closing.
Closes #7615.
2021-10-10 11:24:28 -07:00
John MacFarlane
d80aaee42b Translations: don't depend on the fact that Aeson Object is...
implemented internally as a HashMap.  This is no longer
public as of aeson 2.0.0.0.
2021-10-10 09:36:33 -07:00
John MacFarlane
5a1bd52677 Don't prepend file:// to --syntax-definition on Windows.
This was a fix for a problem in skylighting, but this
problem doesn't exist now that we've moved from HXT to
xml-conduit.

Cf. #6374.
2021-10-06 12:33:22 -07:00