Commit graph

7597 commits

Author SHA1 Message Date
Albert Krewinkel
af97598954
Lua: increase strictness when getting attribute keys 2021-10-28 10:32:00 +02:00
Albert Krewinkel
7fcf1d6184
Lua: re-add t and tag property to Attr values
Removal of these properties from Attr values was a regression.
2021-10-27 22:32:19 +02:00
John MacFarlane
25a86fc06f Markdown writer: Be sure to quote special values in YAML metadata.
E.g. "Y", "yes", which are now (with yaml library) considered
boolean values, as well as "null".

This fixes a bug with roundtripping markdown -> markdown:

```
---
foo: "true"
...
```
2021-10-27 12:50:51 -07:00
John MacFarlane
26a8de684e Change JSON encodings of some types.
- For LineEnding use lowercase constructors, e.g. `crlf`, `native`.
  This was the original intent, but there was a bug in the
  implementation.
- For HTMLSlideVariant use lowercase constructors.
- For ReaderOptions use e.g. `default-image-extension`
  instead of `readerDefaultImageExtension` for field names.
- For Extension, use e.g. `tex_math_dollars` instead of
  `Ext_tex_math_dollars` as constructor.
- For Extensions, use an array of Extensions, instead of
  an object wrapping the tag `Extensions` and an integer.
  (The representation is not supposed to be part of the
  public API.)
- For Opt, use field names like `tab-stop` instead of `optTabStop`.
2021-10-27 12:50:51 -07:00
John MacFarlane
d226a35c0a Switch back from HsYAML to yaml.
Reasons:

- Performance: HsYAML is around 20 times slower in parsing
  large YAML bibliographies ().
- An issue was submitted to HsYAML, but it hasn't gotten
  any attention.  HsYAML seems borderline unmaintained; it hasn't
  had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
  from C dependencies ().  But I don't see a better alternative
  until a better pure Haskell parser is available.

Closes .

Notes:

- We've removed the FromYAML instances for all types that had
  them, since this is a HsYAML-specific typeclass [API change].
  (The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
  parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
  Users may need to quote these when they are meant to be
  interpreted as strings.  Similarly, 'null' is parsed as
  a YAML null value (and will be treated as an empty string
  by pandoc rather than the string 'null').  Quoting it will
  force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
  escaping errors: instead of just falling back on treating
  the section as a table, it raises a YAML parsing error.
2021-10-27 12:50:51 -07:00
Albert Krewinkel
b990ca3c4c
Lua: fix pandoc.utils.stringify regression
The `pandoc.utils.stringify` function returned empty strings when called
with a string argument.
2021-10-27 20:56:30 +02:00
John MacFarlane
2910f9f3f8 Fix a copy/paste bug in Lua marshalling code.
This led changes in link properties in Lua filters to
change the links into images!

Closes .
2021-10-26 21:48:56 -07:00
Albert Krewinkel
b95e864ecf
Lua: marshal SimpleTable values as userdata objects 2021-10-26 21:45:16 +02:00
Albert Krewinkel
80ed81822e
Lua: generate constants in module pandoc programmatically 2021-10-26 14:40:11 +02:00
Albert Krewinkel
f56d870631
Lua: marshal ListAttributes values as userdata objects 2021-10-26 14:40:11 +02:00
Albert Krewinkel
a493c7029c
Lua: marshal Block values as userdata objects
Properties of Block values are marshalled lazily, which generally
improves performance considerably. Script users may also notice the
following differences:

  - Block element properties can no longer be accessed by numerical
    indexing of the `.c` field. The `.c` property now serves as an alias
    for `.content`, so some filter that used this undocumented method
    for property access may continue to work, while others will need to
    be updated and use proper property names.

  - The marshalled Block elements now have a `show` method, and a
    `__tostring` metamethod. Both return the Haskell string
    representation of the element.

  - Block values now have the Lua type `userdata` instead of `table`.
2021-10-26 14:40:10 +02:00
Albert Krewinkel
230b133db5
Lua: marshal Citation values as userdata objects 2021-10-25 09:08:58 +02:00
Albert Krewinkel
2d3813e0dd Lua: convert IOErrors to PandocErrors in pandoc.pipe function
Fixes: 
2021-10-23 11:12:39 -07:00
John MacFarlane
c712d13b67 Org reader: allow an initial :PROPERTIES: drawer to add to metadata.
Closes .
2021-10-22 22:10:25 -07:00
Aner Lucero
921af30854 Use simpleFigure in Readers. 2021-10-22 15:14:23 -07:00
Albert Krewinkel
c07005a095 Lua: marshal Version values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
6a03aca906 Lua: marshal Inline elements as userdata
This includes the following user-facing changes:

- Deprecated inline constructors are removed. These are `DoubleQuoted`,
  `SingleQuoted`, `DisplayMath`, and `InlineMath`.

- Attr values are no longer normalized when assigned to an Inline
  element property.

- It's no longer possible to access parts of Inline elements via
  numerical indexes. E.g., `pandoc.Span('test')[2]` used to give
  `pandoc.Str 'test'`, but yields `nil` now. This was undocumented
  behavior not intended to be used in user scripts. Use named properties
  instead.

- Accessing `.c` to get a JSON-like tuple of all components no longer
  works. This was undocumented behavior.

- Only known properties can be set on an element value. Trying to set a
  different property will now raise an error.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
8523bb01b2 Lua: marshal Attr values as userdata
- Adds a new `pandoc.AttributeList()` constructor, which creates the
  associative attribute list that is used as the third component of
  `Attr` values. Values of this type can often be passed to constructors
  instead of `Attr` values.

- `AttributeList` values can no longer be indexed numerically.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
e4287e6c95 Lua: marshal Pandoc values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
9e74826ba9 Switch to hslua-2.0
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.

Furthermore, new abstractions allow to document the code and exposed
functions.
2021-10-22 11:16:51 -07:00
John MacFarlane
ee34252219 Move splitStrWhen to T.P.Citeproc.Util.
Previously there were two copies, in BibTeX and Locator.
2021-10-21 22:27:07 -07:00
John MacFarlane
ef7b769fa0 SelfContained: fix bug that caused everything to be made a data uri.
All the code we needed to put most styles and scripts into
inline style and script tags was there, but because of the
order of pattern matching, it was never being called.
Putting the catch-all clause at the end fixes the bug.

Closes , closes .  See also .
2021-10-21 21:51:53 -07:00
John MacFarlane
0a93acf91a Markdown reader: don't parse links or bracketed spans as citations.
Previously pandoc would parse

    [link to (@a)](url)

as a citation; similarly

    [(@a)]{#ident}

This is undesirable.  One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.

Closes .
2021-10-20 10:34:47 -07:00
John MacFarlane
7754b7f2dd FormatHeuristics: remove .tei.xml extension for TEI.
As noted in , this never worked, because `takeExtension`
only returns `.xml`.  So it won't be missed if we remove it.

Closes .
2021-10-19 08:05:30 -07:00
Milan Bracke
465c28d28e Docx reader: fix handling of empty fields
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
2021-10-18 19:15:40 -07:00
Milan Bracke
6acc82c5d2 Docx parser: implement PAGEREF fields
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18 19:15:40 -07:00
Milan Bracke
193f6bfeba Docx reader: fix handling of nested fields
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.

To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
2021-10-18 19:15:40 -07:00
Emily Bourke
8de261ba4e pptx: Line up continuation paragraphs
This commit changes the `marL` and `indent` values used for plain
paragraphs and numbered lists, and changes the spacing defined in the
reference doc master for bulleted lists.

For paragraphs, there is now a left-indent taken from the `otherStyle`
in the master. For numbered lists, the number is positioned where the
text would be if this were a plain paragraph, and the text is indented
to the next level. This means that continuation paragraphs line up
nicely with numbered lists.

It also /mostly/ matches the observed PowerPoint behaviour when
inserting paragraphs and numbered lists: the only difference is that
PowerPoint was using a different margin value for the first level
numbered lists – I’ve changed this to match the other levels, as I don’t
think it makes the spacing unappealing and it allows continuation
paragraphs at any level to line up.

With bulleted lists, I’m keeping the observed PowerPoint behaviour of
specifying only a level, letting `marL` and `indent` be automatically
taken from `bodyStyle`. To that end, this commit changes the `bodyStyle`
spacing in the master of the default reference doc, to:

- line up the text of the first paragraph in each bullet with any
  continuation paragraphs
- line up nested bullet markers in any continuation paragraphs with the
  first paragraph, matching lists and plain paragraphs

This does mean the continuation paragraphs still won’t line up for
anyone using their own reference doc where they haven’t matched the
`otherStyle` and `bodyStyle` indent levels, but I think people in that
situation will be able to troubleshoot.
2021-10-17 17:24:30 -07:00
Emily Bourke
8981872bca pptx: Remove outdated comment
I removed the field this comment refers to recently, missed the
comment.
2021-10-17 17:24:30 -07:00
Emily Bourke
8af15ab345 pptx: Fix list level numbering
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.

At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.

This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.

- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
2021-10-17 17:24:30 -07:00
Samuel Tardieu
a41c1fe0bb asciidoc writer: translate numberLines attribute to linesnum switch
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:

```
[source%linesnum,haskell]
----
some Haskell code here
----
```
2021-10-14 13:41:12 -07:00
Samuel Tardieu
628cde48cf DocBook reader: honor linenumbering attribute
The attribute DocBook linenumbering="numbered" attribute on code blocks
maps to "numberLines" internally.
2021-10-14 09:04:56 -07:00
Samuel Tardieu
ed8877bd68 Remove redundant $
Found by hlint 3.3.1
2021-10-14 08:29:58 -07:00
John MacFarlane
49c4e1d014 Fix markdown parsing bug for math in bracketed spans and links.
This affects math with unbalanced brackets (e.g. `$(0,1]$`)
inside links, images, bracketed spans.

Closes .
2021-10-13 08:59:37 -07:00
John MacFarlane
c636b5dd16 Revert "Depend on pandoc-types 1.23, remove Null constructor on Block."
This reverts commit fb0d6c7cb6.
2021-10-12 21:00:15 -07:00
John MacFarlane
906e6016bb T.P.Writers.Shared: remove 'breakable'...
which was introduced in the cherry-pick'd commit that
added splitSentences, but isn't needed here.
(It is for the nospace branch.)
2021-10-11 15:08:33 -07:00
John MacFarlane
5d17020a20 T.P.Writers.Shared: Export splitSentences as a Doc Text transform.
[API change]

Use this in man/ms.
2021-10-11 09:45:22 -07:00
John MacFarlane
2befeaa29f Remove splitSentences from T.P.Shared [API change].
We used to attempt automatic sentence splitting in man and ms
output, since sentence-ending periods need to be followed by
two spaces or a newline in these formats.

But it's difficult to do this reliably at the level of
`[Inline]`.
2021-10-11 09:35:50 -07:00
John MacFarlane
63ea754b49 Fix warning 2021-10-11 09:24:31 -07:00
John MacFarlane
84d68b92a2 LaTeX reader: Implement siunitx v3 commands.
We support `\unit`, `\qty`, `\qtyrange`, and `\qtylist`
as synonynms of `\si`, `\SI`, `\SIrange`, and `\SIlist`.

Closes .
2021-10-11 08:54:45 -07:00
Milan Bracke
0f98cbff4b Avoid blockquote when parent style has more indent
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
2021-10-10 16:27:32 -07:00
John MacFarlane
c72277e986 LaTeX reader: Properly handle \^ followed by group closing.
Closes .
2021-10-10 11:24:28 -07:00
John MacFarlane
d80aaee42b Translations: don't depend on the fact that Aeson Object is...
implemented internally as a HashMap.  This is no longer
public as of aeson 2.0.0.0.
2021-10-10 09:36:33 -07:00
John MacFarlane
5a1bd52677 Don't prepend file:// to --syntax-definition on Windows.
This was a fix for a problem in skylighting, but this
problem doesn't exist now that we've moved from HXT to
xml-conduit.

Cf. .
2021-10-06 12:33:22 -07:00
John MacFarlane
6c507a66cf Avoid bad wraps in markdown writer at the Doc Text level.
Previously we tried to do this at the Inline list level,
but it makes more sense to intervene on breaking spaces
at the Doc Text level.
2021-10-05 08:44:51 -07:00
John MacFarlane
b8d460eeab Powerpoint writer: consolidate text runs when possible.
This slims down the output files by avoiding unnecessary
text run elements.

Updated golden tests.
2021-10-04 12:24:12 -07:00
John MacFarlane
82d587493d Revert "Powerpoint writer: consolidate text run nodes."
This reverts commit 62f83aa486.

This was already being done, it seems.
I misidentified the problem; it is really with `Str ""` nodes.
2021-10-04 11:50:32 -07:00
John MacFarlane
62f83aa486 Powerpoint writer: consolidate text run nodes.
This should reduce the size of the generated files.
2021-10-04 11:45:01 -07:00
John MacFarlane
fb0d6c7cb6 Depend on pandoc-types 1.23, remove Null constructor on Block. 2021-10-01 15:42:00 -07:00
nuew
021cdb543b epub: Add EPUB3 subject metadata (authority/term)
This adds the ability to specify EPUB 3 `authority` and `term` specific
refinements to the `subject` tag. Specifying a plain `subject` tag in
metadata will function as before.
2021-09-30 20:53:07 -07:00