Update changelog.

2021-02-21 22:05:50 -08:00 · 2021-02-21 22:05:50 -08:00 · 607c014e9d
commit 607c014e9d
parent aae1e617a6
1 changed files with 355 additions and 0 deletions
--- a/changelog.md
+++ b/changelog.md
@ -1,5 +1,360 @@
 # Revision history for pandoc
 ## pandoc 2.12 (UNRELEASED -- PROVISIONAL)
  * Add new unexported module Text.Pandoc.XML.Light, as well
    as Text.Pandoc.XML.Light.Types, Text.Pantoc.XML.Light.Proc,
    Text.Pandoc.XML.Light.Output.  (Closes #6001, #6565, #7091).
    This module exports definitions of `Element` and `Content`
    that are isomorphic to xml-light's, but with Text
    instead of String.  This allows us to keep most of the code in existing
    readers that use xml-light, but avoid lots of unnecessary allocation.
    We also add versions of the functions from xml-light's
    Text.XML.Light.Output and Text.XML.Light.Proc that operate on our
    modified XML types, and functions that convert xml-light types to our
    types (since some of our dependencies, like texmath, use xml-light).
    We export functions that use xml-conduit's parser to produce an
    `Element` or `[Content]`.  This allows existing pandoc code to use
    a better parser without much modification.
    The new parser is used in all places where xml-light's parser was
    previously used.  Benchmarks show a significant performance improvement
    in parsing XML-based formats (with docbook, opml, jats, and docx
    almost twice as fast, odt and fb2 more than twice as fast).
    In addition, the new parser gives us better error reporting than
    xml-light.  We report XML errors, when possible, using the new
    `PandocXMLError` constructor in `PandocError`.
    These changes revealed the need for some changes in the tests.  The
    docbook-reader.docbook test lacked definitions for the entities it used;
    these have been added. And the docx golden tests have been updated,
    because the new parser does not preserve the order of attributes.
  * Text.Pandoc.App
    + Add `parseOptionsFromArgs` [API change, new exported function].
  * Text.Pandoc.Citeproc.BibTeX
    + `Text.Pandoc.Citeproc.writeBibTeXString` now returns
      `Doc Text` instead of `Text` (#7068).
    + Correctly handle `pages` (= `page` in CSL) (#7067).
    + Correctly handle BibLaTeX `langid` (= `language` in CSL, #7067).
    + In BibTeX output, protect foreign titles since there's no language
      field (#7067).
    + Clean up BibTeX parsing (#7049).  Previously there was a messy code
      path that gave strange results in some cases, not passing through raw
      tex but trying to extract a string content.  This was an artefact of
      trying to handle some special bibtex-specific commands in the BibTeX
      reader. Now we just handle these in the LaTeX reader and simplify
      parsing in the BibTeX reader. This does mean that more raw tex will
      be passed through (and currently this is not sensitive to the
      `raw_tex` extension; this should be fixed).
  * Text.Pandoc.Citeproc.MetaValue
    + Correctly parse "raw" date value in markdown references metadata.
      (See jgm/citeproc#53.)
  * Text.Pandoc.Class
    + Add `getTimestamp` [API change].  This attempts to read the
      `SOURCE_DATE_EPOCH` environment variable and parse a UTC time
      from it (treating it as a unix date stamp, see
      https://reproducible-builds.org/specs/source-date-epoch/). If the
      variable is not set or can't be parsed as a unix date stamp, then the
      function returns the current date.
  * Text.Pandoc.Error
    + Remove unused variables (Albert Krewinkel)
    + Export `renderError` [API change].
    + Refactor `handleError` to use `renderError`. This allows us render
      error messages without exiting.
  * Text.Pandoc.Extensions
    + `Ext_task_lists` is now supported by org (and turned
      on by default) (Albert Krewinkel, #6336).
    + Remove `Ext_fenced_code_attributes` from allowed commonmark attributes
      (#7097).  This attribute was listed as allowed, but it didn't actually
      do anything. Use `attributes` for code attributes and more.
  * Lua subsystem:
    + Always load built-in Lua scripts from default data-dir (Albert
      Krewinkel).  The Lua modules `pandoc` and `pandoc.List` are now always
      loaded from the system's default data directory. Loading from a
      different directory by overriding the default path, e.g. via
      `--data-dir`, is no longer supported to avoid unexpected behavior
      and to address security concerns.
    + Add module "pandoc.path" (Albert Krewinkel, #6001, #6565).
      The module allows to work with file paths in a convenient and
      platform-independent manner.
  * Text.Pandoc.PDF
    + Disable `smart` extension when building PDF via LaTeX.
      This is to prevent accidental creation of ligatures like
      `` ?` `` and `` !` `` (especially in languages with quotations like
      German), and similar ligature issues.  (See jgm/citeproc#54.)
  * DocBook reader:
    + Avoid expensive tree normalization step, as it is not necessary
      with the new XML parser.
    + Support `informalfigure` (#7079) (Nils Carlson).
  * Docx reader:
    + Use Map instead of list for Namespaces.  This gives a speedup of
      about 5-10%. With this and the XML parsing changes, the docx reader
      is now about twice as fast as in the previous release.
  * HTML reader:
    + Small performance tweaks.
    + Also, remove exported class `NamedTag(..)` [API change]. This was just
      intended to smooth over the transition from String to Text and is no
      longer needed.
    + As a result, the functions `isInlineTag` and `isBlockTag`
      are no longer polymorphic; they apply to a `Tag Text` [API change].
    + Do a lookahead to find the right parser to use.  This takes
      benchmarks from 34ms to 23ms, with less allocation.
    + Fix bad handling of empty `src` attribute in `iframe` (#7099).
      If `src` is empty, we simply skip the `iframe`.
      If `src` is invalid or cannot be fetched, we issue a warning
      nd skip instead of failing with an error.
  * JATS reader:
    + Avoid tree normalization, which is no longer necessary given the
      new XML parser.
  * LaTeX reader:
    + Code cleanup, removing some unnecessary things.
    + Rewrite `withRaw` so it doesn't rely on fragile assumptions
      about token positions (which break when macros are expanded)
      (#7092).  This requires the addition of `sEnableWithRaw` and
      `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw`
      to disable collecting of raw tokens in certain contexts.
      Add `parseFromToks` to Text.Pandoc.Readers.LaTeX.Parsing.
      Fix parsing of single character tokens so it doesn't mess
      up the new raw token collecting.  These changes slightly increase
      allocations and have a small performance impact.
    + Handle some bibtex/biblatex-specific commands that used to be
      dealt with in pandoc-citeproc (#7049).
    + Optimize `satisfyTok`, avoiding unnecessary macro expansion steps.
      Benchmarks after this change show 2/3 of the run time and 2/3 of the
      allocation of the Feb. 10 benchmarks.
    + Removed `sExpanded` in state.  This isn't actually needed and checking
      it doesn't change anything.
    + Improve `braced'`.  Remove the parameter, have it parse the
      opening brace, and make it more efficient.
  * Markdown reader:
    + Improved handling of mmd link attributes in references (#7080).
      Previously they only worked for links that had titles.
  * OPML reader:
    + Avoid tree normalization, which is no longer necessary with the
      new XML parser.
  * ODT reader:
    + Finer-grained errors on parse failure (#7091).
    + Give more information if the zip container can't be unpacked.
  * Org reader:
    + Support `task_lists` extension (Albert Krewinkel, #6336).
    + Fix bug in org-ref citation parsing (Albert Krewinkel, #7101).
      The org-ref syntax allows to list multiple citations separated by
      comma.  Previously commas were accepted as part of the citation id,
      so all citation lists were parsed as one single citation.
  * RST reader:
    + Use `getTimestamp` instead of `getCurrentTime` to fetch timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds.
    + RST reader: fix handling of header in CSV tables (#7064).
      The interpretation of this line is not affected by the delim option.
  * Jira reader:
    + Modified the Doc parser to skip leading blank lines. This fixes
      parsing of documents which start with multiple blank lines (#7095).
    + Prevent URLs within link aliases to be treated as autolinks
      (#6944).
  * Text.Pandoc.Shared
    + Remove formerly exported functions that are no longer used in the
      code base: `splitByIndices`, `splitStringByIndicies`, `substitute`,
      and `underlineSpan` (which had been deprecated in April 2020)
      [API change].
    + Export `handleTaskListItem` (Albert Krewinkel) [API change].
  * BibTeX writer:
    + BibTeX writer: use doclayout and doctemplate.  This change allows
      bibtex/biblatex output to wrap as other formats do,
      depending on the settings of `--wrap` and `--columns` (#7068).
  * CSL JSON writer:
    + Output `[]` if no references in input, instead of raising a
      PandocAppError as before.
  * Docx writer:
    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds.
  * Text.Pandoc.Writers.EPUB
    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds (#7093).
      This does not suffice to fully enable reproducible in EPUB, since
      a unique id is still being generated for each build.
    + Support `belongs-to-collection` metadata (#7063) (Nick Berendsen).
  * JATS writer:
    + Escape special chars in reference elements (Albert Krewinkel).
      Prevents the generation of invalid markup if a citation element
      contains an ampersand or another character with a special meaning
      in XML.
  * LaTeX writer:
    + Adjust hypertargets to beginnings of paragraphs (#7078).
      Use `\vadjust pre` so that the hypertarget takes you to the beginning
      of the paragraph rather than one line down.
      This makes a particular difference for links to citations using
      `--citeproc` and `link-citations: true`.
    + Change BCP47 lang tag from `jp` to `ja` (Mauro Bieg, #7047).
  * Markdown writer:
    + Handle math right before digit.  We insert an HTML comment to
      avoid a `$` right before a digit, which pandoc will not recognize
      as a math delimiter.
  * ODT writer:
    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds.
    + Update default ODT style (Lorenzo).  Previously, the "First paragraph"
      style inherited from "Standard" but not from "Text body." Now
      it is adjusted to inherit from "Text body", to avoid some ugly
      spacing issues. It may be necessary to update a custom `reference.odt`
      in light of this change.
  * Org writer:
    + Support `task_lists` extension (Albert Krewinkel, #6336).
  * Pptx writer:
    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds.
  * JATS templates: tag `author.name` as `string-name` (Albert Krewinkel).
    The partitioning the components of a name into surname, given names,
    etc. is not always possible or not available. Using `author.name`
    allows to give the full name as a fallback to be used when
    `author.surname` is not available.
  * Add default templates for bibtex and biblatex, so that
    the variables `header-include`, `include-before`, `include-after`
    (or alternatively the command line options
    `--include-in-header`, `--include-before-body`, `--include-after-body`)
    may be used.
  * LaTeX template: Update to iftex package (#7073) (Andrew Dunning)
  * revealjs template: Add 'center' option for vertical slide centering.
    (maurerle, #7104).
  * Text.Pandoc.XML: Improve efficiency of `fromEntities`.
  * Test suite: a more robust way of testing the executable.
    Many of our tests require running the pandoc executable. This is
    problematic for a few different reasons. First, cabal-install will
    sometimes run the test suite after building the library but before
    building the executable, which means the executable isn't in place for
    the tests. One can work around that by first building, then building and
    running the tests, but that's fragile.  Second, we have to find the
    executable. So far, we've done that using a function `findPandoc` that
    attempts to locate it relative to the test executable (which can be
    located using findExecutablePath).  But the logic here is delicate and
    work with every combination of options.  To solve both problems, we add
    an `--emulate` option to the `test-pandoc` executable.  When `--emulate`
    occurs as the first argument passed to `test-pandoc`, the program simply
    emulates the regular pandoc executable, using the rest of the arguments
    (after `--emulate`). Thus, `test-pandoc --emulate -f markdown -t latex`
    is just like `pandoc -f markdown -t latex`.
    Since all the work is done by library functions, implementing this
    emulation just takes a couple lines of code and should be entirely
    reliable.  With this change, we can test the pandoc executable by running
    the test program itself (locatable using `findExecutablePath`) with the
    `--emulate` option. This removes the need for the fragile `findPandoc`
    step, and it means we can run our integration tests even when we're just
    building the library, not the executable.  [Note: part of this change
    involved simplifying some complex handling to set environment variables
    for dynamic library paths.  I have tested a build with
    `--enable-dynamic-executable`, and it works, but further testing may be
    needed.]
  * MANUAL.txt
    + MANUAL: block-level formatting is not allowed in line blocks (#7107).
    + Clarify `tex_math_dollars` extension.  Note that no blank lines
      are allowed between the delimiters in display math.
    + Add MANUAL section on reproducible builds.
    + Document no template fallback for absolute path (#7077, Nixon
      Enraght-Moony.)
    + Improve docs for cite-method.
    + Update README and man page.
  * Makefile: in `make bench`, create CSV files for comparison and compare
    against previous benchmark run.  Add timestamp to CSV filenames.
  * doc/lua-filters.md: improve documentation for
    `pandoc.mediabag.insert`, `pandoc.mediabag.fetch`,
    `directory`, `normalize` (Albert Krewinkel).
  * Allow base64-bytestring-1.2.* (Dmitrii Kovanikov)
  * Require jira-wiki-markup 1.3.3 (Albert Krewinkel)
  * Require citeproc 0.3.0.7, which correctly titlecases when titles
    contain non-ASCII characters.
  * Avoid unnecessary use of NoImplicitPrelude pragma (#7089) (Albert
    Krewinkel)
  * Benchmarks
    + Use the lighter-weight tasty-bench instead of criterion.
    + Run writer benchmarks for binary formats too.
    + Alphabetize benchmarks.
    + Don't run benchmarks for bibliography formats
      (yet; we need a special input for them).
    + Show allocation data
    + Clean up benchmark code.
    + Allow specifying patterns using `-p blah'.
 ## pandoc 2.11.4 (2021-01-22)
  * Add `biblatex`, `bibtex` as output formats (closes #7040).