Update changelog.

2021-03-02 14:23:02 -08:00 · 2021-03-02 14:23:02 -08:00 · 95b1327f6b
commit 95b1327f6b
parent af9f6c00aa
1 changed files with 182 additions and 98 deletions
--- a/changelog.md
+++ b/changelog.md
@ -2,6 +2,11 @@

 ## pandoc 2.12 (UNRELEASED -- PROVISIONAL)

+  * When downloading content from URL arguments, be sensitive to
+    the character encoding (#5600).  We can properly handle UTF-8 and latin1
+    (ISO-8859-1); for others we raise an error.  Fall back to latin1 if
+    no charset is given in the mime type and UTF-8 decoding fails.
+
  * Add new unexported module Text.Pandoc.XML.Light, as well
    as Text.Pandoc.XML.Light.Types, Text.Pantoc.XML.Light.Proc,
    Text.Pandoc.XML.Light.Output.  (Closes #6001, #6565, #7091).
@ -34,75 +39,6 @@
    these have been added. And the docx golden tests have been updated,
    because the new parser does not preserve the order of attributes.

-  * Text.Pandoc.App
-
-    + Add `parseOptionsFromArgs` [API change, new exported function].
-
-  * Text.Pandoc.Citeproc.BibTeX
-
-    + `Text.Pandoc.Citeproc.writeBibTeXString` now returns
-      `Doc Text` instead of `Text` (#7068).
-    + Correctly handle `pages` (= `page` in CSL) (#7067).
-    + Correctly handle BibLaTeX `langid` (= `language` in CSL, #7067).
-    + In BibTeX output, protect foreign titles since there's no language
-      field (#7067).
-    + Clean up BibTeX parsing (#7049).  Previously there was a messy code
-      path that gave strange results in some cases, not passing through raw
-      tex but trying to extract a string content.  This was an artefact of
-      trying to handle some special bibtex-specific commands in the BibTeX
-      reader. Now we just handle these in the LaTeX reader and simplify
-      parsing in the BibTeX reader. This does mean that more raw tex will
-      be passed through (and currently this is not sensitive to the
-      `raw_tex` extension; this should be fixed).
-
-  * Text.Pandoc.Citeproc.MetaValue
-
-    + Correctly parse "raw" date value in markdown references metadata.
-      (See jgm/citeproc#53.)
-
-  * Text.Pandoc.Class
-
-    + Add `getTimestamp` [API change].  This attempts to read the
-      `SOURCE_DATE_EPOCH` environment variable and parse a UTC time
-      from it (treating it as a unix date stamp, see
-      https://reproducible-builds.org/specs/source-date-epoch/). If the
-      variable is not set or can't be parsed as a unix date stamp, then the
-      function returns the current date.
-
-  * Text.Pandoc.Error
-
-    + Remove unused variables (Albert Krewinkel)
-    + Export `renderError` [API change].
-    + Refactor `handleError` to use `renderError`. This allows us render
-      error messages without exiting.
-
-  * Text.Pandoc.Extensions
-
-    + `Ext_task_lists` is now supported by org (and turned
-      on by default) (Albert Krewinkel, #6336).
-    + Remove `Ext_fenced_code_attributes` from allowed commonmark attributes
-      (#7097).  This attribute was listed as allowed, but it didn't actually
-      do anything. Use `attributes` for code attributes and more.
-
-  * Lua subsystem:
-
-    + Always load built-in Lua scripts from default data-dir (Albert
-      Krewinkel).  The Lua modules `pandoc` and `pandoc.List` are now always
-      loaded from the system's default data directory. Loading from a
-      different directory by overriding the default path, e.g. via
-      `--data-dir`, is no longer supported to avoid unexpected behavior
-      and to address security concerns.
-    + Add module "pandoc.path" (Albert Krewinkel, #6001, #6565).
-      The module allows to work with file paths in a convenient and
-      platform-independent manner.
-
-  * Text.Pandoc.PDF
-
-    + Disable `smart` extension when building PDF via LaTeX.
-      This is to prevent accidental creation of ligatures like
-      `` ?` `` and `` !` `` (especially in languages with quotations like
-      German), and similar ligature issues.  (See jgm/citeproc#54.)
-
  * DocBook reader:

    + Avoid expensive tree normalization step, as it is not necessary
@ -137,6 +73,11 @@

  * LaTeX reader:

+    + Don't export `tokenize`, `untokenize` [API change].  These are internal
+      implementation details, which were only exported for testing.
+      They don't belong in the public API.
+    + Improved efficiency of the parser.  With these changes the reader
+      is almost twice as fast as in the last release in our benchmarks.
    + Code cleanup, removing some unnecessary things.
    + Rewrite `withRaw` so it doesn't rely on fragile assumptions
      about token positions (which break when macros are expanded)
@ -156,11 +97,19 @@
      it doesn't change anything.
    + Improve `braced'`.  Remove the parameter, have it parse the
      opening brace, and make it more efficient.
+    + Factor out pieces of the LaTeX reader to make the module smaller.
+      This reduces memory demands when compiling.  Created
+      Text.Pandoc.Readers.{LaTeX,Math,Citation,Table,Macro,Accent}.
+      Changed Text.Pandoc.Readers.LaTeX.SIunitx to export a command map
+      instead of individual commands.
+
+  * Make Text.Pandoc.Readers.LaTeX.Types an unexported module [API change].

  * Markdown reader:

    + Improved handling of mmd link attributes in references (#7080).
      Previously they only worked for links that had titles.
+    + Improved efficiency of the parser (benchmarks show a 15% speedup).

  * OPML reader:

@ -218,7 +167,7 @@
    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds.

-  * Text.Pandoc.Writers.EPUB
+  * EPUB writer:

    + Use `getTimestamp` instead of `getCurrentTime` for timestamp.
      Setting `SOURCE_DATE_EPOCH` will allow reproducible builds (#7093).
@ -233,6 +182,10 @@
      contains an ampersand or another character with a special meaning
      in XML.

+  * Jira writer:
+
+    + Use Span identifiers as anchors (Albert Krewinkel,
+
  * LaTeX writer:

    + Adjust hypertargets to beginnings of paragraphs (#7078).
@ -241,6 +194,8 @@
      This makes a particular difference for links to citations using
      `--citeproc` and `link-citations: true`.
    + Change BCP47 lang tag from `jp` to `ja` (Mauro Bieg, #7047).
+    + Use function instead of map for accent lookup (should be
+      more efficient).

  * Markdown writer:

@ -279,40 +234,151 @@
    `--include-in-header`, `--include-before-body`, `--include-after-body`)
    may be used.

-  * LaTeX template: Update to iftex package (#7073) (Andrew Dunning)
+  * LaTeX template:
+
+    + Update to iftex package (#7073) (Andrew Dunning)
+    + Wrap url colours in braces (#7121) (Loïc Grobol).

  * revealjs template: Add 'center' option for vertical slide centering.
    (maurerle, #7104).

  * Text.Pandoc.XML: Improve efficiency of `fromEntities`.

-  * Test suite: a more robust way of testing the executable.
-    Many of our tests require running the pandoc executable. This is
-    problematic for a few different reasons. First, cabal-install will
-    sometimes run the test suite after building the library but before
-    building the executable, which means the executable isn't in place for
-    the tests. One can work around that by first building, then building and
-    running the tests, but that's fragile.  Second, we have to find the
-    executable. So far, we've done that using a function `findPandoc` that
-    attempts to locate it relative to the test executable (which can be
-    located using findExecutablePath).  But the logic here is delicate and
-    work with every combination of options.  To solve both problems, we add
-    an `--emulate` option to the `test-pandoc` executable.  When `--emulate`
-    occurs as the first argument passed to `test-pandoc`, the program simply
-    emulates the regular pandoc executable, using the rest of the arguments
-    (after `--emulate`). Thus, `test-pandoc --emulate -f markdown -t latex`
-    is just like `pandoc -f markdown -t latex`.
-    Since all the work is done by library functions, implementing this
-    emulation just takes a couple lines of code and should be entirely
-    reliable.  With this change, we can test the pandoc executable by running
-    the test program itself (locatable using `findExecutablePath`) with the
-    `--emulate` option. This removes the need for the fragile `findPandoc`
-    step, and it means we can run our integration tests even when we're just
-    building the library, not the executable.  [Note: part of this change
-    involved simplifying some complex handling to set environment variables
-    for dynamic library paths.  I have tested a build with
-    `--enable-dynamic-executable`, and it works, but further testing may be
-    needed.]
+  * Text.Pandoc.MIME
+
+    + Add exported function `getCharset` [API change].
+
+  * Text.Pandoc.UTF8: change IO functions to return Text, not String
+    [API change].  This affects `readFile`, `getContents`, `writeFileWith`,
+    `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
+    `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.
+    This avoids the need to uselessly create a linked list of characters
+    when emiting output.
+
+  * Text.Pandoc.App
+
+    + Add `parseOptionsFromArgs` [API change, new exported function].
+
+  * Text.Pandoc.Citeproc.BibTeX
+
+    + `Text.Pandoc.Citeproc.writeBibTeXString` now returns
+      `Doc Text` instead of `Text` (#7068).
+    + Correctly handle `pages` (= `page` in CSL) (#7067).
+    + Correctly handle BibLaTeX `langid` (= `language` in CSL, #7067).
+    + In BibTeX output, protect foreign titles since there's no language
+      field (#7067).
+    + Clean up BibTeX parsing (#7049).  Previously there was a messy code
+      path that gave strange results in some cases, not passing through raw
+      tex but trying to extract a string content.  This was an artefact of
+      trying to handle some special bibtex-specific commands in the BibTeX
+      reader. Now we just handle these in the LaTeX reader and simplify
+      parsing in the BibTeX reader. This does mean that more raw tex will
+      be passed through (and currently this is not sensitive to the
+      `raw_tex` extension; this should be fixed).
+
+  * Text.Pandoc.Citeproc.MetaValue
+
+    + Correctly parse "raw" date value in markdown references metadata.
+      (See jgm/citeproc#53.)
+
+  * Text.Pandoc.Citeproc
+
+    + Use https URLs for links (Salim B, #7122).
+
+  * Text.Pandoc.Class
+
+    + Add `getTimestamp` [API change].  This attempts to read the
+      `SOURCE_DATE_EPOCH` environment variable and parse a UTC time
+      from it (treating it as a unix date stamp, see
+      https://reproducible-builds.org/specs/source-date-epoch/). If the
+      variable is not set or can't be parsed as a unix date stamp, then the
+      function returns the current date.
+
+  * Text.Pandoc.Error
+
+    + Add `PandocUnsupportedCharsetError` constructor for
+      `PandocError` [API change].
+    + Export `renderError` [API change].
+    + Refactor `handleError` to use `renderError`. This allows us render
+      error messages without exiting.
+    + Remove unused variables (Albert Krewinkel)
+
+  * Text.Pandoc.Extensions
+
+    + `Ext_task_lists` is now supported by org (and turned
+      on by default) (Albert Krewinkel, #6336).
+    + Remove `Ext_fenced_code_attributes` from allowed commonmark attributes
+      (#7097).  This attribute was listed as allowed, but it didn't actually
+      do anything. Use `attributes` for code attributes and more.
+
+  * Lua subsystem:
+
+    + Always load built-in Lua scripts from default data-dir (Albert
+      Krewinkel).  The Lua modules `pandoc` and `pandoc.List` are now always
+      loaded from the system's default data directory. Loading from a
+      different directory by overriding the default path, e.g. via
+      `--data-dir`, is no longer supported to avoid unexpected behavior
+      and to address security concerns.
+    + Add module "pandoc.path" (Albert Krewinkel, #6001, #6565).
+      The module allows to work with file paths in a convenient and
+      platform-independent manner.
+    + Use strict evaluation when retrieving AST value from the stack
+      (Albert Krewinkel, #6674).
+
+  * Text.Pandoc.PDF
+
+    + Disable `smart` extension when building PDF via LaTeX.
+      This is to prevent accidental creation of ligatures like
+      `` ?` `` and `` !` `` (especially in languages with quotations like
+      German), and similar ligature issues.  (See jgm/citeproc#54.)
+
+  * Text.Pandoc.CSV:
+
+    + Fix parsing of unquoted values (#7112).  Previously we didn't allow
+      unescaped quotes in unquoted values, but they are allowed
+      in CSV.
+
+  * Test suite:
+
+    + Use a more robust method for testing the executable.  Many
+      of our tests require running the pandoc executable. This
+      is problematic for a few different reasons. First,
+      cabal-install will sometimes run the test suite after
+      building the library but before building the executable,
+      which means the executable isn't in place for the tests.
+      One can work around that by first building, then building
+      and running the tests, but that's fragile.  Second, we
+      have to find the executable. So far, we've done that using
+      a function `findPandoc` that attempts to locate it
+      relative to the test executable (which can be located
+      using findExecutablePath).  But the logic here is delicate
+      and work with every combination of options.  To solve both
+      problems, we add an `--emulate` option to the
+      `test-pandoc` executable.  When `--emulate` occurs as the
+      first argument passed to `test-pandoc`, the program simply
+      emulates the regular pandoc executable, using the rest of
+      the arguments (after `--emulate`). Thus, `test-pandoc
+      --emulate -f markdown -t latex` is just like `pandoc -f
+      markdown -t latex`.  Since all the work is done by library
+      functions, implementing this emulation just takes a couple
+      lines of code and should be entirely reliable.  With this
+      change, we can test the pandoc executable by running the
+      test program itself (locatable using `findExecutablePath`)
+      with the `--emulate` option. This removes the need for the
+      fragile `findPandoc` step, and it means we can run our
+      integration tests even when we're just building the
+      library, not the executable.  [Note: part of this change
+      involved simplifying some complex handling to set
+      environment variables for dynamic library paths.  I have
+      tested a build with `--enable-dynamic-executable`, and it
+      works, but further testing may be needed.]
+    + Print accurate location if a test fails (Albert
+      Krewinkel).  Ensures that tasty-hunit reports the location
+      of the failing test instead of the location of the helper
+      `test` function.
+
+  * Documentation: Update URLs and use `https` where possible (#7122,
+    Salim B).

  * MANUAL.txt

@ -328,6 +394,9 @@
  * Makefile: in `make bench`, create CSV files for comparison and compare
    against previous benchmark run.  Add timestamp to CSV filenames.

+  * cabal.project: don't explicitly set -trypandoc.
+    If we do, this can't be overridden on the cabal command line.
+
  * doc/lua-filters.md: improve documentation for
    `pandoc.mediabag.insert`, `pandoc.mediabag.fetch`,
    `directory`, `normalize` (Albert Krewinkel).
@ -339,6 +408,14 @@
  * Require citeproc 0.3.0.7, which correctly titlecases when titles
    contain non-ASCII characters.

+  * Use skylighting 0.10.4.  This version of skylighting uses xml-conduit
+    rather than hxt. This speeds up parsing of XML syntax definitions
+    fourfold, and removes four packages from pandoc's dependency graph:
+    hxt-charproperties, hxt-unicode, hxt-regex-xmlschema, hxt.
+
+  * Add script `tools/parseTimings.pl` to help pin down which
+    modules take the most time and memory to compile.
+
  * Avoid unnecessary use of NoImplicitPrelude pragma (#7089) (Albert
    Krewinkel)

@ -353,6 +430,13 @@
    + Clean up benchmark code.
    + Allow specifying patterns using `-p blah'.

+  * trypandoc: add 2 second timeout.
+
+  * Use `-split-sections` in creating linux release binary.
+    This reduces executable size significantly (by about 30%).
+
+  * Remove `weigh-pandoc`.  It's not really useful any more, now that our
+    regular benchmarks include data on allocation.


 ## pandoc 2.11.4 (2021-01-22)