Commit graph

2016 commits

Author SHA1 Message Date
John MacFarlane
a31241a08b Markdown reader: use CommonMark rules for list item nesting.
Closes #3511.

Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces.  Now the indentation required is determined by the
first line of the list item:  to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker.  See the CommonMark
spec for more details and examples.

Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules.  Here are some examples
of texts that will be parsed differently:

    - a
      - b

will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.

    - a

          code

Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin.  With the
four-space rule, this would be a regular paragraph rather than a code
block.

    - a

            code

Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`.  With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).

This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
 #1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).

Users who want to use the old rules can select the `four_space_rule`
extension.

* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
  of `gobbleSpaces` has been changed so that a `ReaderOptions`
  parameter is not needed.
2017-08-19 15:45:01 -07:00
John MacFarlane
5ab1162def Markdown reader: fixed parsing of fenced code after list...
...when there is no intervening blank line.

Closes #3733.
2017-08-18 21:46:55 -07:00
John MacFarlane
7cac58f126 Markdown reader: parse -@roe as suppress-author citation.
Previously only `[-@roe]` (with brackets) was recognized as
suppress-author, and `-@roe` was treated the same as `@roe`.

Closes jgm/pandoc-citeproc#237.
2017-08-18 11:40:57 -07:00
John MacFarlane
bfbdfa646a LaTeX reader: implement \newtoggle, \iftoggle, \toggletrue|false
from etoolbox.

Closes #3853.
2017-08-18 10:13:41 -07:00
John MacFarlane
d1444b4ecd RST reader/writer: support unknown interpreted text roles...
...by parsing them as Span with "role" attributes.
This way they can be manipulated in the AST.

Closes #3407.
2017-08-17 16:01:44 -07:00
John MacFarlane
b1f6fb4af5 HTML reader: support column alignments.
These can be set either with a `width` attribute or
with `text-width` in a `style` attribute.

Closes #1881.
2017-08-17 12:08:32 -07:00
John MacFarlane
b9b35059f6 LaTeX reader: support \lq, \rq. 2017-08-17 12:08:32 -07:00
John MacFarlane
c175317d03 LaTeX reader: support \textquoteleft|right, \textquotedblleft|right.
Closes #3849.
2017-08-17 10:09:35 -07:00
John MacFarlane
ae61d5f57d LaTeX reader: rudimentary support for \hyperlink. 2017-08-16 10:56:16 -07:00
John MacFarlane
db715ca847 LaTeX reader: use Link instead of Span for \ref.
This makes more sense semantically and avoids unnecessary
Span [Link] nestings when references are resolved.
2017-08-16 10:56:12 -07:00
schrieveslaach
cf4b40162d LaTeX reader: add Support for glossaries and acronym package (#3589)
Acronyms are not resolved by the reader, but acronym and glossary information is put into attributes on Spans so that they can be processed in filters.
2017-08-16 10:24:46 -07:00
John MacFarlane
6aef1bd228 Better handle complex \def macros as raw latex. 2017-08-13 12:45:04 -07:00
John MacFarlane
425b731050 LaTeX reader: Allow @ as a letter in control sequences.
@ is commonly used in macros using `\makeatletter`.
Ideally we'd make the tokenizer sensitive to `\makeatletter`
and `\makeatother`, but until then this seems a good change.
2017-08-13 12:24:06 -07:00
John MacFarlane
bf9ec6dfd8 LaTeX reader: fix \let\a=0 case, with single character token. 2017-08-13 12:16:51 -07:00
John MacFarlane
f9656ece4e Resolve references to section numbers in LaTeX reader. 2017-08-13 11:48:44 -07:00
John MacFarlane
253a7c6201 LaTeX reader: track header numbers and correlate with labels. 2017-08-13 11:30:17 -07:00
schrieveslaach
2845ab5976 Put content of \ref, \label commands into span… (#3639)
* Put content of `\ref` and `\label` commands into Span elements so they can be used in filters.
* Add support for `\eqref`
2017-08-13 10:58:45 -07:00
John MacFarlane
0ab8670a0e LaTeX reader: Fixed space after \figurename etc. 2017-08-12 13:40:28 -07:00
John MacFarlane
3897df868a LaTeX reader: support \chaptername, \partname, \abstractname, etc.
See #3559.
Obsoletes #3560.
2017-08-12 13:28:18 -07:00
John MacFarlane
f035f0ffe3 LaTeX reader: have \setmainlanguage set lang in metadata. 2017-08-12 12:34:36 -07:00
John MacFarlane
74212eb1b0 Added support for translations (localization) (see #3559).
* readDataFile, readDefaultDataFile, getReferenceDocx,
  getReferenceODT have been removed from Shared and
  moved into Class.  They are now defined in terms of
  PandocMonad primitives, rather than being primitve
  methods of the class.

* toLang has been moved from BCP47 to Class.

* NoTranslation and CouldNotLoudTranslations have
  been added to LogMessage.

* New module, Text.Pandoc.Translations, exporting
  Term, Translations, readTranslations.

* New functions in Class: translateTerm, setTranslations.
  Note that nothing is loaded from data files until
  translateTerm is used; setTranslation just sets the
  language to be used.

* Added two translation data files in data/translations.

* LaTeX reader: Support `\setmainlanguage` or `\setdefaultlanguage`
  (polyglossia) and `\figurename`.
2017-08-11 22:22:31 -07:00
John MacFarlane
dee4cbc854 RST reader: implement csv-table directive.
Most attributes are supported, including `:file:` and `:url:`.
A (probably insufficient) test case has been added.

Closes #3533.
2017-08-10 15:01:14 -07:00
John MacFarlane
a5790dd308 RST reader: Basic support for csv-table directive.
* Added Text.Pandoc.CSV, simple CSV parser.
* Options still not supported, and we need tests.

See #3533.
2017-08-10 11:12:41 -07:00
John MacFarlane
f4bff5d359 RST reader: reorganize block parsers for ~20% faster parsing. 2017-08-09 21:16:17 -07:00
John MacFarlane
1dcecffef4 Removed spurious comments. 2017-08-09 20:53:42 -07:00
John MacFarlane
ac18ff90b2 Org reader: use org-language attribute rather than data-org-language. 2017-08-09 09:45:17 -07:00
John MacFarlane
96933c6043 Org reader: use tag-name attribute instead of data-tag-name. 2017-08-09 09:26:57 -07:00
John MacFarlane
09b7df472d LaTeX reader: Use label instead of data-label for label in caption.
See d441e656db, #3639.
2017-08-09 09:15:50 -07:00
bucklereed
db55f7c1b2 HTML reader: parse <main> like <div role=main>. (#3791)
* HTML reader: parse <main> like <div role=main>.

* <main> closes <p> and behaves like a block element generally
2017-08-09 09:10:12 -07:00
Alexander
cfa597fc2a Muse reader: simplify tableCell implementation (#3846) 2017-08-09 09:09:05 -07:00
John MacFarlane
606a8e2af4 RST reader: support :widths: attribute for table directive. 2017-08-08 20:48:30 -07:00
John MacFarlane
3752298d91 Thread options through CommonMark reader.
This is more efficient than doing AST traversals for
emojis and hard breaks.

Also make behavior sensitive to `raw_html` extension.
2017-08-08 13:55:19 -07:00
John MacFarlane
54658b923a Support hard_line_breaks in CommonMark reader. 2017-08-08 13:30:53 -07:00
John MacFarlane
714d8a6377 CommonMark reader: support emoji extension. 2017-08-08 12:05:20 -07:00
John MacFarlane
73caf92871 CommonMark reader: support gfm_auto_identifiers.
Added `Ext_gfm_auto_identifiers`: new constructor for `Extension`
in `Text.Pandoc.Extensions` [API change].

Use this in githubExtensions.

Closes #2821.
2017-08-08 11:43:35 -07:00
John MacFarlane
d752f85582 CommonMark reader: make exts depend on extensions. 2017-08-07 23:20:29 -07:00
John MacFarlane
91c989d622 Remove GFM modules; use CMarkGFM for both gfm and commonmark.
We no longer have a separate readGFM and writeGFM;
instead, we'll use readCommonMark and writeCommonMark
with githubExtensions.

It remains to implement these extensions conditionally.

Closes #3841.
2017-08-07 23:11:14 -07:00
John MacFarlane
2c0e989f9d Markdown reader: fixed spurious parsing as citation as reference def.
We now disallow reference keys starting with `@` if the
`citations` extension is enabled.  Closes #3840.
2017-08-07 21:00:57 -07:00
John MacFarlane
2c81c4c218 Added gfm (GitHub-flavored CommonMark) as an input and output format.
This uses bindings to GitHub's fork of cmark, so it should parse
gfm exactly as GitHub does (excepting certain postprocessing
steps, involving notifications, emojis, etc.).

* Added Text.Pandoc.Readers.GFM (exporting readGFM)
* Added Text.Pandoc.Writers.GFM (exporting writeGFM)
* Added `gfm` as input and output forma

Note that tables are currently always rendered as HTML
in the writer; this can be improved when CMarkGFM supports
tables in output.
2017-08-07 16:59:31 -07:00
John MacFarlane
190f36d2fd Small tweak to previous commit. 2017-08-07 16:11:13 -07:00
John MacFarlane
c806ef1b15 LaTeX reader: Support simple \def macros.
Note that we still don't support macros with fancy parameter
delimiters, like

    \def\foo#1..#2{...}
2017-08-07 16:06:19 -07:00
John MacFarlane
9e6b9cdc5f LaTeX reader: Support \let.
Also, fix regular macros so they're expanded at the
point of use, and NOT also the point of definition.
`\let` macros, by contrast, are expanded at the
point of definition.  Added an `ExpansionPoint`
field to `Macro` to track this difference.
2017-08-07 13:38:15 -07:00
Alexander
1b5bfced55 Muse reader: debug indented paragraph support (#3839)
Take only first line indentation into account
and do not start new paragraph on indentation change.
2017-08-06 21:43:59 -07:00
Jesse Rosenthal
a36a56b8ac Docx reader: Avoid 0-level headers.
We used to parse paragraphs styled with "HeadingN" as "nth-level
header." But if a document has a custom style named "Heading0", this
will produce a 0-level header, which shouldn't exist. We only parse
this style if N>0. Otherwise we treat it as a normal style name, and
follow its dependencies, if any.

Closes #3830.
2017-08-06 19:35:03 -07:00
Alexander
8164a005c0 Muse reader: debug list and list item separation rules (#3837) 2017-08-06 13:19:59 -07:00
bucklereed
685788cd4b LaTeX reader: plainbreak, fancybreak et al from the memoir class (#3833) 2017-08-05 10:03:31 -07:00
Alexander Krotov
7a3a8790de Muse reader: do not allow headers in blockquotes (#3831) 2017-08-03 15:41:45 -07:00
Alexander Krotov
38b6adaac0 Muse reader: do not parse blocks inside comments (#3828) 2017-08-03 09:11:00 -07:00
John MacFarlane
2b039acb4e Merge branch 'textcolor-support' of https://github.com/schrieveslaach/pandoc into schrieveslaach-textcolor-support 2017-07-25 11:42:10 +02:00
John MacFarlane
329b61ff5c LaTeX reader: support etoolbox's ifstrequal. 2017-07-24 11:20:59 +02:00