Commit graph

1932 commits

Author SHA1 Message Date
John MacFarlane
2710fc4261 Class: Renamed 'warn' to 'addWarning' and consolidated RTF writer.
* Renaming Text.Pandoc.Class.warn to addWarning avoids conflict
  with Text.Pandoc.Shared.warn.
* Removed writeRTFWithEmbeddedImages from Text.Pandoc.Writers.RTF.
  This is no longer needed; we automatically handle embedded images
  using the PandocM functions.  [API change]
2017-01-25 17:07:40 +01:00
John MacFarlane
830be4d632 Refactored math conversion in writers.
* Remove exported module `Text.Pandoc.Readers.TeXMath`
* Add exported module `Text.Pandoc.Writers.Math`
* The function `texMathToInlines` now lives in `Text.Pandoc.Writers.Math`
* Export helper function `convertMath` from `Text.Pandoc.Writers.Math`
* Use these functions in all writers that do math conversion.

This ensures that warnings will always be issued for failed
math conversions.
2017-01-25 17:07:40 +01:00
Jesse Rosenthal
650fa20788 Readers: pass errors straight up to PandocMonad.
Since we've unified error types, we can just throw the same error at
the toplevel.
2017-01-25 17:07:40 +01:00
Jesse Rosenthal
d5051ae101 Remove redundant imports from OPML reader. 2017-01-25 17:07:40 +01:00
Jesse Rosenthal
3574b98f81 Unify Errors. 2017-01-25 17:07:40 +01:00
Jesse Rosenthal
3f7b3f5fd0 Add Text2Tags to Text.Pandoc 2017-01-25 17:07:40 +01:00
Jesse Rosenthal
b53ebcdf8e Working on readers. 2017-01-25 17:07:40 +01:00
John MacFarlane
18e85f8dfb Changed readNative to use PandocMonad. 2017-01-25 17:07:40 +01:00
John MacFarlane
300d94ac24 Deleted whitespace at end of source lines. 2017-01-25 17:07:39 +01:00
Hubert Plociniczak
30b3412857 Added page breaks into Pandoc.
This requires an updated version of pandoc-types that
introduces PageBreak definition.
Not that this initial commit only introduces ODT pagebreaks
and distinguishes for it page breaks before, after, or both,
the paragraph, as read from the style definition.
2017-01-25 17:07:39 +01:00
Albert Krewinkel
5729f1f2ea
Org reader: allow short hand for single-line raw blocks
Single-line raw blocks can be given via `#+FORMAT: raw line`, where
`FORMAT` must be one of `latex`, `beamer`, `html`, or `texinfo`.

Closes: #3366
2017-01-19 20:33:05 +01:00
John MacFarlane
06bdb8dbab MediaWiki reader: improved handling of display math.
Sometimes display math is indented with more than one colon.
Previously we handled these cases badly, generating definition
lists and missing the math.

Closes #3362.
2017-01-19 11:24:19 +01:00
John MacFarlane
4781819a6b Fixed -f markdown_github-hard_line_breaks+escaped_line_breaks.
Previously this did not properly enable escaped line breaks.
Closes #3341.
2017-01-08 10:01:19 +01:00
Albert Krewinkel
4da41bdb8e
Remove pipe char irking the haddock coverage tool
Haddock documentation strings must be associated with functions. Remove
pipe char from a comment that was moved into a `do` block in
`Readers/Org/Inlines.hs`.
2017-01-06 18:59:07 +01:00
Albert Krewinkel
4ca420e937
Org reader: accept org-ref citations followed by commas
Bugfix for an issue which, whenever the citation was immediately followed by a
comma, prevented correct parsing of org-ref citations.
2017-01-06 18:22:19 +01:00
Albert Krewinkel
21e6ca1976 Org reader: ensure emphasis markup can be nested
Nested emphasis markup (e.g. `/*strong and emphasized*/`) was
interpreted incorrectly in that the inner markup was not recognized.
2017-01-05 23:30:46 +01:00
tgkokk
f2e3e756f8 MediaWiki reader: Fix quotation mark parsing (#3336)
Change MediaWiki reader's behavior when the smart option is parsed to
match other readers' behavior.

Fix #2012.
2017-01-05 21:24:33 +01:00
Mauro Bieg
0159956f7f markdown reader: disallow space between inline code and attributes (#3326)
closes #3323
2016-12-24 07:34:07 -07:00
Jesse Rosenthal
60004cd518 Docx reader: Empty header should be list of lists.
In the past, the docx reader wrote an empty header as an empty list. It
should have the same width as a row (and be filled with empty cells).

(Note that I've reordered the code here slightly to get rid of a call to
`head`. It wasn't unsafe because it tested for null, but it was a bit of
a smell.)
2016-12-13 07:04:40 -05:00
Jesse Rosenthal
8ced8cbc6e Docx reader: Ensure one-row tables don't have header.
Tables in MS Word are set by default to have special first-row
formatting, which pandoc uses to determine whether or not they have a
header. This means that one-row tables will, by default, have only a
header -- which we imagine is not what people want. This change
ensures that a one-row table is not understood to be a header only.

Note that this means that it is impossible to produce a header-only
table from docx, even though it is legal pandoc. But we believe that
in nearly all cases, it will be an accidental (and unwelcome) result

Closes #3285.
2016-12-08 07:01:01 -05:00
John MacFarlane
6708c6a7fc Removed debug trace from HTML reader. 2016-12-08 11:06:50 +01:00
John MacFarlane
7ce622475c HTML reader: Understand style=width: as well as width in col.
Closes #3286.
2016-12-07 15:21:01 +01:00
John MacFarlane
0e9c96d28a RST reader: print warnings when keys, substitition, notes not found.
Previously the parsers failed and we got raw text.
Now we get a link with an empty URL, or empty inlines in
the case of a note or substitution.
2016-12-07 13:03:56 +01:00
John MacFarlane
7fbfcb03d8 RST reader: fix hyperlink aliases.
`link <google_>`_

    .. _google: https://google.com

is really a reference link.

Closes #3283.
2016-12-07 12:54:25 +01:00
John MacFarlane
97274c9991 Fixed some bad regressions in HTML table parser.
This regression leads to the introduction of empty rows
in some circumstances.

Closes #3280.
2016-12-06 23:20:28 +01:00
John MacFarlane
ac83d4b806 Use new module from texmath to lookup MS font codepoints.
+ Removed Text.Pandoc.Readers.Docx.Fonts
+ Moved its code to texmath; we now use (from texmath 0.9)
  Text.TeXMath.Unicode.Fonts
+ Use texmath 0.9 (currently from git).
+ Updated epub tests because texmath now handles more mathml.
2016-11-30 00:43:55 +01:00
John MacFarlane
5222572033 HTML reader: improved table parsing.
We now check explicitly for non-1 rowspan or colspan
attributes, and fail when we encounter them. Previously
we checked that each row had the same number of cells,
but that could be true even with rowspans/colspans.
And there are cases where it isn't true in tables that
we can handle fine -- e.g. when a tr element is empty.
So now we just pad rows with empty cells when needed.

Closes #3027.
2016-11-26 22:28:28 +01:00
hubertp-lshift
015dead0bb [odt] Infer table's caption from the paragraph (#3224)
ODT's reader always put empty captions for the parsed
tables. This commit
1) checks paragraphs that follow the table definition
2) treats specially a paragraph with a style named 'Table'
3) does some postprocessing of the paragraphs that combines
 tables followed immediately by captions

The ODT writer used 'TableCaption' style name for the caption
paragraph. This commit follows the open office approach which
allows for appending captions to table but uses a built-in style
named 'Table' instead of 'TableCaption'. Any users of odt format
(both writer and reader) are therefore required to change the
style's name to 'Table', if necessary.
2016-11-26 21:45:56 +01:00
John MacFarlane
2873cd8288 LaTeX reader: don't treat \vspace and \hspace as block commands.
Fixed an error which came up, for example, with `\vspace`
inside a caption.  (Captions expect inlines.)

Closes #3256.
2016-11-26 21:27:56 +01:00
Albert Krewinkel
f4a8f12387
Org reader: respect column width settings
Table column properties can optionally specify a column's width with
which it is displayed in the buffer. Some exporters, notably the ODT
exporter in org-mode v9.0, use these values to calculate relative column
widths. The org reader now implements the same behavior.

Note that the org-mode LaTeX and HTML exporters in Emacs don't support
this feature yet, which should be kept in mind by users who use the
column widths parameters.

Closes: #3246
2016-11-24 20:07:39 +01:00
John MacFarlane
8d7ecc27a1 Allow beamer-style <...> options in raw LaTeX (also in Markdown).
This allows use of things like `\only<2,3>{my content}` in
Markdown that is going to be converted to beamer.

Closes #3184.
2016-11-20 21:17:41 +01:00
John MacFarlane
d905551b12 LaTeX reader: improved table handling.
We can now parse all of the tables emitted by pandoc in
our tests.

The only thing we don't get yet are alignments and
column widths in more complex tables.

See #2669.
2016-11-19 22:46:32 +01:00
John MacFarlane
f255625ad7 LaTeX reader: limited support for minipage. 2016-11-19 22:46:32 +01:00
Albert Krewinkel
64413b1ce2
Un-break Travis build
Remove whitespace before function documentation The extra spaced cause
problems with documentation tools and Travis tests are failing because
of this.
2016-11-19 22:30:02 +01:00
John MacFarlane
5a1796e650 LaTeX reader: improved parsing of tables.
Reader can now parse simple LaTeX tables such as those
generated by pandoc itself.

We still can't handle pandoc multiline tables which involve
minipages and column widths.

Partially addresses #2669.
2016-11-19 21:36:16 +01:00
John MacFarlane
e4798a6726 Fixed xref lookup in DocBook reader. Closes #3243.
It previously only worked when the qnames lacked the docbook
namespace URI.
2016-11-19 10:30:20 +01:00
Albert Krewinkel
1a8af5fc44
Org reader: Ensure images in paragraphs are not parsed as figures
This fixes a regression introduced in
7e5220b57c.
2016-11-19 01:17:04 +01:00
ickc
e8ce21d614 Small caps in Bracketed Spans (#3191)
* Markdown reader: modify bracketedSpan to check small caps

* MANUAL.txt: add description on the use of `bracketed_spans` in small cap

* Improve markdown readers: bracketedSpan function EXACTLY as spanHtml
2016-11-16 11:53:51 +01:00
John MacFarlane
298e6f38f9 Allow alignments to be specified in Markdown grid tables. 2016-11-15 16:41:54 +01:00
John MacFarlane
50f0cfcc1a HTML reader: only treat "a" element as link if it has href.
Otherwise treat as span.

Closes #3226.
2016-11-13 22:41:11 +01:00
Jesse Rosenthal
eea4d14f60 Docx reader: add a placeholder value for CHART.
We wrap `[CHART]` in a `<span class="chart">`. Note that it maps to
inlines because, in docx, anything in a drawing tag can be part of a
larger paragraph.
2016-11-10 13:19:27 -05:00
Jesse Rosenthal
7539de0287 Docx reader: Be more specific in parsing images
We not only want "w:drawing", because that could also include
charts. Now we specify "w:drawing"//"pic:pic". This shouldn't change
behavior at all, but it's a first step toward allowing other sorts of
drawing data as well.
2016-11-10 13:19:27 -05:00
Albert Krewinkel
7e5220b57c
Org reader: allow HTML attribs on non-figure images
Images which are the only element in a paragraph can still be given HTML
attributes, even if the image does not have a caption and is hence not a figure.
The following will add set the `width` attribute of the image to `50%`:

    #+ATTR_HTML: :width 50%
    [[file:image.jpg]]

Closes: #3222
2016-11-09 22:49:20 +01:00
Hubert Plociniczak
13bc573e7f Inline code when text has a special style
When a piece of text has a text 'Source_Text' then
we assume that this is a piece of the document
that represents a code that needs to be inlined.
Addapted an odt writer to also reflect that change;
previously it was just writing a 'preformatted' text using
a non-distinguishable font style.

Code blocks are still not recognized by the ODT reader.
That's a separate issue.
2016-11-08 09:29:46 -05:00
John MacFarlane
eced02d70e Markdown reader: Allow reference link labels starting with @...
...if citations extension disabled.  Example:  in

    [link text][@a]

    [@a]: url

`link text` isn't hyperlinked because `[@a]` is parsed as a citation.
Previously this happened whether or not the `citations` extension was
enabled. Now it happens only if the `citations` extension is enabled.

Closes #3209.
2016-11-05 21:14:20 +01:00
Jesse Rosenthal
4a99e142ec Docx Reader: abstract out function to avoid code repetition. 2016-11-02 12:28:56 -04:00
Jesse Rosenthal
effc348965 Docx reader: Handle Alt text and titles in images.
We use the "description" field as alt text and the "title" field as
title. These can be accessed through the "Format Picture" dialog in
Word.
2016-11-02 12:10:45 -04:00
Jesse Rosenthal
1138ae6656 Docx reader utils: handle empty namespace in elemName
Previously, if given an empty namespace:

    (elemName ns "" "foo")

`elemName` would output a QName with a `Just ""` namespace. This is
never what we want. Now we output a `Nothing`. If someone *does* want a
`Just ""` in the namespace, they can enter the QName value explicitly.
2016-11-02 12:10:45 -04:00
John MacFarlane
bdda4b185b HTML reader: treat <math> as MathML by default...
unless something else is explicitly specified in xmlns.
Provided it parses as MathML, of course.

Also fixed default which should be to inline math if no
display attribute is used.
2016-11-02 16:43:36 +01:00
John MacFarlane
705df61198 LaTeX reader: Handle BVerbatim from fancyvrb. Fixes #3203. 2016-11-02 12:05:56 +01:00