Commit graph

1379 commits

Author SHA1 Message Date
hubertp-lshift
a840e41fad Test case for issue #3223 (#3308) 2016-12-13 17:20:10 +01:00
hubertp-lshift
46fa347471 Test case for bug 2752 (#3306) 2016-12-13 14:20:21 +01:00
Jesse Rosenthal
60004cd518 Docx reader: Empty header should be list of lists.
In the past, the docx reader wrote an empty header as an empty list. It
should have the same width as a row (and be filled with empty cells).

(Note that I've reordered the code here slightly to get rid of a call to
`head`. It wasn't unsafe because it tested for null, but it was a bit of
a smell.)
2016-12-13 07:04:40 -05:00
Jesse Rosenthal
8ced8cbc6e Docx reader: Ensure one-row tables don't have header.
Tables in MS Word are set by default to have special first-row
formatting, which pandoc uses to determine whether or not they have a
header. This means that one-row tables will, by default, have only a
header -- which we imagine is not what people want. This change
ensures that a one-row table is not understood to be a header only.

Note that this means that it is impossible to produce a header-only
table from docx, even though it is legal pandoc. But we believe that
in nearly all cases, it will be an accidental (and unwelcome) result

Closes #3285.
2016-12-08 07:01:01 -05:00
John MacFarlane
65c0e527f8 Fixed tests with dynamic linking.
Closes #2709.
2016-12-07 15:05:30 +01:00
John MacFarlane
7fbfcb03d8 RST reader: fix hyperlink aliases.
`link <google_>`_

    .. _google: https://google.com

is really a reference link.

Closes #3283.
2016-12-07 12:54:25 +01:00
Albert Krewinkel
bfa734c402
LaTeX writer: Fix unnumbered headers when used with --top-level
Fix interaction of top-level divisions `part` or `chapter` with
unnumbered headers when emitting LaTeX.  Headers are ensured to be
written using stared commands (like `\subsection*{}`).

Fixes: #3272
2016-12-04 21:15:52 +01:00
John MacFarlane
7ace7dd66b Markdown writer: Fixed incorrect word wrapping.
Previously pandoc would sometimes wrap lines too early
due to this bug.

Closes #3277.
2016-12-04 17:13:06 +01:00
John MacFarlane
fb8a2540bd Options: Removed writerStandalone, made writerTemplate a Maybe.
Previously setting writerStandalone = True did nothing unless
a template was provided in writerTemplate.  Now a fragment
will be generated if writerTemplate is Nothing; otherwise,
the specified template will be used and standalone output
generated.  [API change]
2016-11-30 15:34:58 +01:00
John MacFarlane
ac83d4b806 Use new module from texmath to lookup MS font codepoints.
+ Removed Text.Pandoc.Readers.Docx.Fonts
+ Moved its code to texmath; we now use (from texmath 0.9)
  Text.TeXMath.Unicode.Fonts
+ Use texmath 0.9 (currently from git).
+ Updated epub tests because texmath now handles more mathml.
2016-11-30 00:43:55 +01:00
Albert Krewinkel
1fc07ff4da Refactor top-level division selection (#3261)
The "default" option is no longer represented as `Nothing` but via a new
type constructor, making the `Maybe` wrapper superfluous.

The default behavior of using heuristics can now be enabled explicitly
by setting `--top-level-division=default`.

API change (`Text.Pandoc.Options`): The `Division` type was renamed to
`TopLevelDivision`. The `Section`, `Chapter`, and `Part` constructors
were renamed to `TopLevelSection`, `TopLevelChapter`, and
`TopLevelPart`, respectively. An additional `TopLevelDefault`
constructor was added, which is now also the new default value of the
`writerTopLevelDivision` field in `WriterOptions`.
2016-11-27 20:31:04 +01:00
hubertp-lshift
015dead0bb [odt] Infer table's caption from the paragraph (#3224)
ODT's reader always put empty captions for the parsed
tables. This commit
1) checks paragraphs that follow the table definition
2) treats specially a paragraph with a style named 'Table'
3) does some postprocessing of the paragraphs that combines
 tables followed immediately by captions

The ODT writer used 'TableCaption' style name for the caption
paragraph. This commit follows the open office approach which
allows for appending captions to table but uses a built-in style
named 'Table' instead of 'TableCaption'. Any users of odt format
(both writer and reader) are therefore required to change the
style's name to 'Table', if necessary.
2016-11-26 21:45:56 +01:00
Albert Krewinkel
baa25362a4 Allow to overwrite top-level division type heuristics (#3258)
Pandoc uses heuristics to determine the most resonable top-level
division type when emitting LaTeX or Docbook markup.  It is now possible
to overwrite this implicitly set top-level division via the
`top-level-division` command line parameter.

API change (`Text.Pandoc.Options`): the type of the
`writerTopLevelDivision` field in of the `WriterOptions` data type is
altered from `Division` to `Maybe Division`. The field's default value
is changed from `Section` to `Nothing`.

Closes: #3197
2016-11-26 21:43:46 +01:00
John MacFarlane
e4798a6726 Fixed xref lookup in DocBook reader. Closes #3243.
It previously only worked when the qnames lacked the docbook
namespace URI.
2016-11-19 10:30:20 +01:00
Albert Krewinkel
1a8af5fc44
Org reader: Ensure images in paragraphs are not parsed as figures
This fixes a regression introduced in
7e5220b57c.
2016-11-19 01:17:04 +01:00
John MacFarlane
298e6f38f9 Allow alignments to be specified in Markdown grid tables. 2016-11-15 16:41:54 +01:00
John MacFarlane
064e3f8c55 Markdown writer: fixed inconsistent spacing issue.
Previously a tight bullet sublist got rendered with
a blank line after, while a tight ordered sublist did
not.  Now we don't get the blank line in either case.

Closes #3232.
2016-11-15 10:32:16 +01:00
John MacFarlane
50f0cfcc1a HTML reader: only treat "a" element as link if it has href.
Otherwise treat as span.

Closes #3226.
2016-11-13 22:41:11 +01:00
Albert Krewinkel
7e5220b57c
Org reader: allow HTML attribs on non-figure images
Images which are the only element in a paragraph can still be given HTML
attributes, even if the image does not have a caption and is hence not a figure.
The following will add set the `width` attribute of the image to `50%`:

    #+ATTR_HTML: :width 50%
    [[file:image.jpg]]

Closes: #3222
2016-11-09 22:49:20 +01:00
Hubert Plociniczak
13bc573e7f Inline code when text has a special style
When a piece of text has a text 'Source_Text' then
we assume that this is a piece of the document
that represents a code that needs to be inlined.
Addapted an odt writer to also reflect that change;
previously it was just writing a 'preformatted' text using
a non-distinguishable font style.

Code blocks are still not recognized by the ODT reader.
That's a separate issue.
2016-11-08 09:29:46 -05:00
Jesse Rosenthal
5684577da5 Docx reader/writer: Update tests for img title and alt
Closes #3204
2016-11-02 12:12:36 -04:00
hubertp-lshift
01a21dd43f [odt] Infer tables' header props from rows (#3199)
ODT reader simply provided an empty header list
which meant that the contents of the whole table,
even if not empty, was simply ignored.
While we still do not infer headers we at least have
to provide default properties of columns.
2016-11-01 10:07:39 +01:00
John MacFarlane
e08ffa562a Added a test case with a complex raw latex environment in Markdown. 2016-10-31 22:08:47 +01:00
Albert Krewinkel
4f06e6c445
Org reader: support ATTR_HTML for special blocks
Special blocks (i.e. blocks with unrecognized names) can be prefixed
with an `ATTR_HTML` block attribute.  The attributes defined in that
meta-directive are added to the `Div` which is used to represent the
special block.

Closes: #3182
2016-10-30 20:23:53 +01:00
Albert Krewinkel
63bdc5d08f
Org reader: support the todo export option
The `todo` export option allows to toggle the inclusion of TODO keywords
in the output.  Setting this to `nil` causes TODO keywords to be dropped
from headlines.  The default is to include the keywords.
2016-10-30 13:20:25 +01:00
Albert Krewinkel
d5182778c4
Org reader: add support for todo-markers
Headlines can have optional todo-markers which can be controlled via the
`#+TODO`, `#+SEQ_TODO`, or `#+TYP_TODO` meta directive.  Multiple such
directives can be given, each adding a new set of recognized todo-markers.
If no custom todo-markers are defined, the default `TODO` and `DONE`
markers are used.

Todo-markers are conceptually separate from headline text and are hence
excluded when autogenerating headline IDs.

The markers are rendered as spans and labelled with two classes: One
class is the markers name, the other signals the todo-state of the
marker (either `todo` or `done`).
2016-10-30 10:27:47 +01:00
Daniele D'Orazio
78e4fbda51 Markdown Reader: add attributes for autolink (#3183) 2016-10-26 12:18:58 +02:00
John MacFarlane
d51e475bad Export Text.Pandoc.Error in Text.Pandoc.
[API change]
2016-10-24 22:31:36 +02:00
John MacFarlane
bf72a482eb Tighten up parsing of raw email addresses.
Technically `**@user` is a valid email address, but if we
allow things like this, we get bad results in markdown flavors
that autolink raw email addresses. (See #2940.)
So we exclude a few valid email addresses in order to
avoid these more common bad cases.

Closes #2940.
2016-10-23 23:12:36 +02:00
John MacFarlane
1da40d63b1 Merge pull request #3108 from tarleb/part
Add command line option allowing to set type of top-level divisions
2016-10-19 14:07:44 +02:00
Albert Krewinkel
595a171407
Add option for top-level division type
The `--chapters` option is replaced with `--top-level-division` which allows
users to specify the type as which top-level headers should be output. Possible
values are `section` (the default), `chapter`, or `part`.

The formats LaTeX, ConTeXt, and Docbook allow `part` as top-level division, TEI
only allows to set the `type` attribute on `div` containers.  The writers are
altered to respect this option in a sensible way.
2016-10-19 13:12:57 +02:00
Hubert Plociniczak
dd799df0ae Image with a caption needs special formatting
Latex Writer only handles captions if the image's title
is prefixed with 'fig:'.
2016-10-19 11:40:44 +02:00
John MacFarlane
4a2a7a21e5 Merge pull request #3166 from hubertp-lshift/bug/3134
Issue 3143: Don't duplicate text for anchors
2016-10-18 22:03:45 +02:00
John MacFarlane
0cd11b3e54 Merge pull request #3165 from hubertp-lshift/feature/odt-image
[odt] images parser
2016-10-18 22:00:58 +02:00
Hubert Plociniczak
c74c5fdd97 Issue 3143: Don't duplicate text for anchors
When creating an anchor element we were adding its representation
as well as the original content, leading to text duplication.
2016-10-18 10:50:37 +02:00
Albert Krewinkel
1266b210ac
Org writer: drop space before footnote markers
The writer no longer adds an extra space before footnote markers.

Fixes: #3162
2016-10-17 22:11:03 +02:00
Hubert Plociniczak
a02f276ff1 Infer caption from the text following the img
Frame can contain other frames with the text boxes.
This is something that has not been considered before
and meant that the whole construction of images was
broken in those cases. Also the captions were fixed/ignored.
2016-10-17 16:35:13 +02:00
Jesse Rosenthal
f407b66405 RST reader: Add test for space-before-note. 2016-10-17 09:55:18 -04:00
Albert Krewinkel
462c140eb6
Org reader: allow figure with empty caption
A `#+CAPTION` attribute before an image is enough to turn an image into a
figure. This wasn't the case because the `parseFromString` function, which
processes the caption value, would fail on empty values. Adding a newline
character to the caption value fixes this.

Fixes: #3161
2016-10-14 23:16:51 +02:00
Jesse Rosenthal
49b0b67b11 Remove Tests.Arbitrary
Use exported Arbitrary instances from pandoc-types instead.
2016-10-14 09:22:29 -04:00
John MacFarlane
9bd1da122a Merge pull request #3146 from hubertp-lshift/feature/odt-list-start-value
[ODT Parser] Include list's starting value
2016-10-14 15:15:50 +02:00
Hubert Plociniczak
9282fadc6b Added tests and a corner case for starting number
Review revealed that we didn't handle the case
when the starting point is an empty string. While
this is not a valid .odt file, we simply added
a special case to deal with it.

Also added tests for the new feature.
2016-10-14 13:56:24 +02:00
Albert Krewinkel
c9460e7013
Parse line-oriented markup as LineBlock
Markup-features focusing on lines as distinctive part of the markup are read
into `LineBlock` elements. This currently means line blocks in reStructuredText
and Markdown (the latter only if the `line_block` extension is enabled), the
`linegroup`/`line` combination from the Docbook 5.1 working draft, and Org-mode
`VERSE` blocks.
2016-10-13 08:46:44 +02:00
Jesse Rosenthal
4b0dbdc118 Markdown writer: add test for note placement. 2016-10-11 22:26:03 -04:00
John MacFarlane
46d8b42da5 AsciiDoc writer: avoid unnecessary use of "unconstrained" emphasis.
In AsciiDoc, you must use a special form of emphasis (double `__`)
for intraword emphasis.  Pandoc was previously using this more
than necessary.

Closes #3068.
2016-10-02 21:51:34 +02:00
John MacFarlane
e95047ed85 Markdown reader: added bracket syntax for native spans.
See #168.

Text.Pandoc.Options.Extension has a new constructor `Ext_brackted_spans`,
which is enabled by default in pandoc's Markdown.
2016-09-28 12:33:05 +02:00
John MacFarlane
03167bb447 Updated test suite. 2016-09-28 12:01:42 +02:00
John MacFarlane
c6d3eca44e Merge pull request #3093 from wilx/master-figure-placement
LaTeX: Do not set [htbp] figure placement options.
2016-09-28 11:17:48 +02:00
Jesse Rosenthal
581fc0130b LaTeX writer: change braced backtick to \textasciigrave{}
Backticks in verbatim environments are converted to
open-single-quotes. This change makes them appear as backticks. This
corresponds to how we treat `'' in verbatim environments (with
\textquotesingle{}).
2016-09-20 09:44:35 -04:00
Jesse Rosenthal
9c1467e848 Add test for backtick in verbatim. 2016-09-19 16:34:28 -04:00