Commit graph

4752 commits

Author SHA1 Message Date
Albert Krewinkel
f955af58e6
Odt reader: remove dead code
The ODT reader contained a lot of general code useful for working with
arrows. However, many of these utils weren't used and are hence removed.
2017-05-31 19:59:34 +02:00
John MacFarlane
774075c3e2 Added eastAsianLineBreakFilter to Shared.
This used to live in the Markdown reader.
2017-05-30 10:22:48 +02:00
John MacFarlane
5ec384eb60 LaTeX reader: handle escaped & inside table cell.
Closes #3708.
2017-05-29 22:47:04 +02:00
John MacFarlane
230a1b89e8 LaTeX reader: don't crash on empty enumerate environment.
Closes #3707.
2017-05-29 15:09:24 +02:00
John MacFarlane
d461b29d9d Merge pull request #3704 from labdsf/anylinenewline
Markdown reader: use anyLineNewline
2017-05-29 09:25:53 +02:00
Alexander Krotov
efc069de5d Markdown reader: use anyLineNewline 2017-05-28 22:52:35 +03:00
Herwig Stuetz
bfd5c6b172 Org reader: Fix cite parsing behaviour
Until now, org-ref cite keys included special characters also at the
end. This caused problems when citations occur right before colons or
at the end of a sentence.

With this change, all non alphanumeric characters at the end of a cite
key are ignored.

This also adds `,` to the list of special characters that are legal
in cite keys to better mirror the behaviour of org-export.
2017-05-28 18:08:11 +02:00
Herwig Stuetz
5a71632d11 Parsing: many1Till: Check for the end condition before parsing
By not checking for the end condition before the first parse, the
parser was applied too often, consuming too much of the input.

This fixes the behaviour of

  `testStringWith (many1Till (oneOf "ab") (string "aa")) "aaa"`

which before incorrectly returned `Right "a"`. With this change, it
instead correctly fails with `Left (PandocParsecError ...)` because it
is not able to parse at least one occurence of `oneOf "ab"` that is
not `"aa"`.

Note that this only affects `many1Till p end` where `p` matches on a
prefix of `end`.
2017-05-28 18:08:11 +02:00
Alexander Krotov
c38d5966ed RST reader: use anyLineNewline in rawListItem (#3702) 2017-05-28 09:29:37 +02:00
John MacFarlane
8614902234 Markdown writer: changes to --reference-links.
With `--reference-location` of `section` or `block`, pandoc
will now repeat references that have been used in earlier
sections.

The Markdown reader has also been modified, so that *exactly*
repeated references do not generate a warning, only
references with the same label but different targets.

The idea is that, with references after every block,
one  might want to repeat references sometimes.

Closes #3701.
2017-05-27 23:18:45 +02:00
John MacFarlane
4dabcc27f6 Pretty: Eq instance for Doc. 2017-05-27 23:18:45 +02:00
Albert Krewinkel
bf93c07267
Org reader: subject full doc tree to headline transformations
Emacs parses org documents into a tree structure, which is then
post-processed during exporting. The reader is changed to do the same,
turning the document into a single tree of headlines starting at
level 0.

Fixes: #3695
2017-05-27 15:38:08 +02:00
John MacFarlane
8ec03cfc87 HTML writer: Removed unused parameter in dimensionsToAttributeList. 2017-05-26 10:21:55 +02:00
John MacFarlane
cb7b0a6985 Allow em for image height/width in HTML, LaTeX.
- Export `inEm` from ImageSize [API change].
- Change `showFl` and `show` instance for `Dimension` so
  extra decimal places are omitted.
- Added `Em` as a constructor of `Dimension` [API change].
- Allow `em`, `cm`, `in` to pass through without conversion
  in HTML, LaTeX.

Closes #3450.
2017-05-25 22:48:27 +02:00
John MacFarlane
708973a33a Added spaced_reference_links extension.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link:  e.g.

    [a] [b]

    [b]: url

This is now forbidden by default.

Closes #2602.
2017-05-25 12:57:31 +02:00
John MacFarlane
650e1ac1fd Docx writer: Use Table rather than "Table Normal" for table style.
"Table Normal" is the default table style and can't be modified.

Closes #3275, further testing welcome.
2017-05-25 12:11:46 +02:00
John MacFarlane
8f2c803f97 Markdown reader: warn for notes defined but not used.
Closes #1718.

Parsing.ParserState: Make stateNotes' a Map, add stateNoteRefs.
2017-05-25 11:34:51 +02:00
John MacFarlane
41db9e826e MediaWiki reader: don't do curly quotes inside <tt> contexts.
Even if `+smart`.

See #3585.
2017-05-25 09:35:25 +02:00
John MacFarlane
e6f4636a2c MediaWiki reader: Make smart double quotes depend on smart extension.
Closes #3585.
2017-05-25 09:19:34 +02:00
John MacFarlane
b9a30ef959 Markdown reader: fixed smart quotes after emphasis.
E.g. in

    *foo*'s 'foo'

Closes #2228.
2017-05-24 23:23:08 +02:00
John MacFarlane
8f718b0883 LaTeX reader: Fixed failures on \ref{}, \label{} with +raw_tex.
Now these commands are parsed as raw if `+raw_tex`;
otherwise, their argument is parsed as a bracketed string.
2017-05-24 23:04:49 +02:00
John MacFarlane
bc6aac7b47 Parsing: Provide parseFromString'.
This is a verison of parseFromString specialied to
ParserState, which resets stateLastStrPos at the end.
This is almost always what we want.

This fixes a bug where `_hi_` wasn't treated as emphasis in
the following, because pandoc got confused about the
position of the last word:

    - [o] _hi_

Closes #3690.
2017-05-24 22:41:47 +02:00
John MacFarlane
1288a50380 LaTeX reader: parse tikzpicture as raw verbatim environment...
if `raw_tex` extension is selected.
Otherwise skip with a warning.

This is better than trying to parse it as text!

Closes #3692.
2017-05-24 21:46:53 +02:00
John MacFarlane
19d3a2bbe5 Logging: Made SkippedContent WARNING not INFO. 2017-05-24 21:46:43 +02:00
John MacFarlane
7174776c19 HTML reader: Add details tag to list of block tags.
Closes #3694.
2017-05-24 12:11:12 +02:00
Marc Schreiber
29a4bdc681 Add suggestions of @jgm: parse bracketed stuff as inlines 2017-05-23 17:31:42 -03:00
John MacFarlane
5844af67b4 RST reader: reformatting (code line length). 2017-05-23 21:00:51 +02:00
keiichiro shikano
c0c54b7906 RST Reader: parse list table directive (#3688)
Closes #3432.
2017-05-23 20:53:04 +02:00
John MacFarlane
8edeaa9349 Fixed handling of soft hyphen (0173) in docx writer.
Closes #3691.
2017-05-23 16:58:24 +02:00
John MacFarlane
66fa38ed1c Shared.isURI: allow uppercase versions of known schemes. 2017-05-23 09:49:56 +02:00
Albert Krewinkel
5debb0da0f Shared: Provide custom isURI that rejects unknown schemes [isURI]
We also export the set of known `schemes`.

The new function replaces the function of the same name
from `Network.URI`, as the latter did not check whether a scheme is
well-known.  E.g. MediaWiki wikis frequently feature pages with names
like `User:John`. These links were interpreted as URIs, thus turning
internal links into global links. This is prevented by also checking
whether the scheme of a URI is frequently used (i.e. is IANA registered
or an otherwise well-known scheme).

Fixes: #2713

Update set of well-known URIs from IANA list
All official IANA schemes (as of 2017-05-22) are included in the set of
known schemes.  The four non-official schemes doi, isbn, javascript, and
pmid are kept.
2017-05-23 09:48:11 +02:00
John MacFarlane
4d1e9b8e41 Let --eol take native as an argument.
Add `Native` to the `LineEnding` type.
Make `optEol` a `Native` rather than `Maybe Native`.
2017-05-22 10:15:03 +02:00
Alexander Krotov
30a3deadcc Move indentWith to Text.Pandoc.Parsing (#3687) 2017-05-22 10:10:15 +02:00
John MacFarlane
aa1e39858d Text.Pandoc.App: ToJSON and FromJSON instances for Opts.
This can be used e.g. to pass options via web interface,
such as trypandoc.
2017-05-21 11:42:50 +02:00
John MacFarlane
8c1b81bbef Finished implemtation of --resource-path.
* Default is just working directory.
* Working directory must be explicitly specifide if
  `--resource-path` option is used.
2017-05-21 09:02:01 +02:00
John MacFarlane
6a7f980247 PDF: Got --resource-path working with pdf output.
See #852.
2017-05-20 23:46:51 +02:00
John MacFarlane
d109c8be8f PDF: better error message for non-converted svg images. 2017-05-20 23:24:20 +02:00
Alexander Krotov
753d5811e2 RST reader: make use of anyLineNewline (#3686) 2017-05-20 23:14:08 +02:00
Marc Schreiber
03cb05f4c6 Improve SVG image size code.
The old code made some unwise assumptions about
how the svg file would look.

See #3580.
2017-05-20 23:09:08 +02:00
John MacFarlane
5c44fd554f PDF: Refactoring, makePDF is now in PandocIO [API change]. 2017-05-20 22:42:50 +02:00
John MacFarlane
fd6e65b00f Added --resource-path=SEARCHPATH command line option.
SEARCHPATH is separated by the usual character,
depending on OS (: on unix, ; on windows).

Note: This does not yet work for PDF output, because the
routine that creates PDFs runs outside PandocMonad.
(This has to do with its use of inTemporaryDirectory and
its interaction with our exceptions.)

The best solution would be to figure out how to move the
PDF creation routines into PandocMonad.  Second-best,
just pass an extra parameter in?

See #852.
2017-05-20 21:47:10 +02:00
John MacFarlane
93eaf33e6e SelfContained: handle @import with quoted string. 2017-05-20 17:32:46 +02:00
John MacFarlane
8d4fbe6a2a SelfContained: fixed problem with embedded fonts.
Closes #3629.

However, there is still room for improvement.

`@import` with following media declaration is not
handled.

Also `@import` with a simple filename (rather than
`url(...)` is not handled.
2017-05-20 17:09:47 +02:00
John MacFarlane
ca77f0a95e RST writer: add empty comments when needed...
to avoid including a blocquote in the indented content
of a preceding block.

Closes #3675.
2017-05-19 21:05:15 +02:00
Albert Krewinkel
7a09b7b21d
Org reader: fix smart parsing behavior
Parsing of smart quotes and special characters can either be enabled via
the `smart` language extension or the `'` and `-` export options. Smart
parsing is active if either the extension or export option is enabled.
Only smart parsing of special characters (like ellipses and en and em
dashes) is enabled by default, while smart quotes are disabled.

This means that all smart parsing features will be enabled by adding the
`smart` language extension. Fine-grained control is possible by leaving
the language extension disabled. In that case, smart parsing is
controlled via the aforementioned export OPTIONS only.

Previously, all smart parsing was disabled unless the language extension
was enabled.
2017-05-18 23:25:11 +02:00
John MacFarlane
f870a2d8ea Don't render LaTeX images with data: URIs.
LaTeX can't handle these.

Note that --extract-media can be used when the input contains
data: URIs.  Closes #3636.
2017-05-18 22:50:07 +02:00
Ian
b9185b0216 Docx writer: Change FigureWithCaption to CaptionedFigure (#3658)
Edit styles.xml as part of the fix for #3656
2017-05-18 22:34:13 +02:00
John MacFarlane
0f6458c0c1 Don't double extract images from docx.
This fixes a regression that was introduced when `--extract-media`
was generalized to work with any input format.  We were getting
two versions of each image extracted from a docx, one with a hash,
one with the original filename, though only the hash one was used.
This patch restores the original behavior (using the original
filename).

Pointed out in comments on #3674. Thanks to @laperouse.
2017-05-18 13:38:19 +02:00
John MacFarlane
818d5c2f35 Markdown: allow attributes in reference links to start on next line.
This addresses a subsidiary issue in #3674.
2017-05-18 13:20:32 +02:00
Stefan Dresselhaus
6b8240fc2f Add --eol flag and writer option to control line endings.
* Add `--eol=crlf|lf` CLI option.
* Add `optEol` to `WriterOptions` [API change]
* In `Text.Pandoc.UTF8`, add new functions parameterized on `Newline`:
  `writeFileWith`, `putStrWith`, `putStrLnWith`, `hPutStrWith`,
  `hPutStrLnWith`. [API change]
* Document option in MANUAL.txt.

Closes #3663.
Closes #2097.
2017-05-18 11:55:45 +02:00