Commit graph

2975 commits

Author SHA1 Message Date
John MacFarlane
a4d28cdd6d Fixed absolute URI detection in EPUB writer. Closes #1672. 2014-10-08 14:54:03 -07:00
Freiric Barral
24231623f3 fix inDirectory to reset to the original directory in case an exception occurs 2014-10-08 23:25:01 +02:00
John MacFarlane
d60707eed0 EPUB writer: Don't add sourceURL to absolute URIs!
Closes #1669.

If there are further issues, please open a new, targeted issue on the
tracker.  Some notes on the further issues you gestured at:

Data URIs are indeed dereferenced, but why is this a problem?
(The function being used to fetch from URLs is used for many different
formats.  Preserving data URIs would make sense in EPUBs, but not
for e.g. PDF output.  And by dereferencing we can get a smaller,
more efficient EPUB, with the data stored as bytes in a file rather
than encoded in textual representation.)

"absolute uris are not recognized" -- I assume that is the problem
just fixed.  If not, please open a new issue.

"relative uris are resolved (wrongly) like file paths" -- can you
give an example?

`<base>` tag is ignored.  Yes. I didn't know about the base tag.  Could
you open a new issue just for this?
2014-10-08 11:52:47 -07:00
Grégory Bataille
8a1a5948be Getting the page width from the reference file
Uses it to scale images that are too large.
When there is no reference files, default to a US letter portrait size
to scale the images
2014-10-05 14:53:06 +02:00
Jason Ronallo
3dc58090d2 add mime type for WebVTT 2014-10-04 22:40:02 -04:00
John MacFarlane
bf00556c72 Added track to list of tags treated by --self-contained.
Closes #1664.
2014-10-04 11:39:08 -07:00
Wikiwide
678aa31561 cref, sep
Adding inlineCommands
2014-10-03 11:33:02 +10:00
John MacFarlane
08ac33815b RST writer: Wrap line blocks with spaces before continuations.
Improves on fix to #1656.
2014-09-30 09:25:54 -07:00
John MacFarlane
29e1c9529f Don't wrap lines in rST line blocks.
Closes #1656.

Fixing pandoc to wrap the lines but insert spaces would be much
more complicated.  This at least makes the output semantically
correct.
2014-09-29 21:48:59 -07:00
John MacFarlane
fe6d43b3e0 Merge pull request #1601 from jkr/windowsfix
Fix path-slashes inside archive for windows
2014-09-27 16:21:17 -07:00
John MacFarlane
9c4e33f085 Merge pull request #1589 from mszep/master
Add function to sanitize ConTeXt labels
2014-09-27 16:20:56 -07:00
Matthew Pickering
5cb475c374 Org Reader: Parse multi-inline terms correctly in definition list
Closes #1649
2014-09-27 22:40:25 +01:00
Artyom
bc115ffc2d Fix 'Ext_lists_without_preceding_blankline' bug.
* Fixes #1636.
  * Adds a test.
2014-09-26 13:32:08 +04:00
mpickering
6740a9592a HTML Reader: Recognise <br> tags inside <pre> blocks
Closes #1620
2014-09-25 19:20:12 +01:00
mpickering
1f0ba8ec11 HTML Writer: Don't double render when email-obfuscation=none
Closes #1625
2014-09-25 18:46:36 +01:00
mpickering
515a120d04 Add support for KaTeX HTML math
Closes #1626
2014-09-25 18:32:42 +01:00
mpickering
575c76e36b HTML Writer: MathML now outputted with tex annotation.
Closes #1635
2014-09-25 15:28:50 +01:00
mpickering
cc07d0c6bf Shared: Make collapseFilePath OS-agnostic 2014-09-25 12:42:53 +01:00
mpickering
56e4ecab20 MediaBag: Fixes Windows specific path problems
Changes the internal representation to fix the problem.

I haven't tested this on windows.

Closes #1597
2014-09-25 12:19:52 +01:00
Mark Szepieniec
84b75a1c2a ConTeXt writer: add function toLabel
This function can be used to sanitize reference labels so that
they do not contain any of the illegal characters \#[]",{}%()|= .

Currently only Links have their labels sanitized, because they
are the only Elements that use passed labels.
2014-09-18 23:27:14 +02:00
Jesse Rosenthal
020a527c15 Docx writer: Renumber header and footer relationships to avoid collisions.
We previously took the old relationship names of the headers and footer in
secptr. That led to collisions. We now make a map of availabl names in the
relationships file, and then rename in secptr.
2014-09-11 15:11:44 -04:00
Jesse Rosenthal
1326a41780 LaTeX writer: Protect graphics in headers.
Graphics in `\section`/`\subsection` etc titles need to be `\protect`ed.

This adds a state value and manually turns it on before every invocation
of `sectionHeader` and manually turns it off after. Using a writer value
and applying `local` would probably be cleaner, but this fits with the
current style.
2014-09-09 10:56:37 -04:00
Jesse Rosenthal
132814aeb6 Docx Reader: Remove header class properly in other langs
When we encounter one of the polyglot header styles, we want to remove
that from the par styles after we convert to a header. To do that, we
have to keep track of the style name, and remove it appropriately.
2014-09-06 07:53:29 -04:00
Jesse Rosenthal
71452946d9 Docx reader: Use polyglot header list.
We're just keeping a list of header formats that different languages use
as their default styles. At the moment, we have English, German, Danish,
and French. We can continue to add to this.

This is simpler than parsing the styles file, and perhaps less
error-prone, since there seems to be some variations, even within a
language, of how a style file will define headers.
2014-09-05 21:59:58 -04:00
Jesse Rosenthal
13fefd7959 Docx Reader: Start list of polyglot section headers. 2014-09-05 17:31:24 -04:00
Jesse Rosenthal
73b887e2df Org reader: Added state changing blanklines.
This allows us to emphasize at the beginning of a new paragraph (or, in
general, after blank lines).
2014-09-04 19:55:53 -04:00
Jesse Rosenthal
ac8ed1fa93 Docx reader: Rewrite rewriteLink to work with new headers.
There could be new top-level headers after making lists, so we have to
rewrite links after that.
2014-09-04 16:44:21 -04:00
Jesse Rosenthal
7fe54505df Docx reader: Single-item headers in ordered lists are headers.
When users number their headers, Word understands that as a single item
enumerated list. We make the assumption that such a list is, in fact, a header.
2014-09-04 16:35:57 -04:00
Jesse Rosenthal
4ef850ded5 Docx reader: Fix window path for image lookup.
Don't use os-sensitive "combine", since we always want the paths in our
zip-archive to use forward-slashes.
2014-09-02 13:45:01 -04:00
John MacFarlane
db90667a79 EPUB writer: Don't include nav node in spine unless --toc was requested.
Previously we included it in the spine with `linear="no"`, leading
to odd results in some readers.

Closes #1593.
2014-09-01 16:31:32 -07:00
John MacFarlane
cb1a8da01c LaTeX writer: Avoid using reserved characters as \lstinline delimiters.
Closes #1595.
2014-09-01 10:11:09 -07:00
John MacFarlane
43ebb0229f EPUB writer: Fixed typo. 2014-09-01 08:04:21 -07:00
John MacFarlane
3533218d6d Merge pull request #1594 from jkr/itemFix
Item fix
2014-08-31 19:31:38 -07:00
John MacFarlane
01b7957812 EPUB writer: Extract title even from structured title.
Added docTitle'.
2014-08-31 14:47:07 -07:00
Jesse Rosenthal
d3053807a8 LaTeX writer: Put ~ before header in item text.
Because of the built-in line skip, LaTeX can't handle a section header
as the first element in a list item. (To be precise, it can't handle it
if the list immediately follows a section header, but the instance is
rare enough that we can afford to be a bit more general). This puts a
non-breaking space before the header to solve this problem. We won't see
this space, since the header skips a line before printing anyway.

The output is ugly in LaTeX and this structure seems like it should
probably be avoided. But it is valid HTML and native pandoc, so we
should have some sort of typesettable representation in LaTeX.
2014-08-31 16:05:09 -04:00
John MacFarlane
598d3ee23b Markdown reader: better handling of paragraph in div.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.

Test case:

    <div class="first">
    This is a paragraph.

    This is another paragraph.
    </div>

Closes #1591.
2014-08-31 12:55:47 -07:00
John MacFarlane
54df49335a EPUB writer: Don't use opf:title-type for epub2.
It is not supported and epubcheck complains.
2014-08-31 12:08:17 -07:00
John MacFarlane
611bc27862 Shared: Moved import of toChunks outside of conditional.
Closes #1590.
2014-08-31 11:17:53 -07:00
John MacFarlane
374bb3c147 DokuWiki writer: Make tables prettier by aligning columns.
Also cleaned up crufty code and added tests.
2014-08-30 21:24:33 -07:00
John MacFarlane
d97aed3903 DokuWiki writer: Handle table cell alignments.
Closes #1566.
2014-08-30 20:54:33 -07:00
John MacFarlane
a8273009ba Textile reader: Improved table support.
We can now handle all different alignment types, for simple
tables only (no captions, no relative widths, cell contents just
plain inlines).  Other tables are still handled using raw HTML.

Addresses #1585 as far as it can be addresssed, I believe.
2014-08-30 20:34:42 -07:00
John MacFarlane
b50927527c PDF: Catch errors in conversion of images and display message.
See #1582.
2014-08-30 18:45:58 -07:00
John MacFarlane
f70e3c3297 Merge branch 'mime' of https://github.com/Aelve/John into Aelve-mime
Conflicts:
	src/Text/Pandoc/Writers/Docx.hs
2014-08-30 11:49:50 -07:00
John MacFarlane
8b09d954f9 Merge pull request #1580 from jkr/stringCellDokuWiki
DokuWiki writer: Backslash newlines in table cells
2014-08-30 09:22:38 -07:00
Jesse Rosenthal
ccda2a902c DokuWiki writer: Use backslash newlines in table cells.
Write out strings in table cells with backslash linebreaks in place of
newlines. We also want to remove the first two spaces of an indent in lists.
2014-08-30 07:17:55 -04:00
John MacFarlane
218633548f Merge pull request #1574 from jlduran/latex-horizontal-rule
LaTeX writer: Make Horizontal Rules more flexible
2014-08-29 21:40:55 -07:00
John MacFarlane
017d44af1d Merge branch 'ugly-tables' of https://github.com/jlduran/pandoc into jlduran-ugly-tables 2014-08-29 21:24:36 -07:00
Jose Luis Duran
4c684561ee LaTeX writer: Add \strut to fix multiline tables
See: http://tex.stackexchange.com/questions/34971
2014-08-29 13:54:08 +00:00
Jesse Rosenthal
c931be24e1 Docx Reader: Read single para in table cell as plain
This makes to docx reader's native output fit with the way the markdown
reader understands its markdown output. Ie, as far as table cells go:

docx -> native == docx -> native -> markdown -> native

(This identity isn't true for other things outside of table cells, of
course).
2014-08-28 14:35:33 -04:00
Jose Luis Duran
1fc665c07d LaTeX writer: Make Horizontal Rules more flexible
Currently, pandoc has hard-coded the following in order to make horizontal
rules in LaTeX:

```hs
"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
```

Which is fine, but does not allow customizations.  It also does not take into
consideration the current line width.

I'm proposing this change:

```diff
@@ In Writers/LaTeX.hs:
-"\\begin{center}\\rule{3in}{0.4pt}\\end{center}"
+"\\begin{center}\\rule{0.5\\linewidth}{\\linethickness}\\end{center}"
```
2014-08-28 03:12:37 +00:00
Jose Luis Duran
f1d330b7b5 LaTeX writer: Fix tables
- [x] Fix a bug introduced in 66378062b6, which
  causes the table caption to repeat across all pages
- [x] Address the issues discussed
  [here](https://groups.google.com/forum/#!msg/pandoc-discuss/qMu6_5lYy0o/ZAU7lzAIKw0J)
  regarding the extra vertical space.
  - [ ] NOTE: This will cause multiline table cells to appear unpadded. See
    http://tex.stackexchange.com/questions/34971
  - [x] Use [`\tabularnewline`](http://tex.stackexchange.com/questions/78796)
    instead of `\\`.
2014-08-28 02:02:20 +00:00
Matthew Pickering
404a58f456 DokuWiki Writer: Refactor to use Reader monad 2014-08-27 14:29:09 +01:00
Matthew Pickering
495f55b03e DokuWiki Writer: Hlint cleanup 2014-08-27 13:48:19 +01:00
Matthew Pickering
3412018287 DokuWiki Writer: Qualified all imports 2014-08-27 13:29:09 +01:00
John MacFarlane
8f2aa45d69 Merge pull request #1564 from jkr/trackChangesWriter
Docx writer: write track changes.
2014-08-26 22:22:16 -07:00
Calvin Beck
f813755c55 Fixed exampleLine parser to accept example lines which have indentation at the start of the line. 2014-08-26 21:56:40 -06:00
Jesse Rosenthal
b613d85af9 Docx writer: Accomodate GHC 7.4 (no lookupEnv) 2014-08-26 06:43:14 -04:00
Jesse Rosenthal
21253b59e8 Docx writer: Default to user login and time of change if not given. 2014-08-25 14:03:35 -04:00
Jesse Rosenthal
e1bb28a388 Docx writer: Implement track changes.
These have default authors and dates of "unknown" and timestamp-zero,
respectively.
2014-08-25 12:37:56 -04:00
John MacFarlane
9f8051d95d Hlint changes to Docx writer. 2014-08-24 11:37:23 -07:00
John MacFarlane
0ef1f787c7 Docx writer: Bibliography entries get Bibliography style.
Closes #1559.
2014-08-23 20:52:09 -07:00
John MacFarlane
2956ef251c Fixed --self-contained with Windows paths.
Previously C:\foo.js was being wrongly interpreted as a URI.
Closes #1558.
2014-08-22 23:21:57 -07:00
mpickering
aa808055f0 Txt2Tags Reader: Fixed crash when reading from stdin 2014-08-21 17:11:21 +01:00
mpickering
3b6d7afa71 Txt2Tags Reader: Corrected formatting of %%mtime macro 2014-08-21 17:11:16 +01:00
mpickering
2a7319541d Txt2Tags Reader: Parse Meta information
The header is now parsed as meta information. The first line is the
`title`, the second is the `author` and third line is the `date`.
2014-08-21 17:09:40 +01:00
mpickering
2cd049a1bf Txt2Tags reader: Header is now parsed only if standalone flag is set 2014-08-20 18:11:37 +01:00
John MacFarlane
60f3e777f3 EPUB writer: don't use page-progression-direction in EPUB2.
Also, if page-progression-direction not specified in metadata,
don't include the attribute even in EPUB3; not including it is
the same as including it with the value "default", as we did before.

Closes #1550.
2014-08-19 09:21:26 -07:00
John MacFarlane
716ad5fd8a Merge pull request #1547 from jkr/styleparse
Docx reader: parsing styles
2014-08-18 14:14:01 -07:00
John MacFarlane
6dce8c6760 HTML reader: improved handling of tags that can be block or inline.
Previously a section like this would be enclosed in a paragraph,
with RawInline for the video tags (since video is a tag that can
be either block or inline):

    <video controls="controls">
       <source src="../videos/test.mp4" type="video/mp4" />
       <source src="../videos/test.webm" type="video/webm" />
       <p>
          The videos can not be played back on your system.<br/>
          Try viewing on Youtube (requires Internet connection):
          <a href="http://youtu.be/etE5urBps_w">Relative Velocity on
    Youtube</a>.
       </p>
    </video>

This change will cause the video and source tags to be parsed
as RawBlock instead, giving better output.

The general change is this:  when we're parsing a "plain" sequence
of inlines, we don't parse anything that COULD be a block-level tag.
2014-08-18 12:41:09 -07:00
Jesse Rosenthal
4b38e9f1f0 Docx reader: whitespace fix. 2014-08-17 20:11:50 -04:00
Jesse Rosenthal
198aea190f Docx reader: remove emph styles and strong styles list.
We no longer need the explicit lists since we're deriving them from the
ground up.
2014-08-17 17:04:55 -04:00
Jesse Rosenthal
9da7b0946e Docx reader: Add "Hyperlink" to blacklisted styles.
This is the only one so far. We'll add others as they show up.
2014-08-17 17:04:14 -04:00
Jesse Rosenthal
15ce28b8ca Docx reader: Use style resolver.
We now no longer check against explicit styles.
2014-08-17 17:03:44 -04:00
Jesse Rosenthal
03d5d8e596 Docx Reader: Introduce function for resolving dependent run styles.
We always favor an explicit positive or negative in a style in a
descendent, and only turn to the ancestor if nothing is set.

We also introduce an (empty) list of styles that are black-listed. We
won't check them. (Think underlines in hyperlinks).
2014-08-17 16:54:11 -04:00
John MacFarlane
8e60d35d58 Merge pull request #1536 from considerate/master
Add row width to tables in Docx XML
2014-08-17 12:54:38 -07:00
John MacFarlane
b6103eeb83 Merge pull request #1543 from jkr/superSubVert
Docx reader: Change behavior of Super/Subscript
2014-08-17 12:54:10 -07:00
John MacFarlane
c14088ea93 Docx writer: Fixed regression, bungled list numbering.
In pandoc 1.13, all lists come out as basic ordered lists.
This fixes that bad regression.

Closes #1544.
2014-08-17 12:48:41 -07:00
Jesse Rosenthal
99491f0d98 Docx Parse: build a bottom-up style tree.
Two points here: (1) We're going bottom-up, from styles not based on
anything, to avoid circular dependencies or any other sort of
maliciousness/incompetence. And (2) each style points to its
parent. That way, we don't need the whole tree to pass a style over to
Docx.hs
2014-08-17 15:46:17 -04:00
Artyom Kazak
357172f13a Remove an unnecessary import. 2014-08-17 23:44:02 +04:00
Artyom Kazak
6a34cd3ddf Update Reader.EPUB to use MimeType. 2014-08-17 21:00:55 +04:00
Artyom Kazak
cca9e8feb4 MIME cleanup.
* Create a type synonym for MIME type (instead of `String`).
  * Add `getMimeTypeDef` function.
  * Avoid recreating MIME type `Map`s every time.
  * Move “Formula-...” case handling into `getMimeType`.
2014-08-17 21:00:50 +04:00
Jesse Rosenthal
b8f1658c36 Alias string and runStyle to CharStyle type. 2014-08-17 11:30:22 -04:00
Jesse Rosenthal
c4871ac790 Docx Style parser: Basic one now just takes a parent style.
This will make it easier to build the style map from the bottom up (to
avoid any infinite references).
2014-08-17 10:19:48 -04:00
Jesse Rosenthal
75eec0a6b8 Docx reader: work with new rStyle.
Just discards info at the moment, so at least it works the same.
2014-08-17 09:22:25 -04:00
Jesse Rosenthal
ea85a797c2 Parser: Framework for parsing styles.
We want to be able to read user-defined styles. Eventually we'll be able
to figure out styles in terms of inheritance as well. The actual
cascading will happen in the docx reader.
2014-08-17 09:22:21 -04:00
Jesse Rosenthal
dc5b0ba09b Docx reader: Change behavior of Super/Subscript
In docx, super- and subscript are attributes of Vertalign. It makes more
sense to follow this, and have different possible values of Vertalign in
runStyle. This is mainly a preparatory step for real style parsing,
since it can distinguish between vertical align being explicitly turned
off and it not being set.

In addition, it makes parsing a bit clearer, and makes sure we don't do
docx-impossible things like being simultaneously super and sub.
2014-08-17 08:20:00 -04:00
John MacFarlane
9d52ecdd42 HTML reader: Parse appropriately styled span as SmallCaps. 2014-08-16 22:57:00 -07:00
Viktor Kronvall
753e47194c Simplify row width calculation. 2014-08-17 03:24:44 +02:00
Christoffer Ackelman
9f3c34841b Include row width in table rows.
Added a property to all table rows where the sum of column widths
is specified in pct (fraction of 5000).
2014-08-17 03:24:44 +02:00
John MacFarlane
cb4ae6112e Markdown writer: don't escape $, ^, ~ when extensions are deactivated.
`tex_math_dollars`, `superscript`, and `subscript` extensions,
respectively.

Closes #1127.
2014-08-16 17:14:51 -07:00
Jesse Rosenthal
9bb0b99981 Docx reader: Remove unnecessary plural functions
functions like runElemsToInlines and parPartsToInlines are just defined
in terms of concatting and mapping their singular
version (e.g. `runElemToInlines`). Having two functions with almost
identical names makes it easier to introduce errors. It's easy enough to
just concat and map inline, and it makes it clearer what is going on in
the code.
2014-08-16 15:07:41 -04:00
Jesse Rosenthal
9969b2ebee Docx reader: Fix bug in character styles.
Style handling has been cleaned up, but introduced a bug here. There
wasn't previously a test to catch it.
2014-08-16 14:05:19 -04:00
Jesse Rosenthal
0ff9ec2f4e Rewrite Docx.hs and Reducible to use Builder.
The big news here is a rewrite of Docx to use the builder
functions. As opposed to previous attempts, we now see a significant
speedup -- times are cut in half (or more) in a few informal tests.

Reducible has also been rewritten. It can doubtless be simplified and
clarified further. We can consider this, at the moment, a reference for
correct behavior.
2014-08-16 10:22:55 -04:00
John MacFarlane
8bf39cf6d6 Markdown reader: Better handle quote characters in inline links.
This was previously failing to be recognized as a link:

    [Test](http://en.wikipedia.org/wiki/Ward's_method)

Closes #1534.
2014-08-14 10:59:27 -07:00
John MacFarlane
e917bcc124 Make raw_tex extension non-default for textile reader, writer.
Enable `raw_tex` extension in textile writer.

Closes #1532.
2014-08-14 09:49:31 -07:00
John MacFarlane
52a9ccce4f Merge pull request #1531 from jkr/morefonts
Docx reader: Interpret "Strong" and "Emphasis" run styles.
2014-08-13 19:47:42 -07:00
John MacFarlane
17b2fd567b Fixed haddock comment. 2014-08-13 13:59:50 -07:00
John MacFarlane
05b7fd8dee Removed unneeded import. 2014-08-13 11:35:09 -07:00
Jesse Rosenthal
6897905602 Docx reader: Interpret "Strong" and Emphasis run styles. 2014-08-13 12:23:03 -04:00
John MacFarlane
22ab3367c6 Removed unneeded CPP. 2014-08-12 22:50:51 -07:00
Jesse Rosenthal
a1320a76f9 Docx: Reducible forgot about smallcaps 2014-08-13 00:09:40 -04:00
Jesse Rosenthal
dca55630e6 Docx Reader: Trim line breaks from the beginning and end of Section
Headers.

We might also want to do this elsewhere (for pars, for example).
2014-08-12 23:42:01 -04:00
Jesse Rosenthal
378a795eaa Docx: More robust handling of multiple bookmarks in header. 2014-08-12 23:41:57 -04:00
Jesse Rosenthal
85579052b5 Docx reader: Check for null-id'd anchors too.
Otherwise they get left dangling in the document.
2014-08-12 23:33:03 -04:00
Jesse Rosenthal
194ed88852 Docx reader: accept explicit "Italic" and "Bold" rStyles.
Note that "Italic" can be on, and, from the last commit, `<w:i>` can be
present, but be turned off. In that case, the turned-off tag takes
precedence. So, we have to distinguish between something being off and
something not being there. Hence, isItalic, isBold, isStrike, and
isSmallCaps have become Maybes.
2014-08-12 22:39:18 -04:00
Jesse Rosenthal
aae71ad595 Docx reader: Add "BlockQuotation" to divs list. 2014-08-12 22:08:30 -04:00
Jesse Rosenthal
d4748038d7 Docx Reader: Fix font style parsing.
Before we just checked for the existence of a tag. Now, we make sure to
check for its on/off value.
2014-08-12 22:04:07 -04:00
John MacFarlane
e883ef4eb9 Merge pull request #1527 from mpickering/juicypixels
Attempts to convert gif, tiff and bmp to png in pdf writer
2014-08-12 16:57:22 -07:00
John MacFarlane
7684b24959 Merge pull request #1528 from mpickering/epubtitlepage
EPUB Reader: Ignores titlepage attribute
2014-08-12 16:56:38 -07:00
Matthew Pickering
2b31df32de LaTeX Writer: Added missing closing braces to hyperdef commands 2014-08-13 00:37:18 +01:00
Matthew Pickering
57bebe26df PDF Writer: Attempts to convert images to pdf renderable formats
Now depends on the JuicyPixels library.

Will attempt to convert an image (gif, tiff, bmp) to png when converting
to pdf.
2014-08-13 00:37:18 +01:00
John MacFarlane
81157c7cc6 HTML writer: use 'uri' or 'email' class for autolinks.
This allows them to be styled specially.

Closes #1501.
2014-08-12 15:49:43 -07:00
John MacFarlane
da507dcb84 ConTeXt writer: improved autolink detection.
It previously failed in some cases with escaped special characters.
2014-08-12 15:49:20 -07:00
Matthew Pickering
34cf016251 EPUB Reader: Ignore title pages 2014-08-12 23:03:24 +01:00
John MacFarlane
f16dd1bfdf DocBook: Support equations with mathml.
equation, informalequation, inlineequation and mml:math elements.
2014-08-12 14:00:46 -07:00
John MacFarlane
d6f0973128 Merge pull request #1524 from jkr/dropCap3
Docx reader: move dropcap combining logic to Reducible
2014-08-12 11:13:27 -07:00
John MacFarlane
f97ec6db2c Markdown reader: Improved parsing of indented code in list items.
Indented code at the beginning of a list item must be indented eight
spaces from the margin (or from the edge of the container), or four
spaces past the list marker, whichever is farther.

Some examples in `tests/markdown-reader-more.txt`.
2014-08-12 11:10:48 -07:00
John MacFarlane
ab75e1d3bd Beamer: Use \footnote<.->{..} for notes.
This ensures that the footnotes will not appear before the
overlays in which their corresponding note markers appear.

Closes #1525.
2014-08-12 10:56:57 -07:00
Jesse Rosenthal
9d0b390d48 Docx reader: move combining logic to Reducible
Introduces a new function in Reducibles, concatR.  The idea is that if we
have two list of Reducibles (blocks or inlines), we can combine them and
just perform the reduction on the joining parts (the last element of the
first list, the first element of the second list). This is useful in cases
where the two lists are already reduced, and we're only worried about the
joining elements.

This actually improves the efficiency a bit further, because concatR can be
smart about empty lists.
2014-08-12 10:26:49 -04:00
Jesse Rosenthal
e4a8e4a636 Docx reader: Make dropcap combining more efficient.
Before, we had to run reduceList on the whole combined paragraph, which
was redundant, and could take some time for long paragraphs. We only
need to combine the drop cap with the first inline of the next
paragraph.
2014-08-12 09:00:53 -04:00
Jesse Rosenthal
45ec035e93 Docx reader: combine inlines properly in dropcaps.
Make sure that adjacent inlines are combined properly in dropcaps. This
updates the test results as well.
2014-08-11 23:31:16 -04:00
Jesse Rosenthal
3e32cd5bb1 Docx reader: Use dropcap state.
If we get to a dropcap, we keep hold the inlines until the next
paragraph, and combine it there.
2014-08-11 23:08:33 -04:00
Jesse Rosenthal
bca74a2bd0 Add dropCap to paragraph style. 2014-08-11 21:42:02 -04:00
John MacFarlane
86d4da994a EPUB reader: use walk instead of bottomUp.
This should be more efficient.
2014-08-11 14:48:42 -07:00
John MacFarlane
31811657fa Merge pull request #1521 from jkr/emptyEmph
Discard empty formatters
2014-08-11 14:44:08 -07:00
John MacFarlane
211fe266e0 LaTeX writer: Don't produce \label{} for Div or Span.
Just `\hyperdef`.
A slight amendment to #1519.
2014-08-11 12:20:44 -07:00
John MacFarlane
95d9b43b42 Merge pull request #1519 from mpickering/more
EPUB Normalisation and anchors for div blocks in tex
2014-08-11 11:28:11 -07:00
John MacFarlane
6fae136cbb Textile reader: list and HTML block parsing improvements.
Closes #1513.

Lists can now start without an intervening blank line.
Also, html block-level tags that don't start a line are parsed
as RawInline and don't interrupt paragraphs, as in RedCloth.
2014-08-11 11:22:39 -07:00
John MacFarlane
4a535211d8 Merge pull request #1365 from gbataille/docx-margin
Scale images to fit the page for DOCX
2014-08-11 10:44:52 -07:00
Jesse Rosenthal
0411fe7ccf Docx reader: handle empty reducibles. 2014-08-11 12:48:16 -04:00
Matthew Pickering
1952dd0592 TeX Writer: Write hyperdef and label for identifiers on Div blocks 2014-08-11 16:23:05 +01:00
Matthew Pickering
72b1470713 EPUB Reader: Fixed another normalisation problem.. 2014-08-11 16:23:05 +01:00
John MacFarlane
e690fe4a3e Merge pull request #1516 from mpickering/epubmetadata
EPUB improvements
2014-08-11 08:14:54 -07:00
Matthew Pickering
973ed469de Docx Parse: Improved font recognition when specified in rFonts element 2014-08-11 10:30:32 -04:00
Matthew Pickering
427466f80c Docx Fonts: Derives Show and Eq 2014-08-11 10:30:32 -04:00
Matthew Pickering
e02360d3d8 EPUB Reader: Can now parse multiple meta data fields 2014-08-11 13:12:42 +01:00
Matthew Pickering
285d56dea7 EPUB Writer: Added page-progression-direction meta field 2014-08-11 11:21:38 +01:00
Matthew Pickering
9eded27e32 EPUB reader: Fixed bug where filepaths weren't sufficiently normalised 2014-08-11 11:20:33 +01:00
Matthew Pickering
1f02ff60ba EPUB Writer: Added explicit imports 2014-08-11 10:21:52 +01:00
John MacFarlane
65b31e0cac Merge pull request #1510 from jkr/spacefix
Docx reader: Fix spacing issue.
2014-08-10 07:11:59 -07:00
John MacFarlane
7ec8dd956f Removed OMath module, depend on texmath >= 0.8. 2014-08-10 06:19:41 -07:00
Jesse Rosenthal
c15978ce5e Change head/tail to pattern guards. 2014-08-10 09:10:34 -04:00
Jesse Rosenthal
a02ce74acf Docx reader: Fix spacing issue.
Previously spaces at the beginning of Emph/Strong/etc were kept
inside. This makes sure they are moved out.
2014-08-09 23:35:09 -04:00
Matthew Pickering
3bb19307f6 Docx Parse: Recognises code points in sym elements which are in the private range 2014-08-09 22:37:12 -04:00
Matthew Pickering
edc57f77fc Added Text.Pandoc.Readers.Docx.Fonts 2014-08-09 22:37:12 -04:00
Matthew Pickering
2deaa7096f Docx Reader: Added recognition of sym element in paragraphs 2014-08-09 22:37:12 -04:00
Matthew Pickering
4ae61bdf8f EPUB: Fixed another mediabag related regression.. 2014-08-10 00:12:09 +01:00
Matthew Pickering
a6648e5a73 EPUB Reader: Changed image paths to be relative to manifest file 2014-08-09 23:06:16 +01:00
John MacFarlane
4983083079 HTML writer: Don't include empty TOC items for slide shows.
Previously creating a slide with a horizontal rule would result
in an empty list item in the TOC.  This patch fixes that.
2014-08-09 10:29:39 -07:00
John MacFarlane
bc06ef0edb Merge branch 'newbranch' of https://github.com/mpickering/pandoc into mpickering-newbranch
Conflicts:
	src/Text/Pandoc/Readers/EPUB.hs
2014-08-08 22:22:55 -07:00
John MacFarlane
19daf6cf0a Added native_divs and native_spans extensions.
This allows users to turn off the default pandoc behavior of
parsing contents of div and span tags in markdown and HTML
as native pandoc Div blocks and Span inlines.

Setting of default epub extensions has been moved from the EPUB
reader to Text.Pandoc.
2014-08-08 21:05:34 -07:00
John MacFarlane
a4a6b6f28c Plain writer: Use ALL CAPS for level 1 headers. 2014-08-08 15:20:29 -07:00
Matthew Pickering
cfd8c0214c EPUB Reader: Improved robustness of image extraction
We now maintain the invariant that when fetchImages is called,
all images have absolute paths.

This patch fixes several bugs relating to this as there are three places
where images can be introduced.
  (1) During the HTML parse
  (2) As spine elements
  (3) As a cover image

For (1), the paths are corrected by the transformation renameImages
For (2) and (3), we need to append the "root" to the path we parse from the
spine
2014-08-08 23:04:03 +01:00
Matthew Pickering
40ae8efddc EPUB Reader: Fixed regressions in image extraction
Before the images were relative to the position of the package file. The
collapse function changed this so that they were then absolute in the
archive but the fetchImages function wasn't updated to recognise this.
2014-08-08 22:31:27 +01:00
Matthew Pickering
8c551f6f43 EPUB Reader: Use collapseFilePath 2014-08-08 22:31:22 +01:00
Matthew Pickering
2d956677ef Shared: Added collapseFilePath function
This function removes intermediate "." and ".." from a path.
2014-08-08 22:31:02 +01:00
Matthew Pickering
116f03a70a EPUB Reader: Removed incorrectly set reader flag 2014-08-08 22:31:02 +01:00
John MacFarlane
aae90a8671 Merge pull request #1503 from jkr/streamlineMath
OMath parser: Change signature of exported function.
2014-08-08 13:45:30 -07:00
John MacFarlane
f723a0575d Markdown writer: Respect -raw_html.
pandoc -t markdown-raw_html should not emit any raw HTML, even
span and div tags that go with pandoc Span and Div elements.

Cleaned up a bit of the logic with extensions and plain.
2014-08-08 13:34:57 -07:00
Jesse Rosenthal
a426812ccc OMath parser: Change signature of exported function.
This changes the signature of the exported `readOMML` to `String ->
Either String [Exp]`, so it can now, in theory, be slotted into
TeXMath. It doesn't have any real error reporting yet, but that might
make more sense once I put it in a branch, and understand how it works
in the other readers.

It also now reads strings that parse to either oMath or oMathPara
elements. Note that the distinction is lost in the output. It's up to
the caller to remember the display type.
2014-08-08 16:34:38 -04:00
John MacFarlane
7b47042ae6 Textile reader: fixed list parsing bug. Closes #1500. 2014-08-08 12:18:47 -07:00
John MacFarlane
dd78dd6d1b Textile reader: don't allow inline formatting to extend over newline.
This matches behavior of RedCarpet, avoids some ugly bugs, and improves
performance.
2014-08-08 12:18:47 -07:00
Jesse Rosenthal
2f7a627f6d OMath: Finish initial cleanup.
This gets rid of commented-out functions, cleans up whitespace errors,
and exports and imports the correct functions.
2014-08-08 14:16:54 -04:00
Jesse Rosenthal
ba5804f9ec OMath: Remove Namespaces
We still need to test against prefixes, but this is only going to look
at oMath fragments, so we're not going to be worried about looking up
the real namespace.
2014-08-08 14:15:17 -04:00
Jesse Rosenthal
0acd139fb1 OMath: Start phasing out internal OMath type.
This is the first step in removing the intermediate OMath type, which we
no longer need since we're writing straight to TeXMath Exp.
2014-08-08 14:14:30 -04:00
Jesse Rosenthal
cf849443cb OMath parser: don't group expressions if there's only one. 2014-08-08 14:12:05 -04:00
Matthew Pickering
40602c3df6 HTML EPUB exts: switch element can now be in either the inline or block position 2014-08-08 10:25:40 -07:00
John MacFarlane
94466c0060 HTML reader: Really ignore DOCTYPE and xml declarations.
This actually does what d71b013841
said it did.

Revised epub tests to remove the repeated DOCTYPE and xml tags.
2014-08-07 22:12:44 -07:00
John MacFarlane
3c4079edc8 Merge pull request #1488 from mpickering/epubfixes
EPUB Reader: Improved image extraction
2014-08-07 19:00:32 -07:00
John MacFarlane
08bed142ba Merge pull request #1496 from mpickering/master
Org Writer: Write anchor elements
2014-08-07 16:29:20 -07:00
Matthew Pickering
07bb41d6da Org Writer: Write anchor elements
The Org Writer now writes empty span elements which have an id as an anchor.

For example `Span ("uid", [], []) []` becomes `<<uid>>`
2014-08-08 00:20:18 +01:00
Matthew Pickering
19d2ff68b1 EPUB Reader: Improved how images are extracted 2014-08-07 22:56:30 +01:00
John MacFarlane
17e48ba81e Merge pull request #1494 from jkr/math-module
Math module
2014-08-07 13:44:19 -07:00
Jesse Rosenthal
7bd7d4d476 Docx reader: Handle inline drawings.
Previous drawings that were under some other toplevel run (i.e., a
hyperlink) wouldn't be properly handled. This should fix that.
2014-08-07 15:01:05 -04:00
Jesse Rosenthal
d293dd528b OMath module: Add new file. 2014-08-07 12:41:33 -04:00
Jesse Rosenthal
a7967d1aef Docx reader: Split math out into math module.
Could use some cleanup, but this is the first step for getting
an OMML reader into TeXMath.
2014-08-07 12:20:22 -04:00
Matthew Pickering
13f26af84f Docx Reader: Added Default instances and removed withDState
Signed-off-by: Jesse Rosenthal <jrosenthal@jhu.edu>
2014-08-06 19:15:33 -04:00
Jesse Rosenthal
91ab2f155f Get rid of unused docx variable.
Since changing the Docx type, this is no longer necessary. Thanks to
Matthew Pickering for picking up on this.
2014-08-06 12:19:24 -04:00
John MacFarlane
444b1c2ad8 Merge pull request #1491 from jkr/texmath-equations
Docx Reader: Use TeXMath for writing equations.
2014-08-06 09:07:00 -07:00
Jesse Rosenthal
cd9ca5a18a Docx reader: remove now-unnecessary state variable.
This also introduces a `defaultDState` value.
2014-08-06 11:20:41 -04:00
Jesse Rosenthal
cdd769624f Remove now-unnecessary TexChar
TeXMath does the work now.
2014-08-06 11:20:41 -04:00
Jesse Rosenthal
06488c95fa Add a note on how mapD works. 2014-08-06 11:20:41 -04:00
Jesse Rosenthal
3bc2ea4cf7 Docx reader: Use TeXMath to write math
The new version of TeXMath can translate from its type system into
LaTeX. So instead of writing the LaTeX ourself, we write to the TeXMath
`Exp` type, and let TeXMath do the rest.
2014-08-06 11:20:27 -04:00
Uli Köhler
9d07db933c MediaWiki reader doesn't recognize german "Bild" 2014-08-06 00:47:23 +02:00
Matthew Pickering
b04bb3b6d2 MediaBag: Improved normalisation when writing files 2014-08-05 11:02:23 +01:00
John MacFarlane
2de2842bdd Merge pull request #1486 from Aelve/minor
Very minor cleanup and readability changes
2014-08-04 22:07:02 -07:00
John MacFarlane
39b59b7603 Merge pull request #1476 from jkr/endnote-fix
Docx Parser: Produce endnotes.
2014-08-04 21:59:58 -07:00
John MacFarlane
d71b013841 HTML reader: ignore <?xml..> and <DOCTYPE..> tags.
Previously they were parsed as raw.
2014-08-04 18:39:39 -07:00
John MacFarlane
40d8100d44 Use texmath 0.7 interface. 2014-08-04 11:13:09 -07:00
Artyom Kazak
141fdf944a Add PatternGuards pragmas. 2014-08-04 19:58:25 +04:00
Artyom Kazak
eb88444452 Remove redundant isHexDigit function. 2014-08-04 19:58:25 +04:00
Artyom Kazak
e51a2cedf9 Remove dangling where from one function. 2014-08-04 19:58:25 +04:00
Artyom Kazak
82118b3328 Use stripPrefix where appropriate. 2014-08-04 19:57:42 +04:00
Artyom Kazak
feebab9740 Clean up mediaTypeOf a bit. 2014-08-04 19:41:37 +04:00
Artyom Kazak
f659644fcc Use mapM_ instead of () <$ mapM in one place. 2014-08-04 19:41:37 +04:00
John MacFarlane
4630cff2a6 Merge branch 'epubend' of https://github.com/mpickering/pandoc into mpickering-epubend
Conflicts:
	pandoc.cabal
2014-08-04 07:36:18 -07:00
Artyom Kazak
ec88d47f23 Correctly implement capitalisation.
Using `map toUpper` to capitalise text is wrong, as e.g.
“Straße” should be converted to “STRASSE”, which is 1 character
longer. This commit adds a `capitalize` function and replaces
2 identical implementations in different modules (`toCaps` and
`capitalize`) with it.
2014-08-03 17:37:37 +04:00
John MacFarlane
842c705097 SelfContained: Fixed determining of source URL from within CSS files.
(This fixes a bug introduced a couple commits back.)
2014-08-02 16:33:22 -07:00
John MacFarlane
85ff3c5771 fetchItem: improved mime type guessing.
Strip a fragment like `?#iefix` from the extension before doing
the mime lookup.
2014-08-02 16:32:11 -07:00
John MacFarlane
1d137fbed6 Shared: fetchItem improvements.
* More consistent logic:  absolute URIs are fetched from the net;
  other things are treated as relative URIs if sourceURL is a Just,
  otherwise as file paths.
* We escape characters that are not allowed in URIs before trying
  to parse them (e.g. '|', which often occurs in the wild).
* When treating relative paths as local file paths, we drop
  any fragment or query.  This is useful e.g. when you've downloaded
  web fonts locally, but your source still contains the original
  relative URLs.

Together with the previous commit, this should close #1477.
2014-08-02 16:12:05 -07:00