Commit graph

1326 commits

Author SHA1 Message Date
Jesse Rosenthal
aae71ad595 Docx reader: Add "BlockQuotation" to divs list. 2014-08-12 22:08:30 -04:00
Jesse Rosenthal
d4748038d7 Docx Reader: Fix font style parsing.
Before we just checked for the existence of a tag. Now, we make sure to
check for its on/off value.
2014-08-12 22:04:07 -04:00
John MacFarlane
7684b24959 Merge pull request from mpickering/epubtitlepage
EPUB Reader: Ignores titlepage attribute
2014-08-12 16:56:38 -07:00
Matthew Pickering
34cf016251 EPUB Reader: Ignore title pages 2014-08-12 23:03:24 +01:00
John MacFarlane
f16dd1bfdf DocBook: Support equations with mathml.
equation, informalequation, inlineequation and mml:math elements.
2014-08-12 14:00:46 -07:00
John MacFarlane
d6f0973128 Merge pull request from jkr/dropCap3
Docx reader: move dropcap combining logic to Reducible
2014-08-12 11:13:27 -07:00
John MacFarlane
f97ec6db2c Markdown reader: Improved parsing of indented code in list items.
Indented code at the beginning of a list item must be indented eight
spaces from the margin (or from the edge of the container), or four
spaces past the list marker, whichever is farther.

Some examples in `tests/markdown-reader-more.txt`.
2014-08-12 11:10:48 -07:00
Jesse Rosenthal
9d0b390d48 Docx reader: move combining logic to Reducible
Introduces a new function in Reducibles, concatR.  The idea is that if we
have two list of Reducibles (blocks or inlines), we can combine them and
just perform the reduction on the joining parts (the last element of the
first list, the first element of the second list). This is useful in cases
where the two lists are already reduced, and we're only worried about the
joining elements.

This actually improves the efficiency a bit further, because concatR can be
smart about empty lists.
2014-08-12 10:26:49 -04:00
Jesse Rosenthal
e4a8e4a636 Docx reader: Make dropcap combining more efficient.
Before, we had to run reduceList on the whole combined paragraph, which
was redundant, and could take some time for long paragraphs. We only
need to combine the drop cap with the first inline of the next
paragraph.
2014-08-12 09:00:53 -04:00
Jesse Rosenthal
45ec035e93 Docx reader: combine inlines properly in dropcaps.
Make sure that adjacent inlines are combined properly in dropcaps. This
updates the test results as well.
2014-08-11 23:31:16 -04:00
Jesse Rosenthal
3e32cd5bb1 Docx reader: Use dropcap state.
If we get to a dropcap, we keep hold the inlines until the next
paragraph, and combine it there.
2014-08-11 23:08:33 -04:00
Jesse Rosenthal
bca74a2bd0 Add dropCap to paragraph style. 2014-08-11 21:42:02 -04:00
John MacFarlane
86d4da994a EPUB reader: use walk instead of bottomUp.
This should be more efficient.
2014-08-11 14:48:42 -07:00
John MacFarlane
31811657fa Merge pull request from jkr/emptyEmph
Discard empty formatters
2014-08-11 14:44:08 -07:00
John MacFarlane
95d9b43b42 Merge pull request from mpickering/more
EPUB Normalisation and anchors for div blocks in tex
2014-08-11 11:28:11 -07:00
John MacFarlane
6fae136cbb Textile reader: list and HTML block parsing improvements.
Closes .

Lists can now start without an intervening blank line.
Also, html block-level tags that don't start a line are parsed
as RawInline and don't interrupt paragraphs, as in RedCloth.
2014-08-11 11:22:39 -07:00
Jesse Rosenthal
0411fe7ccf Docx reader: handle empty reducibles. 2014-08-11 12:48:16 -04:00
Matthew Pickering
72b1470713 EPUB Reader: Fixed another normalisation problem.. 2014-08-11 16:23:05 +01:00
John MacFarlane
e690fe4a3e Merge pull request from mpickering/epubmetadata
EPUB improvements
2014-08-11 08:14:54 -07:00
Matthew Pickering
973ed469de Docx Parse: Improved font recognition when specified in rFonts element 2014-08-11 10:30:32 -04:00
Matthew Pickering
427466f80c Docx Fonts: Derives Show and Eq 2014-08-11 10:30:32 -04:00
Matthew Pickering
e02360d3d8 EPUB Reader: Can now parse multiple meta data fields 2014-08-11 13:12:42 +01:00
Matthew Pickering
9eded27e32 EPUB reader: Fixed bug where filepaths weren't sufficiently normalised 2014-08-11 11:20:33 +01:00
John MacFarlane
65b31e0cac Merge pull request from jkr/spacefix
Docx reader: Fix spacing issue.
2014-08-10 07:11:59 -07:00
John MacFarlane
7ec8dd956f Removed OMath module, depend on texmath >= 0.8. 2014-08-10 06:19:41 -07:00
Jesse Rosenthal
c15978ce5e Change head/tail to pattern guards. 2014-08-10 09:10:34 -04:00
Jesse Rosenthal
a02ce74acf Docx reader: Fix spacing issue.
Previously spaces at the beginning of Emph/Strong/etc were kept
inside. This makes sure they are moved out.
2014-08-09 23:35:09 -04:00
Matthew Pickering
3bb19307f6 Docx Parse: Recognises code points in sym elements which are in the private range 2014-08-09 22:37:12 -04:00
Matthew Pickering
edc57f77fc Added Text.Pandoc.Readers.Docx.Fonts 2014-08-09 22:37:12 -04:00
Matthew Pickering
2deaa7096f Docx Reader: Added recognition of sym element in paragraphs 2014-08-09 22:37:12 -04:00
Matthew Pickering
4ae61bdf8f EPUB: Fixed another mediabag related regression.. 2014-08-10 00:12:09 +01:00
Matthew Pickering
a6648e5a73 EPUB Reader: Changed image paths to be relative to manifest file 2014-08-09 23:06:16 +01:00
John MacFarlane
bc06ef0edb Merge branch 'newbranch' of https://github.com/mpickering/pandoc into mpickering-newbranch
Conflicts:
	src/Text/Pandoc/Readers/EPUB.hs
2014-08-08 22:22:55 -07:00
John MacFarlane
19daf6cf0a Added native_divs and native_spans extensions.
This allows users to turn off the default pandoc behavior of
parsing contents of div and span tags in markdown and HTML
as native pandoc Div blocks and Span inlines.

Setting of default epub extensions has been moved from the EPUB
reader to Text.Pandoc.
2014-08-08 21:05:34 -07:00
Matthew Pickering
cfd8c0214c EPUB Reader: Improved robustness of image extraction
We now maintain the invariant that when fetchImages is called,
all images have absolute paths.

This patch fixes several bugs relating to this as there are three places
where images can be introduced.
  (1) During the HTML parse
  (2) As spine elements
  (3) As a cover image

For (1), the paths are corrected by the transformation renameImages
For (2) and (3), we need to append the "root" to the path we parse from the
spine
2014-08-08 23:04:03 +01:00
Matthew Pickering
40ae8efddc EPUB Reader: Fixed regressions in image extraction
Before the images were relative to the position of the package file. The
collapse function changed this so that they were then absolute in the
archive but the fetchImages function wasn't updated to recognise this.
2014-08-08 22:31:27 +01:00
Matthew Pickering
8c551f6f43 EPUB Reader: Use collapseFilePath 2014-08-08 22:31:22 +01:00
Matthew Pickering
116f03a70a EPUB Reader: Removed incorrectly set reader flag 2014-08-08 22:31:02 +01:00
John MacFarlane
aae90a8671 Merge pull request from jkr/streamlineMath
OMath parser: Change signature of exported function.
2014-08-08 13:45:30 -07:00
Jesse Rosenthal
a426812ccc OMath parser: Change signature of exported function.
This changes the signature of the exported `readOMML` to `String ->
Either String [Exp]`, so it can now, in theory, be slotted into
TeXMath. It doesn't have any real error reporting yet, but that might
make more sense once I put it in a branch, and understand how it works
in the other readers.

It also now reads strings that parse to either oMath or oMathPara
elements. Note that the distinction is lost in the output. It's up to
the caller to remember the display type.
2014-08-08 16:34:38 -04:00
John MacFarlane
7b47042ae6 Textile reader: fixed list parsing bug. Closes . 2014-08-08 12:18:47 -07:00
John MacFarlane
dd78dd6d1b Textile reader: don't allow inline formatting to extend over newline.
This matches behavior of RedCarpet, avoids some ugly bugs, and improves
performance.
2014-08-08 12:18:47 -07:00
Jesse Rosenthal
2f7a627f6d OMath: Finish initial cleanup.
This gets rid of commented-out functions, cleans up whitespace errors,
and exports and imports the correct functions.
2014-08-08 14:16:54 -04:00
Jesse Rosenthal
ba5804f9ec OMath: Remove Namespaces
We still need to test against prefixes, but this is only going to look
at oMath fragments, so we're not going to be worried about looking up
the real namespace.
2014-08-08 14:15:17 -04:00
Jesse Rosenthal
0acd139fb1 OMath: Start phasing out internal OMath type.
This is the first step in removing the intermediate OMath type, which we
no longer need since we're writing straight to TeXMath Exp.
2014-08-08 14:14:30 -04:00
Jesse Rosenthal
cf849443cb OMath parser: don't group expressions if there's only one. 2014-08-08 14:12:05 -04:00
Matthew Pickering
40602c3df6 HTML EPUB exts: switch element can now be in either the inline or block position 2014-08-08 10:25:40 -07:00
John MacFarlane
94466c0060 HTML reader: Really ignore DOCTYPE and xml declarations.
This actually does what d71b013841
said it did.

Revised epub tests to remove the repeated DOCTYPE and xml tags.
2014-08-07 22:12:44 -07:00
John MacFarlane
3c4079edc8 Merge pull request from mpickering/epubfixes
EPUB Reader: Improved image extraction
2014-08-07 19:00:32 -07:00
Matthew Pickering
19d2ff68b1 EPUB Reader: Improved how images are extracted 2014-08-07 22:56:30 +01:00