Commit graph

1326 commits

Author SHA1 Message Date
John MacFarlane
31713d572a Merge pull request from tarleb/org-emphasis-fix
Org reader: fix rules for emphasis recognition
2014-10-18 13:19:42 -07:00
Albert Krewinkel
e3c36ed6ce Org reader: Drop COMMENT document trees
Document trees under a header starting with the word `COMMENT` are
comment trees and should not be exported.  Those trees are dropped
silently.

This closes .
2014-10-18 22:11:53 +02:00
Albert Krewinkel
d571bec454 Org reader: fix rules for emphasis recognition
Things like `/hello,/` or `/hi'/` were falsy recognized as emphasised
strings.  This is wrong, as `,` and `'` are forbidden border chars and
may not occur on the inner border of emphasized text.  This patch
enables the reader to matches the reference implementation in that it
reads the above strings as plain text.
2014-10-18 12:47:59 +02:00
Timothy Humphries
f1f56e8533 Fix indent issue for definition lists
Tidy up fix for ,  as per comments in .
Fix same issue for definition lists with the same method.
2014-10-17 20:06:25 -04:00
Timothy Humphries
4f4b0f031d Respect indent when parsing Org bullet lists
Fixes issue with top-level bullet list parsing.
Previously we would use `many1 spaceChars` rather than respecting
the list's indent level. We also permitted `*` bullets on unindented
lists, which should unambiguously parse as `header 1`.
Combined, this meant headers at a different indent level were
being unwittingly slurped into preceding bullet lists, as per
Issue .
2014-10-12 03:18:36 -04:00
Wikiwide
678aa31561 cref, sep
Adding inlineCommands
2014-10-03 11:33:02 +10:00
John MacFarlane
fe6d43b3e0 Merge pull request from jkr/windowsfix
Fix path-slashes inside archive for windows
2014-09-27 16:21:17 -07:00
Matthew Pickering
5cb475c374 Org Reader: Parse multi-inline terms correctly in definition list
Closes 
2014-09-27 22:40:25 +01:00
Artyom
bc115ffc2d Fix 'Ext_lists_without_preceding_blankline' bug.
* Fixes .
  * Adds a test.
2014-09-26 13:32:08 +04:00
mpickering
6740a9592a HTML Reader: Recognise <br> tags inside <pre> blocks
Closes 
2014-09-25 19:20:12 +01:00
Jesse Rosenthal
132814aeb6 Docx Reader: Remove header class properly in other langs
When we encounter one of the polyglot header styles, we want to remove
that from the par styles after we convert to a header. To do that, we
have to keep track of the style name, and remove it appropriately.
2014-09-06 07:53:29 -04:00
Jesse Rosenthal
71452946d9 Docx reader: Use polyglot header list.
We're just keeping a list of header formats that different languages use
as their default styles. At the moment, we have English, German, Danish,
and French. We can continue to add to this.

This is simpler than parsing the styles file, and perhaps less
error-prone, since there seems to be some variations, even within a
language, of how a style file will define headers.
2014-09-05 21:59:58 -04:00
Jesse Rosenthal
13fefd7959 Docx Reader: Start list of polyglot section headers. 2014-09-05 17:31:24 -04:00
Jesse Rosenthal
73b887e2df Org reader: Added state changing blanklines.
This allows us to emphasize at the beginning of a new paragraph (or, in
general, after blank lines).
2014-09-04 19:55:53 -04:00
Jesse Rosenthal
ac8ed1fa93 Docx reader: Rewrite rewriteLink to work with new headers.
There could be new top-level headers after making lists, so we have to
rewrite links after that.
2014-09-04 16:44:21 -04:00
Jesse Rosenthal
7fe54505df Docx reader: Single-item headers in ordered lists are headers.
When users number their headers, Word understands that as a single item
enumerated list. We make the assumption that such a list is, in fact, a header.
2014-09-04 16:35:57 -04:00
Jesse Rosenthal
4ef850ded5 Docx reader: Fix window path for image lookup.
Don't use os-sensitive "combine", since we always want the paths in our
zip-archive to use forward-slashes.
2014-09-02 13:45:01 -04:00
John MacFarlane
598d3ee23b Markdown reader: better handling of paragraph in div.
Previously text that ended a div would be parsed as Plain
unless there was a blank line before the closing div tag.

Test case:

    <div class="first">
    This is a paragraph.

    This is another paragraph.
    </div>

Closes .
2014-08-31 12:55:47 -07:00
John MacFarlane
f70e3c3297 Merge branch 'mime' of https://github.com/Aelve/John into Aelve-mime
Conflicts:
	src/Text/Pandoc/Writers/Docx.hs
2014-08-30 11:49:50 -07:00
Jesse Rosenthal
c931be24e1 Docx Reader: Read single para in table cell as plain
This makes to docx reader's native output fit with the way the markdown
reader understands its markdown output. Ie, as far as table cells go:

docx -> native == docx -> native -> markdown -> native

(This identity isn't true for other things outside of table cells, of
course).
2014-08-28 14:35:33 -04:00
Calvin Beck
f813755c55 Fixed exampleLine parser to accept example lines which have indentation at the start of the line. 2014-08-26 21:56:40 -06:00
mpickering
aa808055f0 Txt2Tags Reader: Fixed crash when reading from stdin 2014-08-21 17:11:21 +01:00
mpickering
3b6d7afa71 Txt2Tags Reader: Corrected formatting of %%mtime macro 2014-08-21 17:11:16 +01:00
mpickering
2a7319541d Txt2Tags Reader: Parse Meta information
The header is now parsed as meta information. The first line is the
`title`, the second is the `author` and third line is the `date`.
2014-08-21 17:09:40 +01:00
mpickering
2cd049a1bf Txt2Tags reader: Header is now parsed only if standalone flag is set 2014-08-20 18:11:37 +01:00
John MacFarlane
716ad5fd8a Merge pull request from jkr/styleparse
Docx reader: parsing styles
2014-08-18 14:14:01 -07:00
John MacFarlane
6dce8c6760 HTML reader: improved handling of tags that can be block or inline.
Previously a section like this would be enclosed in a paragraph,
with RawInline for the video tags (since video is a tag that can
be either block or inline):

    <video controls="controls">
       <source src="../videos/test.mp4" type="video/mp4" />
       <source src="../videos/test.webm" type="video/webm" />
       <p>
          The videos can not be played back on your system.<br/>
          Try viewing on Youtube (requires Internet connection):
          <a href="http://youtu.be/etE5urBps_w">Relative Velocity on
    Youtube</a>.
       </p>
    </video>

This change will cause the video and source tags to be parsed
as RawBlock instead, giving better output.

The general change is this:  when we're parsing a "plain" sequence
of inlines, we don't parse anything that COULD be a block-level tag.
2014-08-18 12:41:09 -07:00
Jesse Rosenthal
4b38e9f1f0 Docx reader: whitespace fix. 2014-08-17 20:11:50 -04:00
Jesse Rosenthal
198aea190f Docx reader: remove emph styles and strong styles list.
We no longer need the explicit lists since we're deriving them from the
ground up.
2014-08-17 17:04:55 -04:00
Jesse Rosenthal
9da7b0946e Docx reader: Add "Hyperlink" to blacklisted styles.
This is the only one so far. We'll add others as they show up.
2014-08-17 17:04:14 -04:00
Jesse Rosenthal
15ce28b8ca Docx reader: Use style resolver.
We now no longer check against explicit styles.
2014-08-17 17:03:44 -04:00
Jesse Rosenthal
03d5d8e596 Docx Reader: Introduce function for resolving dependent run styles.
We always favor an explicit positive or negative in a style in a
descendent, and only turn to the ancestor if nothing is set.

We also introduce an (empty) list of styles that are black-listed. We
won't check them. (Think underlines in hyperlinks).
2014-08-17 16:54:11 -04:00
Jesse Rosenthal
99491f0d98 Docx Parse: build a bottom-up style tree.
Two points here: (1) We're going bottom-up, from styles not based on
anything, to avoid circular dependencies or any other sort of
maliciousness/incompetence. And (2) each style points to its
parent. That way, we don't need the whole tree to pass a style over to
Docx.hs
2014-08-17 15:46:17 -04:00
Artyom Kazak
6a34cd3ddf Update Reader.EPUB to use MimeType. 2014-08-17 21:00:55 +04:00
Jesse Rosenthal
b8f1658c36 Alias string and runStyle to CharStyle type. 2014-08-17 11:30:22 -04:00
Jesse Rosenthal
c4871ac790 Docx Style parser: Basic one now just takes a parent style.
This will make it easier to build the style map from the bottom up (to
avoid any infinite references).
2014-08-17 10:19:48 -04:00
Jesse Rosenthal
75eec0a6b8 Docx reader: work with new rStyle.
Just discards info at the moment, so at least it works the same.
2014-08-17 09:22:25 -04:00
Jesse Rosenthal
ea85a797c2 Parser: Framework for parsing styles.
We want to be able to read user-defined styles. Eventually we'll be able
to figure out styles in terms of inheritance as well. The actual
cascading will happen in the docx reader.
2014-08-17 09:22:21 -04:00
Jesse Rosenthal
dc5b0ba09b Docx reader: Change behavior of Super/Subscript
In docx, super- and subscript are attributes of Vertalign. It makes more
sense to follow this, and have different possible values of Vertalign in
runStyle. This is mainly a preparatory step for real style parsing,
since it can distinguish between vertical align being explicitly turned
off and it not being set.

In addition, it makes parsing a bit clearer, and makes sure we don't do
docx-impossible things like being simultaneously super and sub.
2014-08-17 08:20:00 -04:00
John MacFarlane
9d52ecdd42 HTML reader: Parse appropriately styled span as SmallCaps. 2014-08-16 22:57:00 -07:00
Jesse Rosenthal
9bb0b99981 Docx reader: Remove unnecessary plural functions
functions like runElemsToInlines and parPartsToInlines are just defined
in terms of concatting and mapping their singular
version (e.g. `runElemToInlines`). Having two functions with almost
identical names makes it easier to introduce errors. It's easy enough to
just concat and map inline, and it makes it clearer what is going on in
the code.
2014-08-16 15:07:41 -04:00
Jesse Rosenthal
9969b2ebee Docx reader: Fix bug in character styles.
Style handling has been cleaned up, but introduced a bug here. There
wasn't previously a test to catch it.
2014-08-16 14:05:19 -04:00
Jesse Rosenthal
0ff9ec2f4e Rewrite Docx.hs and Reducible to use Builder.
The big news here is a rewrite of Docx to use the builder
functions. As opposed to previous attempts, we now see a significant
speedup -- times are cut in half (or more) in a few informal tests.

Reducible has also been rewritten. It can doubtless be simplified and
clarified further. We can consider this, at the moment, a reference for
correct behavior.
2014-08-16 10:22:55 -04:00
John MacFarlane
8bf39cf6d6 Markdown reader: Better handle quote characters in inline links.
This was previously failing to be recognized as a link:

    [Test](http://en.wikipedia.org/wiki/Ward's_method)

Closes .
2014-08-14 10:59:27 -07:00
Jesse Rosenthal
6897905602 Docx reader: Interpret "Strong" and Emphasis run styles. 2014-08-13 12:23:03 -04:00
Jesse Rosenthal
a1320a76f9 Docx: Reducible forgot about smallcaps 2014-08-13 00:09:40 -04:00
Jesse Rosenthal
dca55630e6 Docx Reader: Trim line breaks from the beginning and end of Section
Headers.

We might also want to do this elsewhere (for pars, for example).
2014-08-12 23:42:01 -04:00
Jesse Rosenthal
378a795eaa Docx: More robust handling of multiple bookmarks in header. 2014-08-12 23:41:57 -04:00
Jesse Rosenthal
85579052b5 Docx reader: Check for null-id'd anchors too.
Otherwise they get left dangling in the document.
2014-08-12 23:33:03 -04:00
Jesse Rosenthal
194ed88852 Docx reader: accept explicit "Italic" and "Bold" rStyles.
Note that "Italic" can be on, and, from the last commit, `<w:i>` can be
present, but be turned off. In that case, the turned-off tag takes
precedence. So, we have to distinguish between something being off and
something not being there. Hence, isItalic, isBold, isStrike, and
isSmallCaps have become Maybes.
2014-08-12 22:39:18 -04:00