pandoc/test/latex-reader.latex
John MacFarlane 0feb7504b1 Rewrote LaTeX reader with proper tokenization.
This rewrite is primarily motivated by the need to
get macros working properly.  A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).

We now tokenize the input text, then parse the token stream.

Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.

A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.

* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
  Exports Macro, Tok, TokType, Line, Column.  [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
  so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
  Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
  rawLaTeXInline and rawLaTeXBlock.  (Both now return a String,
  and they are polymorphic in state.)
* Added orgMacros field to OrgState.  [API change]
* Removed readerApplyMacros from ReaderOptions.
  Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.

Fixes #1390.
Fixes #2118.
Fixes #3236.
Fixes #3779.
Fixes #934.
Fixes #982.
2017-07-07 12:36:00 +02:00

847 lines
13 KiB
Text

\documentclass{article}
\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\usepackage[breaklinks=true,unicode=true]{hyperref}
\usepackage[normalem]{ulem}
% avoid problems with \sout in headers with hyperref:
\pdfstringdefDisableCommands{\renewcommand{\sout}{}}
\usepackage{enumerate}
\usepackage{fancyvrb}
\usepackage{graphicx}
\usepackage{url}
\setcounter{secnumdepth}{0}
\VerbatimFootnotes % allows verbatim text in footnotes
\title{Pandoc Test Suite}
\author{John MacFarlane \and Anonymous}
\date{July 17, 2006}
\begin{document}
\maketitle
This is a set of tests for pandoc. Most of them are adapted from
John Gruber's markdown test suite.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Headers}
\subsection{Level 2 with an \href{/url}{embedded link}}
\subsubsection{Level 3 with \emph{emphasis}}
Level 4
Level 5
\section[alt title ignored]{Level 1}
\subsection{Level 2 with \emph{emphasis}}
\subsubsection{Level 3}
with no blank line
\subsection{Level 2}
with no blank line
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Paragraphs}
Here's a regular paragraph.
In Markdown 1.0.0 and earlier. Version 8. This line turns into a
list item. Because a hard-wrapped line in the middle of a paragraph
looked like a list item.
Here's one with a bullet. * criminey.
There should be a hard line break\\here.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Block Quotes}
E-mail style:
\begin{quote}
This is a block quote. It is pretty short.
\end{quote}
\begin {quote}
Code in a block quote:
\begin{verbatim}
sub status {
print "working";
}
\end{verbatim}
A list:
\begin{enumerate}[1.]
\item
item one
\item
item two
\end{enumerate}
Nested block quotes:
\begin{quote}
nested
\end{quote}
\begin{quote}
nested
\end{quote}
\end{quote}
This should not be a block quote: 2 \textgreater{} 1.
Box-style:
\begin{quote}
Example:
\begin{verbatim}
sub status {
print "working";
}
\end{verbatim}
\end{quote}
\begin{quote}
\begin{enumerate}[1.]
\item
do laundry
\item
take out the trash
\end{enumerate}
\end{quote}
Here's a nested one:
\begin{quote}
Joe said:
\begin{quote}
Don't quote me.
\end{quote}
\end{quote}
And a following paragraph.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Code Blocks}
Code:
\begin{verbatim}
---- (should be four hyphens)
sub status {
print "working";
}
this code block is indented by one tab
\end{verbatim}
And:
\begin{verbatim}
this code block is indented by two tabs
These should not be escaped: \$ \\ \> \[ \{
\end{verbatim}
\begin{obeylines}
this has \emph{two
lines}
\end{obeylines}
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Lists}
\subsection{Unordered}
Asterisks tight:
\begin{itemize}
\item
asterisk 1
\item
asterisk 2
\item
asterisk 3
\end{itemize}
Asterisks loose:
\begin{itemize}
\item
asterisk 1
\item
asterisk 2
\item
asterisk 3
\end{itemize}
Pluses tight:
\begin{itemize}
\item
Plus 1
\item
Plus 2
\item
Plus 3
\end{itemize}
Pluses loose:
\begin{itemize}
\item
Plus 1
\item
Plus 2
\item
Plus 3
\end{itemize}
Minuses tight:
\begin{itemize}
\item
Minus 1
\item
Minus 2
\item
Minus 3
\end{itemize}
Minuses loose:
\begin{itemize}
\item
Minus 1
\item
Minus 2
\item
Minus 3
\end{itemize}
\subsection{Ordered}
Tight:
\begin{enumerate}[1.]
\item
First
\item
Second
\item
Third
\end{enumerate}
and:
\begin{enumerate}[1.]
\item
One
\item
Two
\item
Three
\end{enumerate}
Loose using tabs:
\begin{enumerate}[1.]
\item
First
\item
Second
\item
Third
\end{enumerate}
and using spaces:
\begin{enumerate}[1.]
\item
One
\item
Two
\item
Three
\end{enumerate}
Multiple paragraphs:
\begin{enumerate}[1.]
\item
Item 1, graf one.
Item 1. graf two. The quick brown fox jumped over the lazy dog's
back.
\item
Item 2.
\item
Item 3.
\end{enumerate}
\subsection{Nested}
\begin{itemize}
\item
Tab
\begin{itemize}
\item
Tab
\begin{itemize}
\item
Tab
\end{itemize}
\end{itemize}
\end{itemize}
Here's another:
\begin{enumerate}[1.]
\item
First
\item
Second:
\begin{itemize}
\item
Fee
\item
Fie
\item
Foe
\end{itemize}
\item
Third
\end{enumerate}
Same thing but with paragraphs:
\begin{enumerate}[1.]
\item
First
\item
Second:
\begin{itemize}
\item
Fee
\item
Fie
\item
Foe
\end{itemize}
\item
Third
\end{enumerate}
\subsection{Tabs and spaces}
\begin{itemize}
\item
this is a list item indented with tabs
\item
this is a list item indented with spaces
\begin{itemize}
\item
this is an example list item indented with tabs
\item
this is an example list item indented with spaces
\end{itemize}
\end{itemize}
\subsection{Fancy list markers}
\begin{enumerate}[(1)]
\setcounter{enumi}{1}
\item
begins with 2
\item
and now 3
with a continuation
\begin{enumerate}[i.]
\setcounter{enumii}{3}
\item
sublist with roman numerals, starting with 4
\item
more items
\begin{enumerate}[(A)]
\item
a subsublist
\item
a subsublist
\end{enumerate}
\end{enumerate}
\end{enumerate}
Nesting:
\begin{enumerate}[A.]
\item
Upper Alpha
\begin{enumerate}[I.]
\item
Upper Roman.
\begin{enumerate}[(1)]
\setcounter{enumiii}{5}
\item
Decimal start with 6
\begin{enumerate}[a)]
\setcounter{enumiv}{2}
\item
Lower alpha with paren
\end{enumerate}
\end{enumerate}
\end{enumerate}
\end{enumerate}
Autonumbering:
\begin{enumerate}
\item
Autonumber.
\item
More.
\begin{enumerate}
\item
Nested.
\end{enumerate}
\end{enumerate}
Should not be a list item:
M.A. 2007
B. Williams
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Definition Lists}
Tight using spaces:
\begin{description}
\item[apple]
red fruit
\item[orange]
orange fruit
\item[banana]
yellow fruit
\end{description}
Tight using tabs:
\begin{description}
\item[apple]
red fruit
\item[orange]
orange fruit
\item[banana]
yellow fruit
\end{description}
Loose:
\begin{description}
\item[apple]
red fruit
\item[orange]
orange fruit
\item[banana]
yellow fruit
\end{description}
Multiple blocks with italics:
\begin{description}
\item[\emph{apple}]
red fruit
contains seeds, crisp, pleasant to taste
\item[\emph{orange}]
orange fruit
\begin{verbatim}
{ orange code block }
\end{verbatim}
\begin{quote}
orange block quote
\end{quote}
\end{description}
\section{HTML Blocks}
Simple block on one line:
foo
And nested without indentation:
foo
bar
Interpreted markdown in a table:
This is \emph{emphasized}
And this is \textbf{strong}
Here's a simple block:
foo
This should be a code block, though:
\begin{verbatim}
<div>
foo
</div>
\end{verbatim}
As should this:
\begin{verbatim}
<div>foo</div>
\end{verbatim}
Now, nested:
foo
This should just be an HTML comment:
Multiline:
Code block:
\begin{verbatim}
<!-- Comment -->
\end{verbatim}
Just plain comment, with trailing spaces on the line:
Code:
\begin{verbatim}
<hr />
\end{verbatim}
Hr's:
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Inline Markup}
This is \emph{emphasized}, and so \emph{is this}.
This is \textbf{strong}, and so \textbf{is this}.
An \emph{\href{/url}{emphasized link}}.
\textbf{\emph{This is strong and em.}}
So is \textbf{\emph{this}} word.
\textbf{\emph{This is strong and em.}}
So is \textbf{\emph{this}} word.
This is code: \verb!>!, \verb!$!, \verb!\!, \verb!\$!,
\verb!<html>!.
\sout{This is \emph{strikeout}.}
Superscripts: a\textsuperscript{bc}d
a\textsuperscript{\emph{hello}} a\textsuperscript{hello there}.
Subscripts: H\textsubscript{2}O, H\textsubscript{23}O,
H\textsubscript{many of them}O.
These should not be superscripts or subscripts, because of the
unescaped spaces: a\^{}b c\^{}d, a\ensuremath{\sim}b
c\ensuremath{\sim}d.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Smart quotes, ellipses, dashes}
``Hello,'' said the spider. ``\,`Shelob' is my name.''
`A', `B', and `C' are letters.
`Oak,' `elm,' and `beech' are names of trees. So is `pine.'
`He said, ``I want to go.''\,' Were you alive in the 70's?
Here is some quoted `\verb!code!' and a
``\href{http://example.com/?foo=1&bar=2}{quoted link}''.
Some dashes: one---two---three---four---five.
Dashes between numbers: 5--7, 255--66, 1987--1999.
Ellipses\ldots{}and\ldots{}and\ldots{}.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{LaTeX}
\begin{itemize}
\item
\cite[22-23]{smith.1899}
\item
\doublespacing
\item
$2+2=4$
\item
$x \in y$
\item
$\alpha \wedge \omega$
\item
$223$
\item
$p$-Tree
\item
$\frac{d}{dx}f(x)=\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$
\item
Here's one that has a line break in it:
$\alpha + \omega \times x^2$.
\end{itemize}
These shouldn't be math:
\begin{itemize}
\item
To get the famous equation, write \verb!$e = mc^2$!.
\item
\$22,000 is a \emph{lot} of money. So is \$34,000. (It worked if
``lot'' is emphasized.)
\item
Escaped \verb!$!: \$73 \emph{this should be emphasized} 23\$.
\end{itemize}
Here's a LaTeX table:
\begin{tabular}{|l|l|}\hline
Animal & Number \\ \hline
Dog & 2 \\
Cat & 1 \\ \hline
\end{tabular}
A table with one column:
\begin{tabular}{c}
Animal \\
Vegetable
\end{tabular}
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Special Characters}
Here is some unicode:
\begin{itemize}
\item
I hat: Î
\item
o umlaut: ö
\item
section: §
\item
set membership: ∈
\item
copyright: ©
\end{itemize}
AT\&T has an ampersand in their name.
AT\&T is another way to write it.
This \& that.
4 \textless{} 5.
6 \textgreater{} 5.
Backslash: \textbackslash{}
Backtick: `
Asterisk: *
Underscore: \_
Left brace: \{
Right brace: \}
Left bracket: [
Right bracket: ]
Left paren: (
Right paren: )
Greater-than: \textgreater{}
Hash: \#
Period: .
Bang: !
Plus: +
Minus: -
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Links}
\subsection{Explicit}
Just a \href{/url/}{URL}.
\href{/url/}{URL and title}.
\href{/url/}{URL and title}.
\href{/url/}{URL and title}.
\href{/url/}{URL and title}
\href{/url/}{URL and title}
\href{/url/with_underscore}{with\_underscore}
\href{mailto:nobody@nowhere.net}{Email link}
\href{}{Empty}.
\subsection{Reference}
Foo \href{/url/}{bar}.
Foo \href{/url/}{bar}.
Foo \href{/url/}{bar}.
With \href{/url/}{embedded [brackets]}.
\href{/url/}{b} by itself should be a link.
Indented \href{/url}{once}.
Indented \href{/url}{twice}.
Indented \href{/url}{thrice}.
This should [not][] be a link.
\begin{verbatim}
[not]: /url
\end{verbatim}
Foo \href{/url/}{bar}.
Foo \href{/url/}{biz}.
\subsection{With ampersands}
Here's a
\href{http://example.com/?foo=1&bar=2}{link with an ampersand in the URL}.
Here's a link with an amersand in the link text:
\href{http://att.com/}{AT\&T}.
Here's an \href{/script?foo=1&bar=2}{inline link}.
Here's an
\href{/script?foo=1&bar=2}{inline link in pointy braces}.
\subsection{Autolinks}
With an ampersand: \url{http://example.com/?foo=1&bar=2}
\begin{itemize}
\item
In a list?
\item
\url{http://example.com/}
\item
It should.
\end{itemize}
An e-mail address:
\href{mailto:nobody@nowhere.net}{nobody@nowhere.net}
\begin{quote}
Blockquoted: \url{http://example.com/}
\end{quote}
Auto-links should not occur here: \verb!<http://example.com/>!
\begin{verbatim}
or here: <http://example.com/>
\end{verbatim}
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Images}
From ``Voyage dans la Lune'' by Georges Melies (1902):
\includegraphics{lalune.jpg}
Here is a movie \includegraphics{movie.jpg} icon.
\begin{center}\rule{3in}{0.4pt}\end{center}
\section{Footnotes}
Here is a footnote
reference,\footnote{ Here is the footnote. It can go anywhere after the footnote
reference. It need not be placed at the end of the document.
}
and
another.\footnote{ Here's the long note. This one contains multiple blocks.
Subsequent blocks are indented to show that they belong to the
footnote (as with list items).
\begin{Verbatim}
{ <code> }
\end{Verbatim}
If you want, you can indent every line, but you can also be lazy
and just indent the first line of each block.
}
This should \emph{not} be a footnote reference, because it contains
a space.[\^{}my note] Here is an inline
note.\footnote{ This is \emph{easier} to type. Inline notes may contain
\href{http://google.com}{links} and \verb!]! verbatim characters,
as well as [bracketed text].
}
\begin{quote}
Notes can go in quotes.\footnote{ In quote.
}
\end{quote}
\begin{enumerate}[1.]
\item
And in list items.\footnote{ In list.
}
\end{enumerate}
This paragraph should not be part of the note, as it is not
indented.
\section{Escaped characters}
\$ \% \& \# \_ \{ \}
\end{document}