Markdown reader: use CommonMark rules for list item nesting.

Closes #3511.

Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces.  Now the indentation required is determined by the
first line of the list item:  to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker.  See the CommonMark
spec for more details and examples.

Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules.  Here are some examples
of texts that will be parsed differently:

    - a
      - b

will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.

    - a

          code

Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin.  With the
four-space rule, this would be a regular paragraph rather than a code
block.

    - a

            code

Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`.  With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).

This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
 #1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).

Users who want to use the old rules can select the `four_space_rule`
extension.

* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
  of `gobbleSpaces` has been changed so that a `ReaderOptions`
  parameter is not needed.
This commit is contained in:
John MacFarlane 2017-08-19 10:56:15 -07:00
parent 5ab1162def
commit a31241a08b
8 changed files with 178 additions and 117 deletions

View file

@ -2078,12 +2078,12 @@ But Markdown also allows a "lazy" format:
list item. list item.
* and my second. * and my second.
### The four-space rule ### ### Block content in list items ###
A list item may contain multiple paragraphs and other block-level A list item may contain multiple paragraphs and other block-level
content. However, subsequent paragraphs must be preceded by a blank line content. However, subsequent paragraphs must be preceded by a blank line
and indented four spaces or a tab. The list will look better if the first and indented to line up with the first non-space content after
paragraph is aligned with the rest: the list marker.
* First paragraph. * First paragraph.
@ -2094,9 +2094,19 @@ paragraph is aligned with the rest:
{ code } { code }
Exception: if the list marker is followed by an indented code
block, which must begin 5 spaces after the list marker, then
subsequent paragraphs must begin two columns after the last
character of the list marker:
* code
continuation paragraph
List items may include other lists. In this case the preceding blank List items may include other lists. In this case the preceding blank
line is optional. The nested list must be indented four spaces or line is optional. The nested list must be indented to line up with
one tab: the first non-space character after the list marker of the
containing list item.
* fruits * fruits
+ apples + apples
@ -2121,21 +2131,6 @@ other blocks in a list item, the first line of each must be indented.
Second paragraph of second Second paragraph of second
list item. list item.
**Note:** Although the four-space rule for continuation paragraphs
comes from the official [Markdown syntax guide], the reference implementation,
`Markdown.pl`, does not follow it. So pandoc will give different results than
`Markdown.pl` when authors have indented continuation paragraphs fewer than
four spaces.
The [Markdown syntax guide] is not explicit whether the four-space
rule applies to *all* block-level content in a list item; it only
mentions paragraphs and code blocks. But it implies that the rule
applies to all block-level content (including nested lists), and
pandoc interprets it that way.
[Markdown syntax guide]:
http://daringfireball.net/projects/markdown/syntax#list
### Ordered lists ### ### Ordered lists ###
Ordered lists work just like bulleted lists, except that the items Ordered lists work just like bulleted lists, except that the items
@ -3606,6 +3601,12 @@ implied by pandoc's default `all_symbols_escapable`.
Allow a list to occur right after a paragraph, with no intervening Allow a list to occur right after a paragraph, with no intervening
blank space. blank space.
#### Extension: `four_space_rule` ####
Selects the pandoc <= 2.0 behavior for parsing lists, so that
four spaces indent are needed for list item continuation
paragraphs.
#### Extension: `spaced_reference_links` #### #### Extension: `spaced_reference_links` ####
Allow whitespace between the two components of a reference link, Allow whitespace between the two components of a reference link,

View file

@ -111,6 +111,7 @@ data Extension =
| Ext_autolink_bare_uris -- ^ Make all absolute URIs into links | Ext_autolink_bare_uris -- ^ Make all absolute URIs into links
| Ext_fancy_lists -- ^ Enable fancy list numbers and delimiters | Ext_fancy_lists -- ^ Enable fancy list numbers and delimiters
| Ext_lists_without_preceding_blankline -- ^ Allow lists without preceding blank | Ext_lists_without_preceding_blankline -- ^ Allow lists without preceding blank
| Ext_four_space_rule -- ^ Require 4-space indent for list contents
| Ext_startnum -- ^ Make start number of ordered list significant | Ext_startnum -- ^ Make start number of ordered list significant
| Ext_definition_lists -- ^ Definition lists as in pandoc, mmd, php | Ext_definition_lists -- ^ Definition lists as in pandoc, mmd, php
| Ext_compact_definition_lists -- ^ Definition lists without | Ext_compact_definition_lists -- ^ Definition lists without

View file

@ -50,6 +50,7 @@ module Text.Pandoc.Parsing ( takeWhileP,
blankline, blankline,
blanklines, blanklines,
gobbleSpaces, gobbleSpaces,
gobbleAtMostSpaces,
enclosed, enclosed,
stringAnyCase, stringAnyCase,
parseFromString, parseFromString,
@ -380,14 +381,33 @@ blanklines = many1 blankline
-- | Gobble n spaces; if tabs are encountered, expand them -- | Gobble n spaces; if tabs are encountered, expand them
-- and gobble some or all of their spaces, leaving the rest. -- and gobble some or all of their spaces, leaving the rest.
gobbleSpaces :: Monad m => ReaderOptions -> Int -> ParserT [Char] st m () gobbleSpaces :: (HasReaderOptions st, Monad m)
gobbleSpaces _ 0 = return () => Int -> ParserT [Char] st m ()
gobbleSpaces opts n = try $ do gobbleSpaces 0 = return ()
char ' ' <|> do char '\t' gobbleSpaces n
| n < 0 = error "gobbleSpaces called with negative number"
| otherwise = try $ do
char ' ' <|> eatOneSpaceOfTab
gobbleSpaces (n - 1)
eatOneSpaceOfTab :: (HasReaderOptions st, Monad m) => ParserT [Char] st m Char
eatOneSpaceOfTab = do
char '\t'
tabstop <- getOption readerTabStop
inp <- getInput inp <- getInput
setInput $ replicate (readerTabStop opts - 1) ' ' ++ inp setInput $ replicate (tabstop - 1) ' ' ++ inp
return ' ' return ' '
gobbleSpaces opts (n - 1)
-- | Gobble up to n spaces; if tabs are encountered, expand them
-- and gobble some or all of their spaces, leaving the rest.
gobbleAtMostSpaces :: (HasReaderOptions st, Monad m)
=> Int -> ParserT [Char] st m Int
gobbleAtMostSpaces 0 = return 0
gobbleAtMostSpaces n
| n < 0 = error "gobbleAtMostSpaces called with negative number"
| otherwise = option 0 $ do
char ' ' <|> eatOneSpaceOfTab
(+ 1) <$> gobbleAtMostSpaces (n - 1)
-- | Parses material enclosed between start and end parsers. -- | Parses material enclosed between start and end parsers.
enclosed :: (Show end, Stream s m Char) => ParserT s st m t -- ^ start parser enclosed :: (Show end, Stream s m Char) => ParserT s st m t -- ^ start parser

View file

@ -138,12 +138,7 @@ nonindentSpaces = do
skipNonindentSpaces :: PandocMonad m => MarkdownParser m Int skipNonindentSpaces :: PandocMonad m => MarkdownParser m Int
skipNonindentSpaces = do skipNonindentSpaces = do
tabStop <- getOption readerTabStop tabStop <- getOption readerTabStop
atMostSpaces (tabStop - 1) <* notFollowedBy spaceChar gobbleAtMostSpaces (tabStop - 1) <* notFollowedBy spaceChar
atMostSpaces :: PandocMonad m => Int -> MarkdownParser m Int
atMostSpaces n
| n > 0 = (char ' ' >> (+1) <$> atMostSpaces (n-1)) <|> return 0
| otherwise = return 0
litChar :: PandocMonad m => MarkdownParser m Char litChar :: PandocMonad m => MarkdownParser m Char
litChar = escapedChar' litChar = escapedChar'
@ -809,49 +804,51 @@ blockQuote = do
bulletListStart :: PandocMonad m => MarkdownParser m () bulletListStart :: PandocMonad m => MarkdownParser m ()
bulletListStart = try $ do bulletListStart = try $ do
optional newline -- if preceded by a Plain block in a list context optional newline -- if preceded by a Plain block in a list context
startpos <- sourceColumn <$> getPosition
skipNonindentSpaces skipNonindentSpaces
notFollowedBy' (() <$ hrule) -- because hrules start out just like lists notFollowedBy' (() <$ hrule) -- because hrules start out just like lists
satisfy isBulletListMarker satisfy isBulletListMarker
endpos <- sourceColumn <$> getPosition gobbleSpaces 1 <|> () <$ lookAhead newline
tabStop <- getOption readerTabStop try (gobbleAtMostSpaces 3 >> notFollowedBy spaceChar) <|> return ()
lookAhead (newline <|> spaceChar)
() <$ atMostSpaces (tabStop - (endpos - startpos))
anyOrderedListStart :: PandocMonad m => MarkdownParser m (Int, ListNumberStyle, ListNumberDelim) orderedListStart :: PandocMonad m
anyOrderedListStart = try $ do => Maybe (ListNumberStyle, ListNumberDelim)
-> MarkdownParser m (Int, ListNumberStyle, ListNumberDelim)
orderedListStart mbstydelim = try $ do
optional newline -- if preceded by a Plain block in a list context optional newline -- if preceded by a Plain block in a list context
startpos <- sourceColumn <$> getPosition
skipNonindentSpaces skipNonindentSpaces
notFollowedBy $ string "p." >> spaceChar >> digit -- page number notFollowedBy $ string "p." >> spaceChar >> digit -- page number
res <- do guardDisabled Ext_fancy_lists (do guardDisabled Ext_fancy_lists
start <- many1 digit >>= safeRead start <- many1 digit >>= safeRead
char '.' char '.'
return (start, DefaultStyle, DefaultDelim) gobbleSpaces 1 <|> () <$ lookAhead newline
<|> do (num, style, delim) <- anyOrderedListMarker optional $ try (gobbleAtMostSpaces 3 >> notFollowedBy spaceChar)
return (start, DefaultStyle, DefaultDelim))
<|>
(do (num, style, delim) <- maybe
anyOrderedListMarker
(\(sty,delim) -> (\start -> (start,sty,delim)) <$>
orderedListMarker sty delim)
mbstydelim
gobbleSpaces 1 <|> () <$ lookAhead newline
-- if it could be an abbreviated first name, -- if it could be an abbreviated first name,
-- insist on more than one space -- insist on more than one space
when (delim == Period && (style == UpperAlpha || when (delim == Period && (style == UpperAlpha ||
(style == UpperRoman && (style == UpperRoman &&
num `elem` [1, 5, 10, 50, 100, 500, 1000]))) $ num `elem` [1, 5, 10, 50, 100, 500, 1000]))) $
() <$ spaceChar () <$ lookAhead (newline <|> spaceChar)
return (num, style, delim) optional $ try (gobbleAtMostSpaces 3 >> notFollowedBy spaceChar)
endpos <- sourceColumn <$> getPosition return (num, style, delim))
tabStop <- getOption readerTabStop
lookAhead (newline <|> spaceChar)
atMostSpaces (tabStop - (endpos - startpos))
return res
listStart :: PandocMonad m => MarkdownParser m () listStart :: PandocMonad m => MarkdownParser m ()
listStart = bulletListStart <|> (anyOrderedListStart >> return ()) listStart = bulletListStart <|> (orderedListStart Nothing >> return ())
listLine :: PandocMonad m => MarkdownParser m String listLine :: PandocMonad m => Int -> MarkdownParser m String
listLine = try $ do listLine continuationIndent = try $ do
notFollowedBy' (do indentSpaces notFollowedBy' (do gobbleSpaces continuationIndent
many spaceChar skipMany spaceChar
listStart) listStart)
notFollowedByHtmlCloser notFollowedByHtmlCloser
optional (() <$ indentSpaces) optional (() <$ gobbleSpaces continuationIndent)
listLineCommon listLineCommon
listLineCommon :: PandocMonad m => MarkdownParser m String listLineCommon :: PandocMonad m => MarkdownParser m String
@ -864,26 +861,39 @@ listLineCommon = concat <$> manyTill
-- parse raw text for one list item, excluding start marker and continuations -- parse raw text for one list item, excluding start marker and continuations
rawListItem :: PandocMonad m rawListItem :: PandocMonad m
=> MarkdownParser m a => MarkdownParser m a
-> MarkdownParser m String -> MarkdownParser m (String, Int)
rawListItem start = try $ do rawListItem start = try $ do
pos1 <- getPosition
start start
pos2 <- getPosition
continuationIndent <- (4 <$ guardEnabled Ext_four_space_rule)
<|> return (sourceColumn pos2 - sourceColumn pos1)
first <- listLineCommon first <- listLineCommon
rest <- many (do notFollowedBy listStart rest <- many (do notFollowedBy listStart
notFollowedBy (() <$ codeBlockFenced) notFollowedBy (() <$ codeBlockFenced)
notFollowedBy blankline notFollowedBy blankline
listLine) listLine continuationIndent)
blanks <- many blankline blanks <- many blankline
return $ unlines (first:rest) ++ blanks let result = unlines (first:rest) ++ blanks
return (result, continuationIndent)
-- continuation of a list item - indented and separated by blankline -- continuation of a list item - indented and separated by blankline
-- or (in compact lists) endline. -- or (in compact lists) endline.
-- note: nested lists are parsed as continuations -- note: nested lists are parsed as continuations
listContinuation :: PandocMonad m => MarkdownParser m String listContinuation :: PandocMonad m => Int -> MarkdownParser m String
listContinuation = try $ do listContinuation continuationIndent = try $ do
lookAhead indentSpaces x <- try $ do
result <- many1 listContinuationLine notFollowedBy blankline
notFollowedByHtmlCloser
gobbleSpaces continuationIndent
anyLineNewline
xs <- many $ try $ do
notFollowedBy blankline
notFollowedByHtmlCloser
gobbleSpaces continuationIndent <|> notFollowedBy' listStart
anyLineNewline
blanks <- many blankline blanks <- many blankline
return $ concat result ++ blanks return $ concat (x:xs) ++ blanks
notFollowedByHtmlCloser :: PandocMonad m => MarkdownParser m () notFollowedByHtmlCloser :: PandocMonad m => MarkdownParser m ()
notFollowedByHtmlCloser = do notFollowedByHtmlCloser = do
@ -892,20 +902,12 @@ notFollowedByHtmlCloser = do
Just t -> notFollowedBy' $ htmlTag (~== TagClose t) Just t -> notFollowedBy' $ htmlTag (~== TagClose t)
Nothing -> return () Nothing -> return ()
listContinuationLine :: PandocMonad m => MarkdownParser m String
listContinuationLine = try $ do
notFollowedBy blankline
notFollowedBy' listStart
notFollowedByHtmlCloser
optional indentSpaces
anyLineNewline
listItem :: PandocMonad m listItem :: PandocMonad m
=> MarkdownParser m a => MarkdownParser m a
-> MarkdownParser m (F Blocks) -> MarkdownParser m (F Blocks)
listItem start = try $ do listItem start = try $ do
first <- rawListItem start (first, continuationIndent) <- rawListItem start
continuations <- many listContinuation continuations <- many (listContinuation continuationIndent)
-- parsing with ListItemState forces markers at beginning of lines to -- parsing with ListItemState forces markers at beginning of lines to
-- count as list item markers, even if not separated by blank space. -- count as list item markers, even if not separated by blank space.
-- see definition of "endline" -- see definition of "endline"
@ -920,23 +922,14 @@ listItem start = try $ do
orderedList :: PandocMonad m => MarkdownParser m (F Blocks) orderedList :: PandocMonad m => MarkdownParser m (F Blocks)
orderedList = try $ do orderedList = try $ do
(start, style, delim) <- lookAhead anyOrderedListStart (start, style, delim) <- lookAhead (orderedListStart Nothing)
unless (style `elem` [DefaultStyle, Decimal, Example] && unless (style `elem` [DefaultStyle, Decimal, Example] &&
delim `elem` [DefaultDelim, Period]) $ delim `elem` [DefaultDelim, Period]) $
guardEnabled Ext_fancy_lists guardEnabled Ext_fancy_lists
when (style == Example) $ guardEnabled Ext_example_lists when (style == Example) $ guardEnabled Ext_example_lists
items <- fmap sequence $ many1 $ listItem items <- fmap sequence $ many1 $ listItem
( try $ do (orderedListStart (Just (style, delim)))
optional newline -- if preceded by Plain block in a list start' <- (start <$ guardEnabled Ext_startnum) <|> return 1
startpos <- sourceColumn <$> getPosition
skipNonindentSpaces
res <- orderedListMarker style delim
endpos <- sourceColumn <$> getPosition
tabStop <- getOption readerTabStop
lookAhead (newline <|> spaceChar)
atMostSpaces (tabStop - (endpos - startpos))
return res )
start' <- option 1 $ guardEnabled Ext_startnum >> return start
return $ B.orderedListWith (start', style, delim) <$> fmap compactify items return $ B.orderedListWith (start', style, delim) <$> fmap compactify items
bulletList :: PandocMonad m => MarkdownParser m (F Blocks) bulletList :: PandocMonad m => MarkdownParser m (F Blocks)
@ -1122,7 +1115,7 @@ rawHtmlBlocks = do
updateState $ \st -> st{ stateInHtmlBlock = Just tagtype } updateState $ \st -> st{ stateInHtmlBlock = Just tagtype }
let closer = htmlTag (\x -> x ~== TagClose tagtype) let closer = htmlTag (\x -> x ~== TagClose tagtype)
let block' = do notFollowedBy' closer let block' = do notFollowedBy' closer
atMostSpaces indentlevel gobbleAtMostSpaces indentlevel
block block
contents <- mconcat <$> many block' contents <- mconcat <$> many block'
result <- result <-

46
test/command/3511.md Normal file
View file

@ -0,0 +1,46 @@
```
% pandoc -t native
- a
- b
- c
- code
1000. one
not continuation
^D
[BulletList
[[Plain [Str "a"]
,BulletList
[[Plain [Str "b"]
,BulletList
[[Plain [Str "c"]]]]]]
,[CodeBlock ("",[],[]) "code"]]
,OrderedList (1000,Decimal,Period)
[[Plain [Str "one"]]]
,CodeBlock ("",[],[]) "not continuation"]
```
```
% pandoc -t native -f markdown+four_space_rule
- a
- b
- c
- not code
1000. one
continuation
^D
[BulletList
[[Plain [Str "a"]]
,[Plain [Str "b"]
,BulletList
[[Plain [Str "c"]]]]
,[CodeBlock ("",[],[]) "not code"]]
,OrderedList (1000,Decimal,Period)
[[Para [Str "one"]
,Para [Str "continuation"]]]]
```

View file

@ -29,13 +29,13 @@ Pandoc (Meta {unMeta = fromList [("author",MetaList [MetaInlines [Str "Author",S
,[Plain [Str "three"]]] ,[Plain [Str "three"]]]
,Header 2 ("indented-code-at-beginning-of-list",[],[]) [Str "Indented",Space,Str "code",Space,Str "at",Space,Str "beginning",Space,Str "of",Space,Str "list"] ,Header 2 ("indented-code-at-beginning-of-list",[],[]) [Str "Indented",Space,Str "code",Space,Str "at",Space,Str "beginning",Space,Str "of",Space,Str "list"]
,BulletList ,BulletList
[[CodeBlock ("",[],[]) "code\ncode"]] [[CodeBlock ("",[],[]) "code\ncode"
,OrderedList (1,Decimal,Period) ,OrderedList (1,Decimal,Period)
[[CodeBlock ("",[],[]) "code\ncode"] [[CodeBlock ("",[],[]) "code\ncode"]
,[CodeBlock ("",[],[]) "code\ncode"]] ,[CodeBlock ("",[],[]) "code\ncode"]]
,BulletList ,BulletList
[[CodeBlock ("",[],[]) "code\ncode"] [[CodeBlock ("",[],[]) "code\ncode"]
,[Plain [Str "no",Space,Str "code"]]] ,[Plain [Str "no",Space,Str "code"]]]]]
,Header 2 ("backslash-newline",[],[]) [Str "Backslash",Space,Str "newline"] ,Header 2 ("backslash-newline",[],[]) [Str "Backslash",Space,Str "newline"]
,Para [Str "hi",LineBreak,Str "there"] ,Para [Str "hi",LineBreak,Str "there"]
,Header 2 ("code-spans",[],[]) [Str "Code",Space,Str "spans"] ,Header 2 ("code-spans",[],[]) [Str "Code",Space,Str "spans"]