Docx reader: read table styles as custom styles...

...when `styles` extension is enabled.  Closes .

Also improve manual's coverage of custom styles.
This commit is contained in:
John MacFarlane 2025-01-10 10:28:57 -08:00
parent b108cef9dd
commit 6051d62e56
6 changed files with 53 additions and 18 deletions
MANUAL.txtpandoc.cabal
src/Text/Pandoc/Readers
test/command

View file

@ -3753,11 +3753,14 @@ output formats
### Extension: `styles` ### {#ext-styles}
When converting from docx, read all docx styles as divs (for
paragraph styles) and spans (for character styles) regardless
of whether pandoc understands the meaning of these styles.
This can be used with [docx custom styles](#custom-styles).
Disabled by default.
When converting from docx, add `custom-styles` attributes
for all docx styles, regardless of whether pandoc understands
the meanings of these styles. Because attributes cannot be
added directly to paragraphs or text in the pandoc AST,
paragraph styles will cause Divs to be created and character
styles will cause Spans to be created to hold the attributes.
(Table styles will be added to the Table elements directly.)
This extension can be used with [docx custom styles](#custom-styles).
input formats
: `docx`
@ -7354,11 +7357,11 @@ However, if you need to apply your own styles to blocks, or match a preexisting
set of styles, pandoc allows you to define custom styles for blocks and text
using `div`s and `span`s, respectively.
If you define a `div` or `span` with the attribute `custom-style`,
pandoc will apply your specified style to the contained elements (with
the exception of elements whose function depends on a style, like
headings, code blocks, block quotes, or links). So, for example, using
the `bracketed_spans` syntax,
If you define a Div, Span, or Table with the attribute
`custom-style`, pandoc will apply your specified style to the
contained elements (with the exception of elements whose function
depends on a style, like headings, code blocks, block quotes, or
links). So, for example, using the `bracketed_spans` syntax,
[Get out]{custom-style="Emphatically"}, he said.
@ -7400,9 +7403,10 @@ interpreting the derivation of the input document's styles.
By enabling the [`styles` extension](#ext-styles) in the docx reader
(`-f docx+styles`), you can produce output that maintains the styles
of the input document, using the `custom-style` class. Paragraph
styles are interpreted as divs, while character styles are interpreted
as spans.
of the input document, using the `custom-style` class. A `custom-style`
attribute will be added for each style. Divs will be created to
hold the paragraph styles, and Spans to hold the character styles.
Table styles will be applied directly to the Table.
For example, using the `custom-style-reference.docx` file in the test
directory, we have the following different outputs:

View file

@ -219,6 +219,7 @@ extra-source-files:
test/command/9391.docx
test/command/9358.docx
test/command/9002.docx
test/command/9603.docx
test/command/biblio.bib
test/command/averroes.bib
test/command/A.txt

View file

@ -807,9 +807,9 @@ bodyPartToBlocks (Captioned parstyle parparts bpart) = do
[Para im@[Image{}]]
-> pure $ singleton $ Figure nullAttr capt [Plain im]
_ -> pure captContents
bodyPartToBlocks (Tbl _ _ _ []) =
bodyPartToBlocks (Tbl _ _ _ _ []) =
return mempty
bodyPartToBlocks (Tbl cap grid look parts) = do
bodyPartToBlocks (Tbl mbsty cap grid look parts) = do
let fullCaption = if T.null cap then mempty else plain (text cap)
let shortCaption = if T.null cap then Nothing else Just (toList (text cap))
cap' = caption shortCaption fullCaption
@ -831,7 +831,11 @@ bodyPartToBlocks (Tbl cap grid look parts) = do
totalWidth = sum grid
widths = (\w -> ColWidth (fromInteger w / fromInteger totalWidth)) <$> grid
return $ table cap'
extStylesEnabled <- asks (isEnabled Ext_styles . docxOptions)
let attr = case mbsty of
Just sty | extStylesEnabled -> ("", [], [("custom-style", sty)])
_ -> nullAttr
return $ tableWith attr cap'
(zip alignments widths)
(TableHead nullAttr headerCells)
[TableBody nullAttr 0 [] bodyCells]

View file

@ -287,7 +287,7 @@ data BodyPart = Paragraph ParagraphStyle [ParPart]
| Heading Int ParaStyleName ParagraphStyle T.Text T.Text (Maybe Level)
[ParPart]
| ListItem ParagraphStyle T.Text T.Text (Maybe Level) [ParPart]
| Tbl T.Text TblGrid TblLook [Row]
| Tbl (Maybe T.Text) T.Text TblGrid TblLook [Row]
| Captioned ParagraphStyle [ParPart] BodyPart
| HRule
deriving Show
@ -855,6 +855,9 @@ elemToBodyPart ns element
description = fromMaybe "" $ tblProperties
>>= findChildByName ns "w" "tblDescription"
>>= findAttrByName ns "w" "val"
mbstyle = tblProperties
>>= findChildByName ns "w" "tblStyle"
>>= findAttrByName ns "w" "val"
grid' = case findChildByName ns "w" "tblGrid" element of
Just g -> elemToTblGrid ns g
Nothing -> return []
@ -867,7 +870,7 @@ elemToBodyPart ns element
grid <- grid'
tblLook <- tblLook'
rows <- mapD (elemToRow ns) (elChildren element)
return $ Tbl (caption <> description) grid tblLook rows
return $ Tbl mbstyle (caption <> description) grid tblLook rows
elemToBodyPart _ _ = throwError WrongElem
lookupRelationship :: DocumentLocation -> RelId -> [Relationship] -> Maybe Target

BIN
test/command/9603.docx Normal file

Binary file not shown.

23
test/command/9603.md Normal file
View file

@ -0,0 +1,23 @@
```
% pandoc command/9603.docx -t html -f docx+styles
^D
<p>A table with a contemporary style:</p>
<table data-custom-style="TableContemporary">
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Test</th>
<th>Column</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
</tr>
</tbody>
</table>
```