EPUB writer: Don't add sourceURL to absolute URIs!

Closes #1669.

If there are further issues, please open a new, targeted issue on the
tracker.  Some notes on the further issues you gestured at:

Data URIs are indeed dereferenced, but why is this a problem?
(The function being used to fetch from URLs is used for many different
formats.  Preserving data URIs would make sense in EPUBs, but not
for e.g. PDF output.  And by dereferencing we can get a smaller,
more efficient EPUB, with the data stored as bytes in a file rather
than encoded in textual representation.)

"absolute uris are not recognized" -- I assume that is the problem
just fixed.  If not, please open a new issue.

"relative uris are resolved (wrongly) like file paths" -- can you
give an example?

`<base>` tag is ignored.  Yes. I didn't know about the base tag.  Could
you open a new issue just for this?
This commit is contained in:
John MacFarlane 2014-10-08 11:52:47 -07:00
parent ccd04add67
commit d60707eed0

View file

@ -64,7 +64,7 @@ import Text.XML.Light ( unode, Element(..), unqual, Attr(..), add_attrs
import Text.Pandoc.UUID (getRandomUUID) import Text.Pandoc.UUID (getRandomUUID)
import Text.Pandoc.Writers.HTML (writeHtmlString, writeHtml) import Text.Pandoc.Writers.HTML (writeHtmlString, writeHtml)
import Data.Char ( toLower, isDigit, isAlphaNum ) import Data.Char ( toLower, isDigit, isAlphaNum )
import Network.URI ( unEscapeString ) import Network.URI ( unEscapeString, isURI )
import Text.Pandoc.MIME (MimeType, getMimeType) import Text.Pandoc.MIME (MimeType, getMimeType)
import qualified Control.Exception as E import qualified Control.Exception as E
import Text.Blaze.Html.Renderer.Utf8 (renderHtml) import Text.Blaze.Html.Renderer.Utf8 (renderHtml)
@ -773,8 +773,12 @@ transformTag opts mediaRef tag@(TagOpen name attr)
| name `elem` ["video", "source", "img", "audio"] = do | name `elem` ["video", "source", "img", "audio"] = do
let src = fromAttrib "src" tag let src = fromAttrib "src" tag
let poster = fromAttrib "poster" tag let poster = fromAttrib "poster" tag
let oldsrc = maybe src (</> src) $ writerSourceURL opts let oldsrc = case writerSourceURL opts of
let oldposter = maybe poster (</> poster) $ writerSourceURL opts Just u | not (isURI src) -> u </> src
_ -> src
let oldposter = case writerSourceURL opts of
Just u | not (isURI src) -> u </> poster
_ -> poster
newsrc <- modifyMediaRef mediaRef oldsrc newsrc <- modifyMediaRef mediaRef oldsrc
newposter <- modifyMediaRef mediaRef oldposter newposter <- modifyMediaRef mediaRef oldposter
let attr' = filter (\(x,_) -> x /= "src" && x /= "poster") attr ++ let attr' = filter (\(x,_) -> x /= "src" && x /= "poster") attr ++
@ -811,8 +815,9 @@ transformInline :: WriterOptions
-> Inline -> Inline
-> IO Inline -> IO Inline
transformInline opts mediaRef (Image lab (src,tit)) = do transformInline opts mediaRef (Image lab (src,tit)) = do
let src' = unEscapeString src let oldsrc = case (unEscapeString src, writerSourceURL opts) of
let oldsrc = maybe src' (</> src) $ writerSourceURL opts (s, Just u) | not (isURI s) -> u </> s
(s, _) -> s
newsrc <- modifyMediaRef mediaRef oldsrc newsrc <- modifyMediaRef mediaRef oldsrc
return $ Image lab (newsrc, tit) return $ Image lab (newsrc, tit)
transformInline opts _ (x@(Math _ _)) transformInline opts _ (x@(Math _ _))