Commit Graph

30 Commits

Author SHA1 Message Date
d9f69014a0 Make a couple improvements in performance + add an example script to extract pages from a PDF 2020-05-28 18:54:15 +02:00
c8a5e2b191 Wait, CachedFonts are indexed by Id Object so it could be an IdMap actually 2020-03-19 10:27:29 +01:00
11640c8465 Replace 'cacheFonts' by more versatile 'withFonts' inspired by 'withResources' that avoid having to declare an inline function to capture the 'layer' argument and pass it twice 2020-03-19 10:27:29 +01:00
ba7dd6a690 Make cacheFonts slightly more useful by passing layer directly to it and run the ReaderT underneath 2020-03-19 10:27:29 +01:00
24630a04a1 Implement 'w' for Pages Box instances 2020-03-19 10:27:28 +01:00
f31e9eb38b Generalize Ids out of Content to handle Object Ids too 2020-03-19 10:27:21 +01:00
0f857c457d Use a defined monadic stack in Pages to lift the MonadReader ambiguity and allow finishing to reimplement getText demo 2020-03-14 16:57:16 +01:00
5b8d951516 WIP: Try about everything that's possible to try, OrderedMap or [(,)], try to decouple Box instance for Content and the one for Indexed Text, breaks getText… will probably require some advanced effect library, there seems to be a weird MonadReader conflict in the errors messages 2020-03-11 18:55:18 +01:00
3b1a5152e4 Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list 2020-03-10 22:57:11 +01:00
dce10ae63a Keep Page as only a reference object keeping the ObjectId explicit so we can modify the actual objects one day, write an OrderedMap data structure to help 2020-03-08 22:18:47 +01:00
90348c57d6 Disable text rendering and font loading from the Page abstraction, this code will have to be moved into a separate Box instance 2020-03-05 17:40:58 +01:00
50ac0692b2 Implement r for access by PageNumber and clean the mess a bit 2020-03-05 10:09:09 +01:00
2b9abc24b6 Add a separate instance for Raw streams that don't try to decode them 2020-03-04 18:31:30 +01:00
309f6ed461 Actually re-implement getText with the simpler Box instance 2020-03-04 18:19:10 +01:00
7cef65d799 Fixed vicious bug introduced by 6096a1a237 (since follow is now automatic for references, it's not called explicitely but should in case of 'several' Content, which is an array of references, each of which should be expended) — TODO: add a unit test for that 2020-03-04 18:14:33 +01:00
d288ecf0ac Start reimplementing getAll as a Box instance and try to separate the various monad run steps 2020-03-03 18:17:44 +01:00
9ce1a48030 Optimistically prepare the instance declaration for Pages that should replace get / getAll, not really getting out of the Monad 2020-02-28 18:15:40 +01:00
99014ff30d Recognize openStream was just an implementation of r for the Box m () Object ByteString, and extend it implementing the w operation while we're at it 2020-02-26 22:13:29 +01:00
bcf2e05bfb Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already) 2020-02-17 15:29:59 +01:00
6096a1a237 Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary 2020-02-15 13:51:24 +01:00
4a6dbda7d3 Move Error type from Pages to Navigation as a candidate for MonadFail required by PDFContent defined there 2020-02-15 10:22:42 +01:00
eb4d76002c Finish the split of Navigation out of Page, generalize the use of MonadFail with a custom Error monad (~= Either String) 2020-02-11 22:41:46 +01:00
af994cb50c WIP: in the process of migrating to Object.Navigation in Pages, still unsure how to manage simple Content parsing and efficient font loading (+ giving a way to edit Contents) 2020-02-11 17:59:15 +01:00
aed7af376a WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell 2020-02-11 08:29:08 +01:00
195446e653 Allow resources with no /Font field, they won't cause any problem as long as no call to Tf (to load a font) is made 2020-02-10 17:41:44 +01:00
9f1b1afafe Implement Text rendering from parsed Content 2020-02-10 10:54:44 +01:00
325250383a Add support for fonts and implement MacRomanEncoding 2020-02-08 08:15:32 +01:00
c48ab22808 Forgot some useless parentheses when playing with operator precedences 2020-02-04 17:05:15 +01:00
a2b66ac6d6 Generalize the getFont function because some /Resources have a direct dictionary as value for their /Font property 2020-02-04 17:04:42 +01:00
f9f799c59b Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1) 2019-11-29 11:51:35 +01:00