Commit Graph

149 Commits

Author SHA1 Message Date
Tissevert d288ecf0ac Start reimplementing getAll as a Box instance and try to separate the various monad run steps 2020-03-03 18:17:44 +01:00
Tissevert 3b3eeef218 Maybe we need a MonadState s m => MonadReader s m instance some day ? 2020-03-03 18:16:49 +01:00
Tissevert 2c02e44adf Export the PDFContent monadic type used in PDF.Pages 2020-03-03 18:16:12 +01:00
Tissevert 9ce1a48030 Optimistically prepare the instance declaration for Pages that should replace get / getAll, not really getting out of the Monad 2020-02-28 18:15:40 +01:00
Tissevert 4969c6442e Simple String aliasing to prepare the day when we'll be able to have more complex Component than just PDF Names (and access elements in an array) 2020-02-28 18:14:27 +01:00
Tissevert cb257fc07e Rename function for clarity : actually it's doing just what w StreamContent does, but without checking the headers to re-zlib-encode the stream content 2020-02-27 17:30:42 +01:00
Tissevert d90eaf6f1c Add Box instances to allow handling some exceptions in monad and converting them to Traversable accessible from the data part of the type 2020-02-27 17:22:12 +01:00
Tissevert 99014ff30d Recognize openStream was just an implementation of r for the Box m () Object ByteString, and extend it implementing the w operation while we're at it 2020-02-26 22:13:29 +01:00
Tissevert f4df4aab22 Found a nicer formulation that doesn't require transitivity or index agglomeration and swapped argument of w for more reusability with at / atAll 2020-02-26 17:19:22 +01:00
Tissevert bdbc5f7351 Generalize the Box instances on containers from the particular cases of Document/Layers and Layers/Objects and move them to PDF.Box 2020-02-25 17:36:54 +01:00
Tissevert 30fece6537 Notice the 'edit' I exported earlier could be reused to simplify the w implementation of the proof that Box is a transitive relation 2020-02-25 09:27:56 +01:00
Tissevert 1a70f2972b Expose Box index flags in PDF and PDF.Layer 2020-02-24 21:37:09 +01:00
Tissevert 67faa06ea2 Lift unused restriction on MonadFail for AllObjects instance of Box Layer 2020-02-24 21:36:31 +01:00
Tissevert 83a63d4b02 Implement Box instance from Layer to Object, either all at once or indexed by an ObjectId 2020-02-24 17:29:22 +01:00
Tissevert 85ee8519c4 Implement Box instances from Document to Layers and EOLStyle 2020-02-24 17:28:17 +01:00
Tissevert e607f9cd37 Implement transitivity instance, extract a part of modifyAt as a convenient 'edit' function useful elsewhere and present a right-infix version of (,) to allow writing the nested tuple indexes more conveniently 2020-02-24 17:27:37 +01:00
Tissevert a9252b129a Start a Box module to describe inclusion relations between different types and get a MonadState action on the top type for any modification down there 2020-02-23 22:24:59 +01:00
Tissevert 71e62ee732 Add IDs to Instructions so that they can be selected in a given Content (and modified one day…) 2020-02-23 22:21:09 +01:00
Tissevert 160999a7d7 A small renaming for more clarity and because I thought «update» could be needed for a function name but after all maybe not but it's still better that way 2020-02-23 22:17:09 +01:00
Tissevert 36b1782464 Follow previous renaming for a local variable in Navigation for more clarity 2020-02-23 22:15:52 +01:00
Tissevert bcf2e05bfb Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already) 2020-02-17 15:29:59 +01:00
Tissevert 6096a1a237 Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary 2020-02-15 13:51:24 +01:00
Tissevert 23186100a8 Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access 2020-02-15 10:25:09 +01:00
Tissevert b916ab5206 Just noticed Streams are a kind of Dictionary too, since they have a header 2020-02-15 10:23:32 +01:00
Tissevert 4a6dbda7d3 Move Error type from Pages to Navigation as a candidate for MonadFail required by PDFContent defined there 2020-02-15 10:22:42 +01:00
Tissevert 923d1800b0 Gain a bit of speed by using native Attoparsec for number types instead of reimplementing them with ByteString conversion and call to read 2020-02-14 18:02:40 +01:00
Tissevert 1c457d71d8 Fix the reading of Hexadecimal string objects detected by running the tests implemented from the spec 2020-02-14 18:00:12 +01:00
Tissevert a72d76e229 Add unit tests to make sure I'm not breaking things too much 2020-02-14 17:58:03 +01:00
Tissevert 919f640443 Merge branch 'extract-text' into navigation 2020-02-12 17:35:56 +01:00
Tissevert ae938acc02 Merge branch 'main' into extract-text 2020-02-12 17:34:56 +01:00
Tissevert 32f9866106 Use peek to improve directObject parser avoiding a large <|> disjunction 2020-02-12 17:34:27 +01:00
Tissevert eb4d76002c Finish the split of Navigation out of Page, generalize the use of MonadFail with a custom Error monad (~= Either String) 2020-02-11 22:41:46 +01:00
Tissevert af994cb50c WIP: in the process of migrating to Object.Navigation in Pages, still unsure how to manage simple Content parsing and efficient font loading (+ giving a way to edit Contents) 2020-02-11 17:59:15 +01:00
Tissevert 704d7a7fcf It turns out Output.concat wasn't necessary, OBuilder seems already is a Monoid so mconcat works (that fact was used in the very implementation of concat…) 2020-02-11 17:36:29 +01:00
Tissevert 11647eb4eb Implement output for Content streams 2020-02-11 17:26:47 +01:00
Tissevert aed7af376a WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell 2020-02-11 08:29:08 +01:00
Tissevert e77bbbcda9 WIP: start moving some navigation-related routines from Pages into Object directly and generalize them to multi-component to allow easier browsing 2020-02-10 17:43:04 +01:00
Tissevert 195446e653 Allow resources with no /Font field, they won't cause any problem as long as no call to Tf (to load a font) is made 2020-02-10 17:41:44 +01:00
Tissevert 9f1b1afafe Implement Text rendering from parsed Content 2020-02-10 10:54:44 +01:00
Tissevert 20466c4f13 WIP: Clean code parsing «pages» (now Content), separated from text rendering (will be reimplemented as an upper layer, also providing modification as stream filters) — Page is also forgotten for now, will need a big improvement in Object navigation 2020-02-09 22:42:57 +01:00
Tissevert 325250383a Add support for fonts and implement MacRomanEncoding 2020-02-08 08:15:32 +01:00
Tissevert c48ab22808 Forgot some useless parentheses when playing with operator precedences 2020-02-04 17:05:15 +01:00
Tissevert a2b66ac6d6 Generalize the getFont function because some /Resources have a direct dictionary as value for their /Font property 2020-02-04 17:04:42 +01:00
Tissevert cefb08ee50 Going a step further in «optimization» (slowing it even more…) by replacing choice by a search in a Map 2019-11-30 21:46:22 +01:00
Tissevert afbbcbffc5 Finish implementing the new stack-based call parser 2019-11-30 12:39:40 +01:00
Tissevert 8373bd1ea0 Removing +x permission on getText source that shouldn't ever have been set 2019-11-29 19:07:54 +01:00
Tissevert bac08446dd WIP: starting to fix this criminally inefficient parser for PDF's postfix-operator instructions 2019-11-29 17:42:57 +01:00
Tissevert 7eca875900 Improve getObj example to catch no-existing ObjectId and default to listing existing ObjectIds when none is provided 2019-11-29 11:53:08 +01:00
Tissevert f9f799c59b Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1) 2019-11-29 11:51:35 +01:00
Tissevert 08a9717b3a Get rid of wrapper PageContents structure returned by PageContent in the PDF.Text module (and return directly [ByteString] instead) 2019-11-29 11:48:28 +01:00