Commit Graph

118 Commits

Author SHA1 Message Date
b6c1f670ef Generalize the search for FlateDecode (there can be several filters in an array) 2020-03-11 10:47:52 +01:00
3b1a5152e4 Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list 2020-03-10 22:57:11 +01:00
a04adff1d2 Prepare real instance of Box using renderText 2020-03-10 22:55:16 +01:00
103037ffb2 Fix mistake in arity of operator " 2020-03-10 22:53:27 +01:00
dce10ae63a Keep Page as only a reference object keeping the ObjectId explicit so we can modify the actual objects one day, write an OrderedMap data structure to help 2020-03-08 22:18:47 +01:00
f2986da96d Simplify Content abstracting over MonadParser for no reason and provide instead an parse that's in MonadFail to avoid having to handle Either outside 2020-03-08 22:16:23 +01:00
673321bf0a Implement encoder for good 2020-03-08 22:14:36 +01:00
0ade9cc2f5 Implement proper text formatting into PDF instructions using the new encode feature available in Fonts 2020-03-08 00:04:18 +01:00
457f1755e6 Prepare storing the reverse mapping for CMaps, divided by length to be able to implement encoding with a reasonable complexity 2020-03-08 00:02:24 +01:00
ca40d2df76 Don't use (!?) operator that doesn't exist before containers 0.5.9 for maximum compatibility 2020-03-08 00:00:24 +01:00
44bc898ed3 Generalize the Indexed type to handle both arbitrary Content instructions and text-related ones that can be viewed as text chunks 2020-03-06 19:21:16 +01:00
1ec47c5d07 Update Font type to cover both encoding and decoding — WIP for CMap, but complete though not tested yet for MacRoman encoding 2020-03-06 19:19:53 +01:00
6e245189fd Add a simple Box instance that exposes IndexedInstructions within a Content 2020-03-05 17:44:38 +01:00
90348c57d6 Disable text rendering and font loading from the Page abstraction, this code will have to be moved into a separate Box instance 2020-03-05 17:40:58 +01:00
50ac0692b2 Implement r for access by PageNumber and clean the mess a bit 2020-03-05 10:09:09 +01:00
2b9abc24b6 Add a separate instance for Raw streams that don't try to decode them 2020-03-04 18:31:30 +01:00
309f6ed461 Actually re-implement getText with the simpler Box instance 2020-03-04 18:19:10 +01:00
93c9863426 Remove accidentally commited trailing space on a line 2020-03-04 18:14:54 +01:00
7cef65d799 Fixed vicious bug introduced by 6096a1a237 (since follow is now automatic for references, it's not called explicitely but should in case of 'several' Content, which is an array of references, each of which should be expended) — TODO: add a unit test for that 2020-03-04 18:14:33 +01:00
d288ecf0ac Start reimplementing getAll as a Box instance and try to separate the various monad run steps 2020-03-03 18:17:44 +01:00
3b3eeef218 Maybe we need a MonadState s m => MonadReader s m instance some day ? 2020-03-03 18:16:49 +01:00
2c02e44adf Export the PDFContent monadic type used in PDF.Pages 2020-03-03 18:16:12 +01:00
9ce1a48030 Optimistically prepare the instance declaration for Pages that should replace get / getAll, not really getting out of the Monad 2020-02-28 18:15:40 +01:00
4969c6442e Simple String aliasing to prepare the day when we'll be able to have more complex Component than just PDF Names (and access elements in an array) 2020-02-28 18:14:27 +01:00
cb257fc07e Rename function for clarity : actually it's doing just what w StreamContent does, but without checking the headers to re-zlib-encode the stream content 2020-02-27 17:30:42 +01:00
d90eaf6f1c Add Box instances to allow handling some exceptions in monad and converting them to Traversable accessible from the data part of the type 2020-02-27 17:22:12 +01:00
99014ff30d Recognize openStream was just an implementation of r for the Box m () Object ByteString, and extend it implementing the w operation while we're at it 2020-02-26 22:13:29 +01:00
f4df4aab22 Found a nicer formulation that doesn't require transitivity or index agglomeration and swapped argument of w for more reusability with at / atAll 2020-02-26 17:19:22 +01:00
bdbc5f7351 Generalize the Box instances on containers from the particular cases of Document/Layers and Layers/Objects and move them to PDF.Box 2020-02-25 17:36:54 +01:00
30fece6537 Notice the 'edit' I exported earlier could be reused to simplify the w implementation of the proof that Box is a transitive relation 2020-02-25 09:27:56 +01:00
1a70f2972b Expose Box index flags in PDF and PDF.Layer 2020-02-24 21:37:09 +01:00
67faa06ea2 Lift unused restriction on MonadFail for AllObjects instance of Box Layer 2020-02-24 21:36:31 +01:00
83a63d4b02 Implement Box instance from Layer to Object, either all at once or indexed by an ObjectId 2020-02-24 17:29:22 +01:00
85ee8519c4 Implement Box instances from Document to Layers and EOLStyle 2020-02-24 17:28:17 +01:00
e607f9cd37 Implement transitivity instance, extract a part of modifyAt as a convenient 'edit' function useful elsewhere and present a right-infix version of (,) to allow writing the nested tuple indexes more conveniently 2020-02-24 17:27:37 +01:00
a9252b129a Start a Box module to describe inclusion relations between different types and get a MonadState action on the top type for any modification down there 2020-02-23 22:24:59 +01:00
71e62ee732 Add IDs to Instructions so that they can be selected in a given Content (and modified one day…) 2020-02-23 22:21:09 +01:00
160999a7d7 A small renaming for more clarity and because I thought «update» could be needed for a function name but after all maybe not but it's still better that way 2020-02-23 22:17:09 +01:00
36b1782464 Follow previous renaming for a local variable in Navigation for more clarity 2020-02-23 22:15:52 +01:00
bcf2e05bfb Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already) 2020-02-17 15:29:59 +01:00
6096a1a237 Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary 2020-02-15 13:51:24 +01:00
23186100a8 Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access 2020-02-15 10:25:09 +01:00
b916ab5206 Just noticed Streams are a kind of Dictionary too, since they have a header 2020-02-15 10:23:32 +01:00
4a6dbda7d3 Move Error type from Pages to Navigation as a candidate for MonadFail required by PDFContent defined there 2020-02-15 10:22:42 +01:00
923d1800b0 Gain a bit of speed by using native Attoparsec for number types instead of reimplementing them with ByteString conversion and call to read 2020-02-14 18:02:40 +01:00
1c457d71d8 Fix the reading of Hexadecimal string objects detected by running the tests implemented from the spec 2020-02-14 18:00:12 +01:00
a72d76e229 Add unit tests to make sure I'm not breaking things too much 2020-02-14 17:58:03 +01:00
919f640443 Merge branch 'extract-text' into navigation 2020-02-12 17:35:56 +01:00
ae938acc02 Merge branch 'main' into extract-text 2020-02-12 17:34:56 +01:00
32f9866106 Use peek to improve directObject parser avoiding a large <|> disjunction 2020-02-12 17:34:27 +01:00