Commit graph

148 commits

Author SHA1 Message Date
d9f69014a0 Make a couple improvements in performance + add an example script to extract pages from a PDF 2020-05-28 18:54:15 +02:00
f6664683c7 Once again something that should never have been committed 2020-03-20 09:34:53 +01:00
c491e8a70c Forgot to remove deprecated source file 2020-03-19 12:53:35 +01:00
09bd706748 Export Content operators, needed to write filters like reveal 2020-03-19 10:27:29 +01:00
729e312f90 Actually, the spec calls 'catalog' what we call 'origin' — use 'catalog' for more clarity in regard to the spec 2020-03-19 10:27:29 +01:00
6d265633e4 Export Instructions constructor from PDF.Content, used by reveal 2020-03-19 10:27:29 +01:00
1eb1c23053 Found a nicer way to handle the too long IndirectObjCoordinates for Object Navigation 2020-03-19 10:27:29 +01:00
44125f75a6 The orphan instance for MonadState s m => MonadReader s m really can't be used, so replace it with a mere function that runs an operation on a ReaderT into the monad State, allowing to borrow operations on MonadReader in a MonadState context 2020-03-19 10:27:29 +01:00
c8a5e2b191 Wait, CachedFonts are indexed by Id Object so it could be an IdMap actually 2020-03-19 10:27:29 +01:00
11640c8465 Replace 'cacheFonts' by more versatile 'withFonts' inspired by 'withResources' that avoid having to declare an inline function to capture the 'layer' argument and pass it twice 2020-03-19 10:27:29 +01:00
e94a09b3ec Add a Traversable instance for IdMap, needed in reveal and useful in general to be able to use atAll 2020-03-19 10:27:29 +01:00
ba7dd6a690 Make cacheFonts slightly more useful by passing layer directly to it and run the ReaderT underneath 2020-03-19 10:27:29 +01:00
d21e14f9a4 Hey, zlib isn't needed anymore for getText since all decoding is done directly in the Box instance for Streams 2020-03-19 10:27:28 +01:00
a1c2fbf110 Add an alias to Id to lift type ambiguities like 'chunk' in PDF.Content.Text 2020-03-19 10:27:28 +01:00
24630a04a1 Implement 'w' for Pages Box instances 2020-03-19 10:27:28 +01:00
ee5e7500a8 Implement 'w' for Box m Chunks Content (Indexed Text) 2020-03-19 10:27:28 +01:00
d8aec5bf80 Add Box instance for IdMap a b, remove restriction on new keys in the Map instance since it's not really needed and could be better implemented like in OrderedMap by first using 'r' 2020-03-19 10:27:28 +01:00
25e2823c75 Generalize register to all IdMap a b, since it's gonna be needed by Indexed Text too 2020-03-19 10:27:28 +01:00
5027b079eb Include page numbers in chunks label, needed for long documents with many pages 2020-03-19 10:27:28 +01:00
5722dd1a04 Use IntMap for all Maps on Ids 2020-03-19 10:27:28 +01:00
f31e9eb38b Generalize Ids out of Content to handle Object Ids too 2020-03-19 10:27:21 +01:00
0f857c457d Use a defined monadic stack in Pages to lift the MonadReader ambiguity and allow finishing to reimplement getText demo 2020-03-14 16:57:16 +01:00
40475a3093 Clean unneeded stuff separating the monadic type constraint from the actual monad stack used, one more step towrds MonadFail -> MonadError 2020-03-14 16:55:34 +01:00
a9d3e5d326 Clean unused dependencies from Map + use a more defined Monad for the Box Chunks instance, hoping we will be able to clear the whole stack someday and stop requiring that RoContext type, unboxing and reboxing the FontSet for no good 2020-03-14 16:27:56 +01:00
f2a99e1fd2 Reorder module PDF.Body in alphabetical order 2020-03-14 16:25:26 +01:00
5bf2b08fa9 Try replacing general monadic type constraint by a definite monad stack 2020-03-11 22:35:19 +01:00
5b8d951516 WIP: Try about everything that's possible to try, OrderedMap or [(,)], try to decouple Box instance for Content and the one for Indexed Text, breaks getText… will probably require some advanced effect library, there seems to be a weird MonadReader conflict in the errors messages 2020-03-11 18:55:18 +01:00
d3f1b97f3a Replace the fake instance of Box for Content over Indexed Text with the true one using renderText 2020-03-11 18:53:41 +01:00
c4c3e35e09 Write said instance 2020-03-11 18:52:09 +01:00
10f8c711da Implement set and mapi on OrderedMap for convenience and to write a Box instance over OrderedMap like the one over Map 2020-03-11 18:51:49 +01:00
b6c1f670ef Generalize the search for FlateDecode (there can be several filters in an array) 2020-03-11 10:47:52 +01:00
3b1a5152e4 Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list 2020-03-10 22:57:11 +01:00
a04adff1d2 Prepare real instance of Box using renderText 2020-03-10 22:55:16 +01:00
103037ffb2 Fix mistake in arity of operator " 2020-03-10 22:53:27 +01:00
dce10ae63a Keep Page as only a reference object keeping the ObjectId explicit so we can modify the actual objects one day, write an OrderedMap data structure to help 2020-03-08 22:18:47 +01:00
f2986da96d Simplify Content abstracting over MonadParser for no reason and provide instead an parse that's in MonadFail to avoid having to handle Either outside 2020-03-08 22:16:23 +01:00
673321bf0a Implement encoder for good 2020-03-08 22:14:36 +01:00
0ade9cc2f5 Implement proper text formatting into PDF instructions using the new encode feature available in Fonts 2020-03-08 00:04:18 +01:00
457f1755e6 Prepare storing the reverse mapping for CMaps, divided by length to be able to implement encoding with a reasonable complexity 2020-03-08 00:02:24 +01:00
ca40d2df76 Don't use (!?) operator that doesn't exist before containers 0.5.9 for maximum compatibility 2020-03-08 00:00:24 +01:00
44bc898ed3 Generalize the Indexed type to handle both arbitrary Content instructions and text-related ones that can be viewed as text chunks 2020-03-06 19:21:16 +01:00
1ec47c5d07 Update Font type to cover both encoding and decoding — WIP for CMap, but complete though not tested yet for MacRoman encoding 2020-03-06 19:19:53 +01:00
6e245189fd Add a simple Box instance that exposes IndexedInstructions within a Content 2020-03-05 17:44:38 +01:00
90348c57d6 Disable text rendering and font loading from the Page abstraction, this code will have to be moved into a separate Box instance 2020-03-05 17:40:58 +01:00
50ac0692b2 Implement r for access by PageNumber and clean the mess a bit 2020-03-05 10:09:09 +01:00
2b9abc24b6 Add a separate instance for Raw streams that don't try to decode them 2020-03-04 18:31:30 +01:00
309f6ed461 Actually re-implement getText with the simpler Box instance 2020-03-04 18:19:10 +01:00
93c9863426 Remove accidentally commited trailing space on a line 2020-03-04 18:14:54 +01:00
7cef65d799 Fixed vicious bug introduced by 6096a1a237 (since follow is now automatic for references, it's not called explicitely but should in case of 'several' Content, which is an array of references, each of which should be expended) — TODO: add a unit test for that 2020-03-04 18:14:33 +01:00
d288ecf0ac Start reimplementing getAll as a Box instance and try to separate the various monad run steps 2020-03-03 18:17:44 +01:00