summaryrefslogtreecommitdiff
path: root/Userland/Libraries/LibPDF
AgeCommit message (Collapse)Author
2023-01-29AK: Deprecate the old `AK::Stream`Tim Schumacher
This also removes a few cases where the respective header wasn't actually required to be included.
2023-01-27LibPDF: Remove declarations for non-existent methodsSam Atkins
2023-01-26LibGfx: Remove `try_` prefix from bitmap creation functionsTim Schumacher
Those don't have any non-try counterpart, so we might as well just omit it.
2023-01-25LibPDF: Load Type1C fonts when foundRodrigo Tobar
Now that our CFF parser is working we can load Type1C fonts in PDF, which are backed by a CFF stream.
2023-01-25LibPDF: Add initial CFF parsingRodrigo Tobar
The Compat Font Format specification (Adobe's Technical Note #5176) is used by PDF's Type1C fonts to store their data. While being similar in spirit to PS1 Type 1 Font Programs, it was designed for a more compact representation and thus space reduction (but an increment on complexity). It also shares most of the charstring encoding logic, which is why the CFF class also inherits from Type1FontProgram. This initial implementation is still lacking many details, e.g.: * It doesn't include all the built-in CFF SIDs * It doesn't support CFF-provided SIDs (defaults those glyphs to the space character) * More checks in general
2023-01-25LibPDF: Add name -> char code conversion in EncodingRodrigo Tobar
This is an operation that was already being done (sub-optimally) in PS1FontProgram, so we are replacing that. We will use this during CFF parsing too.
2023-01-25LibPDF: Add Reader::try_read for easier error propagationRodrigo Tobar
This will allow us to use TRY(reader.try_read) instead of having to verify the result of reader.remaining() before calling read.read().
2023-01-25LibPDF: Augment Type11FontProgram with Type2 capabilitiesRodrigo Tobar
The Type1FontProgram logic was based on the Adobe Type 1 Font Format; in particular, it implemented the CharStrings Dictionary section (charstring decoding, and most commands). In the case of Type1, these charstrings are read from a PS1 diciontary, with one entry per character in the font's charset. This has served us well for Type1 font rendering. When implementing Type1C font rendering, this wasn't enough. Type1C PDF fonts are specified in embedded CFF (Compact Font File) streams, which also contain a charstring dictionary with an entry for each character in the font's charset. These entries can be slightly different from those in a PS1 Font Program though: depending on a flag in the CFF, the entries will be encoded either in the original charstring format from the Adobe Type 1 Font Format, or in the "Type 2 Charstring Format" (Adobe's Technical Note #1577). This new format is for the most part a super-set of the original, with small differences, all in the name of making the representation as compact as possible: * The glyph's width is not specified via a separate command; instead it's an optional additional argument to the first command of the charstring stream (and even then, it's only the *difference* to a nominal character width specified in the CFF). * The interpretation of a 4-byte number is different from Type 1: in Type 1 this is a 4-byte unsigned integer, whereas in Type 1 it's a fixed decimal with 16 bits of fractional part. * Many commands accept a variable set of arguments, so they can draw more than one line/curve on a single go. These are all retro-compatible with Type 1's commands. All these changes are implemented in this patch in a backwards-compatible way. To ensure Type 1/2 behavior is accessed, a new parameter indicates which behavior is desired when decoding the charstring stream. I also took the chance to centralise some logic that was previously duplicated across the parse_glyph function. Common lambdas capture the logic for moving to, or drawing a line/curve to a given point and updating the glyph state. Similarly, some command logic, including reading parameters, are shared by several commands. Finally, I've re-organised the cases in the main switch to group together related commands.
2023-01-25LibPDF: Remove unused memberRodrigo Tobar
2023-01-25LibPDF: Add new Type1FontProgram base classRodrigo Tobar
We are planning to add support for CFF fonts to read Type1 fonts, and therefore much of the logic already found in PS1FontProgram will be useful for representing the Type1 fonts read from CFF. This commit moves the PS1-independent bits of PS1FontProgram into a new Type1FontProgram base class that can be used as the base for CFF-based Type1 fonts in the future. The Type1Font class uses this new type now instead of storing a PS1FontProgram pointer. While doing this refactoring I also took care of making some minor adjustments to the PS1FontProgram API, namely: * Its create() method is static and returns a NonnullRefPtr<Type1FontProgram>. * Many (all?) of the parse_* methods are now static. * Added const where possible. Notably, the Type1FontProgram also contains at the moment the code that parses the CharString data from the PS1 program. This logic is very similar in CFF files, so after some minor adjustments later on it should be possible to reuse most of it.
2023-01-25LibPDF: Avoid reading fields from moved-from Data objectRodrigo Tobar
This might not be an issue at the moment, but moved-from objects are usually in a unspecifed but valid state, meaning that we shouldn't read from them.
2023-01-25LibPDF: Record base font name read from documentRodrigo Tobar
This will be useful for debugging, or if we later on want to show all the fonts found in the document in an organised manner.
2023-01-21LibPDF: Use `Core::Stream` to parse the page offset hint tableTim Schumacher
2023-01-20LibGfx: Re-structure the whole initialization pattern for image decodersLiav A
When trying to figure out the correct implementation, we now have a very strong distinction on plugins that are well suited for sniffing, and plugins that need a MIME type to be chosen. Instead of having multiple calls to non-static virtual sniff methods for each Image decoding plugin, we have 2 static methods for each implementation: 1. The sniff method, which in contrast to the old method, gets a ReadonlyBytes parameter and ensures we can figure out the result with zero heap allocations for most implementations. 2. The create method, which just creates a new instance so we don't expose the constructor to everyone anymore. In addition to that, we have a new virtual method called initialize, which has a per-implementation initialization pattern to actually ensure each implementation can construct a decoder object, and then have a correct context being applied to it for the actual decoding.
2023-01-09AK+Everywhere: Rename FlyString to DeprecatedFlyStringTimothy Flynn
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.
2023-01-09LibPDF: Propagate errors in PDFFont::create()Julian Offenhäuser
2023-01-09LibPDF: Make glyphs from standard 14 fonts show up in Type1FontJulian Offenhäuser
Previously, we would assume that all standard 14 fonts use a TrueTypeFont dictionary. Now we render them in Type1Font as well, given that it doesn't contain a PostScript font program.
2023-01-09LibPDF: Allow numbers to start with whitespaceJulian Offenhäuser
2023-01-06LibPDF: Load destinations from Catalogue -> Names -> Dests name treeRodrigo Tobar
PDF allows for named destinations to be provided as string. These can be either found in the Dests dictionary in the document catalogue (as already implemented), or in the Name Tree specified by the Dests key in the Names dictionary of the document catalogue (missing). This commit adds this missing case. Once the named destination is found in the name tree, its value is interpreted just like in the first case, so a new utility method encapsulates the common behavior.
2023-01-06LibPDF: Implement name tree lookupsRodrigo Tobar
Name Trees are hierarchical, string-keyed, sorted-by-key dictionary structures in PDF where each node (except the root) specifies the bounds of the values it holds, and either its kids (more nodes) or the key/value pairs it contains. This commit implements a series of lookup calls for finding a key in such name trees. This implementation follows the tree as needed on each lookup, but if that becomes inefficient in the long run we can switch to creating a HashMap with all the contents, which as a drawback will require more memory.
2023-01-06LibPDF: Add more utility methods to {Dict,Array}ObjectRodrigo Tobar
Being both of them containers, these classes already offered a set of methods to retrieve an inner element by key or index, respectively, with different methods for the different subtypes of the PDF::Object type returning the element cast to the correct type pointer. On top of that, DictObject offered an additional method to obtain an element as an Object pointer. While these methods were useful, they have some shortcomings: * They always take a Document pointer to first perform an object resolution, in case the element is a Reference. This is not always necessary though, as there are values that are always meant to be immediate, and hence the resolution lookup adds overhead. * There was no easy way to get an individual Object element from an ArrayObject like there is in DictObject. This makes it difficult to obtain such values, as one first needs to call dict.get() to get a Value, then cast it manually to a NonnullRefPtr<Object>. This commit fixes these two issues by: * Adding a new method that returns an Object for a given index. * Adding overloads for this new method, and all the existing methods described above, that do *not* take a Document, and therefore do *not* perform an object resolution lookup.
2023-01-06LibPDF: Move casting code to its own cast_to functionRodrigo Tobar
This functionality was previously part of the resolve_to() Document method, and thus only available only when resolving objects through the Document class. There are many use cases where this casting can be used, but no resolution is needed. This commit moves this functionality into a new cast_to function, and makes the resolve_to function call it internally. With this new function in place we can now offer new versions of DictObject::get_* and ArrayObject::get_*_at that don't perform Document resolution unnecessarily when not required.
2023-01-06LibPDF: Support null destination parametersRodrigo Tobar
Destination arrays contain a page number, a mode name, and parameters specific to that mode. In many cases these parameters can be set to "null", which our code wasn't taking into consideration. This commit parses these parameters taking into account whether they are null or actual numbers, and stores them as Optional<float> instead of plain floats. The parameters are not yet used anywhere else other than when formatting a Destination object, so the change is fairly small.
2023-01-06LibPDF: Fix Destination formattingRodrigo Tobar
This was not correctly written, and thus printed confusing output.
2023-01-05LibGfx+LibPDF: Apply subpixel offset in affine transformationMacDue
2023-01-05LibPDF: Use subpixel accurate text renderingMacDue
This just enables the new tricks from LibGfx with the same nice improvements :^)
2023-01-04LibPDF: Fix calculation of encryption keySimon Danner
Before this patch, the generation of the encryption key was not working correctly since the lifetime of the underlying data was too short, same inputs would give random encryption keys. Fixes #16668
2023-01-02Everywhere: Remove unused includes of AK/StdLibExtras.hBen Wiederhake
These instances were detected by searching for files that include AK/StdLibExtras.h, but don't match the regex: \\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP tr|round_up_to_power_of_two|swap|to_underlying)\\b (Without the linebreaks.) This regex is pessimistic, so there might be more files that don't actually use any "extra stdlib" functions. In theory, one might use LibCPP to detect things like this automatically, but let's do this one step after another.
2023-01-02Everywhere: Fix badly-formatted includesBen Wiederhake
In 7c5e30daaa615ad3a2ef55222423a747ac0a1227, the focus was "only" on Userland/Libraries/, whereas this commit cleans up the remaining headers in the repo, and any new badly-formatted include.
2022-12-21LibGfx: Rename TTF/TrueType to OpenTypeAndreas Kling
OpenType is the backwards-compatible successor to TrueType, and the format we're actually parsing in LibGfx. So let's call it that.
2022-12-20LibPDF: Reset encryption key on failed user password attemptRodrigo Tobar
When an attempt is made to provide the user password to a SecurityHandler a user gets back a boolean result indicating success or failure on the attempt. However, the SecurityHandler is left in a state where it thinks it has a user password, regardless of the outcome of the attempt. This confuses the rest of the system, which continues as if the provided password is correct, resulting in garbled content. This commit fixes the situation by resetting the internal fields holding the encryption key (which is used to determine whether a user password has been successfully provided) in case of a failed attempt.
2022-12-20LibPDF: Treat Encyption's Length item as optionalRodrigo Tobar
With the StandardSecurityHandler the Length item in the Encryption dictionary is optional, and needs to be given only if the encryption algorithm (V) is other than 1; otherwise we can assume a length of 40 bits for the encryption key.
2022-12-17LibPDF: Store page number, not Value, in OutlineItemRodrigo Tobar
The Value previously stored corresponded to a Reference to a Page object in the PDF document. This isn't useful information, since what we want to display at the end of the day is the page an outline item refers to. This commit changes the page member on OutlineItem to be a Optional<u32> (some destinations don't necessarily refer to a Page), which we resolve while building OutlineItems.
2022-12-17LibPDF: Keep track of OutlineItem parentsRodrigo Tobar
While OutlineItem had a parent field, it was never populated nor used. This commit populates it when possible (no parent means the OutlineItem is a top-level item).
2022-12-16LibPDF: Don't abort on unsupported drawing operationsRodrigo Tobar
Instead of calling TODO(), which will abort the program, we now return an Error specifying that we haven't implemented the drawing operation yet. This will now nicely trickle up all the way through to the PDFViewer, which will then notify its clients about the problem.
2022-12-16LibPDF: Switch to best-effort PDF renderingRodrigo Tobar
The current rendering routine aborts as soon as an error is found during rendering, which potentially severely limits the contents we show on screen. Moreover, whenever an error happens the PDFViewer widget shows an error dialog, and doesn't display the bitmap that has been painted so far. This commit improves the situation in both fronts, implementing rendering now with a best-effort approach. Firstly, execution of operations isn't halted after an operand results in an error, but instead execution of all operations is always attempted, and all collected errors are returned in bulk. Secondly, PDFViewer now always displays the resulting bitmap, regardless of error being produced or not. To communicate errors, an on_render_errors callback has been added so clients can subscribe to these events and handle them as appropriate.
2022-12-16LibPDF: Add Errors class that accumulate multiple errorsRodrigo Tobar
This will be used to perform a best-effort rendering, where an error in rendering won't abort the whole rendering operation, but instead will be stored for later reference while rendering continues.
2022-12-16LibPDF: Add support for multi-line commentsRodrigo Tobar
The code parsing comments parsed only a single line of comments, but callers assumed they parsed all comments that appeared contiguously in a block. The latter is an easier to understand API, so this commit changes the parse_comment function to parse entire blocks of comments instead of single lines.
2022-12-16LibPDF: Follow a FontFile's Length valuesRodrigo Tobar
These can be references (at least from what I've found in some documents), so we want to resolve them before using them.
2022-12-16LibPDF: Simplify outline constructionRodrigo Tobar
While the Outline Items making up the document's Outline have all sorts of cross-references (parent, first/last chlid, next/previous sibling, etc), not all documents out there have fully-consistent references. Our implementation already discarded some of that information too (e.g., /Parent and /Prev were never read), and trusted that /First and /Next were good enough to traverse the whole hierarchy. Where the current implementation failed was in assuming that /Last was also a good source of information. There are documents out there were /Last also points to dead ends, and were therefore causing a crash when we verified that the last child found on a chain was the /Last child declared by the parent. To fix this I'm simply removing the check, and simplifying the function call to remove any references to /Last. This way we affirm our commitment to /First and /Next as the main sources of information.
2022-12-16LibPDF: Ignore seac PS1 commands for nowRodrigo Tobar
This command is meant to print an Standard Encoding Accented Character. It's not critical to implement it yet, but if we want to render more documents we need to handle the instruction, even if simply ignore it.
2022-12-14Everywhere: Stop shoving things into ::std and mentioning them as suchAli Mohammad Pur
Note that this still keeps the old behaviour of putting things in std by default on serenity so the tools can be happy, but if USING_AK_GLOBALLY is unset, AK behaves like a good citizen and doesn't try to put things in the ::std namespace. std::nothrow_t and its friends get to stay because I'm being told that compilers assume things about them and I can't yeet them into a different namespace...for now.
2022-12-10LibPDF: Add initial image display supportRodrigo Tobar
After adding support for XObject Form rendering, the next was to display XObject images. This commit adds this initial support, Images come in many shapes and forms: encodings: color spaces, bits per component, width, height, etc. This initial support is constrained to the color spaces we currently support, to images that use 8 bits per component, to images that do *not* use the JPXDecode filter, and that are not Masks. There are surely other constraints that aren't considered in this initial support, so expect breakage here and there. In addition to supporting images, we also support applying an alpha mask (SMask) on them. Additionally, a new rendering preference allows to skip image loading and rendering altogether, instead showing an empty rectangle as a placeholder (useful for when actual images are not supported). Since RenderingPreferences is becoming a bit more complex, we add a hash option that will allow us to keep track of different preferences (e.g., in a HashMap).
2022-12-10LibPDF: Add first interpolation methodsRodrigo Tobar
Interpolation is needed in more than one place, and I couldn't find a central place where I could borrow a readily available interpolation routine, so I've implemented the first simple interpolation object. More will follow for more complex scenarios.
2022-12-10LibPDF: Add facility to obtain Vector<float> from ArrayObjectRodrigo Tobar
Arrays of float numbers are common in many PDF objects, and thus to avoid code repetition I'm introducing a new method to ArrayObject that will return exactly that.
2022-12-10LibPDF: Add new Error::Type for unsupported rendering featuresRodrigo Tobar
2022-12-10LibPDF: Add more knowledge to ColorSpaces classesRodrigo Tobar
ColorSpaces now can tell users how many components they expect, and the default decode array that should be used when converting unit bit sequences into color space component input values during image rendering.
2022-12-10LibPDF: Refactor parsing of ColorSpacesRodrigo Tobar
ColorSpaces can be specified in two ways: with a stream as operands of the color space operations (CS/cs), or as a separate PDF object, which is then referred to by other means (e.g., from Image XObjects and other entities). These two modes of addressing a ColorSpace are slightly different and need to be addressed separately. However, the current implementation embedded the full logic of the first case in the routine that created ColorSpace objects. This commit refactors the creation of ColorSpace to support both cases. First, a new ColorSpaceFamily class encapsulates the static aspects of a family, like its name or whether color space construction never requires parameters. Then we define the supported ColorSpaceFamily objects. On top of this also sit a breakage on how ColorSpaces are created. Two methods are now offered: one only providing construction of no-argument color spaces (and thus taking a simple name), and another taking an ArrayObject, hence used to create ColorSpaces requiring arguments. Finally, on top of *that* two ways to get a color space in the Renderer are made available: the first creates a ColorSpace with a name and a Resources dictionary, and another takes an Object. These model the two addressing modes described above.
2022-12-10LibPDF: Return results directly and avoid unpacking+packingRodrigo Tobar
2022-12-08LibPDF: Add missing character quirk for WinAnsiEncoding fontsAndreas Kling
Fonts with the encoding name "WinAnsiEncoding" should render missing characters above character code 040 (octal) as a "bullet" character. This patch adds Encoding::should_map_to_bullet(char_code) which is then called by char_code_to_code_point() to check if the given char code should be displayed as a bullet instead. I didn't have a good way to test this, so I've only verified that it works by manually overriding inputs to the function during the rendering stage. This takes care of a FIXME in the Annex D part of the PDF specification.