If you’ve never used a command line interface before, it is recommended that you first read or watch a quick beginner’s guide.
And obsolete pages can be easily cleared from memory when required.ĭocPub CLI and PageMaster CLI are easier-to-use, manual solutions.īoth DocPub and PageMaster can perform batch conversion, and each leverages the same advanced PDF conversion engine as the API, including components that can be integrated into any app. The remainder of the document will then progressively download and render as the user session continues. For example: if the user skips ahead to page 475 in a 1000-page document, the viewer can request resources for page 475 and surrounding pages, and these will download first. Remaining content chunks are then prioritized based on how the user navigates. If the viewer detects linearization, it will stop the download after receiving the hint tables and first page. This information is then served as sequential content “chunks” of PDF binary. These act as an inventory specifying the location of objects needed to render any given page, essentially enabling random online access to pages.Ī system that uses linearization usually converts documents to linearized PDF upon upload.Ī viewer designed to handle linearized content can then request linearized PDF content from the web server via a URL. A Linearization Dictionary and “Hint tables” are also added to the top of the document. In contrast, linearized PDFs are reorganized so that page resources are grouped together logically according to document page order (hence the term “linearization”). And with no quick method to identify and grab a given page’s resources, a conventional viewer will need to download the entire document before it can open. In the case of non-linearized PDFs, these objects, such as an embedded font, are often scattered across the file. Pages can reference other objects hanging from that tree by object number. Put simply, each PDF is an object tree, starting with a root node, and ascending from there.
Linearization works by changing a PDF file’s internal structure in a way that enables fast on-demand streaming of partial content. Linearization, introduced with PDF 1.2, has a 20+ page appendix dedicated to it in the core PDF reference.īut if you prefer a faster explanation, read on. How Linearization Works - Fast Random Access via On-demand Streaming of Pages This is critical when serving very large 1GB+ to mobile devices with limited or costly data plans, and beneficial even when serving smaller documents of 20MB+. Some viewers such as our PDFTron SDK can be configured to download only those pages viewed by the user.
It improves reliability where there is limited memory/storage, where it would be difficult to cache downloaded data locally (for example, when working in a browser and especially, in a mobile browser).A network interruption during a large document download, for example, might require that the user restart at the very least, it can significantly delay first page view. Linearization makes the viewing experience more resilient to network interruptions.And it provides several other advantages when working with remote, online documents: Linearization therefore delivers a much faster online experience overall.
Linearized vs non-linearized documents opening online on an Android device via a 4G network And while open time extends when a document has a very large and complex first page, most documents are shown to benefit from linearization so long as they have at least a few pages. We’ve found that linearization enables opening of large PDFs in 7 seconds on average when using a 4G connection. Any developer working with large, network-bound documents should consider using linearization.