
While pdfrw does let you get the Info object, it displays it in a less friendly way. If you have using PyPDF2 in the past, then you may recall that PyPDF2 let's you extract an document information object that you can use to pull out information like author, title, etc.
The pdfrw package does not extract data in quite the same way that PyPDF2 does.
#Use memory stream with pdfwriter how to
Now that we have pdfrw installed, let's learn how to extract some information from our PDFs.
Let's get that done so we can start using pdfrw: python -m pip install pdfrw Code can be found on GitHub.Īs you might expect, you can install pdfrw using pip. Note: This article is based on my book, ReportLab: PDF Processing with Python.
Combining the use of pdfrw and ReportLab. Extract certain types of information from a PDF. In this article, we will learn how to do the following: You can also use pdfrw in conjunction with ReportLab to re-use potions of existing PDFs in new PDFs that you create with ReportLab. The pdfrw package has been used by the rst2pdf package (see chapter 18) since 2010 because pdfrw can "faithfully reproduce vector formats without rasterization". With that version, it supports subsetting, merging, rotating and modifying data in PDFs. At the time of writing, pdfrw was at version 0.4. The pdfrw package is a pure-Python library that you can use to read and write PDF files. Patrick Maupin created a package he called pdfrw and released it back in 2012.