Extract images from PDF without resampling, in python?

You can use the module PyMuPDF. This outputs all images as .png files, but worked out of the box and is fast. import fitz doc = fitz.open(“file.pdf”) for i in range(len(doc)): for img in doc.getPageImageList(i): xref = img[0] pix = fitz.Pixmap(doc, xref) if pix.n < 5: # this is GRAY or RGB pix.writePNG(“p%s-%s.png” % (i, … Read more

Merge PDF files

You can use PyPdf2s PdfMerger class. File Concatenation You can simply concatenate files by using the append method. from PyPDF2 import PdfMerger pdfs = [‘file1.pdf’, ‘file2.pdf’, ‘file3.pdf’, ‘file4.pdf’] merger = PdfMerger() for pdf in pdfs: merger.append(pdf) merger.write(“result.pdf”) merger.close() You can pass file handles instead file paths if you want. File Merging If you want more … Read more