Tuesday, July 16, 2019

How to Merge PDF Files Using Python Script - Step by Step Guide


Suppose you have bunch of pdf files and you want to merge them in one pdf. So, there are so many online websites and some software tools that can merge pdf files into one.

But there are some limitation to merge PDF files like you can merge only 5 times or you can merge limited pages else you have to pay for more actions.

But here is a very easy python code that you can use to merge pdf without any limitations and without paying anything.

Condition: Python should be installed in your system and you can download latest version from python website. It is free to use.


How to merge pdf files using Python?

Suppose you have two pdf files named "first.pdf" and "second.pdf" and you want to merge both pdf files into one named "mergefile.pdf" in "C:\Source Folder" location. You can do it very easily using Python script by following below steps:

Step 1: Open Notepad in your system.

Step 2: Copy below code in Notepad:

from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os


def merge(path, output_filename):
    output = PdfFileWriter()

    for pdffile in glob(path + os.sep + '*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

    print("Start writing '%s'" % output_filename)
    with open(output_filename, "wb") as f:
        output.write(f)

if __name__ == "__main__":
    parser = ArgumentParser()

    parser.add_argument("-o", "--output",
                        dest="output_filename",
                        default="mergefile.pdf",
                        help="write merged PDF to FILE",
                        metavar="FILE")
    parser.add_argument("-p", "--path",
                        dest="path",
                        default=".",
                        help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.output_filename)




below is explanation for code:


# Import modules to work on pdf files
from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os

# define pdf file writer module
def merge(path, output_filename):
    output = PdfFileWriter()

# Loop will search all files available files in folder
    for pdffile in glob(path + os.sep + '*.pdf'):
        if pdffile == output_filename:
            continue
        print("Parse '%s'" % pdffile)
        document = PdfFileReader(open(pdffile, 'rb'))
        for i in range(document.getNumPages()):
            output.addPage(document.getPage(i))

# Print output filename as wb
        print("Start writing '%s'" % output_filename)
        with open(output_filename, "wb") as f:
        output.write(f)

if __name__ == "__main__":
    parser = ArgumentParser()

    # Give any name which you want for new file like mergepdf.pdf
    parser.add_argument("-o", "--output",
                        dest="output_filename",
                        default="mergefile.pdf",
                        help="write merged PDF to FILE",
                        metavar="FILE")
    parser.add_argument("-p", "--path",
                        dest="path",
                        default=".",
                        help="path of source PDF files")

    args = parser.parse_args()
    merge(args.path, args.output_filename)


Step 4: Save Notepad file in "Source Folder" and give any name like "merger.py".

Now you can script and it will merge both files as mergefile.pdf within seconds in same folder i.e. "Source Folder".

Even you can merge all available files in a folder. Just put the script file in the same folder and run.

Note:- If you want to edit batch file, then right click on it and then click on “Edit with IDLE” or you can open python script in Notepad and after editing you can save.




***** End *****

No comments:

Post a Comment