Wednesday, July 27, 2022

[FIXED] merge many pdf files into one pdf files in web application java

July 27, 2022 java, merge, pdf, spring-mvc

Issue

I have many pdf files and I have to merge all pdf into one big pdf file and render it into browser.I am using itext. Using this, I am able to merge pdf files into one file into disk but I cannot merge into browser and there is only last pdf in browser..following is my code.. please help me on this.

Thanks in advance.

            Document document = new Document();
            List<PdfReader> readers = 
                    new ArrayList<PdfReader>();
            int totalPages = 0;

            ServletOutputStream servletOutPutStream = response.getOutputStream();;
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();;

            InputStream is=null;
            List<InputStream> inputPdfList = new ArrayList<InputStream>();
            System.err.println(imageMap.size());

            for(byte[] imageList:imageMap)
            {
                System.out.println(imageList.toString()+"   "+imageList.length);


                 byteArrayOutputStream.write(imageList);

                 byteArrayOutputStream.writeTo(response.getOutputStream());

                 is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray()); 
                 inputPdfList.add(is);

            }
            response.setContentType("application/pdf");
            response.setContentLength(byteArrayOutputStream.size());

            System.out.println(inputPdfList.size()+""+inputPdfList.toString());
            //Create pdf Iterator object using inputPdfList.
            Iterator<InputStream> pdfIterator = 
                    inputPdfList.iterator();

            // Create reader list for the input pdf files.
            while (pdfIterator.hasNext()) {
                    InputStream pdf = pdfIterator.next();
                    PdfReader pdfReader = new PdfReader(pdf);
                    readers.add(pdfReader);
                    totalPages = totalPages + pdfReader.getNumberOfPages();
            }

            // Create writer for the outputStream
            PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());

            //Open document.
            document.open();

            //Contain the pdf data.
            PdfContentByte pageContentByte = writer.getDirectContent();

            PdfImportedPage pdfImportedPage;
            int currentPdfReaderPage = 1;
            Iterator<PdfReader> iteratorPDFReader = readers.iterator();

            // Iterate and process the reader list.
            while (iteratorPDFReader.hasNext()) {
                    PdfReader pdfReader = iteratorPDFReader.next();
                    //Create page and add content.
                    while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
                          document.newPage();
                          pdfImportedPage = writer.getImportedPage(
                                  pdfReader,currentPdfReaderPage);
                          pageContentByte.addTemplate(pdfImportedPage, 0, 0);
                          currentPdfReaderPage++;
                    }
                    currentPdfReaderPage = 1;
            }

            //Close document and outputStream.
            servletOutPutStream.flush();
            outputStream.flush();
            document.close();
            outputStream.close();

            servletOutPutStream.close();
            System.out.println("Pdf files merged successfully.");

Solution

There are numerous errors in your code:

Only write to the response output stream what you want to return to the browser

Your code writes a wild collection of data to the response output stream:

ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
     [...]
     byteArrayOutputStream.writeTo(response.getOutputStream());
     [...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]

servletOutPutStream.flush();
document.close();

servletOutPutStream.close();

This results in many copies of the imageMap elements to be written there and the merged file only to be added thereafter.

What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?

Thus, please only write the merged PDF to the response output stream.

Don't write a wrong content length

It is a good idea to write the content length to the response... but only if you use the correct value!

In your code you write a content length:

response.setContentLength(byteArrayOutputStream.size());

but the byteArrayOutputStream at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.

Thus, please do not add false headers to the response.

Don't mangle your input data

In the loop

for(byte[] imageList:imageMap)
{
    System.out.println(imageList.toString()+"   "+imageList.length);

    byteArrayOutputStream.write(imageList);

    byteArrayOutputStream.writeTo(response.getOutputStream());

    is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray()); 
    inputPdfList.add(is);
}

you take byte arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...

Because you never reset or re-instantiate the byteArrayOutputStream, it only gets bigger and bigger.

Thus, please start or end loops like this with a reset of the byteArrayOutputStream.

(Actually you don't need that loop at all, the PdfReader has a constructor which can immediately take a byte[], no need to wrap it in a byte stream.)

Don't merge PDFs using a plain `PdfWriter`, use a `PdfCopy`

You merge the PDFs using a PdfWriter / getImportedPage / addTemplate approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy.

Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy for merging.

Don't flush or close streams only because you can

You finalize the response output by closing numerous streams:

//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();

servletOutPutStream.close();

I have not seen a line in which you declared or set that outputStream variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream variable.

Thus, please remove unnecessary calls like this.

Answered By - mkl
Answer Checked By - Dawn Plyler (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 27, 2022

[FIXED] merge many pdf files into one pdf files in web application java

Issue

Solution

Only write to the response output stream what you want to return to the browser

Don't write a wrong content length

Don't mangle your input data

Don't merge PDFs using a plain `PdfWriter`, use a `PdfCopy`

Don't flush or close streams only because you can

Popular Posts

Labels

Wednesday, July 27, 2022

Issue

Solution

Only write to the response output stream what you want to return to the browser

Don't write a wrong content length

Don't mangle your input data

Don't merge PDFs using a plain PdfWriter, use a PdfCopy

Don't flush or close streams only because you can

Popular Posts

Labels

Don't merge PDFs using a plain `PdfWriter`, use a `PdfCopy`