Wednesday, August 31, 2022

[FIXED] Problem with processing word document java

August 31, 2022 apache-poi, java, ms-word

Issue

i need to replace some fields in Word Document file in java.I am using Apache Poi library , i am using this code to replace words.

for (XWPFParagraph p : doc.getParagraphs()) {
                List<XWPFRun> runs = p.getRuns();
                if (runs != null) {
                    for (XWPFRun r : runs) {
                        String text = r.getText(0);
                        if (text != null)  {
                            System.out.println(text);
                            if (text.contains("[Title]")) {
                                text = text.replace("[Title]", wordBody.getTitle());//your content
                                r.setText(text, 0);
                            }if(text.contains("[Ref_no]")){
                                text=text.replace("[Ref_no]",wordBody.getRefNumber());
                                r.setText(text,0);
                            }
                            if(text.contains("[In_date]")){
                                text=text.replace("[In_date]",wordBody.getDate());
                                r.setText(text,0);
                            }if(text.contains("[FirstName]")){
                                text=text.replace("[FirstName]",wordBody.getFirstName());
                                r.setText(text,0);
                            }if(text.contains("[MiddleName]")){
                                text=text.replace("[MiddleName]",wordBody.getMiddleName());
                                r.setText(text,0);
                            }if(text.contains("[Vehicle_Type]")){
                                text=text.replace("[Vehicle_Type]",wordBody.getVehicleType());
                                r.setText(text,0);
                            }if(text.contains("[Reg_No]")){
                                text=text.replace("[Reg_No]",wordBody.getRegNumber());
                                r.setText(text,0);
                            }if(text.contains("[Location]")){
                                text=text.replace("[Location]",wordBody.getLocation());
                                r.setText(text,0);
                            }if(text.contains("[Issuer_Name]")){
                                text=text.replace("[Issuer_Name]",wordBody.getLocation());
                                r.setText(text,0);
                            }

                        }
                    }
                }
            }

So i mentioned that not all words a replaced and i didn't know how to fix it , then i printed out all text what i get and i got something like that

This is to certify that [Title] [FirstName] [
MiddleName
] [Surname] has purchased [
Vehicle_Type
] 
having registration [
Reg_No
] from our [Location] Showroom.
Issued By,
[
Issuer

So i need replace fields in [] brackets and some of them as [Surname] a printed okay but some of them as [MIddleName] are changing line and i think that s way its not working .

This - is my word text

I parsing docx file . Thank you

Solution

If you have a look on your screen shot, you will see the red wavy line under MiddleName, Vehicle_Type and Reg_No. That means, that Word has detected a possible spelling problem here. This also is stored in the file and that's why the texts [MIddleName], [Vehicle_Type] and [Reg_No] are not together in one text run with their surrounding brackets. The brackets are in their own text runs and also the texts together with the possible spelling problem marked.

This is a well known problem and some libraries already try solving this by detecting the text variables a more complex way than only searching them in text runs. There is templ4docx for example.

But my preferred way is another. Word for a long time provides using text form fields. See Working with Form Fields. Note the legacy form fields are meant, not the ActiveX ones.

See Replace text templates inside .docx (Apache POI, Docx4j or other) for an example.

Modified example for your case:

WordTemplate.docx:

All gray fields are legacy text form fields inserted from developer tab. In their Text Form Field Options the Bookmark: names are Text1, Text2, ... and default texts are set as needed.

Code:

import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.SimpleValue;
import javax.xml.namespace.QName;

public class WordReplaceTextInFormFields {

 private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
  boolean foundformfield = false;
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    XmlCursor cursor = run.getCTR().newCursor();
    cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/@w:fldCharType");
    while(cursor.hasNextSelection()) {
     cursor.toNextSelection();
     XmlObject obj = cursor.getObject();
     if ("begin".equals(((SimpleValue)obj).getStringValue())) {
      cursor.toParent();
      obj = cursor.getObject();
      obj = obj.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/@w:val")[0];
      if (ffname.equals(((SimpleValue)obj).getStringValue())) {
       foundformfield = true;
      } else {
       foundformfield = false;
      }
     } else if ("end".equals(((SimpleValue)obj).getStringValue())) {
      if (foundformfield) return;
      foundformfield = false;
     }
    }
    if (foundformfield && run.getCTR().getTList().size() > 0) {
     run.getCTR().getTList().get(0).setStringValue(text);
     foundformfield = false;
//System.out.println(run.getCTR());
    }
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));

  replaceFormFieldText(document, "Text1", "Mrs.");
  replaceFormFieldText(document, "Text2", "Janis");
  replaceFormFieldText(document, "Text3", "Lyn");
  replaceFormFieldText(document, "Text4", "Joplin");
  replaceFormFieldText(document, "Text5", "Mercedes Benz");
  replaceFormFieldText(document, "Text6", "1234-56-789");
  replaceFormFieldText(document, "Text7", "Stuttgart");

  FileOutputStream out = new FileOutputStream("WordReplaceTextInFormFields.docx");
  document.write(out);
  out.close();
  document.close();
 }
}

This code is tested using apache poi 4.1.0 and needs the full jar of all of the schemas ooxml-schemas-1.4.jar as mentioned in FAQ-N10025.

Result:

Note the gray background of the text fields is only visible in GUI. It will not be printed out by default.

Advantages:

The form field content can only be formatted as whole. So form field content will never torn apart.

The document can be protected so only filling the form fields is possible. Then the template is usable as a form in Word GUI too.

Answered By - Axel Richter
Answer Checked By - Candace Johnson (JavaFixing Volunteer)

This Answer collected from stackoverflow and tested by JavaFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, August 31, 2022

[FIXED] Problem with processing word document java

Issue

Solution

Popular Posts

Labels