Convert Pdf to Word in Java Example
Table of Contents
Convert Pdf to Word in Java
Required Jars
- itextpdf-5.4.4
- xmlbeans-xpath-2.3.0
- xmlbeans-2.6.0
- poi-3.9
- dom4j-1.6.1
- poi-ooxml-schemas-3.7
- poi-ooxml-3.7
Java Program to Convert PDF to Word
package com.ngdeveloper; import java.io.FileOutputStream; import java.io.IOException; import org.apache.poi.xwpf.usermodel.BreakType; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; import org.apache.poi.xwpf.usermodel.XWPFRun; import com.itextpdf.text.pdf.PdfReader; import com.itextpdf.text.pdf.parser.PdfReaderContentParser; import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy; import com.itextpdf.text.pdf.parser.TextExtractionStrategy; public class ConvertPdf2Word { public static void main(String[] args) throws IOException { System.out.println("Document converted started"); XWPFDocument doc = new XWPFDocument(); String pdf = "D:\\javadomain.pdf"; PdfReader reader = new PdfReader(pdf); PdfReaderContentParser parser = new PdfReaderContentParser(reader); for (int i = 1; i <= reader.getNumberOfPages(); i++) { TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy()); String text = strategy.getResultantText(); XWPFParagraph p = doc.createParagraph(); XWPFRun run = p.createRun(); run.setText(text); run.addBreak(BreakType.PAGE); } FileOutputStream out = new FileOutputStream("D:\\javadomain.docx"); doc.write(out); out.close(); reader.close(); System.out.println("Document converted successfully"); } }
Can you please send me te code to convert from word to pdf using itext..
Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.
Can you please send me the code to convert from doc to pdf…
Source code provided in the post itself. Are you facing any issue ? if so please post the errors here to look and solve it.
I am unable to get the exact format from a pdf to doc or docx if the pdf is in a tabular format.
The structure gets distorted. Can you please help.
Docx4j is the only open source api which is efficient in converting docx to pdf without compromising the format and styling but catch there is it does not handle space and tabs in documents which keeps the problem unsolved. So I have been doing a lot of research in this area, I have not been able to find a single perfect api in java which converts doc or docx to pdf without compromising the format and styling.
I'm not a developer, i always use this free online pdf to word converter(http://www.online-code.net/pdf-to-word.html) to convert pdf to word online.
i amusing your code but getting the following error.can u plz help me
Exception in thread “main” java.lang.NoSuchMethodError: org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR.getPictList()Ljava/util/List;
at org.apache.poi.xwpf.usermodel.XWPFRun.(XWPFRun.java:75)
at org.apache.poi.xwpf.usermodel.XWPFParagraph.createRun(XWPFParagraph.java:266)
at com.tcs.ConvertPdf2Word.main(ConvertPdf2Word.java:27)
plz send code for arabic language pdf to word?
hi, pls share some sample pdf in arabic language to check and share you the snippet. you can share at mirthbees@gmail.com
Nice post, useful for me.
this is not valid code , it will only do a sample word copy , but if your pdf has image or table this will not work.