Monday, July 26, 2010

Reading MS Word Document With Java (Apache POI API)

public void readDocument(String fileName){ 
try {
            POIFSFileSystem poifs = new POIFSFileSystem(new FileInputStream(fileName));
            HWPFDocument doc = new HWPFDocument(poifs);
            WordExtractor extractor = new WordExtractor(doc);
            String[] paragraphs = extractor.getParagraphText();
            for (String paragraph : paragraphs) {
                paragraph = paragraph.replaceAll("\\cM?\r?\n", "");
                System.out.println(paragraph);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
}

1 comment:

  1. Welcome Dear, are you truly passing by this site step by step, if so after that you will get incredible data.

    Curso java

    ReplyDelete