Groovy snippet of the day - Using Apache POI to parse Word documents

I sincerely enjoy working with Groovy (a dynamic language for the JVM) combined with the huge ecosystem of the Java platform. Today I had to extract text from a Word 2010 document. As always, there is an app a library for that.

With Apache Poi and the super easy to use Groovy Grape dependency handling, reading a Word, Excel or OpenOffice document becomes a piece of cake:

@Grab(group='org.apache.poi', module='poi', version='3.7')
@Grab(group='org.apache.poi', module='poi-ooxml', version='3.7')
@Grab(group='org.apache.poi', module='poi-scratchpad', version='3.7')
 
import org.apache.poi.extractor.ExtractorFactory
 
class PoiTest {
static def main(args) {
 
def extractor = ExtractorFactory.createExtractor(new File(args[0]))

println extractor.getText()
 
}
}

Java
More posts can be found in the blog archive.