Groovy snippet of the day - Using Apache POI to parse Word documents
I sincerely enjoy working with Groovy (a dynamic language for the JVM) combined with the huge ecosystem of the Java platform. Today I had to extract text from a Word 2010 document. As always, there is an app a library for that.
With Apache Poi and the super easy to use Groovy Grape dependency handling, reading a Word, Excel or OpenOffice document becomes a piece of cake:
@Grab(group='org.apache.poi', module='poi', version='3.7')
@Grab(group='org.apache.poi', module='poi-ooxml', version='3.7')
@Grab(group='org.apache.poi', module='poi-scratchpad', version='3.7')
import org.apache.poi.extractor.ExtractorFactory
class PoiTest {
static def main(args) {
def extractor = ExtractorFactory.createExtractor(new File(args[0]))
println extractor.getText()
}
}
