About Apache Tika
The project is hosted by the Apache Software Foundation. It supports detecting various file and content types. There is a full list of supported formats. When having a look at the list that displays the supported formats, many document formats are listed in there. E.g. text/plain
, text/xml
, the propritary Microsoft OOXML or the office standard Open Document. Furthermore images (image/gif
, image/jpeg
, image/bmp
or image/tiff
), videos (video/avi
, video/mpgeg
or video/mp4
) and audios (audi/ogg
, audio/x-wav
or audio/mpeg
) can be recognized by Tika
. Even feeds (application/rss+xml
, application/atom+xml
) may be recognized. And many, many more … Continue reading