Grep odt file
Overview
In the GNU/Linux world, you spend a lot of time on the command line. Searching
a text file is a piece of cake: grep -iE "expression" file1
You might even
use a gui, and inside that gui you might even use an open-source office
suite for all those times that plain text isn't
enough. But what about when you want to search one of those odt files you
vaguely remember is some form of xml? Easy. You use unoconv or odt2txt (look
those up in your package manager) and then grep the
outputted file. Or you can use the --stdout option. unoconv -f txt foo.odt
unoconv -f txt --stdout foo.odt | grep -iE "Joe Schmoe"
History
I first started tackling this problem by figuring out how to access the xml
inside. I learned an odt file is zipped, but a tar xf didn't help. Turns
out it was some other compression, that unzip manages. I also had to
actually learn the tiniest bit of perl, as regular GNU grep (and I inferred
sed) doesn't do non-greedy wildcard matching. So I got this super-complicated
one-liner going before I decided to try a different approach and discovered
the unoconv and odt2txt applications. time unzip -p foo.odt content.xml | sed
-e 's/\([^n]\)>\n(.*)<\/\1>/\2/;s/<text:h.*?>(.*)<\/text:h>/\1/;' -e
's/<style:(font-face|text-properties).*\/>//g;' | sed -e
"s/'/\'/g;s/"/\"/g;s/<text:.*break\/>//g;"
References
Weblinks
- Unzipping an odt file https://ubuntuforums.org/showthread.php?t=899179&s=3aa7c303c4a5655e039600c4082d7a2c&p=5653494#post5653494
- Perl non-greedy wildcard matching http://stackoverflow.com/a/1103177/3569534
Comments