Script day: grep in jar (or zip) files

Here is another script I wrote for work and I thought it will be interesting enough to share:

Say you want to check which JAR files (or ZIP files for that matter, as Java ARchive files are just ZIP files with a different extension) contain files that contain some text. grep is the obvious answer, but how to grep files in JARs?

A simple loop can go over the archives you are interested in, then unzip -c can extract the content of files to the standard output so that grep can be used on them:

pattern="smtp"
for jar in /usr/share/java/*.jar; do 
    for file in $(unzip -l $jar | perl -nle 'split /\s+/ and print $_[-1];'); do
        unzip -c $jar $file | grep -qi "$pattern" && ( 
            rpm -qf $jar 2>&1 >/dev/null && \
                rpm -qf $jar || \
                echo "orphan jar: $jar"
        ); 
    done
done | sort -u

This could also be written as single line in the terminal, obviously.

My version of the loop’s internal code looks at matching files and prints out the RPM they belong to or proclaim them to be orphaned files.

An alternative loop code might simply print the matching output and the file that contained it, a-la grep -H:

output="$(unzip -c $jar $file | grep -i "$pattern")"
[ -n "$output" ] && echo "$jar, $file: $output"

Although there is an obvious bug in that last code example – see if you can figure it out.

6 Responses to “Script day: grep in jar (or zip) files”

  1. Shlomi Fish:

    Hi Oded!

    Your code here suffers from potential shell-variable injection. See my posts about:

    Code Injection

    Shell Variable Injection

    And why can’t we have comment previews here? Stupid and incredibly lame WordPress.

  2. Oded:

    The problem of shell variable escaping is known, and in this simple example (as well as most other scripts I write) I choose to ignore it unless I know for a fact that the values I’m dealing with are expected to include word delimiters (IFR in bash-speak).

    Specifically in this case, jar files very rarely include white space (I could say “never” as I haven’t see such a case in all my years, but as they say – “never say never”) and class files are not allowed to include white space in their names by the Java language specification.

    I do quote the submitted pattern in the above example, because it is always important to make sure user input can’t escape into your script, regardless of what it is matched against 🙂 .

  3. Oded:

    P.S – I don’t like preview comments as I think they are redundant. I will investigate adding them anyway though.

  4. John Ortega:

    Ok,

    That was easy.

    But, what I need is to search a lib directory with a bunch of jar files or zip files. Inside of the files there may be class files or xml files. I need to read the xml files and see if a certain string occurs in it or not.

    Any help?

    John Ortega
    madridlinuxgroup.blogspot.com
    madridlinuxgroup@gmail.com

  5. Oded:

    The script would work fine for that purpose, just use the “alternative loop” code inside the loop instead of my “rpm finding” code.

  6. Michael Kutschke:

    Hi, btw listing the contents of a jar or zip file like you did in your for header can be done easier using

    jar tf [inputfile]

Leave a Reply