Finding Strings in recursively zipped files
I had an itch to scratch. After using Field Trip (which I like a lot) to determine unused fields, the team managing the external Informatica integration claimed they would need weeks to ensure none of the fields are used in any of their (hundreds) of pipelines.
ZIP inception
My first reaction (OK, the second, first one isn't PC) was: Let's go after the source code and just use an editor of choice to do a find in files. Turns out: not so fast. The source export offered by the team was a zip file with an elaborate directory structure containing, tada, zip files. So each of the pipes would need multiple zip operations.
Itch defined
I needed a tool that would start in a directory with a bunch of zip files, unpack them all. Check for zip files in the unpacked result, unzip these and repeat. Once done, take a list of strings and search for occurrences of those and generate a report which shows the files containing these strings
Itch scratched
I created findstring, a command line tool that takes a directory as starting point unzips what can be unzipped (optional) and searches for the occurrence of strings provided in a text file.
Initially I contemplated to render the output as XML, so the final report could be designed in whatever fashion using XSLT. However following KISS, I ended up using Markdown. I might add the XML option later on.
Recursion
The key piece of the tool is recursion (until you stack overflow ;-) ). Reading a directory and dive into directories found. I could have avoided that using Guava and its fileTraverser, but I like some Inception style coding. The key piece is this:
private boolean expandSources(final File sourceDir) throws IOException {
boolean result = false;
final File[] allFiles = sourceDir.listFiles();
for (final File f : allFiles) {
if (f.isDirectory()) {
result = result || this.expandSources(f);
} else if (f.getName().endsWith(".zip")) {
final String newDirName = f.getAbsolutePath().replace(".zip", "");
final File newTarget = new File(newDirName);
// Need to scan the new directory too
if (this.expandFile(f, newTarget)) {
result = result || this.expandSources(newTarget);
}
}
}
return result;
}
The function will return true as long as there was a zip file to be unzipped. The string finding operation (case insensitive) follows the same approach
Use cases
- Find field usage in ZIP files. Works with a package downloaded from the meta data api or what Informatica exports
- Check a source directory (doesn't need to contain zips) for keywords like
TODO
,FIXME
,XXX
The command line syntax is very simple:
java -jar findString.jar -d directory -s strings [-o output]
- -d,--dir <arg> directory with all zip files
- -s,--stringfile <arg> Filename with Strings to search, one per line
- -o,--output <arg> Output file name for report in MD format
- -nz,--nz Rerun find operation on a ready unzipped structure - good for alternate finds
Limits
In its current form the utility will check for strings in any file short of zip. Zip gets unpacked and the result checked. When your directory contains binary files (e.g. images) it will still look for the string occurrence inside. File extension filters might be a future enhancement (share your opinion).
Files are read into memory. So if your directory contains huge files, you will blow your heap. Source code files hardly pose an issue, so the approach worked for me. Alternatively a scanner could be used, should the need arise.
Go give it a spin and keep in mind: YMMV
Posted by Stephan H Wissel on 16 March 2019 | Comments (1) | categories: Salesforce Singapore