Limitations of the FindStr Utility

Readers have noted that I use the FindStr utility quite often. This utility is documented in both Windows Command-Line Administration Instant Reference and Administering Windows Server 2008 Server Core (and also appears a host of my other books). At the time I wrote that documentation, I had no idea of how much comment this particular utility would generate. I’ve written a number of posts about it, including Accessing Sample Database Data (Part 3), Understanding Line-, Token-, and String-Based Command Line UtilitiesUsing the FindStr Utility to Locate Errors in Visual Studio, and Regular Expressions with FindStr. It might be possible that people think that this utility is infallible, but it most certainly has limits. Of course, the FindStr utility is line-based and I’ve already documented that limitation. However, it has other limitations as well.

The most important limitation you must consider is how FindStr works. This utility works with raw files. So, you can use it to look inside executable files and locate those produced by a specific organization as long as the file contains unencrypted data. When an executable relies on obfuscation or other techniques to render the content less viewable by competitors, the strings that you normally locate using FindStr might become mangled as well—making them invisible to the utility. In practice, this particular problem rarely happens, but you need to be aware that it does happen and very likely will happen when the executable file’s creator has something to hide (think virus).

Another problem is that FindStr can’t look inside archives or other processed data. For example, you can’t look inside a .ZIP file and hope to locate that missing file. You might initially think that there is a way around this problem by using the functionality provided in Windows 7 and newer versions of Windows to look inside archive files and treat them as folders. However, this functionality only exists within Windows Explorer. You can’t open a command prompt inside an archive file and use FindStr with it.

Recently, a reader had written me about his Office installation. Previously, he had used FindStr to locate specific files based on their content—sometimes using content that wasn’t searchable in other ways. This feature suddenly stopped working and the reader wondered why. It turns out that .DOC files are raw, while .DOCX files are archives. Change the extension of a .DOCX file to .ZIP and you’ll suddenly find that your ZIP file utilities work great with it. Old Office files work well with FindStr—new files only work if you save them in .DOC format.

Another reader wrote to ask about network drives. It seems that the reader was having a problem locating files on a network drive unless the drive was mapped. This actually isn’t a limitation, but you do have to think about what you want to do. Let’s say you’re looking for a series of .DOC files on the C drive (with a shared name of Drive C) of a server named WinServer in the WinWord folder that contain the word Java in them. The command would look like this: FindStr /m /s “Java” “\\WinServer\Drive C\WinWord\*.doc”. When using network drives, you must include the server name, the share name, the path, and the file specification as part of the command. Otherwise, FindStr won’t work. What I have found though is that FindStr works best with Windows servers. If you try to use it with another server type, you might experience problems because FindStr won’t know how to navigate the directory structure.

There is a real limit on the length of your search string. Another reader wrote with this immense search string and wondered why FindStr was complaining about it. The utility appears to have a search string length limit of 127 characters (found through experimentation and not documented—your experience may differ). The workaround is to find a shorter search string or to perform multiple searches (refining the search by creating a more detailed file specification). If you can’t use either workaround, then you need to write an application using something like VBScript to perform the task.

These are the questions that readers have asked most about. Of course, I want to hear your question about limitations as well. If you encounter some strange FindStr behavior that affects my book’s content in some way, please be sure to write at [email protected].

 

Understanding Line-, Token-, and String-Based Command Line Utilities

My books, Windows Command-Line Administration Instant Reference and Administering Windows Server 2008 Server Core, both contain batch file sections that answer basic needs, but sometimes you need more than basic information to perform a task. A reader asked me how to perform a task using the FindStr utility the other day based on my Regular Expressions with FindStr post. The problem is that FindStr is a line-based utility, and the reader was trying to obtain a token-based result. Using FindStr alone won’t solve the problem. Here is the original reader comment:

 

If I have lines like below in a file called Sum.txt :

Total001 abcdefg
Total002 hijklmn
Total099 opqrstuv

and I use a regular expression to get all the results like “findstr Total[000-099] Sum.txt” the result printed is :

Total001 abcdefg
Total002 hijklmn
Total099 opqrstuv

But I want it to print only the matches to the regular expression like

Total001
Total002
Total099

How can this be done?


And my response:

 

The FindStr
utility is line oriented, which means you obtain an entire line as
output, not individual tokens. In order to accomplish what you want to
do, you need to create a For loop. Using a For
loop would allow you to process the individual tokens in the line. The
following command will do what I think you want to accomplish:




For /F “UseBackQ” %1 In (`FindStr Total[000-099] Sum.txt`) Do @Echo %1




There are two important things to notice here. First, you must provide the “UseBackQ”
option or the command won’t work. The command itself must appear in
back-quotes—not regular quotes. The back-quote normally appears above
the Tab button and to the left of the 1 on a keyboard. It usually
appears with the tilde (~) character.



Using For makes it possible to create a token-based output from the line-based FindStr output. The default For setting relies on the space and tab characters as delimiters, but you can use the Delimiters= option to change the default behavior. However, sometimes a token-based result isn’t enough. You may not want an entire word (or whatever element the delimiters define). In this case, you need a string-based output.

One of the undocumented features of the command line is to create substrings from variables. For example, let’s say you define the following variable:

 

Set MyVariable=Hello World


Now, you want to obtain just a piece of that variable to use somewhere in your batch file. To obtain the substring, you use the tilde (~) operator. This operator uses a 0-based offset. So, let’s say you issue the following command:

 

Echo %MyVariable:~3%


The output of this command is: lo World. The output begins with the forth character, which is an l and displays the remainder of the string. However, let’s say you don’t need the rest of the string. Well, in this case, you can add a second number to define the characters you do need. If you issue this command:

 

Echo %MyVariable:~3,6%


the output is: lo Wor. The output begins with the fourth character and proceeds to the ninth character. The output contains the six characters you requested. In short, it’s possible to perform some fancy string manipulation in batch files as long as you keep the short of output you need in mind. Let me know how you use batch files to perform various sorts of string manipulation at [email protected].

Regular Expressions with FindStr

Regular expressions are a powerful feature of the FindStr utility. However, they can also prove frustrating to use in some cases because the documentation Microsoft provides is lacking in good examples and difficult to follow. You can see some usage instructions for FindStr starting on page 82 of the Windows Command-Line Administration Instant Reference .

A reader recently commented that there is a problem with the dollar sign ($) regular expression. It must actually appear after the search term to be useful. Of course, the problem is creating a test file to sufficiently check the use of the regular expressions, so I came up with this test file:

TestFile

Now, let’s perform some tests with it.  Here is the result of some tests
that I performed using this test file and FindStr regular expressions:

TestResults

The first test case shows what happens when you try
the command on page 82 of the book.  It appears to work, but you’ll see
in a moment that it actually doesn’t.  Let’s take the two parts of the
regular expression apart.  Using
FindStr “^Hello” *.TXT seems to work just fine.  However, the command FindStr “$World” *.TXT doesn’t produce any output.
Only when the $ appears after World does the command produce an
output.  Consequently, page 82 should show the rather counterintuitive
command, FindStr “^Hello World$” *.TXT to produce the correct output.

It’s also important to be careful about making generalizations when
using FindStr. For example, when working with the test file originally
shown in this example, the FindStr /B /C:”Hello World” *.TXT command produces the same output as FindStr “^Hello” *.TXT as shown here:

TestResults2

If you change the test file like this though:

TestFile2

you’ll see these results:

TestResults3

As you can see, you must exercise care when using FindStr to obtain the
desired results.  What other odd things have you noticed when using
regular expressions with FindStr?  Add a comment here or write me at [email protected] to let me know.