Using LinqPad for ad-hoc text analysis

I think almost every dev had the pleasure to extract information out of text files. The content may be Logs, Debug outputs or from any other source.

Lately I was asked task to analyze our codebase for some given keywords. I did it by downloading all sources to my local disk, and did a somewhat naive search method using Visual Studio “Find In Files”  (Shortcut is Shift+Ctrl+F).

Visual Studio offers the handy "Find In Files" option. You can specify to look in custom folder collections:

After selecting your folders you can start searching.

In Visual Studio's output window you see the search results. Copy and paste it into your favorite text editor (I’m, using Notepad++) and save it to disk.

The next step is to extract the information you need from this rather raw txt .

Fire up LinqPad. I suggest you download the latest version for .NET Framework 4.0.

To read in your file use the ReadLines() method which comes with
Framework 4.0. ReadLines gives you an IEnumerable and yields the results. This means processing large datasets is possible, because only the current element/line is used. A good article by Paul Jacksonville can be found here .

My goal was to get all Files and the number of places my keyword was found.

So let’s look at how to read in Visual Studios "Find In Files" output:

The format is as follows:


I’m using a regular expression to extract File and Line into an anonymous type:

var filesAndLines = File.ReadLines("c:\\temp\\SearchResults.txt")
                    .Where(x => Regex.IsMatch(x,>@"(.+)\([1-9]+\)"))
                    .Select(y => {  File= Regex.Match(y,@"(.+)\([1-9]+\)").Groups[1].Value, Line=Regex.Match(y,@"(.+)\(([1-9]+)\)").Groups[2].Value});

Having this structure, it’s an easy step to group it:

              .Select(y=> new {File = y.Key, Count = y.Count()})

That’s it. Hit F5 and see what the beautiful little Dump() – Extension from LinqPad produce:

The best thing is, you can easily export it to Excel or Word!

Fazit: LinqPad is a handy and versatile tool. I’m using it not only to understand LINQ better, but even for little tasks as shown here.