Monday, March 26, 2007

Extracting fields from a file

Have you every wanted to extract a particular field from a flat file that was hundreds or thousands of lines long?

Now, I bet your saying, sure you could import the file into Excel, OpenOffice.org Spreadsheet or some other spreadsheet application, but what fun would that be. That would mean using your mouse, but your a keyboard kind-of-person. In addition, opening very large files may leave spreadsheet applications very sluggish.

So whats the alternative? AWK (or gawk). Awk is a command line program that can do just what you wish. For example, lets say that you wanted to extract all the full names from your system's /etc/passwd file. Now you could import the file in your favorite spreadsheet application, specify a delimiter, and wait for it to process, depending on the size of your passwd file. Alternatively, you could run the following awk command:

awk -F: '{print $5}' /etc/passswd

This command uses the '-F' flag to specify the semicolon as a delimiter. The 'print $5' section tells the program to print out the 5th field in the file. Finally /etc/passwd specifies the file to extract the field from.

Now I bet you are saying, so what? How did this help me at all, and why didn't I just use the spreadsheet? Well, not only did you not have to import the file into the spreadsheet program, but you can easily pipe the output of this program to other useful applications that allow you to accomplish what you need to. A quick example, is the ability to pipe the output through the 'sort' command which will then list the users full names in alphabetical order.

Still not convinced that this was easier than opening your precious spreadsheet application. When you are referencing a data source that changes often, such as the passwd file on high usage *nix systems, awk definitely pulls its weight. You could easily setup an alias, or a shell script to execute the command for you, and now you can simply type a few characters and hit enter as opposed to fiddling with your spreadsheet, waiting for it to import the data, and dealing with the slow responsiveness as all your system memory is used up.

Give it a try.

1 comment:

Anonymous said...

Good timely reminder of just how good the command line apps can be! Awk is one of those utilities I know little about, but it's always in my "top ten of things I need to learn about". Never no. 1 unfortunately though!!