I still remember with vivd clarity the day many years ago that a senior graduate student asked me the question "Do you know how to use awk?".  My embarrassed response, identical to what felt like a thousand previous questions concerning my knowledge of the scientific world during that first semester of graduate school, was a heavy-hearted "No". Luckily for me, I've managed to put several thousands of miles between myself and that naive, unknowing version of myself and can know happily shout from the rooftops - "Yes! And I LOVE it!".  

My reaction to learning about the power of awk! Ok, just kidding, here I was pretty excited about geology while on a trip to Urey, CO.  But hey, my reaction to discovering awk was pretty much the same just without the beautiful backdrop.

My reaction to learning about the power of awk! Ok, just kidding, here I was pretty excited about geology while on a trip to Urey, CO.  But hey, my reaction to discovering awk was pretty much the same just without the beautiful backdrop.

 

Awk is a powerhouse of a command in the Unix environment primarily used for processing and manipulating text files.  Wait! Don't fall asleep yet! Yes, I know text processing sounds boring but the number of hours that can be lost in manually moving columns, lines, and individual entries in large text files is infinite and that's where awk steps in.  Short disclaimer here, awk can be and commonly is used as a standalone script but I'm only going to be discussing one-liner implementations as l use them in the terminal.  

By far my most common usage of awk is the simple printing of columns from a text file.  Say I have a text file with 7 columns and I just want to output the 2nd and 5th - don't break out your cut and paste shortcuts! The awk command would be:

awk '{print $2 "\t" $5}' example.txt

The print statement here tells awk to print out the 2nd and 5th columns with a tab in between.  This implementation assumes that our original file (example.txt) is already a tab-delimited or space-delimited file, but what if it's not? Easy peasy, lemon squeezy:

awk -F, '{print $2 "\t" $5}' example.txt

The character that follows the -F tells awk where you want to separate the columns.  In this example I've used a comma as you would if you had a comma-delimited file.  You can get quite fancy with the delimiter character if you'd like, using strings and combination of strings and punctuation.  The world's your oyster there!

Another helpful awk recipe I use is to output data from a file when an entry matches a condition I've specified. For example, I have a file with latitude, longitude, and elevation of seismic stations but I want a file with only the stations that are west of 34E.  No need to lose eyesight manually reading through a gigantic file or spend time writing a script to do this - with awk it's a one-liner!

awk '($1 < 34){print $1 " " $2 " " $3}'  station_locations.txt > Wof34.txt

The conditional statement is placed within the first set of parentheses.  This statements translates to "If the entry in column 1 is less than 34 then execute the following print command".   You'll see that here I've gotten lazy and instead of delimiting the output files with a tab I've simply placed a space between them.  Also in this example I've sent the output of the command to another new file (done by the > symbol) rather than having it written out to the screen like in the first example.  I find that when dealing with large text files that perhaps took a lot of time and effort to generate, I like to awk specific bits and pieces out of them into separate files rather than trying to modify the original file itself.  Just feels cleaner and simpler that way.  

One last implementation of awk that I'll show is how to output values from specific lines in a file.  For example, I sometimes have files that are separated into sections with different numbers of columns (If you're curious and seismically minded look at the output files from mineos, sigh). I know that the information I care about occurs between lines 52 and 79 every time and, more importantly, the varying structure of the file will cause awk to fail if it tries to process the surrounding sections.  Thus, here comes some capital letters:

awk '(NR>52 && NR<79){print $2 " " $3}' example.txt

Here we again have our conditional statement within the first set of parentheses but this time we've used one of awk's powerful built in variables NR (note that there's quite a few of these and you should definitely check the rest out here).  NR stands for number of records and acts as a line number variable. As you might be able to guess our example asks awk to only execute the print command when line numbers are within the range we've given.

I think that about does it for now! It's raining here in NYC which offers the perfect excuse to stay inside with some crisp ice coffee, a few awk tutorials, and try to place a few more miles between myself and that tech-novice from all those years ago.  I hope that maybe you'll be tempted to do the same!