Wednesday 26 March 2014

Data formatting with vim regular expressions

Using the right tool for a job can save massive amounts of time. This is a quick post to demonstrate a practical application of the power of regular expressions in vim. The problem is transforming the output from a SQL query into a wiki ready table format.

Starting with the output from the query (test data for demonstration purposes):

 category_name                  total  grated  mashed  boiled
 -----------------------------------------------------------
  Artichoke            107   67     9    31
  Pepper                65   38     2    25
  Carrot                 46   32  NULL    14
  Lettuce                24   24  NULL  NULL
  Spinach               16    8     1     7
  Zuchini                 4    4  NULL  NULL

Paste this data excluding the heading line into vim (or pipe it to an output file and then edit the file). Firstly lets get rid of those ugly null values:

:%s/NULL/0/g

This searches for the character sequence "NULL" and replaces it with 0 globally. The data should now look like this

  Artichoke            107   67     9    31
  Pepper                65   38     2    25
  Carrot                 46   32  0    14
  Lettuce                24   24  0  0
  Spinach               16    8     1     7
  Zuchini                 4    4  0  0

Next lets format the lines and add separators:

:%s/^ /|-^M| /g

This one is a bit trickier, the ^M is actually a control character indicating a new line and is created by pressing Ctrl-V and the Enter. It searches for lines starting with white space (^ ) and inserts |- and a new line before prefixing the line with a |:

|-
| Artichoke            107   67     9    31
|-
| Pepper                65   38     2    25
|-
| Carrot                 46   32  0    14
|-
| Lettuce                24   24  0  0
|-
| Spinach               16    8     1     7
|-
| Zuchini                 4    4  0  0


Finally we want to split each number onto a separate line. This is where the power of regular expressions really shines:

:%s/ \+\(\d\+\)/^M| \1/g

The regex is actually quite simple, it says find all instances with one or more spaces ( \+) followed by one or more digits (\d\+) and save the digits for substitution (by putting them in brackets \( and \)). Then replace the pattern with a newline, a pipe and the saved digits (\1). And viola, columns to rows:

|-
| Artichoke
| 107
| 67
| 9
| 31
|-
| Pepper
| 65
| 38
| 2
| 25
|-
| Carrot
| 46
| 32
| 0
| 14
|-
| Lettuce
| 24
| 24
| 0
| 0
|-
| Spinach
| 16
| 8
| 1
| 7
|-
| Zuchini
| 4
| 4
| 0
| 0

Pretty cool.