TCOLS v2.00 - a table column filter
Revised 2-Jan-99. Copyright (c) 1996-99 by Rune Berg. TextTools Freeware.

Introduction
Usage
Options
Input Data: Fields And Separators
Expressions
Errors During Processing
More Examples
Expression Syntax
Function Library
Limitations
Return Codes
Version History
tt_r6


IntroductionTop | Next

tcols is a filter for projecting and transforming data columns in text files.

tcols runs from the command line or from batch files.

Input and output data are plain ASCII text lines, each line being treated as (by default, but see -i option) whitespace-separated fields. Files are typically used for input and output data.

For example, consider a text file "data", containing the following table (3 columns, 4 lines):

The command:

writes the third and second columns (separated by a tab) to the screen:

What made tcols write just that was the expressions, $3 and $2, specified on the command line.

Here's another example, using the same file "data". The command:

writes the following to the file "results":

The last example shows the use of applying a function to an expression. tcols has functions for string manipulation, formatting, decimal/hex/octal conversions, maths, and a few other things.

The above examples show only a few of tcols's capabilities, so read the next sections for a full description.

Note: All usage examples in this document are for tcols running on MS-DOS. Running tcols on a Unix shell requires quoting appropriate for the particular shell.


UsageTop | Previous | Next

tcols [log logfile] [options] [from infile] [to outfile] expr [...]

Where:

[] denotes an optional item.

Upper/lower case for the 'log', 'from', and 'to' keywords is not significant. Also, these keywords should not be used as file names.


OptionsTop | Previous | Next

tcols recognizes the following command line options:

Here are some example -o uses:


Input Data : Fields and SeparatorsTop | Previous | Next

Whitespace-separated | Character-separated | Fixed-position/length

The input data to tcols is ordinary ASCII text lines.
tcols sees each line as consisting of zero or more fields, denoted $1, $2, ...

Whitespace-separated fields (default)

tcols sees each field as separated by at least one tab or blank, e.g.:

     john 37  butcher  (end-of-line)
     <--> <>  <----->
     $1   $2  $3

If an input line has no fields (i.e., consists of whitespace only), then tcols will write an empty line to the output, without evaluating the expression(s).

If you want a field to contain whitespace, then the field must be surrounded by single quotes, e.g. 'hey you', or by double quotes, e.g. "hey you".

If you want a single quote inside a singly quoted field, precede it by a backslash, e.g. 'It\'s allright'.

If you want a double quote inside a doubly quoted field, precede it by a backslash, e.g. "She said \"yes\"".

If you want a backslash inside a singly/doubly quoted field, precede it by another backslash, e.g. "a backslash: \\".

If you want a single quote inside a doubly quoted field, no special care is needed, e.g. "It's allright".

If you want a double quote inside a singly quoted field, no special care is needed, e.g. 'She said "yes"'

'' and "" are valid fields.

If tcols finds an unmatched quote on an input line, then tcols reads that quote and the rest of the line as one field. For example:

     12 5654 'I feel good     8899     (newline)
     <> <--> <------------------------>
     $1 $2   $3

When tcols reads a quoted field from the input, tcols considers the surrounding quotes part of the field.

Character-separated fields

If you use the -iC option, tcols uses the character C to separate the fields on an input line. In this discussion, we'll consider comma-separated input data.

As an example, this is how tcols -i, would see the following input line:

     Al,   42, shoe salesman,married,2, Dodge  (newline)
     <> <---> <------------> <-----> - <------>
     $1 $2    $3             $4      $5 $6

Any text between the start of the line and the first comma, between two commas, or between the last comma and the end of the line, constistutes a field.

Two commas right next to each other constitute an empty field; this is perfectly legal.

If an input line consists of whitespace only, then tcols will write an empty line to the output, without evaluating the expression(s). Otherwise, whitespace has no special significance when you're using the -i option.

Quotes have no special significance when you're using the -i option.

If you want a comma inside a field, precede it by a backslash: \,
If you want a backslash inside a field, precede it by a backslash: \\

Make sure your input data does not have unwanted spaces at the end of lines.

Fixed-position/length fields

This section describes how to make tcols deal with fields starting at specific character positions.

The trick is to use the $r (raw line) expression and the subs function.

For example, the command:

where the file "alpha.txt" contains the line:

yields the following output:


ExpressionsTop | Previous | Next

Simple | Arithmetic | Function-calls

Expressions specify how tcols should map input data to output data. tcols applies the expressions to each input line in turn, producing a corresponding output line. The only exception is empty input lines (lines that contain only whitespace); they result in an empty line on the output, without being evaluated.

Syntax errors in expressions will cause tcols to exit with an appropriate error message, before any processing.

An expression should not contain spaces, except in string literals (in which case the whole expression must be surrounded by double quotes, e.g.: "/hi there/".)

The Expression Syntax section describes the exact grammar.

Simple Expressions

This section describes tcols's basic expressions. They are best illustrated by examples:

Some notes on basic expressions:

Example: printing 3rd and 5th fields separated by just a colon:

Example: swapping the 3rd and 8th column in an 12 column table:

Expressions with Arithmetic

This sections describes how to use tcols's arithmetic operators: + - * / %.

All these operators work on integers. All except % work on floating point numbers. An operation involving only integers gives an integer result. An operation involving a floating point number gives a floating point result.

The use of arithmetic operators is best shown by examples:

Note 1: When invoking tcols from a batch file, you need an extra % to prevent the MS-DOS shell from treating %10 as the 10th command line argument: $1%%10

Note 2: If you invoke tcols to use standard input/output, and the first expression starts with a '-', then put that first expression in brackets, e.g. (-$2), so tcols won't think it's a command line option.

Shortcuts are possible. For example, the expression:

applied to the input line:

yields the following output:

(Note default floating point precision of 6 digits.)

Note that the right hand side of + - * / and % must evaluate to exactly one number.

Unary - (minus) has the highest precedence, so the following are equivalent:

* / and % have equal, and next highest precedence. They're evaluated left to right, so the following are equivalent:

+ and binary - have the lowest precedence, and are evaluated left to right, so the following are equivalent:

Parenthesis ( ) can be used to override precedence:

Expressions with Function-calls

This section describes how to form expressions with function calls.

A function call has one the forms:

     expression.functionname
     expression.functionname(arguments)

Here are some example function calls:

As a shortcut, expressions can be grouped with ( ) and then fed to a function:

This saves you from writing:

Some functions are only meaningful when applied to several expressions:

Function calls can be chained:

Any expression can be used as a function argument:

If a function is given the wrong number of arguments, or the wrong type of arguments, tcols will print error message to standard error (or logfile, if used) and exit. However, if you use the -w command line option, tcols will skip the offending input line, print a warning to standard error (or logfile, if used), and continue processing the next input line; see the Errors During Processing section.

Note: Due to the introduction of floating point support in version 2.00, you can no longer apply a function directly to a literal integer, as in 33.sqt, because tcols will consider the '33.' part a floating point number. Instead, write e.g. (33).sqt or /33/.sqt.

The Function Library section describes all functions and their required arguments.


Errors During ProcessingTop | Previous | Next

A processing error occurs if the contents of an input line prevent tcols from evaluating your expressions.

tcols's default error action is to print a relevant error message and exit.

However, if you set the -w command line option, tcols will skip the bad input line and continue processing the next input line. tcols prints a warning anyway.

tcols prints error messages and warnings to standard error (or the logfile, if used).

Here are some typical processing errors:

tcols is rather strict about input data. For example, the sum function will only work on integer and floating point arguments, even though I could have made it ignore non-numeric arguments. My reasoning is: tcols will often be used for processing hand-typed data. Typists sometimes hit the wrong keys. If tcols were lax about bad input data, it might quietly produce bad output data.


More ExamplesTop | Previous | Next

This section gives more examples of complete tcols commands.

These examples start with the file "books" which contains:

Now, this file looks a bit messy. You want to reformat it to look cleaner, with first names and surnames together, no single quotes around the names, and no year of publication. The command:

prints the following to "books2":

Allright. To ease future processing, you want your book list on a field-oriented format. The command:

prints the following to "books3":

Now, you can use another TextTools program, trows, to print all your crime books. The command:

prints to the screen:

Or, you can sort your books on author name, using yet another TextTools program: tsort. The command:

prints to the screen:


Expression SyntaxTop | Previous | Next

This section defines the exact tcols expression syntax rules.

Note: The spaces used in these rules are for clarity; spaces are not allowed in actual expressions (except to denote a space in a literal string).

     expr      ::=  list

     list      ::=  arit , list
                |   arit

     arit      ::=  arit + term
                |   arit - term
                |   term

     term      ::=  term * neg
                |   term / neg
                |   neg

     neg       ::=  - neg
                |   call

     call      ::=  call . funcname ( list )
                |   call . funcname
                |   simple

     simple    ::=  $ integer               ; integer must be >= 1
                |   $ integer .. integer    ; integers must be >= 1, first <= second
                |   $ integer .. l          ; integer must be >= 1
                |   $ l
                |   $ c
                |   $ r
                |   $ n
                |   $ e ( list )   ; list should eval. to one string
                |   integer
                |   floating-point
                |   / string /
                |   ( list )

     integer   ::=  [+|-][0-9]+

     floating- ::=  [+|-][0-9]*.[0-9]* 
     point      |   [+|-][0-9]+.[0-9]*e[+|-][0-9]*  ; scientific notation
                |   [+|-][0-9]*.[0-9]+e[+|-][0-9]*  ; scientific notation

     string    ::=  one or more printable characters, but use
                    \/ for forward-slash, \\ for backslash


Function LibraryTop | Previous | Next

Character
.... Change
.... Deletion
String
.... Substring
.... Case
.... Replacement
.... Formatting
.... Misc.
Conversion
.... Number Base
Maths
.... Basic
Miscallenous
.... Misc.

This section describes all tcols's functions.

E, E1, etc., in this discussion denotes expressions, as far as syntax is concerned, and the result of evaluating expressions as far as evaluation is concerned.

Character Change Functions

padl | padt | resc | desc | cc | cco | ccl | cct | ccto | ccp


padl - pad leading blanks

E.padl(s) yields E with leading blanks replaced by the first character of s.

s must be exactly one character long.

For example:

     /  55/.padl(/0/)   yields: 0055


padt - pad trailing blanks

E.padt(s) yields E with trailing blanks replaced by the first character of s.

s must be exactly one character long.

For example:

     /ok   /.padt(/./)   yields: ok...


resc - re-escape

E.resc yields E with every:

     '        changed to  \'
     "        changed to  \"
     \        changed to  \\
     tab      changed to  \t
     newline  changed to  \n

For example, resc applied to:

     'ok'  yields:  \'ok\'
     a"b'  yields:  a\"b\'
     kh\k  yields:  kh\\k
     \'\"  yields:  \\\'\\\"

(Newlines can only occur as the result of desc applied to a string that contains \n)


desc - de-escape

E.desc yields E with every:

     \'  changed to  '
     \"  changed to  "                
     \\  changed to  \                
     \t  changed to  tab                
     \n  changed to  newline

desc changes every \xHH (where HH is exactly two hexadecimal digits) to the corresponding ASCII character.

desc changes every \O (where O is one, two, or three octal digits) to the corresponding ASCII character.

desc makes no other changes. For example, \z is not changed to z.


cc - change characters

E.cc(s,t) yields E with any characters in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.

For example:

applied to the input lines:

yields the following output lines:


cco - change given occurences of characters

E.cco(s,t,m,n) yields E with mth..nth occurence of the character(s) in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.
m and n must integers >= 1, with m <= n.

For example:

applied to the input lines:

yields the following output lines:


ccl - change leading characters

E.ccl(s,t) yields E with leading characters in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.

For example:

applied to the input line:

yields the following output line:


cct - change trailing characters

E.cct(s,t) yields E with trailing characters in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.

For example:

applied to the input line:

yields the following output line:


ccto - change given number of trailing characters

E.ccto(s,t,n) yields E with (up to) n trailing characters in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.
n must be >= 0.

For example:

applied to the input lines:

yields the following output lines:


ccp - change characters within given position range

E.ccp(s,t,m,n) yields E with characters at positions m..n equal to the character(s) in s changed into the single or (by relative position) corresponding character in t.

s must be at least one character long.
t must be exactly one character long or the same length as s.
m and n must integers >= 1, with m <= n.

For example:

applied to the input line:

yields the following output line:

Character Deletion Functions

trl | trt | tr | dc | dco | dcl | dct | dcto | dcp


trl - trim leading whitespace

E.trl yields E without leading whitespace.

For example:

     /  aaa/.trl.sqt   yields: 'aaa'


trt - trim trailing whitespace

E.trt yields E without trailing whitespace.

For example:

     /aaa  /.trt.sqt  yields: 'aaa'


tr - trim leading and trailing whitespace

E.tr yields E without leading or trailing whitespace.

For example:

     / aa a  /.trt.sqt  yields: 'aa a'


dc - delete certain characters

E.dc(s) yields E with any characters in s deleted.

s must be at least one character long.

For example:

applied to the input lines:

yields the following output lines:


dco - delete certain occurences of certain characters

E.dco(s,m,n) yields E with mth..nth occurences of any characters in s deleted.

s must be at least one character long.
m and n must be integers >= 1, with m <= n.

For example:

applied to the input lines:

yields the following output lines:


dcl - delete leading occurences of certain characters

E.dcl(s) yields E with leading occurences of characters in s deleted.

s must be at least one character long.

For example:

applied to the input lines:

yields the following output lines:


dct - delete trailing occurences of certain characters

E.dct(s) yields E with trailing occurences of any characters in s deleted.

s must be at least one character long.

For example:

applied to the input lines:

yields the following output lines:


dcto - delete given number of trailing characters

E.dcto(s,n) yields E with (up to) n trailing occurrences of characters in s deleted.

s must be at least one character long.
n must be >= 0.

For example:

applied to the input lines:

yields the following output lines:


dcp - delete certain characters in a certain range of positions

E.dcp(s,m,n) yields E with characters at positions m..n equal to the character(s) in s deleted.

s must be at least one character long.
m and n must integers >= 1, with m <= n.

For example:

applied to the input lines:

yields the following output lines:

Substring Functions

subs | rig | clip | app | pre | cat


subs - substring

E.subs(i,j) yields the i'th ... j'th characters of E.

i and j must be integers greater than or equal to 1.
j must be greater than or equal to i.

If i is greater than the length of E, E.subs(i,j) yields the empty string.
If j is greater than the length of E, E.subs(i,j) yields characters i .. length-of-E of E.

For example:

     /abcdefgh/.subs(3,6)   yields: cdef


rig - right substring

E.rig(i) yields the i last characters of E.

i must be an integer greater than or equal to 0.

If E has less than i characters, E.rig(i) yields E.


clip - clip off

E.clip(i,j) yields E with the i leftmost and j rightmost characters clipped off.

i and j must be integers greater than or equal to 0.

If the length of E is less than or equal to i + j, then E.clip(i,j) yields the empty string.

For example:

     /abcdefg/.clip(2,3)   yields: cd


app - append

(E1,E2,...).app(s) yields s appended to E1, E2, ...

Useful for appending the same string to several expressions.

For example:

     (4,5,6).app(/.00/)   yields: 4.00 5.00 6.00


pre - prepend

(E1,E2,...).pre(s) yields s prepended to E1, E2, ...

Useful for prepending the same string to several expressions.

For example:

     (2,3,4).pre(/#/)   yields: #2 #3 #4


cat - concatenate

(E1,E2,...).cat yields the concatenation of E1, E2,...

For example:

     ($2,$3,$1).cat 

applied to the input line:

     56 john zap

yields:

     johnzap56
String Case Functions

upp | low


upp - upper case

E.upp yields E with all letters in upper case.

upp does not touch non-letters.


low - lower case

E.low yields E with all letters in lower case.

low does not touch non-letters.

String Replacement Functions

if | ifel


if - if then

E.if(f,g) yields: g if E is equal to f; E if E is not equal to f.

If E and f are both integers, they are compared numerically; otherwise they are compared ASCII-wise.

For example:

     $1.if(20,/TWENTY/) 

applied to the input lines:

     20
     67      
     4
     0020

yields the following output lines:

     TWENTY
     67
     4
     TWENTY


ifel - if then else

E.ifel(f,g,h) yields: g if E is equal to f; h if E is not equal to f.

If E and f are both integers, they are compared numerically, otherwise they are compared ASCII-wise.

For example:

     $1.ifel(20,/TWENTY/,/other/)

applied to the input lines:

     20
     67
     4
     +0020

yields the following output lines:

     TWENTY
     other
     other
     TWENTY
String Formatting Functions

prf | rjf | ljf


prf - print formatted (minimal printf)

(E1,E2,...).prf(s) yields (the format string) s expanded according to format specifications and Es.

A format specification has the general form ([] denotes an optional item):

     #[flags][width][.prec]format

flags are one or more of:

width is an integer in the range 1..[DOS: 255; Win32: 1024]; leading 0 (as in #04) means left-pad integer or floating point with 0s,

prec is an integer in the range 0..15 (default 6). Only relevant for use with f, g and e formats (see below).

format is one of:

Notes:

Also, prf replaces every ## in s by #, every \t by a real tab, and every \n by a real newline.

For example:

     (/abc/,55,-123).prf(/#-5s:###05d:#7dX/)

yields:

     abc  :#00055:   -123X
     .....  ..... .......     ; author's comment
       5      5      7        ; 

There must be enough Es for the format specifiers. Extra Es are ignored.

Here's an example of formatting floating point values. The command:

     tcols "-o " $1.prf(/#08.3f/) $2.prf(/#+6.2g/) $3.prf(/#15.4e/) 

or, simpler:

     tcols "$1..3.prf(/#08.3f #+6.2g #15.4e/)" 

applied to the following input data:

     1.1  2.20 3.0034
     55.6777 -0.0345 0.01

gives the following output:

     0001.100  +2.2       3.0034e+00
     0055.678  -0.03      1.0000e-02
     ........ ...... ...............    ; author's comment
        8       6         15            ;

Notice especially how the second column is aligned along '.', and only has significant digits.

Bug note: Due to an error in my C compiler's I/O library, combining the blank or + flag, a width indicating leading 0s, and one of the f/g/e formats, will sometimes produce an output field that has one leading 0 too much. Currently, I have no solution for this problem. As a workaround, you can reduce the width by one.


rjf - right justified field

E.rjf(w) yields E right justified in a field of at least w spaces.

w must be an integer in the range 1 .. [DOS: 255; Win32: 1024].

For example:

     45.rjf(7).sqt  yields:  '     45'
     45.rjf(2).sqt  yields:  '45'    
     45.rjf(1).sqt  yields:  '45'


ljf - right justified field

E.ljf(w) yields E left justified in a field of at least w spaces.

w must be an integer in the range 1 .. [DOS: 255; Win32: 1024].

For example:

     45.ljf(7).sqt  yields:  '45     '
     45.ljf(2).sqt  yields:  '45'
     45.ljf(1).sqt  yields:  '45'
Misc. String Functions

sqt | suqt | dqt | duqt | rev | len | nl


sqt - single quote

E.sqt yields E surrounded by single quotes (').

For example, sqt applied to:

     hey   yields: 'hey'
     'hey  yields: 'hey'
     hey'  yields: 'hey'
     'hey' yields: 'hey'
     '     yields: ''
     ''    yields: ''
     hey\' yields: 'hey\''

sqt applied to the empty string yields: ''


suqt - single unquote

E.suqt yields E without surrounding single quotes (').

For example, suqt applied to:

     'hey'  yields: hey
     'hey   yields: hey
     hey'   yields: hey
     ''     yields: the empty string
     '      yields: the empty string
     hey\'  yields: hey\'


dqt - double quote

dqt works exactly like sqt, but handles double quotes (").


duqt - double unquote

duqt works exactly like suqt, but handles double quotes (").


rev - reverse

E.rev yields E reversed.

For example:

     /istanbul/.rev   yields: lubnatsi

Note that rev changes \' to '\, etc.


len - length

E.len yields the number of characters in E.

For example:

     /mama/.len   yields: 4


nl - append newline char.

E.nl appends a string containing just a newline character to E.

For example, the command:

     tcols -o, from myfile $1 $2.nl $3 $4 

applied to the file "myfile" containing:

     this is line 1
     this is line 2

prints the following to the screen:

     this,is
     line,1
     this,is
     line,2
Number Base Conversion Functions

d2h | h2d | d2o | o2d


d2h - convert decimal to hexadecimal

E.d2h yields E in hexadecimal form.

E must be an integer in decimal form.

For example:

     256.d2h   yields: 100

If E is negative, the number of hexadecimal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)


h2d - convert hexadecimal to decimal

E.h2d yields E in decimal form, possibly preceeded by a minus sign.

E must contain only hexadecimal digits (0..9 a..f A..F).


d2o - convert decimal to octal

E.d2h yields E in octal form.

E must be an integer in decimal form.

If E is negative, the number of octal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)


o2d - convert octal to decimal

E.o2d yields E in decimal form, possibly preceeded by a minus sign.

E must contain only octal digits (0..7).

Basic Maths Functions

abs | sum | nmax | nmin | rnd


abs - absolute value

E.abs yields the absolute value of E.

E must be an integer or floating point number.


sum - add up

(E1,E2,...).add yields: E1+E2+..

E1, E2, ... must all be integer or floating point numbers.

Note: When used on a mix of integers and floating point numbers, this function will always yield a floating point result.


nmax - greatest number

(E1,E2,...).nmax yields the numerically greatest of E1, E2, ..., which must all be integer or floating point numbers.

Note: When used on a mix of integers and floating point numbers, this function will always yield a floating point result.


nmin - smallest number

(E1,E2,...).nmin yields the numerically smallest of E1, E2, ..., which must all be integer or floating point numbers.

Note: When used on a mix of integers and floating point numbers, this function will always yield a floating point result.


rnd - round floating point number to a specified precision

E.rnd(n) yields E rounded to n decimal places, in a format obeying the -fpfF option.

E must be a floating point number.

n must be an integer in the range 0..15.

Miscallenous Functions

amax | amin | turn | rng | dup


amax - ASCII-wise greatest string

(E1,E2,...).amax yields the greatest of E1, E2, ... when compared as ASCII strings.

For example:

     ($1,$2,$3).amax

applied to the input line:

     lemonade gin port 

yields:

     port


amin - ASCII-wise smallest string

(E1,E2,...).amin yields the smallest of E1, E2, ... when compared as ASCII strings.


turn - turn

(E1,E2,...).turn yields ... E2 E1

For example:

     $1..l.turn 

applied to the input line:

     56 4 11 899 66

yields:

     66 899 11 4 56


rng - range

(E1,E2,...).rng(i,j) yields: Ei ... Ej

i and j must be integers greater than or equal to 1.
i must be within the count of E1,E2,...
j must be greater than or equal to i.

For example:

     $1..l.rng(2,4)

applied to the input line:

     56 4 11 899 66

yields:

     4 11 899   


dup - duplicate

E.dup(i) yields: E i times

i must be an integer greater than or equal to 1.

For example:

     $1.dup(3)

applied to the input line:

     56

yields:

     56 56 56   

Another example:

     $1..3.dup(2)

applied to the input line:

     a b

yields:

     a b a b


LimitationsTop | Previous | Next

This section describes tcols's limitations. Normally these limitations won't bother you, but anyway, here they are:

tcols will print an error message to standard error (or logfile, if used), if any of the above error situations occurs.


Return CodesTop | Previous | Next

tcols returns with one of the following codes ("error levels"):


Version HistoryTop | Previous

These are the released versions:

End of document