By Federico Giorgi


2009-11-13 15:13:46 8 Comments

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'\n' $'\t' | sed -e "s/\t$/\n/g" >> output
done

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

25 comments

@agc 2016-04-10 10:52:41

Some *nix standard util one-liners, no temp files needed. NB: the OP wanted an efficient fix, (i.e. faster), and the top answers are usually faster than this answer. These one-liners are for those who like *nix software tools, for whatever reasons. In rare cases, (e.g. scarce IO & memory), these snippets can actually be faster than some of the top answers.

Call the input file foo.

  1. If we know foo has four columns:

    for f in 1 2 3 4 ; do cut -d ' ' -f $f foo | xargs echo ; done
    
  2. If we don't know how many columns foo has:

    n=$(head -n 1 foo | wc -w)
    for f in $(seq 1 $n) ; do cut -d ' ' -f $f foo | xargs echo ; done
    

    xargs has a size limit and therefore would make incomplete work with a long file. What size limit is system dependent, e.g.:

    { timeout '.01' xargs --show-limits ; } 2>&1 | grep Max
    

    Maximum length of command we could actually use: 2088944

  3. tr & echo:

    for f in 1 2 3 4; do cut -d ' ' -f $f foo | tr '\n\ ' ' ; echo; done
    

    ...or if the # of columns are unknown:

    n=$(head -n 1 foo | wc -w)
    for f in $(seq 1 $n); do 
        cut -d ' ' -f $f foo | tr '\n' ' ' ; echo
    done
    
  4. Using set, which like xargs, has similar command line size based limitations:

    for f in 1 2 3 4 ; do set - $(cut -d ' ' -f $f foo) ; echo [email protected] ; done
    

@Ed Morton 2016-04-10 15:25:14

Those would all be orders of magnitude slower than an awk or perl solution and fragile. Read unix.stackexchange.com/questions/169716/….

@agc 2016-04-10 18:17:35

@EdMorton, thanks, qualifed intro of my answer to address your speed concerns. Re "fragile": not 3), and nor the others when the programmer knows the data is safe for a given technique; and isn't POSIX compatible shell code a more stable standard than perl?

@Ed Morton 2016-04-10 19:12:55

sorry, idk much about perl. In this case the tool to use would be awk. cut, head, echo, etc. are no more POSIX compatible shell code than an awk script is - they all are standard on every UNIX installation. There's simply no reason to use a set of tools that in combination require you to be careful about the contents of your input file and the directory you execute the script from when you can just use awk and the end result is faster as well as more robust.

@agc 2016-04-10 20:30:11

Please, I'm not anti-awk, but conditions vary. Reason #1: for f in cut head xargs seq awk ; do wc -c $(which $f) ; done When storage is too slow or IO is too low, bigger interpreters make things worse no matter how good they'd be under more ideal circumstances. Reason #2: awk, (or most any language), also suffers from a steeper learning curve than a small util designed to do one thing well. When run-time is cheaper than coder man hours, easy coding with "software tools" saves money.

@Pal 2017-09-07 15:00:34

GNU datamash is perfectly suited for this problem with only one line of code and potentially arbitrarily large filesize!

datamash -W transpose infile > outfile

@αғsнιη 2018-09-19 16:51:14

Another awk solution and limited input with the size of memory you have.

awk '{ for (i=1; i<=NF; i++) RtoC[i]= (RtoC[i]? RtoC[i] FS $i: $i) }
    END{ for (i in RtoC) print RtoC[i] }' infile

This joins each same filed number positon into together and in END prints the result that would be first row in first column, second row in second column, etc. Will output:

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

@nisetama 2015-05-11 17:28:41

Another option is to use rs:

rs -c' ' -C' ' -T

-c changes the input column separator, -C changes the output column separator, and -T transposes rows and columns. Do not use -t instead of -T, because it uses an automatically calculated number of rows and columns that is not usually correct. rs, which is named after the reshape function in APL, comes with BSDs and OS X, but it should be available from package managers on other platforms.

A second option is to use Ruby:

ruby -e'puts readlines.map(&:split).transpose.map{|x|x*" "}'

A third option is to use jq:

jq -R .|jq -sr 'map(./" ")|transpose|map(join(" "))[]'

jq -R . prints each input line as a JSON string literal, -s (--slurp) creates an array for the input lines after parsing each line as JSON, and -r (--raw-output) outputs the contents of strings instead of JSON string literals. The / operator is overloaded to split strings.

@tripleee 2015-11-26 13:00:09

I wasn't familiar with rs -- thanks for the pointer! (The link is to Debian; the upstream appears to be mirbsd.org/MirOS/dist/mir/rs)

@lalebarde 2016-02-27 09:05:56

It looks like it cannot use tabs as separator. I tried '\t' and '^t'. This is not documented.

@nisetama 2016-03-05 12:20:44

@lalebarde At least in the implementation of rs that comes with OS X, -c alone sets the input column separator to a tab.

@glenn jackman 2016-04-10 11:51:12

@lalebarde, try bash's ANSI-C quoting to get a tab character: $'\t'

@Nathan S. Watson-Haigh 2016-11-21 22:13:34

This worked for me when transposing a tab-delimited file: rs -c$'\t' -C$'\t' -T

@jrm 2017-07-06 09:35:00

This is an extreme case, but for a very large file with many rows like TTC TTA TTC TTC TTT, running rs -c' ' -C' ' -T < rows.seq > cols.seq gives rs: no memory: Cannot allocate memory. This is a system running FreeBSD 11.0-RELEASE with 32 GB of ram. So, my guess is that rs puts everything in RAM, which is good for speed, but not for large data.

@Glubbdrubb 2018-03-20 09:50:17

jq used 21Gb of ram on a 766MB file. I killed it after 40 minutes without any output.

@simlev 2018-08-29 08:28:06

@jrm Your guess is correct, it says so in the manual: BUGS The algorithm currently reads the whole file into memory, so files that do not fit in memory will not be reshaped.

@kirill_igum 2014-12-07 03:08:45

Here is a Bash one-liner that is based on simply converting each line to a column and paste-ing them together:

echo '' > tmp1;  \
cat m.txt | while read l ; \
            do    paste tmp1 <(echo $l | tr -s ' ' \\n) > tmp2; \
                  cp tmp2 tmp1; \
            done; \
cat tmp1

m.txt:

0 1 2
4 5 6
7 8 9
10 11 12
  1. creates tmp1 file so it's not empty.

  2. reads each line and transforms it into a column using tr

  3. pastes the new column to the tmp1 file

  4. copies result back into tmp1.

PS: I really wanted to use io-descriptors but couldn't get them to work.

@Ed Morton 2016-04-10 15:46:03

Make sure to set an alarm clock if you're going to execute that on a large file. Read unix.stackexchange.com/questions/169716/… to understand some, but not all, of the problems with that approach.

@nelaaro 2017-04-07 09:00:40

There is a purpose built utility for this,

GNU datamash utility

apt install datamash  

datamash transpose < yourfile

Taken from this site, https://www.gnu.org/software/datamash/ and http://www.thelinuxrain.com/articles/transposing-rows-and-columns-3-methods

@Felipe 2014-05-06 21:41:21

Not very elegant, but this "single-line" command solves the problem quickly:

cols=4; for((i=1;i<=$cols;i++)); do \
            awk '{print $'$i'}' input | tr '\n' ' '; echo; \
        done

Here cols is the number of columns, where you can replace 4 by head -n 1 input | wc -w.

@ghostdog74 2009-11-13 15:34:46

awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' file

output

$ more file
0 1 2
3 4 5
6 7 8
9 10 11

$ ./shell.sh
0 3 6 9
1 4 7 10
2 5 8 11

Performance against Perl solution by Jonathan on a 10000 lines file

$ head -5 file
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
1 0 1 2

$  wc -l < file
10000

$ time perl test.pl file >/dev/null

real    0m0.480s
user    0m0.442s
sys     0m0.026s

$ time awk -f test.awk file >/dev/null

real    0m0.382s
user    0m0.367s
sys     0m0.011s

$ time perl test.pl file >/dev/null

real    0m0.481s
user    0m0.431s
sys     0m0.022s

$ time awk -f test.awk file >/dev/null

real    0m0.390s
user    0m0.370s
sys     0m0.010s

EDIT by Ed Morton (@ghostdog74 feel free to delete if you disapprove).

Maybe this version with some more explicit variable names will help answer some of the questions below and generally clarify what the script is doing. It also uses tabs as the separator which the OP had originally asked for so it'd handle empty fields and it coincidentally pretties-up the output a bit for this particular case.

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
    for (rowNr=1;rowNr<=NF;rowNr++) {
        cell[rowNr,NR] = $rowNr
    }
    maxRows = (NF > maxRows ? NF : maxRows)
    maxCols = NR
}
END {
    for (rowNr=1;rowNr<=maxRows;rowNr++) {
        for (colNr=1;colNr<=maxCols;colNr++) {
            printf "%s%s", cell[rowNr,colNr], (colNr < maxCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

The above solutions will work in any awk (except old, broken awk of course - there YMMV).

The above solutions do read the whole file into memory though - if the input files are too large for that then you can do this:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", (FNR>1 ? OFS : ""), $ARGIND }
ENDFILE {
    print ""
    if (ARGIND < NF) {
        ARGV[ARGC] = FILENAME
        ARGC++
    }
}
$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

which uses almost no memory but reads the input file once per number of fields on a line so it will be much slower than the version that reads the whole file into memory. It also assumes the number of fields is the same on each line and it uses GNU awk for ENDFILE and ARGIND but any awk can do the same with tests on FNR==1 and END.

@Jonathan Leffler 2009-11-13 15:54:49

And now to handle row and column labels too?

@ghostdog74 2009-11-13 17:14:15

no requirement for that.

@Jonathan Leffler 2009-11-13 17:20:26

OK - you're correct; your sample data doesn't match the question's sample data, but your code works fine on the question's sample data and gives the required output (give or take blank vs tab spacing). Mainly my mistake.

@Jonathan Leffler 2009-11-16 09:43:01

Interesting timings - I agree you see a performance benefit in awk. I was using MacOS X 10.5.8, which does not use 'gawk'; and I was using Perl 5.10.1 (32-bit build). I gather that your data was 10000 lines with 4 columns per line? Anyway, it doesn't matter a great deal; both awk and perl are viable solutions (and the awk solution is neater - the 'defined' checks in my Perl are necessary for warning free runs under strict/warnings) and neither is a slouch and both are likely to be way faster than the original shell script solution.

@ghostdog74 2009-11-16 09:48:05

yes, my data is just repetition till 10000 lines.

@Federico Giorgi 2009-11-16 10:18:47

On my original 2.2GB matrix, the perl solution is slightly faster than awk - 350.103s vs. 369.410s I was using perl 5.8.8 64bit

@ghostdog74 2009-11-16 10:27:55

i am using gawk 3.16a, Perl 5.10.0.

@porges 2009-11-16 20:48:17

mawk should be even faster

@tommy.carstensen 2013-04-07 23:40:49

What are the memory requirements of each of the two methods on your 2.2GB matrix/file?

@tommy.carstensen 2013-04-08 08:59:48

Which solution(s) did you end up using?

@Vytenis Bivainis 2014-05-25 10:38:08

Result can be piped through | column -t to make the result easier to read.

@zx8754 2014-06-05 14:56:52

Note: awk has maximum number of fields size=32767.

@keypoint 2015-10-03 00:38:03

Hi what's the meaning of this line "NF>p { p = NF }"? I didn't get it...thanks a lot

@daknowles 2016-04-03 00:24:48

Beautiful, but you might want to point out to non-awkers like me that what goes in test.awk is the bit between the quotes!

@user5359531 2016-04-05 01:36:38

I really like the awk solution, but is there an easy way to pipe the output of find, or somehow pass many files at once and output the transposed results of them all together?

@Ed Morton 2016-04-10 14:49:35

@tommy.carstensen the memory requirements are the same for both solutions since they both read the whole file into memory before printing in the new order

@Ed Morton 2016-04-10 14:50:36

@zx8754 that max number of fields only applies to an old, non-POSIX awk. Possibly the incredibly unfortunately named "nawk". It does not apply to gawk or other modern awks.

@Ed Morton 2016-04-10 14:51:37

@keypoint NF>p { p = NF } is identifying the max number of fields across all lines in the file just in case not all lines have the same number of fields so the tool can later print the max number of rows.

@Ed Morton 2016-04-10 14:52:27

@user5359531 just list all the files you're interested in on the command line: awk '...' file1 file2 ... fileN.

@Sigur 2017-04-08 02:46:42

I used your first awk code but after transposing, the leading zeros from 1st column data vanished. Any idea?

@Alex Reynolds 2019-03-18 22:46:01

Your awk solution worked much faster than GNU datamash on a very large file that would not fit into memory.

@user2350426 2016-01-28 22:46:04

An awk solution that store the whole array in memory

    awk '$0!~/^$/{    i++;
                  split($0,arr,FS);
                  for (j in arr) {
                      out[i,j]=arr[j];
                      if (maxr<j){ maxr=j}     # max number of output rows.
                  }
            }
    END {
        maxc=i                 # max number of output columns.
        for     (j=1; j<=maxr; j++) {
            for (i=1; i<=maxc; i++) {
                printf( "%s:", out[i,j])
            }
            printf( "%s\n","" )
        }
    }' infile

But we may "walk" the file as many times as output rows are needed:

#!/bin/bash
maxf="$(awk '{if (mf<NF); mf=NF}; END{print mf}' infile)"
rowcount=maxf
for (( i=1; i<=rowcount; i++ )); do
    awk -v i="$i" -F " " '{printf("%s\t ", $i)}' infile
    echo
done

Which (for a low count of output rows is faster than the previous code).

@pixelbeat 2016-01-07 09:08:01

Have a look at GNU datamash which can be used like datamash transpose. A future version will also support cross tabulation (pivot tables)

@agc 2016-04-10 09:44:33

This works: datamash -t ' ' transpose < input

@allanbcampbell 2014-11-06 12:06:48

If you only want to grab a single (comma delimited) line $N out of a file and turn it into a column:

head -$N file | tail -1 | tr ',' '\n'

@Dyno Fu 2015-08-19 07:43:32

#!/bin/bash

aline="$(head -n 1 file.txt)"
set -- $aline
colNum=$#

#set -x
while read line; do
  set -- $line
  for i in $(seq $colNum); do
    eval col$i="\"\$col$i \$$i\""
  done
done < file.txt

for i in $(seq $colNum); do
  eval echo \${col$i}
done

another version with set eval

@Ed Morton 2016-04-10 15:43:43

Read unix.stackexchange.com/questions/169716/… to understand some, but not all, of the problems with that solution.

@Guilherme Freitas 2015-06-10 17:57:51

Assuming all your rows have the same number of fields, this awk program solves the problem:

{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}

In words, as you loop over the rows, for every field f grow a ':'-separated string col[f] containing the elements of that field. After you are done with all the rows, print each one of those strings in a separate line. You can then substitute ':' for the separator you want (say, a space) by piping the output through tr ':' ' '.

Example:

$ echo "1 2 3\n4 5 6"
1 2 3
4 5 6

$ echo "1 2 3\n4 5 6" | awk '{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for (f=1;f<=NF;f++) print col[f]}' | tr ':' ' '
 1 4
 2 5
 3 6

@fedorqui 2015-05-12 07:48:30

I normally use this little awk snippet for this requirement:

  awk '{for (i=1; i<=NF; i++) a[i,NR]=$i
        max=(max<NF?NF:max)}
        END {for (i=1; i<=max; i++)
              {for (j=1; j<=NR; j++) 
                  printf "%s%s", a[i,j], (j==NR?RS:FS)
              }
        }' file

This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input.

This needs to keep track of the maximum amount of columns the initial file has, so that it is used as the number of rows to print back.

@stelleg 2014-08-26 03:03:40

Here's a Haskell solution. When compiled with -O2, it runs slightly faster than ghostdog's awk and slightly slower than Stephan's thinly wrapped c python on my machine for repeated "Hello world" input lines. Unfortunately GHC's support for passing command line code is non-existent as far as I can tell, so you will have to write it to a file yourself. It will truncate the rows to the length of the shortest row.

transpose :: [[a]] -> [[a]]
transpose = foldr (zipWith (:)) (repeat [])

main :: IO ()
main = interact $ unlines . map unwords . transpose . map words . lines

@Another.Chemist 2014-08-06 01:10:29

I was looking for a solution to transpose any kind of matrix (nxn or mxn) with any kind of data (numbers or data) and got the following solution:

Row2Trans=number1
Col2Trans=number2

for ((i=1; $i <= Line2Trans; i++));do
    for ((j=1; $j <=Col2Trans ; j++));do
        awk -v var1="$i" -v var2="$j" 'BEGIN { FS = "," }  ; NR==var1 {print $((var2)) }' $ARCHIVO >> Column_$i
    done
done

paste -d',' `ls -mv Column_* | sed 's/,//g'` >> $ARCHIVO

@user3251704 2014-01-30 05:27:17

I was just looking for similar bash tranpose but with support for padding. Here is the script I wrote based on fgm's solution, that seem to work. If it can be of help...

#!/bin/bash 
declare -a array=( )                      # we build a 1-D-array
declare -a ncols=( )                      # we build a 1-D-array containing number of elements of each row

SEPARATOR="\t";
PADDING="";
MAXROWS=0;
index=0
indexCol=0
while read -a line; do
    ncols[$indexCol]=${#line[@]};
((indexCol++))
if [ ${#line[@]} -gt ${MAXROWS} ]
    then
         MAXROWS=${#line[@]}
    fi    
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))

    done
done < "$1"

for (( ROW = 0; ROW < MAXROWS; ROW++ )); do
  COUNTER=$ROW;
  for (( indexCol=0; indexCol < ${#ncols[@]}; indexCol++ )); do
if [ $ROW -ge ${ncols[indexCol]} ]
    then
      printf $PADDING
    else
  printf "%s" ${array[$COUNTER]}
fi
if [ $((indexCol+1)) -lt ${#ncols[@]} ]
then
  printf $SEPARATOR
    fi
    COUNTER=$(( COUNTER + ncols[indexCol] ))
  done
  printf "\n" 
done

@flying sheep 2013-02-08 17:36:25

the transpose project on sourceforge is a coreutil-like C program for exactly that.

gcc transpose.c -o transpose
./transpose -t input > output #works with stdin, too.

@tommy.carstensen 2013-04-08 09:41:21

Thanks for the link. However, it requires too much memory, when dealing with large matrices/files.

@flying sheep 2013-04-08 14:54:54

it has arguments for blocksize and fieldsize: try tweaking the -b and -f arguments.

@tommy.carstensen 2013-04-10 16:27:41

Default block size (--block or -b) is 10kb and default field size (--fieldmax or -f) is 64, so that can't be it. I tried. Thanks for the suggestion though.

@discipulus 2016-11-08 03:10:01

Worked well with a csv of size 2 GB.

@ncemami 2016-11-28 06:40:59

For a matrix file with dimensions roughly 11k by 5k, I found transpose.c to be ~7x faster and ~5x more memory-efficient than the ghostdog74's first awk solution. Also, I found that the "uses almost no memory" awk code from ghostdog74 didn't work properly. Also, watch out for the --limit flag in the transpose.c program, which by default limits the output to dimension 1k by 1k.

@jan-glx 2018-03-04 20:14:19

In case you don't like sourceforge or it's down, I made a github mirror of the project.

@dtw 2010-03-21 22:39:57

I used fgm's solution (thanks fgm!), but needed to eliminate the tab characters at the end of each row, so modified the script thus:

#!/bin/bash 
declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns

index=0
while read -a line; do
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))
    done
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s" ${array[$COUNTER]}
    if [ $COUNTER -lt $(( ${#array[@]} - $COLS )) ]
    then
        printf "\t"
    fi
  done
  printf "\n" 
done

@Fritz G. Mehner 2009-11-19 15:11:58

Pure BASH, no additional process. A nice exercise:

declare -a array=( )                      # we build a 1-D-array

read -a line < "$1"                       # read the headline

COLS=${#line[@]}                          # save number of columns

index=0
while read -a line ; do
    for (( COUNTER=0; COUNTER<${#line[@]}; COUNTER++ )); do
        array[$index]=${line[$COUNTER]}
        ((index++))
    done
done < "$1"

for (( ROW = 0; ROW < COLS; ROW++ )); do
  for (( COUNTER = ROW; COUNTER < ${#array[@]}; COUNTER += COLS )); do
    printf "%s\t" ${array[$COUNTER]}
  done
  printf "\n" 
done

@bugloaf 2013-06-19 19:50:19

This worked for my file, although interestingly it prints out a directory listing for the first line of the table. I don't know enough BASH to figure out why.

@Hello71 2014-08-27 23:26:23

@bugloaf your table has a * in the corner.

@Dennis Williamson 2014-11-26 16:12:12

@bugloaf: Properly quoting variables should prevent that: printf "%s\t" "${array[$COUNTER]}"

@Jonathan Leffler 2009-11-14 19:54:04

Here is a moderately solid Perl script to do the job. There are many structural analogies with @ghostdog74's awk solution.

#!/bin/perl -w
#
# SO 1729824

use strict;

my(%data);          # main storage
my($maxcol) = 0;
my($rownum) = 0;
while (<>)
{
    my(@row) = split /\s+/;
    my($colnum) = 0;
    foreach my $val (@row)
    {
        $data{$rownum}{$colnum++} = $val;
    }
    $rownum++;
    $maxcol = $colnum if $colnum > $maxcol;
}

my $maxrow = $rownum;
for (my $col = 0; $col < $maxcol; $col++)
{
    for (my $row = 0; $row < $maxrow; $row++)
    {
        printf "%s%s", ($row == 0) ? "" : "\t",
                defined $data{$row}{$col} ? $data{$row}{$col} : "";
    }
    print "\n";
}

With the sample data size, the performance difference between perl and awk was negligible (1 millisecond out of 7 total). With a larger data set (100x100 matrix, entries 6-8 characters each), perl slightly outperformed awk - 0.026s vs 0.042s. Neither is likely to be a problem.


Representative timings for Perl 5.10.1 (32-bit) vs awk (version 20040207 when given '-V') vs gawk 3.1.7 (32-bit) on MacOS X 10.5.8 on a file containing 10,000 lines with 5 columns per line:

Osiris JL: time gawk -f tr.awk xxx  > /dev/null

real    0m0.367s
user    0m0.279s
sys 0m0.085s
Osiris JL: time perl -f transpose.pl xxx > /dev/null

real    0m0.138s
user    0m0.128s
sys 0m0.008s
Osiris JL: time awk -f tr.awk xxx  > /dev/null

real    0m1.891s
user    0m0.924s
sys 0m0.961s
Osiris-2 JL: 

Note that gawk is vastly faster than awk on this machine, but still slower than perl. Clearly, your mileage will vary.

@ghostdog74 2009-11-16 09:34:45

on my system, gawk outperforms perl. you can see my results in my edited post

@ghostdog74 2009-11-16 16:11:45

conclusion gathered: different platform, different software version, different results.

@Stephan202 2009-11-13 17:21:00

A Python solution:

python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < input > output

The above is based on the following:

import sys

for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip())):
    print(' '.join(c))

This code does assume that every line has the same number of columns (no padding is performed).

@krlmlr 2012-10-02 04:18:34

One minor problem here: Replace l.split() by l.strip().split() (Python 2.7), else the last line of the output is crippled. Works for arbitrary column separators, use l.strip().split(sep) and sep.join(c) if your separator is stored in variable sep.

@tommy.carstensen 2013-05-20 23:09:57

This solution reads everything into memory...

@Dennis Williamson 2009-11-13 16:54:28

If you have sc installed, you can do:

psc -r < inputfile | sc -W% - > outputfile

@Thor 2012-11-08 10:38:44

Note that this supports a limited number of lines because sc names its columns as one or a combination of two characters. The limit is 26 + 26^2 = 702.

@lalebarde 2016-02-27 09:20:08

Does not work for me.

@Simon C 2009-11-13 16:08:51

The only improvement I can see to your own example is using awk which will reduce the number of processes that are run and the amount of data that is piped between them:

/bin/rm output 2> /dev/null

cols=`head -n 1 input | wc -w` 
for (( i=1; i <= $cols; i++))
do
  awk '{printf ("%s%s", tab, $'$i'); tab="\t"} END {print ""}' input
done >> output

@Federico Giorgi 2009-11-13 15:49:11

A hackish perl solution can be like this. It's nice because it doesn't load all the file in memory, prints intermediate temp files, and then uses the all-wonderful paste

#!/usr/bin/perl
use warnings;
use strict;

my $counter;
open INPUT, "<$ARGV[0]" or die ("Unable to open input file!");
while (my $line = <INPUT>) {
    chomp $line;
    my @array = split ("\t",$line);
    open OUTPUT, ">temp$." or die ("unable to open output file!");
    print OUTPUT join ("\n",@array);
    close OUTPUT;
    $counter=$.;
}
close INPUT;

# paste files together
my $execute = "paste ";
foreach (1..$counter) {
    $execute.="temp$counter ";
}
$execute.="> $ARGV[1]";
system $execute;

@ghostdog74 2009-11-13 17:11:09

using paste and temp files are just extra unnecessary operations. you can just do manipulation inside memory itself, eg arrays/hashes

@Federico Giorgi 2009-11-16 11:49:01

Yep, but wouldn't that mean keeping everything in memory? The files I'm dealing with are around 2-20gb in size.

Related Questions

Sponsored Content

18 Answered Questions

[SOLVED] Loop through an array of strings in Bash?

  • 2012-01-16 13:21:16
  • Mo.
  • 986822 View
  • 1218 Score
  • 18 Answer
  • Tags:   arrays bash shell

17 Answered Questions

[SOLVED] Echo newline in Bash prints literal \n

  • 2011-12-11 21:01:54
  • Sergey
  • 1685636 View
  • 1927 Score
  • 17 Answer
  • Tags:   bash echo newline

9 Answered Questions

[SOLVED] Difference between sh and bash

  • 2011-04-20 03:33:16
  • Weiwei Yang
  • 405539 View
  • 1056 Score
  • 9 Answer
  • Tags:   bash shell unix sh

55 Answered Questions

[SOLVED] Get the source directory of a Bash script from within the script itself

  • 2008-09-12 20:39:56
  • Jiaaro
  • 1436556 View
  • 4351 Score
  • 55 Answer
  • Tags:   bash directory

17 Answered Questions

[SOLVED] How do I tell if a regular file does not exist in Bash?

  • 2009-03-12 14:48:43
  • Bill the Lizard
  • 2286362 View
  • 2894 Score
  • 17 Answer
  • Tags:   bash file-io scripting

29 Answered Questions

[SOLVED] How to concatenate string variables in Bash

36 Answered Questions

[SOLVED] Extract filename and extension in Bash

22 Answered Questions

[SOLVED] How to check if a string contains a substring in Bash

  • 2008-10-23 12:37:31
  • davidsheldon
  • 1659380 View
  • 2050 Score
  • 22 Answer
  • Tags:   string bash substring

34 Answered Questions

[SOLVED] How to check if a program exists from a Bash script?

  • 2009-02-26 21:52:49
  • gregh
  • 547897 View
  • 1848 Score
  • 34 Answer
  • Tags:   bash

11 Answered Questions

[SOLVED] Looping through the content of a file in Bash

  • 2009-10-05 17:52:54
  • Peter Mortensen
  • 1277488 View
  • 1113 Score
  • 11 Answer
  • Tags:   linux bash loops unix io

Sponsored Content