Distance matrix file conversion

How to convert a PHYLIP-formatted square distance matrix into a PHYLIP-formatted lower-triangular one?

How to convert a PHYLIP-formatted lower-triangular distance matrix into a PHYLIP-formatted square one?

How to extract a submatrix from a PHYLIP-formatted one?

Given a PHYLIP-formatted square distance matrix file $infile and a list of labels $taxfile (one per line), the following gawk one-liner allows the corresponding submatrix to be extracted and written into $outfile:

Of note, the above one-liner could be used to reorder the distance matrix following the order of the labels inside $taxfile.

[181212ac]

How to deal with the One Entry Per Line (OEPL) matrix format?

The OEPL format is useful for dealing with matrix files, especially when estimating each entry simultaneously (e.g. parallel computing). The OEPL format is very simple: the first line is made up by the n labels separated by blank spaces, and each remaining lines are made up by three columns: row index i, column index j, and value of the entry ij (row and column start at index 1).

Transforming a PHYLIP-formatted distance matrix file $infile (either square or lower-triangular) into an OEPL-formatted file $outfile could be easily performed with the following command line:

or the following one without tac:

Reciprocally, an OEPL-formatted distance matrix file $infile could be easily transformed into a PHYLIP-formatted file $outfile with the following gawk one-liner ($prec is the number of decimal places):

Of note, the above gawk one-liner returns a square matrix. A lower-triangular matrix could be obtained by replacing while(++j<n) with while(++j<i).

[181212ac]