Distance matrix manipulation

How to symmetrize a PHYLIP-formatted matrix file?

How to estimate the arboricity coefficient from a matrix of evolutionary distances?

The following gawk one-liner allows the arboricity coefficient (Guénoche and Garreta 2001; [pdf]) to be estimated from the PHYLIP-formatted distance matrix file $infile (either square or lower-triangular). The arboricity coefficient assesses the overall treelikeness of the evolutionary distances, i.e. they share a strong phylogenetic signal when the arboricity coefficient is close to 1.

[180328ac]

How to perform a δ plot analysis from a matrix of evolutionary distances?

A δ plot (Holland et al. 2002) displays a measure for treelikeness of quartets in terms of a histogram with $nb intervals. The following gawk one-liner allows the δ plot to be estimated from the PHYLIP-formatted distance matrix file $infile (either square or lower-triangular):

Every troublesome taxon indexes δx (see Holland et al. 2002) could also be estimated and sorted with the following gawk one-liner:

[180128ac]

How to estimate a troublesome taxon index from a matrix of evolutionary distances?

In complement to the δ plot method, the following gawk one-liner allows a troublesome taxon index (tti) to be estimated for each taxon from the PHYLIP-formatted distance matrix file $infile (either square or lower-triangular). Following the same basis as the arboricity coefficient, the tti of a taxon x is the proportion of the taxon quartets containing x that strongly violate the quadrangular inequality property. Results are sorted from the more to the less troublesome taxa.

[180128ac]