Multiple sequence alignment conversion

How to convert a FASTA-formatted multiple sequence alignment file into a PHYLIP sequential one?
How to convert a FASTA-formatted multiple sequence alignment file into a NEXUS one?
How to convert a relaxed PHYLIP-formatted multiple sequence alignment file into a FASTA one?
How to convert a PHYLIP-formatted multiple sequence alignment into a NEXUS one?

How to convert a FASTA-formatted multiple sequence alignment file into a PHYLIP sequential one?

The following awk one-liner allows converting the FASTA-formatted sequence file $infile into PHYLIP sequential format that is written into $outfile:

awk '/^>/{seq[n]=s;fh=substr($0,2);gsub(/^[ \t]+|[ \t]+$/,"",fh);gsub(" ","_",fh);lbl[++n]=fh;(m<(l=length(fh)))&&m=l;s="";next} {gsub(" ","",$0);s=s$0}
     END{seq[n]=s;print(b=" ")n" "length(seq[1]);x=0.5;while((x*=2)<m)b=b""b;while(++i<=n){print substr(lbl[i]""b,1,m)" "seq[i]}}'  $infile > $outfile

[190129ac]

How to convert a FASTA-formatted multiple sequence alignment file into a NEXUS one?

Several programs allows a FASTA-formatted multiple sequence alignment file $infile to be converted into NEXUS-formatted one $outfile. Below are some examples.

Clustalw

The program Clustalw allows converting FASTA-formatted files into NEXUS ones with the following command line:

clustalw -CONVERT -OUTPUT=NEXUS -QUIET -INFILE=$infile -OUTFILE=$outfile

[170222ac]

goalign

The program goalign allows converting FASTA-formatted files into NEXUS ones with the following command line:

goalign reformat nexus -i $infile -o $outfile

[1702312fl]

How to convert a relaxed PHYLIP-formatted multiple sequence alignment file into a FASTA one?

The following command lines allow converting a PHYLIP-formatted sequence file $infile into a FASTA-formatted one $outfile.

goalign

The program goalign allows converting PHYLIP-formatted files into FASTA ones with the following command line:

goalign reformat fasta -p -i $infile -o $outfile

[1702312fl]

awk

PHYLIP interleaved:

awk '(NR==1){n=$1;next} 
     /^$/{next}
     (!nbl){lbl[++i]=substr($0,1,(c=index($0," ")));seq[i]=substr($0,++c);if(i==n){nbl=(i=1);--i}next}
     {seq[i]=seq[++i]$0;i=(i==n)?0:i}
     END{while(++i<=n){print">"lbl[i];gsub(" ","",seq[i]);print seq[i]}}' $infile > $outfile

PHYLIP sequential:

awk '(NR==1){n=$1;l=$2;next}
     /^$/{next}
     (li==0){lbl[++i]=substr($0,1,(c=index($0," ")));si=substr($0,++c);gsub(" ","",si);if((li=length(si))==l){seq[i]=si;li=0}next}
     (li<l){si=si$0;gsub(" ","",si);if((li=length(si))==l){seq[i]=si;li=0}}
     END{i=0;while(++i<=n){print">"lbl[i];print seq[i]}}' $infile > $outfile

[170222ac]

How to convert a PHYLIP-formatted multiple sequence alignment into a NEXUS one?

The following command lines allow converting a PHYLIP-formatted sequence file $infile into a NEXUS-formatted one $outfile.

goalign

The program goalign allows converting PHYLIP-formatted files into NEXUS ones with the following command line:

goalign reformat nexus -i $infile -p -o $outfile

[1702312fl]

Bash

nucleotide sequences

ntax=$(awk 'NR==1{print$1;exit}' $infile); nchar=$(awk 'NR==1{print$2;exit}' $infile);
(echo -e "#NEXUS\n\nbegin data;\ndimensions ntax=$ntax nchar=$nchar;\nformat datatype=DNA missing=? gap=-;\nmatrix"; sed 1d $infile; echo -e ";\nend;";) > $outfile

amino acid sequences

ntax=$(awk 'NR==1{print$1;exit}' $infile); nchar=$(awk 'NR==1{print$2;exit}' $infile);
(echo -e "#NEXUS\n\nbegin data;\ndimensions ntax=$ntax nchar=$nchar;\nformat datatype=PROTEIN missing=? gap=-;\nmatrix"; sed 1d $infile; echo -e ";\nend;";) > $outfile

[181212ac]