The following awk one-liner allows converting the FASTA-formatted sequence file $infile
into PHYLIP sequential format that is written into $outfile
:
awk '/^>/{seq[n]=s;fh=substr($0,2);gsub(/^[ \t]+|[ \t]+$/,"",fh);gsub(" ","_",fh);lbl[++n]=fh;(m<(l=length(fh)))&&m=l;s="";next} {gsub(" ","",$0);s=s$0}
END{seq[n]=s;print(b=" ")n" "length(seq[1]);x=0.5;while((x*=2)<m)b=b""b;while(++i<=n){print substr(lbl[i]""b,1,m)" "seq[i]}}' $infile > $outfile
Several programs allows a FASTA-formatted multiple sequence alignment file $infile
to be converted into NEXUS-formatted one $outfile
. Below are some examples.
The program Clustalw allows converting FASTA-formatted files into NEXUS ones with the following command line:
The program goalign allows converting FASTA-formatted files into NEXUS ones with the following command line:
The following command lines allow converting a PHYLIP-formatted sequence file $infile
into a FASTA-formatted one $outfile
.
The program goalign allows converting PHYLIP-formatted files into FASTA ones with the following command line:
awk '(NR==1){n=$1;next}
/^$/{next}
(!nbl){lbl[++i]=substr($0,1,(c=index($0," ")));seq[i]=substr($0,++c);if(i==n){nbl=(i=1);--i}next}
{seq[i]=seq[++i]$0;i=(i==n)?0:i}
END{while(++i<=n){print">"lbl[i];gsub(" ","",seq[i]);print seq[i]}}' $infile > $outfile
awk '(NR==1){n=$1;l=$2;next}
/^$/{next}
(li==0){lbl[++i]=substr($0,1,(c=index($0," ")));si=substr($0,++c);gsub(" ","",si);if((li=length(si))==l){seq[i]=si;li=0}next}
(li<l){si=si$0;gsub(" ","",si);if((li=length(si))==l){seq[i]=si;li=0}}
END{i=0;while(++i<=n){print">"lbl[i];print seq[i]}}' $infile > $outfile
The following command lines allow converting a PHYLIP-formatted sequence file $infile
into a NEXUS-formatted one $outfile
.
The program goalign allows converting PHYLIP-formatted files into NEXUS ones with the following command line:
ntax=$(awk 'NR==1{print$1;exit}' $infile); nchar=$(awk 'NR==1{print$2;exit}' $infile);
(echo -e "#NEXUS\n\nbegin data;\ndimensions ntax=$ntax nchar=$nchar;\nformat datatype=DNA missing=? gap=-;\nmatrix"; sed 1d $infile; echo -e ";\nend;";) > $outfile
ntax=$(awk 'NR==1{print$1;exit}' $infile); nchar=$(awk 'NR==1{print$2;exit}' $infile);
(echo -e "#NEXUS\n\nbegin data;\ndimensions ntax=$ntax nchar=$nchar;\nformat datatype=PROTEIN missing=? gap=-;\nmatrix"; sed 1d $infile; echo -e ";\nend;";) > $outfile