Multiple sequence alignment

How to perform a multiple sequence alignment from amino acid sequences?

The following command lines allows writing into $outfile the multiple sequence alignment inferred from the amino acid sequences in $infile.

MAFFT

MAFFT allows observing a good tradeoff between accuracy and overall running time. Depending on the expected locations of the gapped regions within the inferred multiple sequence alignment, there exists several options. If the sequences to align are expected to contain unalignable regions, the following command line is recommended:

If the sequences are expected to contain one alignable domain flanked by unalignable segments, the following command line is recommended:

If every residue of all sequences are expected to be alignable, the following command line is recommended:

Finally, the following command line could be used is unsure of the overall structure of the multiple sequence alignment:

Of note, the overall speed could be improved by launching MAFFT on multiple threads (i.e. option --thread), especially when using options --genafpair or --localpair which are known to require long running times. Finally it is also expected to obtain slightly more accurate multiple sequence alignments when tuning substitution matrix (i.e. option --bl) and both gap opening and extending penalties (i.e. options --op and --ep, respectively; see e.g. Long et al. 2016).

[170328ac]

How to perform a multiple sequence alignment from nucleotide sequences?

The following command lines allows writing into $outfile the multiple sequence alignment inferred from the nucleotide sequences in $infile.

MAFFT

MAFFT allows observing a good tradeoff between accuracy and overall running time. Depending on the expected locations of the gapped regions within the inferred multiple sequence alignment, there exists several options. If the sequences to align are expected to contain unalignable regions, the following command line is recommended:

If the sequences are expected to contain one alignable domain flanked by unalignable segments, the following command line is recommended:

If every residue of all sequences are expected to be alignable, the following command line is recommended:

Finally, the following command line could be used is unsure of the overall structure of the multiple sequence alignment:

Of note, the overall speed could be improved by launching MAFFT on multiple threads (i.e. option --thread), especially when using options --genafpair or --localpair which are known to require long running times.

[170328ac]