Institut Pasteur blankvertical divider clipartblank C3BI blankvertical divider clipartblank Bioinformatics and Biostatistics Hub blankvertical divider clipartblank GIPhy

DESCRIPTION      PHYLA      USAGE      LITTERATURE CITED

PhyloM


Description

PhyloM is a selection of phylogenetic markers that are well-suited for phylogenetic tree inference. These selected markers are recommended for phylogenetic reconstruction because they have been shown to correspond to conserved genes within specific phyla. For each phylogenetic marker, a reference multiple amino acid sequence alignment (MSA) and its associated position specific scoring matrices (PSSM) are available for performing BLAST sequence similarity searches.


Phyla

PhyloM: bacteria
  226 phylogenetic markers for phylogenetic analyses within bacterial phyla (Bratlie et al. 2010, Creevey et al. 2011, Wu et al. 2013)

PhyloM: NCLDV
  8 phylogenetic markers for phylogenetic analyses within Nucleocytoplasmic large DNA virus phyla (Guglielmini et al. 2018)


Usage

Using a MSA for performing a BLAST search against an amino acid sequence databank

Each of the PhyloM MSA files could be used as a query for performing a psiblast search with the BLAST+ tools (Camacho et al. 2009). Let cds.faa be a FASTA-formatted amino acid sequence file (e.g. every CDS from a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in cds.faa

Next, a PhyloM MSA file msa.faa could be directly used as a query for performing a BLAST search with the following linux command line model:

 psiblast  -in_msa msa.faa  -db cds.faa  -seg no  -word_size 2  -evalue 1E-20  -xdrop_gap_final 1000

Using a PSSM for performing a BLAST search against a nucleotide sequence databank

Each of the PhyloM PSSM files could be used as a query for performing a tblastn search with the BLAST+ tools (Camacho et al. 2009). Let seq.fna be a FASTA-formatted nucleotide sequence file (e.g. de novo assembly of a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in seq.fna  -dbtype nucl

Next, a PhyloM PSSM file pssm.smp could be directly used as a query for performing a BLAST search with the following linux command line model:

 tblastn  -in_pssm pssm.smp  -db seq.fna  -seg no  -word_size 2  -evalue 1E-20  -xdrop_gap_final 1000

Of note, the corresponding full CDS could be easily extracted by using the program eFASTA along with the fields 2, 9 and 10 outputed by the tblastn option -outfmt 6.


Litterature cited

Bratlie MS, Johansen J, Drablos F (2010) Relationship between operon preference and functional properties of persistent genes in bacterial genomes. BMC Genomics, 11:71. doi:10.1186/1471-2164-11-71

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10:421. doi:10.1186/1471-2105-10-421

Creevey CJ, Doerks T, Fitzpatrick DA, Raes J, Bork P (2011) Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS ONE, 6(8):e22099. doi:10.1371/journal.pone.0022099

Guglielmini J, Woo A, Krupovic M, Forterre P, Gaia M (2018) Diversification of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. bioRxiv. doi:10.1101/455816

Wu D, Jospin G, Eisen JA (2013) Systematic identification of gene families for use as "markers" for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One, 8(10):e77033. doi:10.1371/journal.pone.0077033