Institut Pasteur blankvertical divider clipartblank DBC blankvertical divider clipartblank Bioinformatics and Biostatistics Hub blankvertical divider clipartblank GIPhy

DESCRIPTION      PHYLA      USAGE      LITTERATURE CITED

PhyloM


Description

PhyloM is a selection of phylogenetic markers that are well-suited for phylogenetic tree inference. These selected markers are recommended for phylogenetic reconstruction because they have been shown to correspond to conserved genes within specific phyla. For each phylogenetic marker, reference multiple amino acid sequence alignments (MSA) and associated position specific scoring matrices (PSSM) are available for performing BLAST sequence similarity searches.


Phyla

PhyloM: Bacillaceae
  91 markers for phylogenetic analyses of Bacillaceae taxa (Wu et al. 2013, Patel and Gupta 2020, Gupta et al. 2020)

PhyloM: bacteria
  236 markers for phylogenetic analyses within bacterial phyla (Bratlie et al. 2010, Creevey et al. 2011, Wu et al. 2013; Parks et al. 2017)

PhyloM: NCLDV
  8 markers for phylogenetic analyses within Nucleocytoplasmic large DNA virus phyla (Guglielmini et al. 2018)


Usage

Using a MSA for performing a BLAST search against an amino acid sequence databank

Each of the PhyloM MSA files can be used as a query for performing a psiblast search using the BLAST+ tools (Camacho et al. 2009). Let cds.faa be a FASTA-formatted amino acid sequence file (e.g. every CDS from a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in cds.faa

Next, a PhyloM MSA file msa.faa can be directly used as a query for performing a BLAST search with the following linux command line model:

 psiblast  -in_msa msa.faa  -db cds.faa  -seg no  -word_size 2  -evalue 0.05  -xdrop_gap_final 1000

Using a PSSM for performing a BLAST search against a nucleotide sequence databank

Each of the PhyloM PSSM files can be used as a query for performing a tblastn search using the BLAST+ tools (Camacho et al. 2009). Let seq.fna be a FASTA-formatted nucleotide sequence file (e.g. de novo assembly of a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in seq.fna  -dbtype nucl

Next, a PhyloM PSSM file pssm.smp can be directly used as a query for performing a BLAST search with the following linux command line model:

 tblastn  -in_pssm pssm.smp  -db seq.fna  -seg no  -word_size 2  -evalue 0.05  -xdrop_gap_final 1000

Of note, the corresponding full CDS can be easily extracted by using the program eFASTA along with the fields 2, 9 and 10 outputted by the tblastn option -outfmt 6. The tool eCDS can also be used to easily extract the full CDS associated to each tblastn hit.


Litterature cited

Bratlie MS, Johansen J, Drablos F (2010) Relationship between operon preference and functional properties of persistent genes in bacterial genomes. BMC Genomics, 11:71. doi:10.1186/1471-2164-11-71

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10:421. doi:10.1186/1471-2105-10-421

Creevey CJ, Doerks T, Fitzpatrick DA, Raes J, Bork P (2011) Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS ONE, 6(8):e22099. doi:10.1371/journal.pone.0022099

Guglielmini J, Woo A, Krupovic M, Forterre P, Gaia M (2019) Diversification of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proceedings of the National Academy of Sciences, 116(39):19585-19592. doi:10.1073/pnas.1912006116

Gupta RS, Patel S, Saini N, Chen S (2020) Robust demarcation of 17 distinct Bacillus species clades, proposed as novel Bacillaceae genera, by phylogenomics and comparative genomic analyses: description of Robertmurraya kyonggiensis sp. nov. and proposal for an emended genus Bacillus limiting it only to the members of the Subtilis and Cereus clades of species. International Journal of Systematics and Evolutionary Microbiology, 70(11):5753-5798. doi:10.1099/ijsem.0.004475

Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2:1533-1542. doi:10.1038/s41564-017-0012-7

Patel S, Gupta RS (2020) A phylogenomic and comparative genomic framework for resolving the polyphyly of the genus Bacillus: Proposal for six new genera of Bacillus species, Peribacillus gen. nov., Cytobacillus gen. nov., Mesobacillus gen. nov., Neobacillus gen. nov., Metabacillus gen. nov. and Alkalihalobacillus gen. nov. International Journal of Systematics and Evolutionary Microbiology, 70:406-438. doi:10.1099/ijsem.0.003775

Wu D, Jospin G, Eisen JA (2013) Systematic identification of gene families for use as "markers" for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One, 8(10):e77033. doi:10.1371/journal.pone.0077033