PhyloM is a selection of phylogenetic markers that are well-suited for phylogenetic tree inference.
These selected markers are recommended for phylogenetic reconstruction because they have been shown to correspond to conserved genes within specific phyla.
For each phylogenetic marker, reference multiple amino acid sequence alignments (MSA) and associated position specific scoring matrices (PSSM) are available for performing BLAST sequence similarity searches.
PhyloM: Bacillaceae
91 markers for phylogenetic analyses of Bacillaceae taxa (Wu et al. 2013, Patel and Gupta 2020, Gupta et al. 2020)
PhyloM: bacteria
74 markers for phylogenetic analyses within any bacterial phyla (derived from the meta-analysis of 10 different marker sets)
PhyloM: NCLDV
8 markers for phylogenetic analyses within Nucleocytoplasmic large DNA virus phyla (Guglielmini et al. 2018)
Each of the PhyloM MSA files can be used as a query for performing a psiblast search using the BLAST+ tools (Camacho et al. 2009).
Let cds.faa
be a FASTA-formatted amino acid sequence file (e.g. every CDS from a bacterial or virus genome).
This databank should be first formatted with the following linux command line:
makeblastdb -in cds.faa
Next, a PhyloM MSA file msa.faa
can be directly used as a query for performing a BLAST search with the following linux command line model:
psiblast -in_msa msa.faa -db cds.faa -seg no -word_size 2 -evalue 0.05 -xdrop_gap_final 1000
Each of the PhyloM PSSM files can be used as a query for performing a tblastn search using the BLAST+ tools (Camacho et al. 2009).
Let seq.fna
be a FASTA-formatted nucleotide sequence file (e.g. de novo assembly of a bacterial or virus genome).
This databank should be first formatted with the following linux command line:
makeblastdb -in seq.fna -dbtype nucl
Next, a PhyloM PSSM file pssm.smp
can be directly used as a query for performing a BLAST search with the following linux command line model:
tblastn -in_pssm pssm.smp -db seq.fna -seg no -word_size 2 -evalue 0.05 -xdrop_gap_final 1000
Of note, the corresponding full CDS can be easily extracted by using the program eFASTA along with the fields 2, 9 and 10 outputted by the tblastn option -outfmt 6
.
The tool eCDS can also be used to easily extract the full CDS associated to each tblastn hit.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10:421. doi:10.1186/1471-2105-10-421
Guglielmini J, Woo A, Krupovic M, Forterre P, Gaia M (2019) Diversification of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proceedings of the National Academy of Sciences, 116(39):19585-19592. doi:10.1073/pnas.1912006116
Gupta RS, Patel S, Saini N, Chen S (2020) Robust demarcation of 17 distinct Bacillus species clades, proposed as novel Bacillaceae genera, by phylogenomics and comparative genomic analyses: description of Robertmurraya kyonggiensis sp. nov. and proposal for an emended genus Bacillus limiting it only to the members of the Subtilis and Cereus clades of species. International Journal of Systematics and Evolutionary Microbiology, 70(11):5753-5798. doi:10.1099/ijsem.0.004475
Patel S, Gupta RS (2020) A phylogenomic and comparative genomic framework for resolving the polyphyly of the genus Bacillus: Proposal for six new genera of Bacillus species, Peribacillus gen. nov., Cytobacillus gen. nov., Mesobacillus gen. nov., Neobacillus gen. nov., Metabacillus gen. nov. and Alkalihalobacillus gen. nov. International Journal of Systematics and Evolutionary Microbiology, 70:406-438. doi:10.1099/ijsem.0.003775
Wu D, Jospin G, Eisen JA (2013) Systematic identification of gene families for use as "markers" for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One, 8(10):e77033. doi:10.1371/journal.pone.0077033