PhyloM is a selection of phylogenetic markers that are well-suited for phylogenetic tree inference. These selected markers are recommended for phylogenetic reconstruction because they have been shown to correspond to conserved genes within specific phyla. For each phylogenetic marker, reference multiple amino acid sequence alignments (MSA) and associated position specific scoring matrices (PSSM) are available for performing BLAST sequence similarity searches.


PhyloM: Bacillaceae
  91 markers for phylogenetic analyses of Bacillaceae taxa (Wu et al. 2013, Patel and Gupta 2020, Gupta et al. 2020)

PhyloM: bacteria
  74 markers for phylogenetic analyses within any bacterial phyla (derived from the meta-analysis of 10 different marker sets)

  8 markers for phylogenetic analyses within Nucleocytoplasmic large DNA virus phyla (Guglielmini et al. 2018)


Using a MSA for performing a BLAST search against an amino acid sequence databank

Each of the PhyloM MSA files can be used as a query for performing a psiblast search using the BLAST+ tools (Camacho et al. 2009). Let cds.faa be a FASTA-formatted amino acid sequence file (e.g. every CDS from a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in cds.faa

Next, a PhyloM MSA file msa.faa can be directly used as a query for performing a BLAST search with the following linux command line model:

 psiblast  -in_msa msa.faa  -db cds.faa  -seg no  -word_size 2  -evalue 0.05  -xdrop_gap_final 1000

Using a PSSM for performing a BLAST search against a nucleotide sequence databank

Each of the PhyloM PSSM files can be used as a query for performing a tblastn search using the BLAST+ tools (Camacho et al. 2009). Let seq.fna be a FASTA-formatted nucleotide sequence file (e.g. de novo assembly of a bacterial or virus genome). This databank should be first formatted with the following linux command line:

 makeblastdb  -in seq.fna  -dbtype nucl

Next, a PhyloM PSSM file pssm.smp can be directly used as a query for performing a BLAST search with the following linux command line model:

 tblastn  -in_pssm pssm.smp  -db seq.fna  -seg no  -word_size 2  -evalue 0.05  -xdrop_gap_final 1000

Of note, the corresponding full CDS can be easily extracted by using the program eFASTA along with the fields 2, 9 and 10 outputted by the tblastn option -outfmt 6. The tool eCDS can also be used to easily extract the full CDS associated to each tblastn hit.

