PhyloM: bacteria is a selection of 74 universal single-copy genes (USCG) that can be used as well-suited markers for phylogenetic tree inference of bacterial taxa. These selected markers are recommended for phylogenetic reconstruction, as they have been shown to correspond to persistent genes within bacterial phyla (close to universal distribution). This collection is derived from the meta-analysis of ten previously published sets of USCG that are also available from this webpage.
For each phylogenetic marker, a standard gene name is provided, together with three different files:
multiple amino acid sequence alignments (MSA),
hidden Markov models (HMM),
position specific scoring matrices (PSSM).
These files were gathered from reference databanks (when available):
• COG (Tatusov et al. 1997, 2003; Galperin et al. 2015, 2021),
• Pfam (Sonnhammer et al. 1997, 1998; Finn et al. 2016),
• TIGRFAMs (Haft et al. 2001, 2003, 2013).
For each marker, selected MSA, HMM and PSSM files (among the COG, Pfam and TIGRFAMs ones) are also provided (labelled PMB).
[last update: 24.10.31]
For each compiled marker sets, this section provides the list of the corresponding gene names (), together with three selected (PMB) datafiles (contained into tar.gz archives):
multiple amino acid sequence alignments (MSA),
hidden Markov models (HMM),
position specific scoring matrices (PSSM).
This small set of 31 genes was elaborated to infer a global phylogenetic tree of archaea, bacteria and eukaryotes.
This dataset was also used by Sorek et al. (2007).
marker set | no. loci | gene names | MSA | HMM | PSSM |
Cic31 | 31 |
These 40 genes were originally compiled to study horizontal gene transfers within bacteria.
The same market set was also used in the works of Sunagawa et al. (2013), and by Mende et al. (2013) in their tool specI.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Cre40 | 40 |
Originally designed to assess the completeness of different metagenomes, these 107 phylogenetic markers were also used by Ankenbrand and Keller (2016) in their phylogenetic tree reconstruction tool bcgTree.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Dup107 | 107 |
These 114 phylogenetic markers comprised 40 genes spanning the archaea+bacteria domains, together with 74 bacterial-specific ones.
Before the development of the PhyEco set, the same research group had previously described a limited set of 31 USCG for the phylogenetic reconstruction tool AMPHORA (Wu and Eisen 2008; see also Wu and Scott 2012).
marker set | no. loci | gene names | MSA | HMM | PSSM |
PhyEco | 114 |
This limited set of 36 genes was designed for the metagenomic data binning tool CONCOCT.
The same USCG set was used by Quince et al. (2017) for the alternative tool DESMAN.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Aln36 | 36 |
These 73 phylogenetic markers were gathered to quantify the level of overall relatedness between prokaryote genomes.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Lan73 | 73 |
First compiled to classify genome assemblies derived from metagenomic data, this set of 120 markers, named bac120 by Parks et al. (2017), was next used to infer phylogenetic trees in order to assess and revise bacterial species classifications (Parks et al. 2018, 2020, 2022).
marker set | no. loci | gene names | MSA | HMM | PSSM |
bac120 | 120 |
This USCG set was developped to infer a bacterial phylogenetic tree and discuss its putative rooting. Of note, Coleman et al. (2021) described 62 loci, but the published list contains a duplicated one (i.e. radA: K04485; see Supplementary Table S1); in consequence, only 61 loci are compiled in the present marker set.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Col61 | 61 |
This revision of a first USCG set (UBCG: 92 genes; Na et al. 2018) was computed using two other marker sets (i.e. Dup107 and bac120), leading to an updated list of 81 genes.
marker set | no. loci | gene names | MSA | HMM | PSSM |
UBCG2 | 81 |
These 85 genes are derived from a reanalysis of the three marker sets Dup107, bac120 and UBCG2. Tian and Imanian (2023) also described a smaller subset of 20 genes (i.e. VBCG) that was not considered here.
marker set | no. loci | gene names | MSA | HMM | PSSM |
Tia85 | 85 |
The ten above marker sets correspond to a total of 194 putative USCG, each occurring diversely within these different sets (see the zoomable upset plot below). This discrepancy is caused by the number and the diversity of the genomes considered by each analysis, as well as the disparate criteria to define a USCG among the different works. The PhyloM set contains the 74 genes that were assessed as a USCG in at least 50% of the previous studies.
marker set | no. loci | gene names | MSA | HMM | PSSM |
PhyloM | 74 |
The following table itemizes the overall 194 phylogenetic markers.
Each marker is labelled by a common gene name.
The corresponding coding sequences (CDS) from the Escherichia coli strain K12 substr. MG1655 genome (Genbank accn: NC_000913) are also indicated (if any).
For each gene, the corresponding COG, Pfam and TIGR identifiers are given (if any), together with the associated MSA (), HMM () and PSSM () files.
The column PMB lists a selection of recommended MSA (), HMM () and PSSM () files for each gene.
Presence/absence of each phylogenetic marker is ticked in columns Cic31, Cre40, Dup107, PhyEco, Aln36, Lan73, bac120, Col61, UBCG2, Tia85 and PhyloM.
name | E. coli CDS | COG | Pfam | TIGRFAMs | PMB | Cic31 | Cre40 | Dup107 | PhyEco | Aln36 | Lan73 | bac120 | Col61 | UBCG2 | Tia85 | PhyloM |
alaS | NP_417177 | COG0013 | pfam01411 | TIGR00344 | PMB051 | |||||||||||
arfB | NP_414733 | COG1186 | pfam00472 | TIGR00020 | PMB086 | |||||||||||
argS | NP_416390 | COG0018 | pfam00750 | TIGR00456 | PMB075 | |||||||||||
aspS | NP_416380 | COG0173 | pfam00152 | TIGR00459 | PMB106 | |||||||||||
atpD | NP_418188 | COG0055 | pfam00006 | TIGR01039 | PMB107 | |||||||||||
atpG | NP_418189 | COG0224 | pfam00231 | TIGR01146 | PMB135 | |||||||||||
cdsA | NP_414717 | COG0575 | pfam01148 | PMB136 | ||||||||||||
cgtA | NP_417650 | COG0536 | pfam01018 | TIGR02729 | PMB052 | |||||||||||
clpP | NP_414971 | COG0740 | pfam00574 | TIGR00493 | PMB137 | |||||||||||
clpX | NP_414972 | COG1219 | pfam07724 | TIGR00382 | PMB108 | |||||||||||
coaD | NP_418091 | COG0669 | pfam01467 | TIGR01510 | PMB138 | |||||||||||
coaE | NP_414645 | COG0237 | pfam01121 | TIGR00152 | PMB076 | |||||||||||
cysS | NP_415059 | COG0215 | pfam01406 | TIGR00435 | PMB087 | |||||||||||
der | NP_417006 | COG1160 | TIGR03594 | PMB053 | ||||||||||||
dnaA | NP_418157 | COG0593 | pfam00308 | TIGR00362 | PMB077 | |||||||||||
dnaB | NP_418476 | COG0305 | pfam03796 | TIGR00665 | PMB139 | |||||||||||
dnaE | NP_414726 | COG0587 | pfam07733 | TIGR00594 | PMB140 | |||||||||||
dnaG | NP_417538 | COG0358 | pfam08275 | TIGR01391 | PMB054 | |||||||||||
dnaK | NP_414555 | COG0443 | pfam00012 | TIGR02350 | PMB109 | |||||||||||
dnaN | NP_418156 | COG0592 | pfam02767 | TIGR00663 | PMB088 | |||||||||||
dnaX | NP_415003 | COG2812 | pfam13177 | TIGR02397 | PMB089 | |||||||||||
dxr | NP_414715 | COG0743 | pfam02670 | TIGR00243 | PMB141 | |||||||||||
efp | NP_418571 | COG0231 | pfam09285 | TIGR00038 | PMB142 | |||||||||||
era | NP_417061 | COG1159 | TIGR00436 | PMB090 | ||||||||||||
exoIX | - | COG0258 | pfam02739 | PMB143 | ||||||||||||
ffh | NP_417101 | COG0541 | pfam00448 | TIGR00959 | PMB021 | |||||||||||
fmt | NP_417746 | COG0223 | pfam00551 | TIGR00460 | PMB078 | |||||||||||
frr | NP_414714 | COG0233 | pfam01765 | TIGR00496 | PMB041 | |||||||||||
ftsA | NP_414636 | COG0849 | pfam14450 | TIGR01174 | PMB110 | |||||||||||
ftsY | NP_417921 | COG0552 | pfam00448 | TIGR00064 | PMB034 | |||||||||||
ftsZ | NP_414637 | COG0206 | pfam12327 | TIGR00065 | PMB144 | |||||||||||
fusA | NP_417799 | COG0480 | TIGR00484 | PMB111 | ||||||||||||
gatA | - | COG0154 | pfam01425 | TIGR00132 | PMB145 | |||||||||||
gidA | NP_418197 | COG0445 | pfam01134 | TIGR00136 | PMB146 | |||||||||||
glnS | NP_416899 | COG0008 | pfam00749 | TIGR00464 | PMB147 | |||||||||||
glyA | NP_417046 | COG0112 | pfam00464 | PMB148 | ||||||||||||
glyS | NP_418016 | COG0751 | pfam02092 | TIGR00211 | PMB112 | |||||||||||
gmk | NP_418105 | COG0194 | pfam00625 | TIGR03263 | PMB079 | |||||||||||
groEL | NP_418567 | COG0459 | pfam00118 | TIGR02348 | PMB113 | |||||||||||
grpE | NP_417104 | COG0576 | pfam01025 | PMB114 | ||||||||||||
guaB | NP_417003 | COG0516 | pfam00478 | TIGR01302 | PMB149 | |||||||||||
gyrA | NP_416734 | COG0188 | pfam00521 | TIGR01063 | PMB091 | |||||||||||
gyrB | YP_026241 | COG0187 | pfam00204 | TIGR01059 | PMB092 | |||||||||||
hemC | YP_026260 | COG0181 | pfam01379 | TIGR00212 | PMB150 | |||||||||||
hemE | NP_418425 | COG0407 | pfam01208 | TIGR01464 | PMB151 | |||||||||||
hemN | NP_418303 | COG0635 | pfam04055 | TIGR00538 | PMB152 | |||||||||||
hisS | NP_417009 | COG0124 | pfam13393 | TIGR00442 | PMB042 | |||||||||||
holA | NP_415173 | COG1466 | pfam06144 | TIGR01128 | PMB153 | |||||||||||
ileS | NP_414567 | COG0060 | pfam00133 | TIGR00392 | PMB043 | |||||||||||
infA | NP_415404 | COG0361 | pfam01176 | TIGR00008 | PMB115 | |||||||||||
infB | NP_417637 | COG0532 | pfam11987 | TIGR00487 | PMB035 | |||||||||||
infC | NP_416233 | COG0290 | pfam00707 | TIGR00168 | PMB055 | |||||||||||
ispF | NP_417226 | COG0245 | pfam02542 | TIGR00151 | PMB154 | |||||||||||
lepA | NP_417064 | COG0481 | pfam06421 | TIGR01393 | PMB056 | |||||||||||
leuS | NP_415175 | COG0495 | pfam13603 | TIGR00396 | PMB022 | |||||||||||
ligA | NP_416906 | COG0272 | pfam01653 | TIGR00575 | PMB093 | |||||||||||
manB | - | COG1109 | pfam02878 | PMB155 | ||||||||||||
map | NP_414710 | COG0024 | pfam00557 | TIGR00500 | PMB156 | |||||||||||
metG | NP_416617 | COG0143 | pfam09334 | TIGR00398 | PMB116 | |||||||||||
mfd | NP_415632 | COG1197 | TIGR00580 | PMB117 | ||||||||||||
miaA | NP_418592 | COG0324 | pfam01715 | TIGR00174 | PMB157 | |||||||||||
mraY | NP_414629 | COG0472 | pfam00953 | TIGR00445 | PMB158 | |||||||||||
mreC | NP_417716 | COG1792 | pfam04085 | TIGR00219 | PMB159 | |||||||||||
murB | NP_418403 | COG0812 | pfam02873 | TIGR00179 | PMB160 | |||||||||||
murC | NP_414633 | COG0773 | TIGR01082 | PMB161 | ||||||||||||
murD | NP_414630 | COG0771 | TIGR01087 | PMB118 | ||||||||||||
murE | NP_414627 | COG0769 | TIGR01085 | PMB162 | ||||||||||||
murG | NP_414632 | COG0707 | pfam03033 | TIGR01133 | PMB163 | |||||||||||
mutL | NP_418591 | COG0323 | pfam08676 | TIGR00585 | PMB164 | |||||||||||
mutS | NP_417213 | COG0249 | pfam00488 | TIGR01070 | PMB165 | |||||||||||
nrdA | NP_416737 | COG0209 | pfam02867 | TIGR02506 | PMB166 | |||||||||||
nrdR | NP_414947 | COG1327 | pfam03477 | TIGR00244 | PMB167 | |||||||||||
nusA | NP_417638 | COG0195 | pfam08529 | TIGR01953 | PMB057 | |||||||||||
nusB | NP_414950 | COG0781 | pfam01029 | TIGR01951 | PMB168 | |||||||||||
nusG | NP_418409 | COG0250 | pfam02357 | TIGR00922 | PMB044 | |||||||||||
pepP | - | COG0006 | pfam01321 | PMB169 | ||||||||||||
pgk | NP_417401 | COG0126 | pfam00162 | PMB094 | ||||||||||||
pheS | NP_416229 | COG0016 | pfam01409 | TIGR00468 | PMB001 | |||||||||||
pheT | NP_416228 | COG0072 | pfam17759 | TIGR00472 | PMB036 | |||||||||||
plsX | NP_415608 | COG0416 | pfam02504 | TIGR00182 | PMB170 | |||||||||||
pnp | NP_417633 | COG1185 | pfam01138 | TIGR03591 | PMB095 | |||||||||||
prfA | NP_415729 | COG0216 | pfam03462 | TIGR00019 | PMB045 | |||||||||||
prfB | NP_418300 | COG0749 | pfam00476 | TIGR00593 | PMB096 | |||||||||||
priA | NP_418370 | COG1198 | pfam17764 | TIGR00595 | PMB171 | |||||||||||
proS | NP_414736 | COG0442 | pfam04073 | TIGR00409 | PMB119 | |||||||||||
pth | NP_415722 | COG0193 | pfam01195 | TIGR00447 | PMB172 | |||||||||||
purB | NP_415649 | COG0015 | pfam00206 | TIGR00928 | PMB173 | |||||||||||
purM | NP_416994 | COG0150 | pfam02769 | TIGR00878 | PMB174 | |||||||||||
pyrG | NP_417260 | COG0504 | pfam06418 | TIGR00337 | PMB058 | |||||||||||
pyrH | NP_414713 | COG0528 | pfam00696 | TIGR02075 | PMB120 | |||||||||||
radA | NP_418806 | COG1066 | TIGR00416 | PMB097 | ||||||||||||
rbfA | NP_417636 | COG0858 | pfam02033 | TIGR00082 | PMB098 | |||||||||||
recA | NP_417179 | COG0468 | pfam00154 | TIGR02012 | PMB059 | |||||||||||
recF | NP_418155 | COG1195 | TIGR00611 | PMB175 | ||||||||||||
recG | NP_418109 | COG1200 | pfam00270 | TIGR00643 | PMB176 | |||||||||||
recN | YP_026172 | COG0497 | pfam13476 | TIGR00634 | PMB121 | |||||||||||
recR | NP_415005 | COG0353 | pfam13662 | TIGR00615 | PMB099 | |||||||||||
ribF | NP_414566 | COG0196 | pfam06574 | TIGR00083 | PMB122 | |||||||||||
rimM | NP_417099 | COG0806 | pfam01782 | TIGR02273 | PMB123 | |||||||||||
rimP | NP_417639 | COG0779 | pfam02576 | PMB177 | ||||||||||||
rlmB | NP_418601 | COG0566 | pfam00588 | TIGR00186 | PMB178 | |||||||||||
rnc | NP_417062 | COG0571 | pfam14622 | TIGR02191 | PMB100 | |||||||||||
rnhB | NP_414725 | COG0164 | pfam01351 | PMB124 | ||||||||||||
rpL1 | NP_418411 | COG0081 | pfam00687 | TIGR01169 | PMB002 | |||||||||||
rpL2 | NP_417776 | COG0090 | pfam03947 | TIGR01171 | PMB014 | |||||||||||
rpL3 | NP_417779 | COG0087 | pfam00297 | TIGR03625 | PMB003 | |||||||||||
rpL4 | NP_417778 | COG0088 | pfam00573 | TIGR03953 | PMB023 | |||||||||||
rpL5 | NP_417767 | COG0094 | pfam00673 | PMB015 | ||||||||||||
rpL6 | NP_417764 | COG0097 | pfam00347 | TIGR03654 | PMB004 | |||||||||||
rpL7L12 | NP_418413 | COG0222 | pfam00542 | TIGR00855 | PMB060 | |||||||||||
rpL9 | NP_418624 | COG0359 | pfam03948 | TIGR00158 | PMB061 | |||||||||||
rpL10 | NP_418412 | COG0244 | pfam00466 | PMB037 | ||||||||||||
rpL11 | NP_418410 | COG0080 | pfam00298 | TIGR01632 | PMB005 | |||||||||||
rpL13 | NP_417698 | COG0102 | pfam00572 | TIGR01066 | PMB006 | |||||||||||
rpL14 | NP_417769 | COG0093 | pfam00238 | TIGR01067 | PMB016 | |||||||||||
rpL15 | NP_417760 | COG0200 | pfam00828 | TIGR01071 | PMB024 | |||||||||||
rpL16 | NP_417772 | COG0197 | pfam00252 | TIGR01164 | PMB007 | |||||||||||
rpL17 | NP_417753 | COG0203 | pfam01196 | TIGR00059 | PMB062 | |||||||||||
rpL18 | NP_417763 | COG0256 | pfam00861 | TIGR00060 | PMB025 | |||||||||||
rpL19 | NP_417097 | COG0335 | pfam01245 | TIGR01024 | PMB080 | |||||||||||
rpL20 | NP_416231 | COG0292 | pfam00453 | TIGR01032 | PMB046 | |||||||||||
rpL21 | NP_417653 | COG0261 | pfam00829 | TIGR00061 | PMB063 | |||||||||||
rpL22 | NP_417774 | COG0091 | pfam00237 | TIGR01044 | PMB017 | |||||||||||
rpL23 | NP_417777 | COG0089 | pfam00276 | TIGR03636 | PMB064 | |||||||||||
rpL24 | NP_417768 | COG0198 | pfam17136 | TIGR01079 | PMB038 | |||||||||||
rpL25 | NP_416690 | COG1825 | pfam01386 | TIGR00731 | PMB179 | |||||||||||
rpL27 | NP_417652 | COG0211 | pfam01016 | TIGR00062 | PMB081 | |||||||||||
rpL28 | NP_418094 | COG0227 | pfam00830 | TIGR00009 | PMB180 | |||||||||||
rpL29 | NP_417771 | COG0255 | pfam00831 | TIGR00012 | PMB082 | |||||||||||
rpL32 | NP_415607 | COG0333 | pfam01783 | TIGR01031 | PMB125 | |||||||||||
rpL34 | NP_418158 | COG0230 | pfam00468 | TIGR01030 | PMB126 | |||||||||||
rpL35 | NP_416232 | COG0291 | pfam01632 | TIGR00001 | PMB083 | |||||||||||
rpoA | NP_417754 | COG0202 | pfam01193 | TIGR02027 | PMB026 | |||||||||||
rpoB | NP_418414 | COG0085 | pfam00562 | TIGR02013 | PMB027 | |||||||||||
rpoC | NP_418415 | COG0086 | pfam04997 | TIGR02386 | PMB065 | |||||||||||
rpS1 | NP_415431 | COG0539 | pfam00575 | TIGR00717 | PMB181 | |||||||||||
rpS2 | NP_414711 | COG0052 | pfam00318 | TIGR01011 | PMB008 | |||||||||||
rpS3 | NP_417773 | COG0092 | pfam00189 | TIGR01009 | PMB009 | |||||||||||
rpS4 | NP_417755 | COG0522 | pfam00163 | TIGR01017 | PMB039 | |||||||||||
rpS5 | NP_417762 | COG0098 | pfam03719 | TIGR01021 | PMB018 | |||||||||||
rpS6 | NP_418621 | COG0360 | pfam01250 | TIGR00166 | PMB066 | |||||||||||
rpS7 | NP_417800 | COG0049 | pfam00177 | TIGR01029 | PMB010 | |||||||||||
rpS8 | NP_417765 | COG0096 | pfam00410 | PMB011 | ||||||||||||
rpS9 | NP_417697 | COG0103 | pfam00380 | TIGR03627 | PMB012 | |||||||||||
rpS10 | NP_417780 | COG0051 | pfam00338 | TIGR01049 | PMB040 | |||||||||||
rpS11 | NP_417756 | COG0100 | pfam00411 | TIGR03632 | PMB019 | |||||||||||
rpS12 | NP_417801 | COG0048 | pfam00164 | TIGR00981 | PMB028 | |||||||||||
rpS13 | NP_417757 | COG0099 | pfam00416 | TIGR03631 | PMB029 | |||||||||||
rpS14 | NP_417766 | COG0199 | pfam00253 | PMB182 | ||||||||||||
rpS15 | NP_417634 | COG0184 | pfam00312 | TIGR00952 | PMB020 | |||||||||||
rpS16 | NP_417100 | COG0228 | pfam00886 | TIGR00002 | PMB084 | |||||||||||
rpS17 | NP_417770 | COG0186 | pfam00366 | TIGR03635 | PMB030 | |||||||||||
rpS18 | NP_418623 | COG0238 | pfam01084 | TIGR00165 | PMB101 | |||||||||||
rpS19 | NP_417775 | COG0185 | pfam00203 | TIGR01050 | PMB031 | |||||||||||
rpS20 | NP_414564 | COG0268 | pfam01649 | TIGR00029 | PMB067 | |||||||||||
rseP | NP_414718 | COG0750 | pfam02163 | TIGR00054 | PMB183 | |||||||||||
rsfS | NP_415170 | COG0799 | pfam02410 | TIGR00090 | PMB127 | |||||||||||
rsmA | NP_414593 | COG0030 | pfam00398 | TIGR00755 | PMB068 | |||||||||||
rsmD | NP_417922 | COG0742 | pfam03602 | TIGR00095 | PMB184 | |||||||||||
rsmG | NP_418196 | COG0357 | pfam02527 | TIGR00138 | PMB185 | |||||||||||
rsmH | NP_414624 | COG0275 | pfam01795 | TIGR00006 | PMB047 | |||||||||||
ruvA | NP_416375 | COG0632 | pfam01330 | TIGR00084 | PMB128 | |||||||||||
ruvB | NP_416374 | COG2255 | pfam05496 | TIGR00635 | PMB069 | |||||||||||
ruvC | NP_416377 | COG0817 | pfam02075 | TIGR00228 | PMB186 | |||||||||||
secA | NP_414640 | COG0653 | pfam07517 | TIGR00963 | PMB085 | |||||||||||
secE | NP_418408 | COG0690 | pfam00584 | TIGR00964 | PMB102 | |||||||||||
secG | NP_417642 | COG1314 | pfam03840 | TIGR00810 | PMB103 | |||||||||||
secY | NP_417759 | COG0201 | pfam00344 | TIGR00967 | PMB013 | |||||||||||
serS | NP_415413 | COG0172 | pfam00587 | TIGR00414 | PMB032 | |||||||||||
smpB | NP_417110 | COG0691 | pfam01668 | TIGR00086 | PMB048 | |||||||||||
thrS | NP_416234 | COG0441 | pfam00587 | TIGR00418 | PMB129 | |||||||||||
tig | NP_414970 | COG0544 | pfam05697 | TIGR00115 | PMB130 | |||||||||||
tilS | NP_414730 | COG0037 | pfam01171 | TIGR02432 | PMB070 | |||||||||||
tmk | NP_415616 | COG0125 | pfam02223 | TIGR00041 | PMB187 | |||||||||||
topA | NP_415790 | COG0550 | pfam01131 | TIGR01051 | PMB188 | |||||||||||
tpiA | NP_418354 | COG0149 | pfam00121 | TIGR00419 | PMB189 | |||||||||||
trmD | NP_417098 | COG0336 | pfam01746 | TIGR00088 | PMB071 | |||||||||||
trmU | NP_415651 | COG0482 | pfam03054 | TIGR00420 | PMB104 | |||||||||||
trpS | NP_417843 | COG0180 | pfam00579 | TIGR00233 | PMB190 | |||||||||||
truB | NP_417635 | COG0130 | pfam01509 | TIGR00431 | PMB049 | |||||||||||
trxA | NP_416699 | COG0526 | pfam08534 | TIGR00385 | PMB191 | |||||||||||
trxB | NP_415408 | COG0492 | pfam07992 | TIGR01292 | PMB192 | |||||||||||
tsaD | NP_417536 | COG0533 | pfam00814 | TIGR03723 | PMB050 | |||||||||||
tsf | NP_414712 | COG0264 | pfam00889 | TIGR00116 | PMB072 | |||||||||||
tufA | NP_417798 | COG0050 | TIGR00485 | PMB131 | ||||||||||||
typA | YP_026274 | COG1217 | TIGR01394 | PMB132 | ||||||||||||
tyrS | NP_416154 | COG0162 | pfam00579 | TIGR00234 | PMB133 | |||||||||||
uvrB | NP_415300 | COG0556 | pfam17757 | TIGR00631 | PMB105 | |||||||||||
uvrC | NP_416423 | COG0322 | pfam08459 | TIGR00194 | PMB193 | |||||||||||
valS | NP_418679 | COG0525 | pfam00133 | TIGR00422 | PMB073 | |||||||||||
ybeY | NP_415192 | COG0319 | pfam02130 | TIGR00043 | PMB074 | |||||||||||
ychF | NP_415721 | COG0012 | pfam06071 | TIGR00092 | PMB033 | |||||||||||
yeaZ | NP_416321 | COG1214 | pfam00814 | TIGR03725 | PMB194 | |||||||||||
yqgF | NP_417424 | COG0816 | pfam03652 | TIGR00250 | PMB134 |
For each gene name GENE
and each accession identifier ACCN
available in the above table, the reference multiple amino acid sequence alignment (MSA), hidden Markov model (HMM) and position-specific scoring matrix (PSSM) files can be accessed via the following URL models, respectively:
https://giphy.pasteur.fr/PhyloM/bacteria/aln/GENE.ACCN.faa https://giphy.pasteur.fr/PhyloM/bacteria/hmm/GENE.ACCN.hmm https://giphy.pasteur.fr/PhyloM/bacteria/smp/GENE.ACCN.smp
For example, the reference MSA PMB034 for the gene rpL9 can be downloaded using wget of curl with the following linux command lines, respectively:
wget -q https://giphy.pasteur.fr/PhyloM/bacteria/aln/rpL9.PMB034.faa curl --silent -O https://giphy.pasteur.fr/PhyloM/bacteria/aln/rpL9.PMB034.faa
For each marker set MSET
(i.e. Cic31
, Cre40
, Dup107
, PhyEco
, Aln36
, Lan73
, bac120
, Col61
, UBCG2
, Tia85
, PhyloM
), three different files can be downloaded:
• a tar.gz archive containing the recommended reference MSA files (PMB identifiers),
• a tar.gz archive containing the associated HMM files,
• a tar.gz archive containing the associated PSSM files.
The MSA, HMM and PSSM archives associated to a marker set MSET
can be accessed via the following URL models, respectively:
https://giphy.pasteur.fr/PhyloM/bacteria/tgz/MSET.aln.tar.gz https://giphy.pasteur.fr/PhyloM/bacteria/tgz/MSET.hmm.tar.gz https://giphy.pasteur.fr/PhyloM/bacteria/tgz/MSET.smp.tar.gz
For example, the 74 HMM files associated to the marker set PhyloM can be downloaded using wget or curl with the following linux command lines, respectively:
wget -q https://giphy.pasteur.fr/PhyloM/bacteria/tgz/PhyloM.hmm.tar.gz curl --silent -O https://giphy.pasteur.fr/PhyloM/bacteria/tgz/PhyloM.hmm.tar.gz
Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C (2014) Binning metagenomic contigs by coverage and composition. Nature Methods, 11:1144–1146. doi:10.1038/nmeth.3103
Ankenbrand MJ, Keller A (2016) bcgTree: automatized phylogenetic tree building from bacterial core genomes. Genome, 59(10). doi:10.1139/gen-2015-0175
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P (2006) Toward automatic reconstruction of a highly resolved tree of life. Science, 311(5765):1283-1287. doi:10.1126/science.1123061
Coleman GA, Davín AA, Mahendrarajah TA, Szánthó LL, Spang A, Hugenholtz P, Szöllősi GJ, Williams TA (2021) A rooted phylogeny resolves early bacterial evolution. Science, 372(6542):eabe0511. doi:10.1126/science.1123061
Creevey CJ, Doerks T, Fitzpatrick DA, Raes J, Bork P (2011) Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS ONE, 6(8):e22099. doi:10.1371/journal.pone.0022099
Dupont CL, Rusch DB, Yooseph S, Lombardo M-J, Richter RA, Valas R, Novotny M, Yee-Greenbaum J, Selengut JD, Haft DH, Halpern AL, Lasken RS, Nealson K, Friedman R, Venter JC (2012) Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. The ISME Journal, 6(6):1186–1199. doi:10.1038/ismej.2011.189
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Research, 44:D279-285. doi:10.1093/nar/gkv1344
Galperin MY, Makarova KS, Wolf YI, Koonin EV (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Research, 43:D261-9. doi:10.1093/nar/gku1223
Galperin MY, Wolf YI, Makarova KS, Alvarez RV, Landsman D, Koonin EV (2021) COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Research, 49(D1):D274-D281. doi:10.1093/nar/gkaa1018
Haft DH, Loftus BJ, Richardson DL, Yang F, Eisen JA, Paulsen IT, White O (2001) TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Research, 29(1):41-43. doi:10.1093/nar/29.1.41
Haft DH, Selengut JD, White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Research, 31(1):371-373. doi:10.1093/nar/gkg128
Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E (2013) TIGRFAMs and Genome Properties in 2013. Nucleic Acids Research, 41:D387-95. doi:10.1093/nar/gks1234
Kim J, Na S-I, Kim D, Chun J (2021) UBCG2: Up-to-date bacterial core genes and pipeline for phylogenomic analysis. Journal of Microbiology, 59(6):609-615. doi:10.1007/s12275-021-1231-4
Mende DR, Sunagawa S, Zeller G, Bork P (2013) Accurate and universal delineation of prokaryotic species. Nature Methods, 10:881-884 doi:10.1038/nmeth.2575
Na, SI, Kim YO, Yoon SH, Ha SM, Baek I, Chun J (2018) UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. Journal of Microbiology, 56, 280–285. doi:10.1007/s12275-018-8014-65
Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P (2020) A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology, 38(9):1079-1086. doi:10.1038/nbt.4229
Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P (2022) GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Research, 50(D1):D785-D794. doi:10.1093/nar/gkab776
Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36:996-1004. doi:10.1038/nbt.4229
Parks DH, Rinke C, Chuvochina M, Chaumeil P-A, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology, 2:1533-1542. doi:10.1038/s41564-017-0012-7
Quince C, Delmont TO, Raguideau S, Alneberg J, Darling AE, Collins G, Eren AM (2017) DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biology, 18:181. doi:10.1186/s13059-017-1309-9
Sonnhammer ELL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Research, 26:320-322. doi:10.1093/nar/26.1.320
Sonnhammer ELL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein families based on seed alignments. Proteins, 28:405-420. doi:10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l
Sorek R, Zhu Y, Creevey CJ, Francino MP, Bork P, Rubin EN (2007) Genome-wide experimental determination of barriers to horizontal gene transfer. Science, 318(5855):1449-52. doi:10.1126/science.1147112
Sunagawa S, Mende DR, ZellerG, Izquierdo-Carrasco F, Berger SA, Kultima JR, Coelho LP, Arumugam M, Tap J, Nielsen HB, Rasmussen S, Brunak S, Pedersen O, Guarner F, de Vos WM, Wang J, Li J, Doré J, Ehrlich SD, Stamatakis A, Bork P (2013) Metagenomic species profiling using universal phylogenetic marker genes. Nature Methods, 10(12):1196-1199. doi:10.1038/nmeth.2693
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4:41. doi:10.1186/1471-2105-4-41
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science, 278(5338):631-637. doi:10.1126/science.278.5338.631
Tian R, Imanian B (2023) VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution. Microbiome, 11:247. doi:10.1186/s40168-023-01705-9
Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biology, 9(10):R151. doi:10.1186/gb-2008-9-10-r151
Wu D, Jospin G, Eisen JA (2013) Systematic identification of gene families for use as "markers" for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One, 8(10):e77033. doi:10.1371/journal.pone.0077033
Wu M, Scott AJ (2012) Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics, 28(7):1033–1034. doi:10.1093/bioinformatics/bts079
COG | Pfam | TIGR | |
MSA | CDD FTP (fasta.tar.gz) |
CDD FTP (fasta.tar.gz) |
CDD FTP (fasta.tar.gz) |
HMM | COGcollator | Pfam FTP (Pfam-A.hmm.gz) |
TIGRFAMs FTP (release 15.0) |
PSSM | CDD FTP (cdd.tar.gz) |
CDD FTP (cdd.tar.gz) |
CDD FTP (cdd.tar.gz) |