Institut Pasteur blankvertical divider clipartblank DBC blankvertical divider clipartblank Bioinformatics and Biostatistics Hub blankvertical divider clipartblank GIPhy

G I Phy GIPhy

Groupe d'Inférence Phylogénétique

Genome Informatics and Phylogenetics


Welcome to the homepage of GIPhy, one of the expert groups of the Bioinformatics and Biostatistics Hub from the Research and Resource Centre for Scientific Informatics, Institut Pasteur, Paris, France. GIPhy is highly involved in scientific research topics focusing on biological classifications. Therefore, projects regarding important themes such as systematics, taxonomy, homology and related fields are specifically addressed by the members of this dedicated group.

Institutional webpage

More details about GIPhy (members, main projects, publication list):

Databases and Datasets

Empirical Models of Amino Acid Substitution   ‖   a complete list of amino acid replacement matrices for model-based sequence evolution analyses

PhyloM   ‖   phylogenetic markers (along with multiple sequence alignments and position specific scoring matrices) that are well-suited for the phylogenetic analysis of specific phyla

RVDB-prot   ‖   reference viral coding sequence and associated HMM database developed for enhancing virus detection from High-Throughput Sequencing data

Programs and Tools

AlienRemover   ‖   removing contaminating reads from High-Throughput Sequencing data

AlienTrimmer   ‖   clipping and trimming High-Throughput Sequencing reads

BMGE   ‖   selecting characters or encoding character states from a multiple sequence alignment for phylogenetic inference

C2A/A2C   ‖   two tools for translating and back-translating codon and amino-acid sequence files, respectively

Concatenate   ‖   building a supermatrix of characters by concatenating multiple sequence alignments

contig_info   ‖   estimating standard descriptive statistics from contig sequences

DNA2ORF   ‖   efficient genome partitioning into open reading frames

eCDS   ‖   extracting coding sequences from a FASTA-formatted contig sequence file

eFASTA   ‖   extracting nucleotide segments from a FASTA-formatted file

FASTA2AGP   ‖   creating AGP files from FASTA-formatted scaffold sequence files

fastq_info   ‖   estimating standard descriptive statistics from FASTQ files

findSynapomorphies   ‖   finding characters shared by a group of aligned sequences

fqCleanER   ‖   FASTQ file Cleaning and Enhancing Routine

fq2dna   ‖   genome de novo assembly from raw paired-end FASTQ files

Gklust   ‖   fast genome sequence clustering

GenoLayout   ‖   creating figures showing linear maps between genomes

GenoMed   ‖   determining the medoid of a set of genomes

JolyTree   ‖   inferring distance-based phylogenetic trees from unaligned genome sequences

gbk2ENA   ‖   converting Genbank files into EMBL-like files suitable for submission to the ENA

MSAshrink   ‖   Multiple Sequence Alignment shrinking

MSTclust   ‖   Minimum Spanning Tree-based clustering

OGRI   ‖   estimating Overall Genome Relatedness Indices

REQ   ‖   estimating branch supports in distance-based phylogenetic treesεq-assessing-branch-supports-oƒ-a-distance-based-phylogenetic-tree-with-the-rate-oƒ-elementary-quartetsbiocondabiocondabioconda

RepeatPlot   ‖   creating figures that represent the positions of long repeats in a chromosome

ROCK   ‖   fast and accurate digital normalization of high-thoughput sequencing reads

SAM2MSA   ‖   building a multiple sequence alignment well-suited for phylogenetic analysis from read mapping data

SimiPlot   ‖   creating figures showing overall similarity between genomes

wgetENAHTS   ‖   downloading FASTQ files from the ENA repositories

wgetGenBankWGS   ‖   downloading genome assembly FASTA files from the GenBank or RefSeq repositories

YACO   ‖   yet another contig ordering

Supplementary Data

Supplementary data accompanying some of our published analyses