I am currently a postdoc at the CEA (Commissariat à l'Énergie Atomique). My main research topics are algorithmics and combinatorial optimization applied to challenging bioinformatic problems. During my postdoc, I worked on STAMPS, an application to efficiently search for 3D motifs of residues in protein structures.

Guillaume Collet
CEA Saclay
91191 Gif-sur-Yvette Cedex

E-mail: guillaume@gcollet.fr
Phone: +33(0)1 69 08 17 30



Teaching and Research Experience

  • Post-doctoral Research, 2010-2012
    French Alternative Energies and Atomic Energy Commission (CEA), Life-Science Division, Saclay, France
  • Teaching Assistant, 2009-2010
    Université Européenne de Bretagne, Rennes, France
    Courses: Boolean logic and Graph theory
  • Teaching Assistant, 2007-2009
    Lycée Chateaubriand, Rennes, France
    Courses: Algorithms and Programming in Pascal
  • International internship, February 2007
    IPP-BAS Laboratory, Sofia, Bulgaria


  • Doctor of Philosophy in Computer Science, July 2010
    Université Européenne de Bretagne, Rennes, France
    Thesis title: Alignement local pour la reconnaissance de repliements des protéines par programmation linéaire en nombres entiers
  • Master of Science in Biology, June 2006
    Université Européenne de Bretagne, Rennes, France
    Thesis title: Développement d'un modèle d'alignements flexibles pour la reconnaissance de repliements des protéines
  • Master of Science in Computer Science, June 2005
    Université Européenne de Bretagne, Rennes, France
    Thesis title: Alignement semi-global de séquences sur des structures de protéines
  • Bachelor of Science in Computer Science, June 2003
    Université Européenne de Bretagne, Rennes, France


I share many projects on my github account : http://github.com/gcollet.


The design of novel protein functions, by the transfer of a functional motif on an existing scaffold, is a major goal of protein engineering. Computational protein design approaches have led to many successful compounds but only a few methods propose to search suitable sites to graft a functional motif The two main limitations faced by these methods are : the huge number of potential grafting sites to explore and the evaluation of the functionality of the grafted motif. To reduce the complexity of the problem, these methods reduce the number of potential sites to explore. In consequence, motifs are limited to 5 or 6 residues in pockets or on the protein surface.

We proposed STAMPS, an approach based on Cα-Cβ distances compatibility and an optimized clique search algorithm which can screen the entire PDB in a reasonable amount of time (3h) to find suitable grafting sites. The quality of the identified sites is then calculated using a RMSD optimization and a steric hindrance measure with the protein target.

STAMPS has been successfully applied for the design of Kv1.2 Potassium Channel Blockers, the analysis of facial two-histidine one-carboxylate binding motif through the entire PDB, and the detection of potential scaffolds to design artificial inhibitors of metalloenzymes. STAMPS is available through the RASMOT3D-PRO web site: http://biodev.cea.fr/rasmot3d/.


MSTATX is a personal project I try to maintain on my free time. This program is designed to calculate the conservation score of the columns of a multiple alignement. The average of all columns can also be calculated. Many different conservation scores are available but I did not find a program which is able to calculate all of them. This was the starting point of this project.

MSTATX is written in C++, it is easily extensible by the use of a simple structure. Moreover, this code is distributed freely so that anyone can add a conservation score and thus enhance the capability of MSTATX. The code is available through my github account here: http://github.com/gcollet/MstatX.


FROST (Fold Recognition Oriented Search Tool) is a piece of software I designed during my PhD based on the previous work by Antoine Marin. It realizes pairwise alignements of protein sequences. Classical pairwise alignement methods use substitution scores based on amino acids substitution probabilities. In FROST, the substitution scores are not limited on a single amino acid but on pairs of amino acids. In fact, some amino acids co-evolve if we look at their positions in a multiple sequences alignement. Then if they co-evolve, using this information can enhance the quality of the paiwise alignement and let us detect remote homologs. FROST is available here: http://genome.jouy.inra.fr/frost/.