Complete Cross Docking Data Of Mintseris Benchmark 2.0

A study of protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information.

 

Overview:

For each receptor-ligand docking simulation, we provide about 2000 ligand orientations. Each orientation is defined by the distance between the centers of mass of the ligand and the
receptor and by the two Euler angles θ and φ (where θ = φ, = 0° was chosen to pass through the center of the binding interface of the receptor protein). This leads to 28224 (168x168) files corresponding to each possible pair of monomers coming from the Mintseris Benchmark 2.0. They are distributed according to the receptor used for the docking calculation among the 84 directories corresponding to the complexes of the Mintseris Benchmark 2.0. The full set of directories is packaged in the CCD_MintserisBenchmark2.0.tar.gz archive. The data files are named, for example, "Cond.1KAC_l_u.1EAW_r_u.UB.global.dat", where 1KAC_l plays the role of the receptor while 1EAW_r plays the role of the ligand during the docking procedure. For each file, each line contains all the necessary information to built the coordinates of the corresponding ligand (in this example, 1EAW_r) for a given position of the ligand. Approximatively, 2000 positions are described for each pair leading to files of about 2000 lines. The precise description of each line is given in section "Format of the files Cond.*.*.UB.global.dat" below.

To built a pdb file of a pair of proteins starting from the coordinates of the receptor and a given ligand position, a fortran script "Interface.out" is provided in the archive PDBBuilder.tar available in the "Download" section. Notice, that this script enables to built only the coordinates of the ligand for the desired position while those of the receptor remain inchanged (the receptor is fixed during the docking procedure). Thus, you only need to get the receptor coordinates from its original pdb file, coming from the Mintseris Benchmark 2.0 in its unbound form. Note, that all the docking calculations where made from the unbound pdb files. These original files are stored in the directory "run/Proteins" contained in the archive PDBBuilder.tar.

The large computational effort necessary to accomplish the Complete Cross Docking of Mintseris dataset was realized with the help of World Community Grid (WCG), that coordinated thousands of internautes providing their computer time to dock about 300000 conformations per protein pair for the set of 28224 possible pairs in the Mintseris Benchmark 2.0. The computations lasted 7 months. Information on the project can be found here.

 

Download:

The Complete Cross Docking Data of the Mintseris Benchmark 2.0 are available here.

You can unpack the archive through the command

tar -zxf CCD_MintserisBenchmark2.0.tar.gz

 

The source for the PDB builder program to be applied to the "Cond.*.*.UB.global.dat" files is available here.

You can unpack the archive through the command

tar -zxf PDBBuilder.tar.gz

 

The binding site prediction based on evolutionary sequence analysis of the proteins forming the complexes of the Mintseris Benchmark 2.0 is realized with the JET program and it is available here. A description of JET output format is found here.

You can unpack the archive through the command

tar -zxf JEToutputfiles.tar.gz

 

System requirements:

  • A fortran compiler. The Makefile is ready for a gfortran compiler but you can anyway change the Makefile to adapt it to your own compiler.
  • The bash environment should be installed.

 

How to compile the PDBBuilder source code:

  • Use the Makefile for program compiling. You need to change the paths to your gfortran or fortran 90 compiler (COMPILERDIR and F90), specify your compiler name ("main" line) and possibly change the options (CFLAGS) depending on your configuration. The output program is named "Interface.out".
  • Do not forget to erase all *.o and *.out files before compiling.

 

Building the PDB files:

  • Use the WCGlongint.sh script stored in the directory "run/" for running the program, make sure to change the path leading to the Interface.out program at the end of the script.
  • The script writes a "proteins.dat" file which tells the program which proteins structures have been docked (PROT for receptor and PROTT for the ligand), where to find them (in the run/Proteins directory), and whether you want to select a specific docking position (via the DOCKPOS parameter) or rebuild all of them (type "selection/" or "no selection/" on the fifth line of the proteins.dat file).
  • The script also needs the necessary "Cond.PDB1_x_u.PDB2_y_u.UB.global.dat" file(s) listing the ligand coordinates for all the docking positions with a given receptor. These files come from the CCD_Mintseris database and must be stored in the "run/Globfiles" directory. The simplest way, is to directly copy/move all the Cond.*.*.UB.global.dat coming from the CCD_Mintseris database in the "run/Globfiles" directory.
  • All original pdb files used for the docking procedure must be stored in the "run/Proteins" directory.
  • The resulting pdb files for the ligand are stored in the "pdb_files" directory. The receptor protein did not move during the docking process, so you just have to keep the original pdb file of the receptor in its unbound form, coming from the Mintseris database or stored in the "run/Proteins" directory. Notice, that the receptor does not mean "receptor" according to a biological definition as it is defined in the original pdb file coming from the Mintseris database, but it refers to the protein that has been fixed during the docking procedure. According to a docking definition, the fixed monomer is called "receptor" and its coordinates remain inchanged whatever role is given to the monomer in the pdb file (that is, a receptor or a ligand).
  • Be careful that the WCGlongint.sh script, in its actual shape, builds positions 38 and 46 for 1TMQ_r docked on 1TMQ_l. This means that the coordinates of 1TMQ_l remain inchanged, thus the corresponding coordinates can be found in the file "Proteins/1TMQ_l_u.pdb". The new coordinates of the ligand 1TMQ_r after the docking procedure and the pdb building will be stored in the files "pdb_files/1TMQ_l-1TMQ_r.min38.pdb" and "pdb_files/1TMQ_l-1TMQ_r.min46.pdb" for the docking positions 38 and 46 respectively. Just edit the "foreach PROT", "foreach PROTT" and "foreach DOCKPOS" lines, to modify the names of the receptor (i.e. the fixed monomer during the docking procedure), of the ligand (the monomer moving) and the position number you want to build.

Format of the files Cond.*.*.UB.global.dat:

The "Cond.*.*.UB.global.dat" files stored in the CCD_MintserisBenchmark2.0.tar.gz archive are structured in 11 columns accepting either integers or reals as follows:

  • integer(i4): ligand position identifier
  • integer(i4): ligand rotation identifier
  • real(f13.6): distance between the center of mass of the ligand and the center of mass of the receptor
  • 2 real(f13.6): two Euler angles θ and φ (where θ = φ = 0° was chosen to pass through the center of the binding interface of the receptor protein) as shown in Fig 9 of [Sacquin-Mora, Carbone, Lavery (2008) JMB 382, 1276–1289]
  • 3 real(f13.6): 3 angles that define the orientation of the ligand according to the receptor
  • 3 real(f13.6): LJ energy term for this position, Coulomb energy term, total energy

 

The MAXDo program:

Docking calculations were performed using the MAXDo program. Binaries and source code are available here together with detailed instructions for installing and running the program.

You can unpack the archive through the command

tar -zxf MAXDo.tar.gz

 


Contacts:

For questions, comments or suggestions feel free to contact Alessandra Carbone or Anne Lopes.

 

Reference:

If you are using our data, please cite:

  • A. Lopes, S. Sacquin-Mora, V. Dimitrova, E. Laine, Y. Ponty, A. Carbone. (2013) Protein-protein interactions in a crowded environment: an analysis via cross-docking simulations and evolutionary information. PLoS Computational Biology, in print.

Last Update May 2013