SHREC 2024: Protein shapes docking

Envisioned task

The aim of this track is to assess the performance of shape complementarity algorithms on a dataset of protein-protein interactions.

Proteins are complex macro-molecular molecules constituted of hundreds to millions of atoms, that constantly interact with each other in the cellular environment. When complementary proteins meet, they form a protein complex, that are responsible for numerous biological processes, such as signal transduction or immune recognition. One of the primary factors of such a complementarity is the shape complementarity of the two molecular surfaces. However, each protein can adopt a set of various structures rather than a single, rigid structure. In the era where whole-proteome structures become available, the ability to detect protein-protein interactions among a set of surfaces derived from a set of non-rigid structures is a major scientific concern.

This track proposes a set of 387 query surfaces and 520 target surfaces (provided as .ply files), representing 52 protein-protein interactions. For each query, the participants are asked to produce protein-protein complex poses using the target proteins.

Dataset and Ground Truth

The dataset is based on the docking benchmark version 5 (Vreven et al., Journal of Molecular Biology, 2015). This set of protein-protein complexes differentiate from others by the fact that the reference complexes and the free proteins (meaning the proteins without their counterpart) structures are available in the Protein Data Bank (PDB). We retained the complexes composed of two chains only, one chain for the query and one chain for the target. From the free structures of these chains, we requested the PDBFlex database (Hrabe et al., Nucleic Acids Res, 2016) for alternative, flexible structures. To avoid an imbalanced dataset, only 10 alternatives structures by target protein were retained, and up to 10 by query protein.

The structures were then retrieved and deprotonated, and their solvent-excluded surfaces (SES) were calculated using EDTSurf (Xu et al, Plos One, 2009).

The participants are asked to produce a 387x520 score matrix reflecting the likelyhood of each query-target complex and for each query surface, the top 10 (most probable) query-target complexes (the submitted filenames should follow the convention queryID_targetID.ply as in 0_0.ply).

The ground truth is derived from the the docking benchmark version 5 (Vreven et al., Journal of Molecular Biology, 2015): the actual complex will be used as the reference for evaluating the poses.

Evaluation

The evaluation is two-fold:

First, the standard retrieval metrics of previous shape retrieval experiments will be used: precision - recall evaluation, first-tier and second-tier.
Second, the correct query-target poses will be compared to the reference complexes. The participants’ meshes will be converted back to PDB complexes. These poses will be compared to the reference complexe using standard metrics from the structural bioinformatics community (Dock-Q, ICS, IPS, lDDT and TM-score).

It is important for the participants to provide runtimes of their calculations since it is a critical information for processing large datasets.

Expected number of participants

All 3DOR experts interested in treating non-conventional shapes with inherent complexity such as molecular shapes could be interested. To render the track easily accessible to most participants, we provide the meshes of the proteins Solvent excluded surfaces in the .ply format.

Schedule timeline

Feb 19, 2024 - The dataset is made available on shrec2024.drugdesign.fr. The participants are allowed to run their calculations.

Mar 4, 2024 - Registration deadline. Registration must be sent to Matthieu Montès and Florent Langenfeld.

Mar 13, 2020 - Submission deadline of the results to the organizers. A brief summary to be included in the track report is written by each participant and submitted with the results. A link to the download / compile the method is expected as well.

Mar 15, 2020 - The organizers circulate the evaluation of all participants of the track and release the ground truth.

Mar 15-29, 2024 - The organizers and participants draft the joint paper of the results.

Mar 29, 2024 - The track review is submitted for review.

Aug 26-27, 2024 - Eurographics Workshop on 3D Object Retrieval 2020 (3DOR)

queries.tar.gz

targets.tar.gz

Organizers

Matthieu Montès - Conservatoire National des Arts-et-Métiers
Florent Langenfeld - Conservatoire National des Arts-et-Métiers