ECML-PKDD 2015 Machine Learning in Life Sciences
GMUM challenge

team: pass:

Final results

With great pleasure, organizing committee of the challenge would like to announce that the winner of the competition is team

AlibabaAndFortyThreeLinesOfCode

  • Damian Leśniak, Jagiellonian University
  • Piotr Kruk, Jagiellonian University
  • Michał Kowalik, Jagiellonian University
which achieved schore of 68.85% on the test set, beating second team by the large margin of over 3%. Results have been reproduced and confirmed. We would like to thank all the participants and hopefully - see you soon in Portugal!

Predict Chemical Compound Biological Activity towards multiple Proteins


The Group of Machine Learning Research at the Jagiellonian University in Cracow, represented by Wojciech Czarnecki, Igor Podolak, Jacek Tabor in cooperation with the Institute of Pharmacology, Polish Academy of Sciences, Cracow, represented by Andrzej Bojarski, Sabina Smusz is proud to announce a competition with objective to predict the activity of selected chemical structures against a set of given proteins. This competition is associated with the Machine Learning in Life Sciences Workshop at European Conference on Machine Learning, ECML PKDD 2015 to be held at Porto, Portugal on 11th September 2015 Workshop on Machine Learning in Life Sciences and organized by Bartosz Krawczyk, Michal Wozniak from Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland

Problem description

Imagine that you are the head of a biochemistry research group, whose objective is to find new chemical structures which might become the basis of feature commercial drugs. People at your laboratory can do tests of these structures against some proteins to find, whether these structures are worth further research.


from Sotriffer et al. "Virtual Screening: Principles, Challenges, and Practical Guidelines"

Naturally all these chemical compounds are costly, but can all be ordered in sufficient quantities from other laboratories specialising in their creation. As it frequently happens, the money is scarce. Similarly, the human resources at your laboratory are limited, therefore it is not possible to order all the compounds at the same time.

But you can apply for grants, which are of limited size. The more compounds that are active you find, the better off will be your laboratory, and the more grants will be assigned to you next time. Therefore, it is best to order the most promising compounds first. If the compounds are found to be active, you will be able to get more money more easily, extend your laboratory and make yourself famous.

For simplicity, let us assume that the ordering of each compound costs the same amount of money, say C € each. Therefore, if you get a grant of M €, you will be able to buy M/C different compounds. Which compounds, from the huge list of more than 30 thousand, should you buy?

The objective of this project is to predict the activities of the given compounds, so that the next time you will be granted a new amount of means, you will be able to make a list of the most prospective compounds that you will be able to buy with the money from last grant.

Deadlines

Solution submission: 22 June 2015
Test set results publication: 24 June 2015
Submission of winning codes: 26 June 2015
Final results after reproducing results: around 1 July 2015
Submission of the manuscript deadline: 8 July 2015
Paper reviews: 15 July 2015
Camera ready paper submission: 27 July 2015

Organizers

Group of Machine Learning Research, Jagiellonian University

Institute of Pharmacology, Polish Academy of Sciences

Department of Systems and Computer Networks, Wroclaw University of Technology

In case of any doubts, problems with the competition system, please contact Wojciech Czarnecki.