Thursday, January 7, 2010

SIGMOD 2010 Programming Contest Distributed Query Engine

A programming contest is organized in parallel with the ACM SIGMOD 2010 conference, following the success of the first annual SIGMOD programming contest organized last year. Student teams from degree-granting institutions are invited to compete to develop a distributed query engine over relational data. Submissions will be judged on the overall performance of the system on a variety of workloads. A shortlist of finalists will be invited to present their implementation at the SIGMOD conference in June 2010 in Indianapolis, USA. The winning team, to be selected during the conference, will be awarded a prize of 5,000 USD and will be invited to a one-week research visit in Paris. The winning system, released in open source, will form a building block of a complete distributed database system which will be built over the years, throughout the programming contests.

News

2009-12-13: Task details made available
The full task details are made available. The contest is now open!
2009-12-13: Mailing list
A mailing list has been created to publish technical information about the competition. Contestants are encouraged to subscribe to it. This complements the Atom feed that will provide more general news.
2009-11-19: Poster
A letter-sized poster is made available for advertising the contest.
2009-11-17: Prize announced
We can now announce the amount of the prize awarded to the winner of the contest: 5,000 USD. This comes in addition to an invited research visit in Paris.
2009-10-09: Initial description of the contest
The initial description of the contest is available on the SIGMOD 2010 programming contest website.

Contestants and other interested parties are invited to subscribe to the Atom feed of the SIGMOD 2010 programming contest, that will serve general news about the competition.

Task Overview

The system to implement is a simple distributed query executor built on top of last year's main-memory index (an implementation of which will be provided). Centralized query plans will be supplied and will have to be translated into distributed query plans, to be executed on each peer of a cluster of machines. An initial computation of statistics can be run over each peer in order to optimize the distributed query plan. The system must be able to efficiently execute the query over each peer, with the help of the in-memory index, gather the results from each peer, and perform any other necessary operation on a monitoring peer.

The system will be tested on a collection of synthetic and real-world datasets, with appropriate query loads. The interface is planned so that the distributed query engine can be tested either on a single machine (local processes acting as peers), on an ad-hoc cluster of peers, or on an evaluation cluster made available to test the performance of the system in the conditions of the final evaluation.

To help contestants test their implementation, any team whose system passes a collection of unit tests can be provided with an Amazon Web Services account of a 100 USD value (up to 30 accounts are available, on a first-come, first-served basis; up to two accounts per team can be provided, depending on demand). This is made possible thanks to the support of Amazon.

Important Dates

Sunday, December 13, 2009
Interfacing code and example workload made available.
Friday, February 6, 2010
Evaluation infrastructure made available.
Sunday, April 4, 2010
Submissions due.
Friday, April 16, 2010
Notification of a shortlist of finalists.
Sunday, May 30, 2010
Final submission by finalists due.
6–11 June, 2010
SIGMOD conference in Indianapolis, USA. Presentation of the works of finalists, and announcement of the winner.

All deadlines are 5:00pm UTC (10:00am PDT, 1:00pm EDT).

Task Details

Given a parsed SQL query, you have to return the right results as fast as possible. The data is stored on disk, the indexes are all in memory. The SQL queries will always have the following form:

SELECT alias_name.field_name, ... FROM table_name AS alias_name, ... WHERE condition1 AND ... AND conditionN

A condition may be either:

  • alias_name.field_name = fixed value
  • alias_name.field_name > fixed value
  • alias_name.field_name1 = alias_name.field_name2

The data is distributed on multiple nodes, and can be replicated. The implementation of the indexes is provided and cannot be changed. Up to 50 queries will be sent at the same time by 50 different threads, but only the total amount of time will be measured. You do not have to take care of the partitioning, replication or creation of the indexes: these are done before the beginning of the benchmark of your client.

Before the actual start of the benchmarks, you are given a predefined number of seconds to run some preprocessing on the data. You are also given a set of queries which is representative of the benchmark to help you run the preprocessing.

There are 7 methods to implement. There are fully described in the client.h file. The following diagrams show the way they are called.

In the initialization phase, the startPretreatmentMaster function is called on the master node, while the StartSlave method is called on slave nodes. In the connection phase, the createConnection function is called on the master node. In the query phase, the performQuery and fetchRow functions are successively called on the master node. In the closing phase, the closeConnection then closeProcess functions are called on the master node.

For more details, see the README file inside the ZIP archive.

ZIP archive with all instructions, example workload, and example implementation

Contestants are encouraged to subscribe to the programming contest mailing list that will be used to communicate technical information about the task.

Regulations

The contest is open to undergraduate and graduate students registered at degree-granting institutions over the world (contestants must be registered students at the beginning of the competition, Sunday, December 13, 2009). A team can be formed of one or several students from one or several institutions (in case of a team consisting of several students, only a fixed number of them will receive travel grants to attend the conference). Several teams from the same institution can compete independently. Students can work under the guidance of a professor or researcher, but the implementation must be written exclusively by the students. Contestants must supply the source code for their entries, and agree to license their code under the BSD or MIT open-source license should their system win the contest. Using open-source third-party libraries is allowed, as long as the core of the system is novel and the licenses of the libraries are compatible with the BSD or MIT licenses. Students supervised by the organizers cannot enter the contest.

Organization

Organizers
Advisory board

For any information, please contact Pierre Senellart and Clément Genzmer at dbweb@telecom-paristech.fr.

Related :

Copyright 2011 COMPETITION CONTEST AWARD FESTIVAL GAMES 2011 | Use For Forum Lowongan Kerja 2011 | Sponsored by INDOJOB.CO.BE
Enter Your Email To Recieve Update Contest in Your MailBox :