This year will be the 5th GIS-focused algorithm competition, GISCUP 2016, co-located with the 2016 ACM SIGSPATIAL GIS conference. ACM SIGSPATIAL hosts an annual algorithm contest with the goal of encouraging innovation in a fun way. The winners will be announced at the ACM SIGSPATIAL GIS conference in November 2016. Contest participants will submit original computer programs to be evaluated by the contest organizers on a common dataset.
This year’s contest will focus on applying spatial statistics to spatio-temporal big data in order to identify statistically significant hot spots using a distributed computing framework and functional programming languages. With the advent of the ubiquitous collection of spatio-temporal observational data (e.g., vehicle tracking data), the identification of unusual patterns of data in a statistically significant manner has become a key problem that numerous businesses and organizations are attempting to solve. This year’s SIGPSPATIAL Cup addresses this significant programming challenge.
For this year’s contest, each team will utilize a very large collection of spatio-temporal observational data – the New York City Taxi and Limousine Commission Yellow Cab trip data. This dataset consists of over one billion records representing all Yellow Cab taxi trips in New York City between January 2009 and June 2015. Each record in the dataset contains key information such as pickup and dropoff date, time, and location (latitude, longitude), trip distance, passenger count, and fare amount.
Given a certain subset of this dataset, each team will identify the fifty most statistically significant dropoff locations by passenger count in both time and space using the Getis-Ord statistic. Space will be aggregated into a grid (using latitude and longitude); time will be aggregated into time windows (for example two hour periods). Each team will utilize the Apache Spark open source cluster computing framework; the tests will be run on a cluster of twenty five commodity-level PCs (3GHz, quad core), each equipped with 24 GB of RAM and 3 TB of disk storage. The applications that will run on top of the Spark framework can be implemented using Java, Python, or Scala using functional programming techniques. The dataset will be provided in HDFS as a collection of uncompressed CSV files.
In summary, the key topics that are being addressed in this year’s cup are the following:
- Spatio-temporal data
- Big data
- Spatial statistics – cluster analysis
- Distributed processing
- Functional programming
The top three teams will be provided with cash and or other prizes. In addition to these prizes, the top three teams will be invited to submit a four page paper for a contest paper session to be held at the 2016 ACM SIGSPATIAL GIS conference. These papers will be subject to review and acceptance by the contest organizers, but it is expected that each of the top three teams will have their paper in the conference proceedings, a ten-minute presentation in the contest session, and a poster presentation in the conference’s regular poster session. At least one team member of each winning team must register for the 2016 ACM SIGSPATIAL GIS conference.
- 4th ACM SIGSPATIAL GIS Cup (2015)
- 3rd ACM SIGSPATIAL GIS Cup (2014)
- 2nd ACM SIGSPATIAL GIS Cup (2013)
- 1st ACM SIGSPATIAL GIS Cup (2012)