Submission and Evaluation
The evaluation process will be conducted by comparing the accuracy of the results against those from the reference implementation as well as the execution time of the program. Additionally, a manual review of the source code can be used to judge unclear cases. It is expected that the source code follows good programming practices and is well commented.
Deadline: Participants should submit their programs by Monday, August 22, 2016, 5:00 PM (PST).
Submission System: Please send your submission via email to both cup masters and CC your whole team. Mails are accepted until the final deadline, only the newest mail is considered in evaluation. Please indicate your team at the beginning of the subect of each email by concatenating the names of your team members. Please be prepared for small technical questions that might be asked by the Cup masters when trying to run your submission. These are sent via email to all addresses in the email.
Only one submission per team is necessary.
Note: Each participant on a team must be a member of ACM SIGSPATIAL. Information about the membership is available at http://www.sigspatial.org
What to submit?
Each participant is expected to submit:
- A single .zip file that contains the original source code and all dependencies. Please include a readme.txt file for any special instructions on how to compile the submitted code. Submission of the source code is mandatory to ensure originality of the submitted work.
- The submitted program will be invoked using the following syntax:
./bin/spark-submit [spark properties] --class [submission class] [submission jar] [path to input] [path to output] [cell size in degrees] [time step size in days]
- Cell Size = 0.001 degrees (~80m e/w in New York City)
- Time Step Size = 7 days (1 week)
./bin/spark-submit \\ --master local[*] \\ --class com.example.Submission \\ /path/to/submission.jar \\ hdfs://path/to/directory\_with\_csvs \\ hdfs://path/to/output \\ 0.001 \\ 7
Space – cell size is defined in degrees with the origin being at latitude = 0 and longitude = 0
Time – time step size is defined in days with the origin being at 2015-01-01T00:00:00.000Z
Output will be a CSV in the following format:
cell_x, cell_y, time_step, zscore, pvalue
How to evaluate?
We are interested in evaluating the following items:
- Correctness of the provided spatio-temporal hot spots
- Computation (cpu) time (actual runtime)
We will not discount the time consumed to read the input files or write the output result. The specifics of the evaluation rules are detailed on the Problem Definition page.
Note: to simplify the problem for the participants, we will only consider the taxi data from 2015. This is due to schema changes in the dataset between 2009 and 2015 and the resultant complexities in dealing with the taxi data as a single logical dataset.
We will provide the following system to evaluate all submissions. If you require any additional components (or additional libraries), you should inform us at least a week before the actual submission deadline. It is important to note that the Yellow Cab trip data will be available on the cluster, stored as a file in HDFS.
Hardware: Cluster of 25 Intel Xeon CPU (1 CPU of 4 cores, hyper-threaded) @ 3.1GHz
Operating System: CentOS Linux 6.7
RAM: 24 GB
Frameworks: Apache Spark 1.6.0, Apache Hadoop 2.6