GSoC Update 0 – Starting Summer of Code

This is the first entry in a series of posts on my participation in this year’s Google Summer of Code program working on the reproducible research tool Sumatra with mentor Andrew Davison under the mentoring organization INCF.

Summer of Code! In their annual program, taking place for the 10th time since 2005, Google supports students to work with a mentor on a free and open-source project over the summer. My proposal “Data-centric provenance capture with Sumatra” was accepted in March and I’m happy to post a first update to my work on the project here. Sumatra is a tool promoting reproducible research in computational sciences – “a lab notebook for computational projects”. Stumbling upon the software while looking up best practices in computational research, I have come to highly appreciate what the tool can do.

But of course, it can always be better! This is why I was writing Andrew Davison, the maintainer of Sumatra and now mentor of my GSoC project, about a potential Summer of Code participation already in January. I suggested a stronger connection in Sumatra’s architecture and display between process records and the data generated in these processes.

In my work I’m extensively using Sumatra and have, even before thinking about Summer of Code, written some bash scripts to achieve better data to process associations purely through the data and process labels. As the program is not called Summer of Text, let’s look at some code! This is an example of a custom bash script I’m using to first get a label string, possibly depending on parameters, and using it as the Sumatra label as well as the file label for generated plots in plot_data.py:

#!/bin/bash         

inputfiles="$@"

labelstr=`python comp/figure_label.py` ;

smt run --executable=python \
        --main=comp/plot_data.py $inputfiles $labelstr \
        --reason=Test graphic \
        --tag=graphic \
        --label=$labelstr \
        comp/params/plot_data_params_template.py

Paths to the data to plot is just passed as parameter while calling the bash script. You can find a full repository with example usage here. After a good week of coding on the project now, I got a working prototype of displaying associated records of data in the web interface and opened a first pull-request. Once reviewed by the maintainer, I hope to take this as a base to expand upon in the coming weeks!

data_view

More with the next update!

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s