GSoC Update 0 – Starting Summer of Code

This is the first entry in a series of posts on my participation in this year’s Google Summer of Code program working on the reproducible research tool Sumatra with mentor Andrew Davison under the mentoring organization INCF.

Summer of Code! In their annual program, taking place for the 10th time since 2005, Google supports students to work with a mentor on a free and open-source project over the summer. My proposal “Data-centric provenance capture with Sumatra” was accepted in March and I’m happy to post a first update to my work on the project here. Sumatra is a tool promoting reproducible research in computational sciences – “a lab notebook for computational projects”. Stumbling upon the software while looking up best practices in computational research, I have come to highly appreciate what the tool can do.

But of course, it can always be better! This is why I was writing Andrew Davison, the maintainer of Sumatra and now mentor of my GSoC project, about a potential Summer of Code participation already in January. I suggested a stronger connection in Sumatra’s architecture and display between process records and the data generated in these processes.

In my work I’m extensively using Sumatra and have, even before thinking about Summer of Code, written some bash scripts to achieve better data to process associations purely through the data and process labels. As the program is not called Summer of Text, let’s look at some code! This is an example of a custom bash script I’m using to first get a label string, possibly depending on parameters, and using it as the Sumatra label as well as the file label for generated plots in plot_data.py:

#!/bin/bash         

inputfiles="$@"

labelstr=`python comp/figure_label.py` ;

smt run --executable=python \
        --main=comp/plot_data.py $inputfiles $labelstr \
        --reason=Test graphic \
        --tag=graphic \
        --label=$labelstr \
        comp/params/plot_data_params_template.py

Paths to the data to plot is just passed as parameter while calling the bash script. You can find a full repository with example usage here. After a good week of coding on the project now, I got a working prototype of displaying associated records of data in the web interface and opened a first pull-request. Once reviewed by the maintainer, I hope to take this as a base to expand upon in the coming weeks!

data_view

More with the next update!

Advertisements

Arrowheads for axis in Matplotlib

This is a short demo showing how to make abstract plots in matplotlib that have arrows pointing in the x and y direction as axis.

arrow_axis

The idea is to remove the default axis completely and insert arrows with the correct dimensions as substitute axis:

import pylab as pl

fig = pl.figure()
ax = fig.add_subplot(111)

x = pl.arange(-5,5,0.1)
ax.plot(x, x**2-8.8)

xmin, xmax = ax.get_xlim() 
ymin, ymax = ax.get_ylim()

# removing the default axis on all sides:
for side in ['bottom','right','top','left']:
    ax.spines[side].set_visible(False)

# removing the axis ticks
pl.xticks([]) # labels 
pl.yticks([])
ax.xaxis.set_ticks_position('none') # tick markers
ax.yaxis.set_ticks_position('none')

# wider figure for demonstration
fig.set_size_inches(4,2.2)

# get width and height of axes object to compute 
# matching arrowhead length and width
dps = fig.dpi_scale_trans.inverted()
bbox = ax.get_window_extent().transformed(dps)
width, height = bbox.width, bbox.height

# manual arrowhead width and length
hw = 1./20.*(ymax-ymin) 
hl = 1./20.*(xmax-xmin)
lw = 1. # axis line width
ohg = 0.3 # arrow overhang

# compute matching arrowhead length and width
yhw = hw/(ymax-ymin)*(xmax-xmin)* height/width 
yhl = hl/(xmax-xmin)*(ymax-ymin)* width/height

# draw x and y axis
ax.arrow(xmin, 0, xmax-xmin, 0., fc='k', ec='k', lw = lw, 
         head_width=hw, head_length=hl, overhang = ohg, 
         length_includes_head= True, clip_on = False) 

ax.arrow(0, ymin, 0., ymax-ymin, fc='k', ec='k', lw = lw, 
         head_width=yhw, head_length=yhl, overhang = ohg, 
         length_includes_head= True, clip_on = False) 

# clip_on = False if only positive x or y values.

pl.savefig('arrow_axis.png', dpi = 300)


Berlin11 Satellite Conference – Outcomes

Important: The recordings are now available at the conference website!

The Berlin11 Satellite Conference for Students and Early Stage Researchers took place on November 18th, 2013. Not being able to attend, I only watched some parts of the livestream broadcast, relying on the promise that video recordings of the presentations will be posted shortly after the conference. Three months after the event and after a month of unsuccessfully trying to get a statement from the organizers on whether there’s still any chance to see the videos, I realize that neither recordings, nor any other outcomes of the meeting will be published. It’s hard for me to understand how a conference on Open Access ends up portraying exactly the closed and inaccessible behaviour, that the concept of Open Access itself is trying to battle.

So I decided to do something about it. Following the example of the Beyond the PDF2 conference, I’m trying to compile an outcomes page with any resources, summaries and reports to the conference I can find. I’m hoping that eventually I would be able to collect at least the presentations of all speakers. So far I’m 4 out of 9 and I want to thank those presenters for making their slides accessible to everyone who couldn’t attend the conference. Thank you!

Presentations:

Jack Andraka:
Carl-Christian Buhr: Making EU Open Access Policies Work (Slideshare)
Heather Joseph: Open Access Update (direct download)
Iryna Kuchma:
Cameron Neylon:
Ulrich Pöschl:
Bernard Rentier:
Alek Tarkowski: From Open Access to Open All (Slideshare)
Mike Taylor: Towards universal Open Access: what we can do about it, and who should do it. (blog post, direct download)

Reflections and Summaries:

Open Access Working Group: Recap of the Berlin 11 conference: the call for a change in scientific culture becomes stronger (blog post)
Open Access Button: The Open Access Button Launch Roundup (blog post)

Please help! If you have access to any resources, presentations, summaries from the Berlin11 Satellite Conference, anything that you think could contribute to this list, please contact me and I will add it here.

————————————————————————————————————————–

EDIT1 (Feb 22, 2014): There’s good news! One of the organizers of the conference contacted me and explained the situation. Audio recordings of the talks will be published soon (1-2 weeks). I’m also hoping that all presentations will be posted and will link from here once the official site has been updated.

EDIT2 (May 7, 2014): Almost three months after my conversation with the organizers about recordings of the talks, neither audio nor slides have been officially posted. I still feel strongly about this topic and I realize that the good intentions of the organizers to make the talks publicly available was (is?) present. For now I hope that in planing potential future events the “post-production” will be considered as in important part of the conference, as it is for those not being able to attend in person the best opportunity to still participate.

EDIT3 (August 7, 2014): The recordings have been posted! You can find them at the conference website. This is fantastic and I hope that with OpenCon 2014, a follow-up event to the successful satellite conference in Berlin, coming up, quite some people might be interested in going back and listening to these talks!

————————————————————————————————————————–


Pygame presentation for Python User Group

In our most recent Python User Group meeting, I presented a short introduction to Pygame:

You can find the source for the slides here.


Template for code highlighting with minted in LaTeX beamer

Syntax highlighting can be achieved in LaTeX via listings or more recently with minted. The latter package uses Pygments to create beautiful code highlighting and includes fantastic additional features such as line numbering.

Minted’s compatibility with the Latex beamer class, however, is restricted and some workarounds (as laid out by Tristan Ravitch in his blog post) are needed to assure full functionality of both the beamer class and minted.

Here’s a template I created for anyone who wants to present code with the beamer class and the minted package. Slides created in such way can then, for example, look like this:

beamer_syntaxhighlight

Get the TeX source code!


Mathjax theorems – CSS for LaTeX-like environments – Custom names

EDIT: As mentionend in the comments, this solutions does not currently work in all browsers.
EDIT2: Paul Siegel has commented below and shared a solution that works again in all browsers. His implementation should be preferred. For ease of reading, I’ve posted his solution on my new blog. Many thanks to him!

————————————————

In a fantastic post on his blog earlier this year, Zachary T. Harmany explained how LaTex-like environments, such as definitions, theorems and proofs, should be done with CSS.

His solution is a great way for writing mathematics on the web and I took his idea one step further, looking for a way to have custom names for theorems, which we would TeX as

\begin{theorem}[Prime numbers]
All odd numbers are prime.
\end{theorem}

To combine such functionality with Zachary’s approach, I got the right idea from BigMacAttack’s answer to a stackoverflow question about a related topic.

Here’s what I am suggesting:

.theorem {
    display: block;
    font-style: italic;
    margin-left: 4.5%; margin-right: 4.5%; margin-top: 2em;
    content: "Theorem ";
}
 
.theorem:before {
    content: inherit;
    font-weight: bold;
    font-style: normal;
}

With this one could write

<div class="theorem" style="content:'Theorem (Prime numbers)';"> 
All odd numbers are prime.</div>

to get

Theorem (Prime numbers) All odd numbers are prime.

For more about this, please visit Zachary’s original post !


Ellipse Grid Python

Recently, I had to deal with the problem of distributing (roughly) N points in a grid-like manner on an elliptical surface. A quick search for a programmatic solution brought up John Burkardt’s Ellipse Grid.

As I’m almost exclusively working in Python, I made a quick translation of his C++ script. View and download it on Pastebin.

ellp