Skip to content

In Defense of Open Source Software

Over at Small Pond Science, Terry McGlynn, Amy Parachnowitsch, and Catherine Scott regularly post informative and useful blog entries about “how scientists research, teach and mentor in all kinds of academic institutions, including teaching-centered universities”. I’m a fan, and for the most part a lurker, which means that I mostly read and rarely contribute. One of the main reasons I lurk is because I want to keep myself out of trouble. I find it hard to convey tone in less formal and more conversational online platforms like Twitter and comments sections. Sometimes what I intend as a good-natured chiding can cause offense 1. I also forget that there is an asymmetry of experience between someone who engages actively and a lurker – lurkers can feel like they “know” someone because they have been reading public posts from an author or a site for a long time. From a site author’s perspective, however, a comment from an infrequent poster carries none of that history and familiarity.

This all serves as a preamble for an expansion of my thoughts on a Small Pond Science post. In a recent Small Pond Science comments section, I stopped lurking and made a snarky and ill-considered analogy criticizing the title of this Small Pond Science post:  Open source software doesn’t necessarily mean we’ll have better stats . I’ll spare myself the embarrassment of repeating the analogy here (you can find the comment section here), but I was encouraged to take all the space I needed to explain my position. I’ve thought about it for a bit and decided that, in addition to apologizing for the tone of my comment, I’ll add an explanation of my position to this blog entry.

I simply disagree with a premise that any open source statistical package promises better stats. Rather, open source statistical packages promise the same stats, just more accessible to users regardless of the size (or existence) of user bank accounts. This Wikipedia page on statistical packages places software into four categories: open source, public domain, freeware, and proprietary. For the most part, you’d probably be hard pressed to find any specific analysis that you can conduct in any one package that you could not recreate in a separate package. The main story of the Small Pond Science post compares an experience running a statistical analysis on a dataset with the open source package “R” as well as with the proprietary package “JMP”. In recent years and in some biological subfields, R has become the de facto standard for statistical analysis. This has been difficult for me, because, for the most part, I received my statistical training on an older, proprietary format (in fact, I still have the tattered lab instructions for my first zoology lab in the late 90’s in which we used JMP to compare body lengths of soldier beetles!). The R user experience can still improve, but on the whole, this change from the previous status quo is a marked improvement. Here’s a short list of reasons why:

  1. The software license for JMP costs $1620. In contrast, R is completely free to download.

That’s the end of my list. The cost of JMP is prohibitive and, as a consequence, denies participation by a huge section of the scientific community.  I work at a small regional university that caters mainly to commuter, first-generation students. Our students sometimes have trouble purchasing our textbooks, which are an order of magnitude less than the cost of a single JMP license. In a sense then, open source is “better” because it allows for greater inclusion and more participation.

At its core, though, the post  Open source software doesn’t necessarily mean we’ll have better stats  wasn’t about the actual statistical analyses that R vs JMP produce, or about the user base that each program serves due to their costs. Rather, the post was a valid critique of the user experience between JMP and R. I fully agree that the learning curve for R is steep and can possibly lead to user error issues. In grad school, I ran into issues with collaborators using R that were eventually fixed by ensuring that we both had downloaded the most-up-to-date package (i.e a “clean” install”). It can present real problems for collaboration, trouble-shooting, and teaching if a product is overly customizable. In my opinion, user-experience issues are neither endemic to open-source software nor statistical software (in fact, I have had greater frustrations with bioinformatics software on the whole than statistical software). My good friend Katie Hinde has made a similar critique of open-access with one of my absolutely favorite terms: it’s not open access to everyone if it’s hidden behind a paywall of jargon. While this comment was made specifically for publications, I think it also applies to software.

To recap, I think the R vs JMP themed post on Small Pond Science was, like nearly all the posts there, informative and useful. I disagreed with the choice of title and the framing of the critique as an open access issue. In my opinion, user experience problems affect many software packages – both open source and proprietary. Fortunately, as R has become more and more common, there is a terrific community of folks that are willing to help out. Many use the #rstats tag on Twitter.

  1. It bears repeating that it is the offense that matters, not a speaker’s intent.


Where to find me

In case you can’t tell, this webpage is infrequently updated. My last post was about accepting the Visiting Assistant Professor gig at Oberlin College, and I have since completed that two-year stint and began a tenure-track Assistant Professor position at Dominican University in River Forest, IL. If you are looking for more regularly updated information about me, here is my Google Scholar page, my ResearchGate page, and my Twitter feed. I’ve also posted an  updated CV here (1 Sept 2016).

Back in Ohio!

As of July 1st, I accepted a visiting assistant professor position in the Oberlin College department of Biology. This means I have left my postdoctoral researcher position in the Cordoba lab at UNAM and my assistant researcher position in the Grether lab at UCLA.  While I am sad that I will not see my friends in Mexico City and Los Angeles as often as before, I am excited to move on to this next stage in my academic career.  I have been fortunate to acquire some adjunct teaching experience at UCLA and Occidental College and hope that this position at Oberlin will further prepare me for the tenure track academic job market. I will be teaching Freshwater Invertebrate Biology in the Fall Semester and helping with the second course of the year-long intro series during the Spring Semester. I am looking forward to an exciting year at Oberlin!

Conservation genetics of odonates

A recent research aim of mine has been acquiring molecular genetic data for Hetaerina and Paraphlebia damselflies. The costs related to generating these type of data are a fraction of what they were just a few years ago.  Last year, I was lucky enough to get to run a Hetaerina americana specimen and a Paraphlebia zoe specimen on the UCLA Genotyping and Sequencing Core’s 454 High-Throughput Sequencer.  These data can be used for many applications, but I have been using the sequence data to develop microsatellite primers. The students and staff at the UCLA Conservation Genetics Resource Center have been super helpful in providing logistical support and advice. We have recently published the Paraphlebia markers here.  I expect to publish the Hetaerina markers later this spring. I am especially excited about the Hetaerina markers because our results show that many of the markers that amplify for H. americana also amplify for other species of Hetaerina. Ultimately, these markers will be used to help resolve questions about population structure and geneflow among populations of damselflies.

Agonistic character displacment

An excellent video explaining the concept of agonistic character displacement.  Great work by Neil Losin.

Is Kin Selection Dead?

I went to a nice seminar today hosted by the UCLA Center for Behavior, Evolution and Culture. Peter Nonacs gave a talk titled Is Kin Selection Dead and Is It Time to Move On in Understanding the Evolution of Cooperation?. E.O. Wilson has been publishing a critique of kin selection for years, but his most recent article in Nature has generated a lot of recent activity. In particular, this response’s author list reads like a “who’s who” of eminent evolutionary biologists.

Peter spent the first part of his seminar introducing the debate, which, as he puts it, centers on the difference between an “actors-view” perspective on the evolution of cooperation or a “genes-view” perspective on the evolution of cooperation. Traditional kin-selection theory takes an actors-view perspective which focuses on evolution at the level of the individual or indirect benefits to the individual through indirect fitness. The “genes-view” perspective is more of a multi-level selection argument a la David Sloan Wilson. In the genes-view perspective you don’t need to take relatedness into account to explain the evolution of cooperation. The math is a bit beyond me, but Peter summarized it briefly by saying that the central observation is that individuals breed better in groups than in solitary arrangements and you don’t need to take relatedness into account to explain group formation. I believe you do need to get some level of population structure across groups to get the evolution of cooperation in the multi-level selection framework – so in a way, even in multi-level selection theory you do need to take genetic similarity into account. Some have suggested that the differences between the two groups is a matter of semantics.  I think that Peter thinks that the differences between the groups are real, but that each framework may be at times useful.

Or at other times, perhaps neither framework works well.  Peter spent the second half of his talk introducing the concept of social heterosis. It’s a very interesting topic, but I am not quite sure how well it fits in the kin selection debate. Peter claims that social heterosis provides a pathway to cooperation that does not require the clustering of related individuals. Heterosis is another term for hybrid vigor – in many breeding experiments, outcrossing leads to healthier offspring than highly inbred individuals. Inbreeding typically leads to the expression of deleterious recessive alleles. If you know a couple of folks with pets, you may be aware that owning a mutt will generally lead to a healthier, long-lived animal than owning a “pure-bred”, which are often susceptible to diseases and genetic abnormalities. Social heterosis applies the same logic of the reproduction of an individual, but instead applies it at the level of a group. In 2007 and 2008, Peter and his PhD student Karen Kapheim, published a series of papers that models the social heterosis mechanism. The model estimates a within-group fitness benefit and an across-group fitness benefit. Across-group fitness benefits are primarily determined by the genetic diversity of the group composition and increasing group diversity leads to higher fitness. Peter has a nice slide illustrating the benefits of diversity which shows a photograph of Karen and another grad student, Brittany Enzmann picking fruit in an orchard. Karen is tall and picks the fruit near the tops of the trees while Brittany is short and picks the fruit from the bottom. If they were the same height, they would not be able to exploit the resources from the environment as efficiently. As it is the NBA playoffs right now, I thought of another illustrative example. A basketball team composed of all point guards or all centers would not do as well as a basketball team that had players that specialized in the different rebounding, shooting, and dribbling skills. I see the social heterosis concept as a potential answer to the question “what maintains genetic diversity” in some traits, but missed how it provides a solution to the central differences between the kin-selection and the multi-level selection viewpoints. I am not sure Peter claims that it is a potential solution, but is rather another model of group-trait evolution.

I need to read Peter and Karen’s paper a little more closely to look at their potential case studies, but I wonder whether the social heterosis mechanism is widespread. I am especially interested in whether there are any natural examples of group formation where close relatives are excluded and foreign individuals are readily incorporated. There are many examples of sex-biased dispersal to prevent inbreeding, but the other sex typically remains around.  In cooperatively breeding groups, aren’t helpers almost always relatives? It seems to me that kin-structured groups are prevalent in nature – primarily due to philopatry. It will be interesting to see if the social heterosis concept takes off, but it is clear the the evolution of cooperation is a hot topic in the community right now.

The Future of the PhD

Nature this week has a special issue dedicated to the future of the PhD. Some good points raised on Metafilter.