peer reviewed journals for source code and data: narrative forms for the modern scientific method
greg wilson has a great post about a new journal devoted to source code for biology and medicine. it’s a fascinating idea, and greg asks, “why don’t we do this?”. clearly there are many reasons; one is the significant financial reward awaiting programmers for solving problems better than others for an extended period of time. in the research community, the prestige of publishing a result often outweighs the financial rewards of keeping that result to yourself.
even for computer science professors, the allure of financial rewards keep them from providing their code to the open source community for criticism: see, for example, ken birman’s licensing of the astrolabe [pdf] source code to amazon for a large sum.
suppose we could align incentives appropriately and a healthy community of peer reviewed journals for source code emerged. how would you structure one of the articles in these journals? they would probably take inspiration from don knuth’s literate programming (see also knuth’s book on the topic). tools for literate programming seem to be getting more sophisticated recently (especially in the two languages i use most, python and r). another related idea is reproducible research, which involves publishing empirical data in addition to code. it’s a logical extension of literate programming to the scientific realm.
as an aside, it’s somewhat notable that these journals are emerging outside of the computer science community, where code is needed primarily to manufacture results. i suppose it’s indicative of the strange relationship computer science professors have with programming, as pointed out to me by a professor of statistics this weekend.
in fields where i meander from time to time, there has been some progress on this issue. well, sort of, though none are quite analagous to the journal highlighted by greg’s post. statistics has the journal of statistical software; machine learning has mloss; and databases have the vldb experiments and analyses papers.
i’ve always enjoyed books that use actual code to illustrate their ideas (example one, example two), and i’d love to see this trend extend to the academic literature. as we all adjust to the new tools of modern science (hypotheses, code, and data), having a unified narrative method for conveying your results to other researchers will grow in importance. it looks like medicine ance biology are moving pretty quickly on this front; if you know of any other examples that illustrate how certain fields are moving forward with literate programming or reproducible research, please send them my way!