The scientific world has been rocked by some notable instances of allegedly falsified data recently; Andrew Wakefield, Marc Hauser, Phil Jones and others. These prominent researchers have suffered, some would say justifiably, public humiliation and engendered scepticism for their fields of research.
However, to point out their failings without addressing the broader issue of how scientific publication proceeds from data collection to publication would be myopic. Scientists collect data of one form or another, then put it through the machinations of statistical techniques and find... well that's where it gets tricky!
There's often a lot riding on findings: research grants, prestige, bets even. Therefore there is a certain bias that is inevitable. How can this be minimised? I would argue that we need to overturn the idea that we keep data close to our chests and only publish digested results. In this endeavour, we can learn much from the free software community. It is certainly possible to pursue a career and profit from software without making it closed and proprietary, likewise having other scientists inspecting the 'private parts' of our data and statistics can also be useful move, and not just for preventing fraud.
More eyes equal more hypotheses and data can be added to create a corpus for new analyses, which brings in the important issue of which statistical methods we use (more on Bayesian and mega-analysis methods in an upcoming post) and which statistics packages we choose (more on open-source statistics coming too, most notably the R project).
There are a number of neuroscience sites leading the charge as far as providing open data, here are a few (please post any others you know of in comments and I'll add them to the list).
- Brainmaps and Brainmaps:Links
- OpenfMRI run by by Russ Poldrack at the University of Texas at Austin is for the sharing of raw data, much better for mega-analysis than pre-processed data. Also adheres to open data commons guidelines
- The fMRI data centre - currently not accepting submissions
- The Open Access Series of Imaging Studies (OASIS)
- International Neuroimaging Data-sharing Initiative started with resting connectivity data, now more general
|Name||Active||Maintainer||Size||access tools||raw data||processed data||licence||notes|
University of Texas
|INDI - International Neuroimaging Data-sharing Initiative||Yes||Maarten Mennes|
|yes||no||Attribution Non-Commercial||resting mode data with phenotypes, also R-fMRIpackage|
|Brainmap||Active||Research Imaging Institute |
University of Texas
|2155 papers||custom, closed source software||No||Yes||Copyrighted with limited use licence||coregistered data from preprocessed, categorised studies|
Many of these sites provide co-registered data, for example EEG/MRI in the same participants. As computing power improves, I believe that many more discoveries will come from re-analysing these massive data sets than necessarily running new experiments. Or at least using the data sets to develop appropriate hypotheses for testing.
To this aim, I will be publishing raw data for my current neuroscience experiments in parallel with journal articles and encouraging you all to reuse the data under copyleft. Hopefully you'll all feel empowered to do the same. More to follow on how to do the publishing and where....