I agree with most of what you are saying, but many of the issues around code reuse and documentation are a problem of time. There are too few bioinformatician hired, the key word is hired, at most institutions. This means that most bioinformaticians are seriously over committed and are looking for easy (cheap) ways to complete work before they are yelled at by the PI. PIs do not understand software development AT ALL and if you do not have someone who can stand up and explain it, then these issues will continue. I can't tell you how many times I have heard "What do you mean that you need 4 weeks to properly analyze that data? I have the paper going out in 2 weeks" or "How can it take 6 months to develop a pipeline? I could get a grad student to do it in 2 weeks." The other refrain I hear is "It costs that much? I could add 1000 samples to my experiment for that amount of money." Sorry but you should have talked to me before writing the grant. Bioinformaticians and biologists share in the problems here.
Competing interests
none.
The Command Line Is More Effective for Reproducibility and Re-Use
Jan T Kim, The Pirbright Institute
15 January 2015
An awkward command line interface (CLI) can surely be a pain, but fixing or mitigating that is very easy, compared to dealing with a program that can only be used interactively via a GUI. Many CLI based programs have had a more "user friendly" interface wrapped around them, e.g. the PHYLIP [1] programs have been wrapped for EMBOSS [2] in EMBASSY, and in turn there are various GUIs for EMBOSS; and Galaxy [3] provides a framework for wrapping a web interface around any tool that can be driven non-interactively. From this perspective, any CLI is much more conducive to re-use than a GUI that is the only way to use a program.
Perhaps even more importantly, GUI only programs often don't keep a log of the input users provide interactively, and tend to allow users to manipulate data ad hoc. In doing so, they promote violation of the "Ten Simple Rules for Reproducible Computational Research" [4].
For these reasons I think that anyone who provides a CLI at all isn't trying hard enough to not be a bioinformatician.
[4] Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Competing interests
None declared
Reply to David Sexton
Manuel Corpas, manuelcorpas.com
15 January 2015
Thanks a lot David for your comment, I totally agree with your concern of the fact that there are many PIs who have no idea of what it takes for bioinformaticians to do their work.
Nevertheless, even if more bioinformaticians are hired, will unsavvy PIs ever understand what it takes for a bioinformatics pipeline or anaysis ever to be carried out?
I agree, but..........
12 June 2012
I agree with most of what you are saying, but many of the issues around code reuse and documentation are a problem of time. There are too few bioinformatician hired, the key word is hired, at most institutions. This means that most bioinformaticians are seriously over committed and are looking for easy (cheap) ways to complete work before they are yelled at by the PI. PIs do not understand software development AT ALL and if you do not have someone who can stand up and explain it, then these issues will continue. I can't tell you how many times I have heard "What do you mean that you need 4 weeks to properly analyze that data? I have the paper going out in 2 weeks" or "How can it take 6 months to develop a pipeline? I could get a grad student to do it in 2 weeks." The other refrain I hear is "It costs that much? I could add 1000 samples to my experiment for that amount of money." Sorry but you should have talked to me before writing the grant. Bioinformaticians and biologists share in the problems here.
Competing interests
none.
The Command Line Is More Effective for Reproducibility and Re-Use
15 January 2015
An awkward command line interface (CLI) can surely be a pain, but fixing or mitigating that is very easy, compared to dealing with a program that can only be used interactively via a GUI. Many CLI based programs have had a more "user friendly" interface wrapped around them, e.g. the PHYLIP [1] programs have been wrapped for EMBOSS [2] in EMBASSY, and in turn there are various GUIs for EMBOSS; and Galaxy [3] provides a framework for wrapping a web interface around any tool that can be driven non-interactively. From this perspective, any CLI is much more conducive to re-use than a GUI that is the only way to use a program.
Perhaps even more importantly, GUI only programs often don't keep a log of the input users provide interactively, and tend to allow users to manipulate data ad hoc. In doing so, they promote violation of the "Ten Simple Rules for Reproducible Computational Research" [4].
For these reasons I think that anyone who provides a CLI at all isn't trying hard enough to not be a bioinformatician.
[1] http://evolution.genetics.washington.edu/phylip.html
[2] http://emboss.sourceforge.net/
[3] http://galaxyproject.org/
[4] Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Competing interests
None declaredReply to David Sexton
15 January 2015
Thanks a lot David for your comment, I totally agree with your concern of the fact that there are many PIs who have no idea of what it takes for bioinformaticians to do their work.
Nevertheless, even if more bioinformaticians are hired, will unsavvy PIs ever understand what it takes for a bioinformatics pipeline or anaysis ever to be carried out?
Competing interests
none