Monday, February 27, 2017

Release the kraken

What happens when you collaborate with non-ecology scientists?

- You create a Transposable Element (TE) simulation model called TEWorld, with this logo:

- One of your collaborators initials in this project are TE (and you only realize this after collaborating for 4 years)

- You need to run said simulations on sharcnet, and one of the clusters is you will use is called kraken (or orca, or requin, or saw; obviously computer scientists are not biologists, or they are and the inclusion of orca is reverse reverse psychology).

- You receive links to these types of youtube videos:


- You descend into philosophical rabbit holes, and come out transformed.

Tuesday, February 21, 2017

Is doing a PhD a waste of time?

Some people definitely think so. These types of articles crop up regularly. Here is a recent version of this type of analysis that re-appeared in Medium, but is originally from 2010. These are like the villain in a horror movie, every time you think he is finally dead, ominous music fills the theatre. The linked-to article has some aspects that set it apart, though. It provides a voice of faculty members, in addition to some short-writing gems. The quote below has both:
"Monica Harris, a professor of psychology at the University of Kentucky, is a rare exception. She believes that too many PhDs are being produced, and has stopped admitting them. But such unilateral academic birth control is rare."

Monday, February 13, 2017

I will never look at a boxplot in the same way...

https://xkcd.com/1798/

Friday, February 3, 2017

Standing on the shoulders of giants

I am currently teaching a graduate "Stats" course, which is more a historical exploration of statistical issues in ecology, led by grad students. As part of the course, we are also exploring best practices in R and ecological data management. So naturally we covered Brian McGill's 10 commandments for good data management, and his follow-up post with an example application of these recommendations with a toy data set.
I decided afterwards to do the challenge, and with our weekly University of Guelph R Users group (UGRU) we walked through the code line by line, and discussed why certain lines were included, alternative ways to code them, advantages and disadvantages of these alternative approaches. It took us 3 hours of exploration, and I have captured our discussion in an alternative R script file, where our notes are preceded by "###" to differentiate them from Brian's comments.

Here is a link to this updated script file: https://drive.google.com/open?id=0B6C_pml53BPUQ1JKWV96NDFFVUU

Here is a summary of some of our observations:

  • The tidyverse package makes everything easy
  • read_csv is preferable over read.csv
  • tibbles are the way to go
  • reproducible code is very difficult (paths to files, outdated packages)
  • different philosophies with respect to keeping/creating intermediate files, and the value of long versus short file names
  • the flexibility of ggplot is awesome, and just as in base R, there are multiple ways to reach the same goal
  • and the biggest revelation for some of us: when you are piping, and your code is structured in multiple lines, you can still execute the whole block with one cmd/ctr-enter, without the need to highlight the block or step through it line by line!
Thank you Brian for the nice tutorial, RStudio for the functionality, and Maddie for the cmd-enter combination in a piping block, coding will be so much more efficient now.