Monday, February 27, 2017

Release the kraken

What happens when you collaborate with non-ecology scientists?

- You create a Transposable Element (TE) simulation model called TEWorld, with this logo:

- One of your collaborators initials in this project are TE (and you only realize this after collaborating for 4 years)

- You need to run said simulations on sharcnet, and one of the clusters is you will use is called kraken (or orca, or requin, or saw; obviously computer scientists are not biologists, or they are and the inclusion of orca is reverse reverse psychology).

- You receive links to these types of youtube videos:

- You descend into philosophical rabbit holes, and come out transformed.

Tuesday, February 21, 2017

Is doing a PhD a waste of time?

Some people definitely think so. These types of articles crop up regularly. Here is a recent version of this type of analysis that re-appeared in Medium, but is originally from 2010. These are like the villain in a horror movie, every time you think he is finally dead, ominous music fills the theatre. The linked-to article has some aspects that set it apart, though. It provides a voice of faculty members, in addition to some short-writing gems. The quote below has both:
"Monica Harris, a professor of psychology at the University of Kentucky, is a rare exception. She believes that too many PhDs are being produced, and has stopped admitting them. But such unilateral academic birth control is rare."

Monday, February 13, 2017

I will never look at a boxplot in the same way...

Friday, February 3, 2017

Standing on the shoulders of giants

I am currently teaching a graduate "Stats" course, which is more a historical exploration of statistical issues in ecology, led by grad students. As part of the course, we are also exploring best practices in R and ecological data management. So naturally we covered Brian McGill's 10 commandments for good data management, and his follow-up post with an example application of these recommendations with a toy data set.
I decided afterwards to do the challenge, and with our weekly University of Guelph R Users group (UGRU) we walked through the code line by line, and discussed why certain lines were included, alternative ways to code them, advantages and disadvantages of these alternative approaches. It took us 3 hours of exploration, and I have captured our discussion in an alternative R script file, where our notes are preceded by "###" to differentiate them from Brian's comments.

Here is a link to this updated script file:

Here is a summary of some of our observations:

  • The tidyverse package makes everything easy
  • read_csv is preferable over read.csv
  • tibbles are the way to go
  • reproducible code is very difficult (paths to files, outdated packages)
  • different philosophies with respect to keeping/creating intermediate files, and the value of long versus short file names
  • the flexibility of ggplot is awesome, and just as in base R, there are multiple ways to reach the same goal
  • and the biggest revelation for some of us: when you are piping, and your code is structured in multiple lines, you can still execute the whole block with one cmd/ctr-enter, without the need to highlight the block or step through it line by line!
Thank you Brian for the nice tutorial, RStudio for the functionality, and Maddie for the cmd-enter combination in a piping block, coding will be so much more efficient now.

Wednesday, January 11, 2017

R user group - UGRU

The last 2 years, I have "been running" a R User group, the University of Guelph R Users group (or UGRU). Normally I am the worst when it comes to acronyms, and now I was only bad, because UGRU has Gru in it, the main character of Despicable Me. During a field course, students compared my accent to Gru, and there is a scene in the movie that has its own meme: light bulb.

I hoped that be having people working together and solving similar problems would make the R light bulb go off. And last semester there were a lot of lightbulbs that lighted up. We worked through the first four chapters of the Grolemund and Wickham R for Data Science book, and I have convinced my first physiology colleague that R is awesome.

At the end of the semester, several participants shared their exploratory data analysis, and the word that kept coming up after they put their code together the meeting and the feedback and questions from the other participants was "surprise": patterns they had missed, variables not included, approaches not considered, etc. And this was a mix of people with years of coding in R to beginners. So the feedback and working together did lead to several light bulbs and happy R users!