My Profile Photo

Sophia N. Wassermann, PhD


Marine ecologist interested in quantitative approaches to issues at the intersection of fisheries and climate change. Postdoc in the Punt Lab, School of Aquatic & Fisheries Science, University of Washington, in collaboration with the NOAA Alaska Fisheries Science Center & Northwest Fisheries Science Center. PhD in Earth & Ocean Science from the National University of Ireland, Galway.


Tools for Learning Python: End of the Summer of Data Science 2017 and Back to School

Ahh back to school. The time when I used to fawn over Lisa Frank school supplies. Now, I watch as undergrads, so hopeful, with so few gray hairs, once again clomp through my building.

my childhood

via https://goo.gl/MpdeZ5

In the spirit of the season and at the end of the Summer of Data Science 2017 (#SoDS17), I thought I’d give you a rundown of the resources I’ve used to teach myself Python. I didn’t get very far with the deep learning goals I outlined at the beginning of the summer (blog post here), as I realized that I needed to focus more on the basics of Python before I tried anything more advanced (blog post here). All in all, though I didn’t progress as far as I had wished, I do feel like I have a better foundation that will benefit me further down the road. If you’re also interested in learning Python, read on!

First off, why should you learn Python?

Computers are a huge part of our lives and as much as I would like to keep believing they’re controlled by gnomes, it’s a good idea to have some concept of how programming and coding work, even if computers are still a mystery.

Also, if you’re in any flavor of scientific field, programming is super useful. I’m sure those of you who have had to do any degree of quantitative work have dabbled in some statistical software: Excel, SPSS, JMP, Minitab, MatLab, etc., and probably have been exposed to R. R is great. with R and other programming languages, any statistical tests you run don’t involve filling in boxes, which then disappear into a black box. You must, to some degree, understand the test if you want to make sense of it. This requires a bit more work at the beginning, but is incredibly rewarding for conducting better science. Perhaps this is a story for another time, but I thought I wasn’t great at math and have ended up carving out a niche for myself in computative ecology: heavy on the math. It’s really not that scary and languages like R and Python can actually help you understand statistics and modelling as you write your code.

Additionally, R is open source, meaning it’s free and all sorts of smart people have developed extensions (libraries) that might do exactly the statistical test or model you need. There’s also tons of support online, and often the answer to your question is a google search away.

Python is similar to R in that it’s open source and widely used, but there is one major difference. While R is used almost exclusively for data science, Python is a general purpose, or full-stack, programming language, meaning it is used for a huge number of applications, from building software and websites to conducting statistics. Python is very useful for data science, however, since there are very robust libraries for managing data, running statistical tests and models, and building graphs.

Also, it’s named after Monty Python and while that has no practical bearing on how it works, it’s a fun fact.

So, how do you learn Python?

That depends mostly on how you learn. Also, one caveat: since I learned Python for data science, that’s the type of resource that I am familiar with.

There are great books, explanatory videos, and online material that allow you to dive deep into the theory and practice of Python before getting your hands dirty, if that’s your preferred learning style. It’s not mine, so the best I can do is turn you towards DataSciGuide.com, which has a list of Python books, podcasts, and other resources. Full disclosure, I help maintain DataSciGuide, so if you have any recommendations, please let me know.

I prefer interactive tutorials, where you’re shown a short video or read a short description and then you fill in code for yourself. I’ve used two sites for introductory Python tutorials: DataQuest.io and DataCamp. Both sites have a variety of interactive courses aimed at absolute beginners who want to learn Python, R, and other technologies for data science specifically. The main difference between the two is that DataQuest’s tutorials consist of a block of text that you read and then you complete exercise, while DataCamp uses short video lectures to introduce their exercises. Both sites are great for learning Python. Which to choose will depend mostly on your preferred learning style. They’re both free for the introductory classes, so you can check out both before committing.

After the intro level, you’ll have to start paying for classes. DataCamp has a larger variety of courses and I’ve had good luck with the ones I’ve taken so far. I followed up their intro course with the intermediate Python and Python toolbox courses, then data visualization in Python. I’ve also started their Deep Learning course, which is pretty different in terms of material, but I like it so far!

I hope you’ve found these resources helpful and best of luck on your programming or data science journey. Soon I’ll be posting a rundown of the software and resources I use on a day-to-day basis, conducting my own research. In the meantime, I’ll be here, moving along slowly :þ

Best,

-Sophia