Inspiring Resources for Data Science

Inspiring Resources for Data Science

I suffered from information paralysis when I started learning about data science. There are a hundred and one different websites, books, and videos to watch. I didn’t know if DataCamp was better than DataQuest.io or if I should just start reading one of the many O’Reilly books. I spent a fair amount of time on both of those sites and have read a good part of a few O’Reilly books like Think Bayes and Doing Data Science. I’ve read Introduction to Statistical Learning and plan on reading Elements to Statistical Learning in a few months. But all of those resources are very technical. As a new data scientist, you will often be encouraged to learn some of the many technical aspects of the field. I am always looking at the big picture. How does this new technique fit into solving problems with data science? This is the question I think is sometimes lost when you spend your time in the recesses of a books and journal articles. I have a few places I go when I feel like I am bogged down in the weeds of data science.

The first place I like to go to is YouTube. Past the entertaining DriveTribe videos and fail compilations, there exists a great channel called PyData. PyData “provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.” The YouTube channel offers a multitude of different lectures ranging the full spectrum of data science topics. I will often find myself listening to a lecture and following along with the lecturer’s Jupyter Notebook on my local machine. This is a great way to learn from folks using data science in the field. I recently listened to a great lecture by Patrick Harrison on modern natural language processing techniques. I would never have been able to learn from Harrison if it weren’t for PyData.

Books have always offered me a certain level of inspiration. Nate Silver’s The Signal and the Noise, Seth Stephens-Davidowitz’s Everybody Lies, and Christian Rudder’s Dataclysm are all great examples of books that provide context to data science. Each of these veterans of data science offers a unique perspective and subject matter expertise on a wide range of topics. All three of these books leave the technical details out and for the data scientist that may be unfortunate. Nonetheless, each book provides the reader with a new use case for data science and can offer you context to your new founded data science skills.

I believe in having a macro and micro view of data science. Your current data science project is a micro project. How it fits into the world is a macro problem. At the core of it, data science is about solving problems and acting like a detective with data. If you ever find yourself bogged down by where the lambda tuning parameter in Scikit-Learn’s regularization function is, remember that data science problems need context. You are solving interesting real world problems. Always remember that.

Understanding Movie Quality from Plot Summaries using NLP

Understanding Movie Quality from Plot Summaries using NLP

Project Management in Data Science and Parkinson’s Law

Project Management in Data Science and Parkinson’s Law