Becoming a Data Scientist

Weka data mining program

It’s been a busy time since my last post, switching jobs, taking a full load of classes, and spending the remaining time with my family. Now that things are settling down into a predictable routine with some time to write, I really wanted to give an update on my path to becoming a data scientist.

I’m kicking off the fall session at the community college where I’m pursuing a certificate in computer programming with a class in C++, which so far has been rather easy, despite much of the curriculum being based on lecture notes that were created by the instructor, who is friendly and funny but not an expert in English. So, we’ll see how that goes.

The latest class I’ve taken in UCSD’s Data Mining certificate program, Data Preparation, was just about as bad as the Data Mining I class. The video lectures were not very instructive, as the teacher mainly resorted to simply reading the bullet points on the slides and constantly adding her seemingly favorite expression, “and so on and so forth”, without really explaining anything. On occasions, she can even be heard sighing deeply as if she was annoyed to be teaching the class. Never mind the fact that all the lessons were pre-recorded and that she never participated in the class’s online discussion board.

Thankfully, it seems that the data gods have taken notice of the suffering of newbie data geeks and have made available a handful of awesome and FREE online learning courses devoted to data mining.

  • My latest find is the perfect antidote for the overpriced crap that is UCSD: Data Mining with Weka, by the creator of Weka itself, Prof. Ian Witten. The video lectures are thoughtful, thankfully brief, and incorporate relevant hands-on exercises – all of which have been woefully absent in UCSD’s classes.
  • Not as comprehensive, but a fantastic and accessible introduction to R is Google’s very own Intro to R video series.
  • Another one that I haven’t started yet but looks very promising is Cal Tech’s Learning from Data course that teaches the basics of machine learning through the open-source tool Octave, which is like a free version of the popular Matlab application.
  • Finally, another great resource I’ve stumbled across is UCLA Institute of Digital Research and Education’s online library of tutorials and references for a variety of statistical tools, including R, SAS, SPSS, and STATA.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: