Archive | September 2013

Mapping Data Made Easy

By the end of the Fusion Tables tutorial, you can create a useful population map of the Bay Area.

Wow. I’m loving Google more and more every day. It seems like they have a free service for just about anything you can imagine. Although I’ve heard about Google Fusion Tables before, I hadn’t really looked into it – until today.

First, I mapped a couple addresses and did simple things like change the color of the markers. Then, I went through this handy tutorial, which showed how to merge tables with publicly available data to create more powerful maps with polygons colored based on selected variables.

By the end of the day, I created a map of all the school districts in California. Next step, add metadata like district API scores.

Data Is the New Punk

“There are a few steps beyond just learning three chords when doing data journalism.”

I knew I was onto something when I got into data. As this article by Simon Rogers points out, data journalism is the new punk. Anyone can learn it, thanks to free tutorials, references, and tools online, and create compelling data visualizations right in their bedroom. As Dan Sinker, a former editor for Punk Planet, says in the article:

Where I think there are more parallels are in the fact that this is a young community (in years if not always age), and one that’s actively teaching itself new tricks every day. That same vitality and excitement that motivated punk, it’s motivating news hackers right now.”

I Am a Miracle

What Are the Odds?Most infographics smell too much of marketing and are so unimaginative with their visualizations that I’ve grown to ignore them. But this one called What Are the Odds? on Visual.ly was intriguing enough to get me to read the fine print to figure it out, and when I did, it was fascinating: the odds of my, or anyone’s, existence is near zero. Each of us is a miracle.

My Own Private Tower of Babel

Just as with the tower of Babel, there are many programming languages out there.

Trying to learn multiple programming languages is a pain in the head. Although many of the concepts and structures are similar, the syntax can be very different, as far as I can tell from my limited experience and knowledge. Currently, I’m tackling C++, R, and Stata, in addition to machine learning via the Weka application.

Despite the headache and nausea induced, I’m looking forward to the day when I master all these tools and am making a good living doing something gratifying.

Becoming a Data Scientist

Weka data mining program

It’s been a busy time since my last post, switching jobs, taking a full load of classes, and spending the remaining time with my family. Now that things are settling down into a predictable routine with some time to write, I really wanted to give an update on my path to becoming a data scientist.

I’m kicking off the fall session at the community college where I’m pursuing a certificate in computer programming with a class in C++, which so far has been rather easy, despite much of the curriculum being based on lecture notes that were created by the instructor, who is friendly and funny but not an expert in English. So, we’ll see how that goes.

The latest class I’ve taken in UCSD’s Data Mining certificate program, Data Preparation, was just about as bad as the Data Mining I class. The video lectures were not very instructive, as the teacher mainly resorted to simply reading the bullet points on the slides and constantly adding her seemingly favorite expression, “and so on and so forth”, without really explaining anything. On occasions, she can even be heard sighing deeply as if she was annoyed to be teaching the class. Never mind the fact that all the lessons were pre-recorded and that she never participated in the class’s online discussion board.

Thankfully, it seems that the data gods have taken notice of the suffering of newbie data geeks and have made available a handful of awesome and FREE online learning courses devoted to data mining.

  • My latest find is the perfect antidote for the overpriced crap that is UCSD: Data Mining with Weka, by the creator of Weka itself, Prof. Ian Witten. The video lectures are thoughtful, thankfully brief, and incorporate relevant hands-on exercises – all of which have been woefully absent in UCSD’s classes.
  • Not as comprehensive, but a fantastic and accessible introduction to R is Google’s very own Intro to R video series.
  • Another one that I haven’t started yet but looks very promising is Cal Tech’s Learning from Data course that teaches the basics of machine learning through the open-source tool Octave, which is like a free version of the popular Matlab application.
  • Finally, another great resource I’ve stumbled across is UCLA Institute of Digital Research and Education’s online library of tutorials and references for a variety of statistical tools, including R, SAS, SPSS, and STATA.