Monday, April 16, 2012

Studying data science is a lot like being a data scientist

Note: Last year I decided to commit to data science as a career. I did this analysis to brush up on some skills, and to show potential employers how I solve problems. It worked like a charm, and I recommend it to new data science people. - K2 - 2013-04-01

Data science is like anything else: the best way to learn to do it is to do it. This is challenging if you’re winging it, because there isn’t a clear path laid out for newbies. There are lots of free / low cost resources out there, but most of them assume some previous knowledge from the other resources. It’s unclear what comes first, which data philosophy an author / instructor is operating on (there are several), or which techniques are most practical in the real world. Thus, learning data science is a lot like doing data science: you start with some half-formed questions, search and slice until you have some half-formed answers, organize them somehow, refine your questions and start again. Making it work curiosity, and a knack for sorting through giant piles of unsorted information and turning it into categories. The good news is: you’re probably already good at that, which is why you’re interested in data in the first place.

The other good news is that I’m going to lay out some of those steps & terms for you here. Personally, it drives me nuts when things are made to seem harder or more forbidding than they have to be. While data science isn’t for everybody, there are way more people out there who would be great at it, than there are people who know they would. The industry is going to need all of us: the ones who know they can do it and the ones who don’t. So, I figure, let’s lower the bar of entry. If each newbie works to make it easier on the next newbie, before we know it there’s an army of us well-poised to ask and answer fascinating new questions about human behavior.

The flip side is that since I’m just getting oriented myself. Collaboration is the steam that makes data science go: if you want to add a resource, step in the process, or advice to this ground-up tutorial series, let me know.

Next: we start by doing. I’ll set you up with a couple of user friendly tools that let you circumvent some (though not all) of the initial technical hurdles, so you can get directly into the fun part: data analysis.

No comments: