The internet is full of noise about what counts as “real” data science. This genre is generally a waste of time, but for beginners it can be particularly pernicious. If you’re already feeling like a fish out of water, the last thing you need to hear is that you’re a “fake” data scientist. Nothing more demoralizing!
The biggest challenge facing a beginner isn’t actually technical at all -- it is the difficulty of staying motivated. Learning only happens when you dive in and get your hands dirty, and the problem with these snobby articles is that they keep you from doing just that. They plant seeds of doubt. Maybe you are wasting your time by learning this tool. Maybe you shouldn’t even bother with a project like this. Doubts like this just encumber you while you strive to practice and improve.
So let’s dispel the two of the biggest myths that hold people back from learning data science.
#1: “You have to learn all the math first”
In traditional academics, knowledge is built “bottom-up.” First, you lay the foundation by learning the history and theory. Then you progress, layer-by-layer, from the fundamentals up to more advanced concepts. Finally, you reach the most modern and useful material.
How frustrating! To accomplish anything practical, it’s a long wait. Some will argue that this is the proper way to learn data science, but it’s the opposite of how people normally acquire skills in tech. Does a twelve-year-old computer whiz learn about CPU architecture, and then slowly work their way up to making web pages? No way. They just mess around with HTML and see what happens. At first, they make a mess, but through iteration, they improve their code and their understanding of computer science as well.
Learning in this “top-down” style means that you treat a lot of tools like a black box initially. You don’t know how they work internally, but you know which buttons to press to accomplish practical tasks. And that’s OK! If you’re able to get things done and stay motivated, you will piece together how everything works with time.
You will also get a better sense of day-to-day data science, compared to someone who goes with the “bottom-up” approach. Studying math doesn’t give you any sense of what a practicing data scientist does with their time. In the big picture, this kind of knowledge can be more important than anything technical. Lots of beginners are surprised to learn how much time data scientists spend cleaning up data, for instance. If that isn’t your cup of tea, it’s better to discover that now!
#2 “Data scientists only rely on rigorous statistics”
If you only learn from online data science tutorials, you might get the impression that all data science begins with a sacred data set. You download it, and by applying lots of math and statistics you get to the underlying truth. Which is not how it works. In reality, a lot of outside knowledge goes into understanding a data set.
How was the data set created? Your data is only as accurate as the process that created it. Was it created by a machine, or hand entered by an amateur? Is it a random sample, or just collected based on convenience? Questions like these are important and determine which techniques you can safely apply to your data.
The broader theme here is that domain expertise -- hands-on experience with the subject of your data -- is invaluable. Although data is a powerful tool, it doesn’t contain the full story. A common pattern these days is for data scientists to be paired with a domain expert, who can provide reasonable assumptions where the data is lacking. Teams like these are less likely to make embarrassing mistakes because knowing the ground truth helps in deciding which data to trust and in recognizing unrealistic results.
If you are transitioning into data science from another career, the good news is that you may be able to provide that domain expertise in a short bit of time, check out our Data Science Certificate.