Any way you measure it, Python is booming in popularity. According to the most recent survey by Stack Overflow, Python is the “most wanted” programming language for the second year running. More people wish they could work with Python, in other words, than any other language. Usage numbers show the same kind of growth. Python has recently climbed into the top three on the TIOBE index, which tracks the number of engineers, courses, and vendors associated with each programming language. As for the future, this trend is likely to only strengthen, since Python has recently become the most common introductory programming language at top universities as well.
The Rise of Data
So what is driving all of this demand for Python? The single most significant factor is the rising importance of data. In recent years, data has become central to all kinds of businesses and professions. Today executives, marketers and UX designers are all expected to be “data-driven” in their work. Companies like Facebook and Google have proven that advantage in data can be massively profitable, and a plethora of startups based on AI and machine learning are seeking to follow suit. Wall Street itself is increasingly run by machine learning algorithms. And, from all of this demand, a vast array of tools for manipulating data have emerged. However, among all these tools Python is unique. Python alone combines:
- modern software infrastructure
- a broad open-source community
- short, readable code
To make clear why these particular benefits are paramount, let’s take a tour. We’ll look at a few types of data-centric work, and why people are choosing Python over the previous state-of-the-art.
Spreadsheets and Their Limits
The obvious place to start is, of course, with spreadsheets. Spreadsheets have been a massive success over the last four decades, helping a wide range of industries make better decisions with data. However, as the complexity of a spreadsheet grows, there are painful limitations to the tool.
One unfortunate feature of spreadsheets is that data and code are mixed in the graphical user interface. That is, cells store data as well as formulas applied to that data. While this makes spreadsheets easier to learn, it makes debugging them much more difficult. Formulas can link a given cell to cells from different spreadsheets, which in turn might be linked to cells from even more spreadsheets. This chain of dependencies can quickly become unmanageable and unpredictable, even for the smartest of users. As an example, one mistake in an Excel formula ended up completely undercutting a Harvard study that supported budget cuts following the 2009 financial crisis. Unfortunately, by the time the error was discovered, the study’s policy recommendations had already been adopted by several countries across the world. When you rely on data for important decisions, the limitations of your tools can have a big impact. Finding bugs in a complicated spreadsheet can be like looking for a needle in a haystack.
This issue is exacerbated by a lack of tools for managing collaboration and versioning. When there are multiple contributors and versions of a spreadsheet floating around, the likelihood of introducing bugs is multiplied. Microsoft Excel and Google Docs do not have good systems for tracking different versions of one file and merging them back together, so users end up relying on half-baked strategies like holding down the “undo” hotkey or creating similarly-named backup files. Strategies like these are very error-prone, and a single mistaken keystroke can result in lost work.
Modern programming languages like Python have proper infrastructure for managing problems like this. A “version control system” (or VCS) is a tool that allows you to create checkpoints at any moment in the evolution of your code. No matter what changes happen after that point, you can always safely revert to a checkpoint. Multiple contributors can start out from the same checkpoint and work separately, only to merge their changes back together. If there are any conflicting changes, the VCS will direct you to the exact lines where you need to decide which change to keep. For Python (and other modern programming languages), using a VCS is standard procedure and makes it very difficult to lose work permanently.
In general, Python has features that make it easier to use for complex projects. To start, data and code are kept separate from the beginning. Python code can pull in tabular data and manipulate it, but the two are not stored in the same file. Secondly, a standard Python installation comes with support for “unit testing,” a method that allows users to make automated tests for each separate part of their code. Whenever you make changes, these unit tests can be run to make sure you have not introduced any major bugs. If a bug arises, the unit tests will tell you exactly where it is. Features like this make working in Python feel fundamentally different than working in a spreadsheet.
Now let’s move on from spreadsheets to another category—proprietary tools for data. Proprietary languages like MATLAB, SPSS, SAS, and STATA have traditionally been used in engineering and social science because of the need for specialized statistical and mathematical features. Tools like these flourished in previous decades when collaborating on code was much more difficult. Under those conditions, the best way to create quality software was to have a team of paid engineers all working under one roof.
However, in the age of the internet, Python benefits much more from having an open-source ecosystem. Unlike with proprietary tools, users of Python can contribute back any enhancements they come up with. Through this processes of sharing code, Python has been able to match and surpass many of the shiny features that were previously only available through expensive proprietary tools. As an example, Python largest graphing library is Matplotlib. It was initially intended to replicate graphing capabilities from MATLAB (like the name might suggest), but today it has far surpassed anything MATLAB can do.
Similarly, one of the best features of Mathematica was being able to run an analysis in a notebook format, where code could be interwoven with tables and graphs of the underlying data. Now Python users have an even better version of the notebook with Project Jupyter. Thanks to the internet and open-source software, Python has sped past these specialized proprietary tools.
Open Source Alternatives
But what about other open-source languages? For those languages that are newer or more specialized, like Julia or R, their benefits seem to be outweighed by Python’s massive open source ecosystem. People use Python for a wide variety of purposes—web development, data science, automation, and system administration to name a few. In contrast, Julia and R are more narrowly focused and so naturally have fewer contributors.
Working with a widely-used language means that you are less likely to get stuck with weird bugs, since you can be sure that someone else has been down the path you are on. With more obscure languages you might be stuck with a bug nobody has seen before. The other benefit of scale means that you to get access to cutting-edge tools earlier. In case you’ve heard the hype around Apache Spark or Kafka, these are examples of tools that were integrated with Python years before they were integrated with languages the size of R or Julia. The advantage of Python’s scale is, broadly speaking, that others will have done much of your work for you.
Finally, compared with the other large, general-purpose languages like Java or C#, Python is much more pleasant to work with. Python’s short, readable syntax might seem like a superficial feature, but at its heart it reflects a decision to value the productivity of the programmer. This value proposition might seem obvious, but it has not been a focus for programming languages until recently. Making code that is easy for humans to read comes at the cost of being slower for computers to run. Older programming languages, designed when computer time was much more expensive and scarce, took the opposite side of the trade-off and forced humans to write more machine-friendly code. As computers become exponentially more powerful and cheap, the foresight of Python’s focus on developer productivity becomes ever clearer.
In the broad landscape of data tools, Python occupies quite a sweet spot. Although not as easy to pick up as a spreadsheet, it has concepts and infrastructure that make it far less limiting and frustrating in the long run. Compared with proprietary or narrowly-focused languages, it is improving far faster due to a vibrant open-source community. It is also a joy to work with, having been designed with the developer in mind. This unique combination of benefits has been pushing Python up the charts in popularity, and make it a great skill to invest in.
Learn Python in Hands-On Classes & Bootcamps
Learn this powerful programming language in Python classroom training in NYC, including our 3-hour introductory Python workshop and 1-week Python for Data Science Bootcamp. Work on real-world projects and learn from the top Python programmers.
High school students can attend the Python Summer Camp at our affiliate, NextGen Bootcamp, in New York or New Jersey.