Understand the fundamentals of linear regression and how it's applied for continuous data classification. Gain clarity on differentiating between discrete and continuous classification scenarios.
Key Insights
- Linear regression is a method used for continuous classification scenarios, where data can take on numerous possible numeric values, such as prices or weights, rather than discrete categories like animal types or car cylinder counts.
- A linear regression model works by plotting a line or multiple slopes (in polynomial regression cases) to find an equation that best predicts numeric outcomes based on given data points.
- Creating a linear regression model involves minimal coding effort using libraries such as scikit-learn, yet understanding the underlying concepts requires thorough learning and comprehension.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
We're going to use a linear regression model, and I want to introduce exactly why we're doing that. We are classifying the data, giving a label or a value to some data. But there are two different kinds of classification, discrete and continuous classes.
Now, discrete is where there's a small number of possible classes. If you're like, is this dog or a cat? Is this a mammal, reptile, fish, bird, or amphibian? Should this car have four, six, or eight cylinders? Was this car sold or unsold? Only a couple of possible options. On the other hand, continuous classification means there's a lot of possible answers, and that's what we're looking at here.
Things like prices, or distances, or weights. These are numeric values that could be anything. It could cost $39,000.
It could cost $39,000 in one. It could cost $39,000 in two. And then we could dive into cents, right? It gets as granular as you want, and there aren't a finite number of numbers.
So, unlike, you know, hey, is this a dog? Is this a cat? We are asking it to predict any possible value. So that makes it a continuous classification, not a discrete. And that's what linear regression is for.
Linear regression, again, it's going to plot a line, just like we did with attendance and concessions, and try to make predictions based on that. Okay, and we're going to eventually find this, or possibly more slopes, as we did with the curvy line, right, with the polynomial regression, right? It's going to find an equation that best predicts this value. That's what a linear regression is.
It's a line, and that's why we're doing a linear regression. Now, a funny thing with linear regressions is it's really pretty easy to create and then train the model. There's not much code involved.
There's a lot of work. There's a lot of concepts to understand, but it's not a lot of actual code. So, let's start with instantiating a linear regression model, meaning create it.
And the way we do that is we just say model equals linear regression, call to the function, and that's straight from scikit-learn. If we run that, great, we have a model. Now, we haven't trained it on our data next.
That'll come next.