Understand the fundamentals of linear regression and how it's applied for continuous data classification. Gain clarity on differentiating between discrete and continuous classification scenarios.
Key Insights
- Linear regression is a method used for continuous classification scenarios, where data can take on numerous possible numeric values, such as prices or weights, rather than discrete categories like animal types or car cylinder counts.
- A linear regression model works by plotting a line or multiple slopes (in polynomial regression cases) to find an equation that best predicts numeric outcomes based on given data points.
- Creating a linear regression model involves minimal coding effort using libraries such as scikit-learn, yet understanding the underlying concepts requires thorough learning and comprehension.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
We're going to use a linear regression model, and I want to introduce exactly why we're doing that. We are assigning a label or value to data. But there are two different kinds of classification, discrete classes and continuous classes.
Now, discrete classes are where there's a small number of possible classes. If you're like, is this a dog or a cat? Is this a mammal, reptile, fish, bird, or amphibian? Should this car have four, six, or eight cylinders? Was this car sold or unsold? There are only a couple of possible options. On the other hand, continuous classification means there's a lot of possible answers, and that's what we're looking at here.
Things like prices, or distances, or weights. These are numeric values that could be anything. It could cost $39,000.
It could cost $39,001. It could cost $39,002. And then we could dive into cents, right? It gets as granular as you want, and there aren't a finite number of numbers.
So, unlike, you know, hey, is this a dog? Is this a cat? We are asking it to predict any possible value. So that makes it a continuous classification, not a discrete classification. And that's what linear regression is for.
Linear regression, again, is going to plot a line, just like we did with attendance and concessions, and try to make predictions based on that. Okay, and we're going to eventually find this, or possibly more slopes, like we did with polynomial regression. It's going to find an equation that best predicts this value. That's what a linear regression is.
It's a line, and that's why we're doing a linear regression. Now, a funny thing with linear regression is that it's really easy to create and then train the model. There's not much code involved.
There's a lot of work. There's a lot of concepts to understand, but it's not a lot of actual code. So, let's start with instantiating a linear regression model, meaning creating it.
And the way we do that is we just say model equals linear regression, calling the function, and that's straight from scikit-learn. If we run that, great, we have a model. Now, we haven't trained it on our data yet.
That'll come next.