Visualizing Normal Distribution with NumPy

Demonstrate generating and graphing a normal distribution using NumPy and matplotlib.

Understand how NumPy's random number generation creates normal distributions and visualize how increasing sample size refines the bell curve. Learn to adjust parameters like mean, standard deviation, and bins to generate clearer statistical insights.

Key Insights

  • NumPy's np.random.normal method generates random numbers following a normal distribution by specifying mean, standard deviation, and sample size, illustrated by creating a dataset centered around a mean of 100 with a standard deviation of 15.
  • Increasing the sample size from 1,000 to 250,000 significantly smooths out the bell-shaped curve, demonstrating that larger datasets yield clearer and more accurate visual representations of normal distributions.
  • Adjusting histogram bins from 20 to 100 further refines the visualization, highlighting the importance of selecting appropriate granularity to effectively interpret statistical distributions.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

If we take a look at a normal distribution now, again, like the bell curve, we can use a NumPy method to get a normal distribution of random numbers. So let's do scores, 100, let's call it that. Actually, let's make 1,000,1,000 scores.

And we'll say np.random.normal this time. And as opposed to the uniform distribution we made before. We say what the numbers clustered around, what's the mean? And we say, give me a standard deviation of 15, please.

I mean, maybe we're not polite, but I like to be polite. And then we say how many of those we want. So it's not a range like before, it's standard deviation.

So 68% of them will be within 15 of this. So 68% of them will be between 85 and 150. Okay, now if we just printed out a sample of it, say the first 20, there they are.

You can see they're all clustered around 100, roughly. But there are some outliers, right? Here's an outlier right there. Not too much of an outlier.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Again, it's one standard deviation away. And same if we look at the last 20, right? We have some that are, most about 2 3rds are within one standard deviation. Okay, now let's go and graph this.

And I think you'll see something interesting starting to emerge. Our X is our 1,000 scores. And our bins, let's do 20 bins.

Oh, and pyplot, show the graph. And here they are. And you can see it's a bell curve.

It's a little off because our sample size is so small that it's not quite matching. Like it seems like we got nothing under 60, but more outside the distribution over on this side. However, the more we do, the more we're going to, and the more granular we're being with our bins, the more we're going to see the curve smooth out.

So let's make the same thing, but with 250 K, 250,000 of them. We'll say, I still want 100, it means to be 100 and standard deviation to be 15, but give me 250,000 of them. And then let's make a histogram where X is those scores.

And our bins is, let's keep it at 20. Now, maybe let's increase the bins to get a little more granular. We'll do bins of 100.

Yeah, I think that'll look much better. And what we're getting is a much, much, much more standard bell curve, but maybe a little weirdness still up there. It's a little jagged still, it's not fully smooth, but it's much smoother.

And you can see that the greater the sample, the more things even out over time, even with randomness.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram