Data Science Capstone Projects

Machine Learning Capstone Project Description:

Choose your own structured dataset (e.g., housing prices, customer churn, or loan default) to build a machine learning pipeline from scratch, including data cleaning, feature engineering, model selection, and performance evaluation. Put together a presentation highlighting your process, tools, and insights. 

Deliverables:

  1. Select and Explore a Structured Dataset
    • Choose a publicly available dataset (e.g., from Kaggle or UCI) relevant to a classification or regression problem; perform initial exploration to understand data structure and context.
  2. Clean and Engineer Features
    • Handle missing values, encode categorical data, and create meaningful new features that may improve model performance.
  3. Train and Evaluate Machine Learning Models
    • Apply at least one appropriate model (e.g., logistic regression, decision tree, random forest), perform data splitting, and evaluate performance using metrics such as accuracy or RMSE.
  4. Visualize Patterns and Results
    • Create clear visualizations (e.g., correlation heatmaps, prediction vs actual plots) that illustrate relationships in the data and support your findings.
  5. Presentation
    • A final presentation explaining your problem statement, approach, tools used (e.g., pandas, scikit-learn, matplotlib), patterns discovered, model results, and key takeaways.

Python for AI Capstone Project Description (Choose One of Two):

Capstone Project I:

Build an AI chat assistant for a live website that helps users answer questions about the product offering and services offered. 

Deliverables:

  1. Embedded AI Chat Assistant on a Web Page
    • Create a chat interface (embedded or pop-up) that users can interact with on a webpage. 
    • Messages should be sent by JavaSCript fetch() to the Flask server
    • Flask should send entire conversation to OpenAI API
    • AI response / answer should be returned by Flask to JS fetch()
    • JS then() then() should parse incoming AI json and render newest chat message to the chat window
  2. Custom Flask Backend with OpenAI Integration
    • Set up a Flask server that handles POST requests from the frontend
    • Sends them to the OpenAI API, and returns appropriate answers based on injected business context.
  3. Business-Aware Prompt Engineering
    • Include company/product/service info in the system prompt so that the AI assistant responds knowledgeably. 
    • Tailor prompt structure to support Q&A, with fallback messaging if a question is out of scope.
  4. Session Memory or Context Handling 
    • Engineer the prompt to include the relevant website information to improve the answers
    • Store chat history temporarily on the client side or backend to maintain conversational context. 
  5. Slide Deck or Demo Walkthrough
    • Prepare a short presentation that explains the assistant’s purpose, target users, tech stack used, prompt structure, limitations, and ideas for expanding it (e.g., connecting to a real knowledge base or support system).

Capstone Project II:

Create a web app that allows users to upload images of personal collections — such as vintage books, vinyl records, rare sneakers, collectible cards, or antiques — and uses AI to identify the item, generate descriptive metadata, and log it in a searchable session history.

Deliverables:

  1. Interactive Web App for Uploading and Identifying Items
    • Users can upload an image of a collectible item; the app previews it and provides an AI-generated identification and description (e.g., title, era, material, condition estimate).
  2. OpenAI Integration with Custom Prompting
    • A backend Flask server routes image data to the OpenAI API with a structured prompt designed for collection analysis and identification.
  3. Dynamic Frontend with Session-Based Logging
    • JavaScript displays AI results and stores a session history of analyzed items, allowing users to review or delete past entries without persistent database storage.
  4. Pattern Recognition and Insights
    • The app identifies patterns across uploaded items (e.g., most common era, condition trends, types of items), which are summarized visually or as bullet-point findings.
  5. Presentation
    • A final presentation or slide deck explaining the problem being solved, how OpenAI and Flask were used, user flow demo, and key findings, plus reflection on future improvements (e.g., adding persistent storage or export feature).

Python Data Visualization Capstone Project Description:

Analyze global CO₂ emissions alongside GDP and population data. You’ll clean, explore, and visualize the data, then build an interactive dashboard in Dash. Your final presentation should highlight key patterns, tools used, and insights discovered.

Deliverables:

  1. Dataset Used: The student will use a publicly available dataset on global CO₂ emissions by country and year, along with GDP and population data for context. (e.g., Our World in Data)
  2. Exploratory Data Analysis (EDA): Clean and preprocess the data to handle missing values and inconsistent formats. Use correlation analysis and group-by techniques to understand trends and relationships between emissions, GDP, and population over time.
  3. Visualizations: Create insightful charts such as time series plots of emissions by continent, correlation heatmaps between GDP and emissions, and a bar chart of top polluting countries. Use matplotlib, seaborn, and plotly for varied visual appeal.
  4. Dashboard Implementation: Build a responsive, interactive dashboard using Dash, where users can filter data by region, select time ranges, and visualize the top emitters or GDP/emission ratios over time.
  5. Findings and Patterns: Present observations such as which countries have decoupled GDP growth from emissions, regional emission trends, or anomalies. Emphasize tools used (Pandas, Plotly, Dash, correlation analysis, apply() and lambda functions, etc.) and how they were used to derive insights.

Yelp Facebook LinkedIn YouTube Twitter Instagram