As part of the 2023 Data Science Conference (DSCO 23), AWS partnered with the Data Institute at the University of San Francisco (USF) to conduct a datathon. Participants, both high school and undergraduate students, competed on a data science project that focused on air quality and sustainability. The Data Institute at the USF aims to support cross-disciplinary research and education in the field of data science. The Data Institute and the Data Science Conference provide a distinctive fusion of cutting-edge academic research and the entrepreneurial culture of the technology industry in the San Francisco Bay Area.
The students used Amazon SageMaker Studio Lab, which is a free platform that provides a JupyterLab environment with compute (CPU and GPU) and storage (up to 15GB). Because most of the students were unfamiliar with machine learning (ML), they were given a brief tutorial illustrating how to set up an ML pipeline: how to conduct exploratory data analysis, feature engineering, model building, and model evaluation, and how to set up inference and monitoring. The tutorial referenced Amazon Sustainability Data Initiative (ASDI) datasets from the National Oceanic and Atmospheric Administration (NOAA) and OpenAQ to build an ML model to predict air quality levels using weather data via a binary classification AutoGluon model. Next, the students were turned loose to work on their own projects in their teams. The winning teams were led by Peter Ma, Ben Welner, and Ei Coltin, who were all awarded prizes at the opening ceremony of the Data Science Conference at USF.
Response from the event
“This was a fun event, and a great way to work with others. I learned some Python coding in class but this helped make it real. During the datathon, my team member and I conducted research on different ML models (LightGBM, logistic regression, SVM models, Random Forest Classifier, etc.) and their performance on an AQI dataset from NOAA aimed at detecting the toxicity of the atmosphere under specific weather conditions. We built a gradient boosting classifier to predict air quality from weather statistics.”
– Anay Pant, a junior at the Athenian School, Danville, California, and one of the winners of the datathon.
“AI is becoming increasingly important in the workplace, and 82% of companies need employees with machine learning skills. It’s critical that we develop the talent needed to build products and experiences that we will all benefit from, this includes software engineering, data science, domain knowledge, and more. We were thrilled to help the next generation of builders explore machine learning and experiment with its capabilities. Our hope is that they take this forward and expand their ML knowledge. I personally hope to one day use an app built by one of the students at this datathon!”
– Sherry Marcus, Director of AWS ML Solutions Lab.
“This is the first year we used SageMaker Studio Lab. We were pleased by how quickly high school/undergraduate students and our graduate student mentors could start their projects and collaborate using SageMaker Studio.”
– Diane Woodbridge from the Data Institute of the University of San Francisco.
Get started with Studio Lab
If you missed this datathon, you can still register for your own Studio Lab account and work on your own project. If you’re interested in running your own hackathon, reach out to your AWS representative for a Studio Lab referral code, which will give your participants immediate access to the service. Finally, you can look for next year’s challenge at the USF Data Institute.
About the Authors
Neha Narwal is a Machine Learning Engineer at AWS Bedrock where she contributes to development of large language models for generative AI applications. Her focus lies at the intersection of science and engineering to influence research in Natural Language Processing domain.
Vidya Sagar Ravipati is a Applied Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.