CS 109b Final Project
During my junior spring semester at Harvard, I took CS 109b: Advanced Topics in Data Science. This course focused on extending the concepts first covered in CS 109a. We further explored statistical modeling and prediction through nonlinear statistical models and deep learning. In the middle of the spring semester, the magnitude of the COVID-19 pandemic caused Harvard to send us off campus and classes to be continued virtually. The severity of the pandemic unfortunately presented an exact opportunity to apply skills we learned in CS 109a and 109b in order to track and forecast an epidemic outbreak in real time. Therefore, for our final project, our team of four was motivated to apply machine learning techniques and Internet-based data sources for real-time monitoring and short-term forecasting of population level disease activity.
The struggling response in the United States influenced our team's project. We aimed to use Google and Apple mobility data as a proxy for the level of social distancing within the U.S. in order to estimate the changing R0 (the expected number of people an infected person infects) in an SIR model. We then exhibit three social distancing scenarios: staying in lockdown, slowly lifting lockdown, and immediately lifting lockdown. For each of the three social distancing measures, we will use the SIR model with R0 mobility proxy to forecast the corresponding spread of COVID-19 in the United States. We hope that these forecasts will show the importance and necessity of staying in a lockdown in order to flatten the curve of infection which will help to not overwhelm health care facilities.