Linear Regression for R_t (difference) for Indian States
Linear Regression for R_t (difference) for Indian States
- Load india_state_data worksheet
- MinMax Normalization of all columns
- Add Interaction Term
- Misc Analysis
We run a linear regression for the difference in R_t for 15 states in India over a period from April 2nd and May 9th. The endogenous variable is the difference in Rt. The exogenous variables are difference in Covid19 testing rates, the Human Development Index for the state, the Governance Index for the state and the per Capita Health Care Expenditure in the state.
#!pip install statsmodels
import pandas as pd
import statsmodels.api as sm
ind_data = pd.read_csv('india_state_data.csv')
ind_data = ind_data.rename(columns={'diff': 'rt_diff', 'testing2ndapr':'test1', 'test 9may': 'test2'})
ind_data['test_diff'] = ind_data['test2'] - ind_data['test1']
ind_data.head()
ind_data2 = ind_data[['rt_diff', 'test_diff', 'hdi', 'governance', 'pche']].copy()
ind_data2.columns
ind_data2.head()
for c in ind_data2.columns.values:
mmin = ind_data2[c].min()
mmax = ind_data2[c].max()
ind_data2[c] = (ind_data2[c] - mmin) / (mmax-mmin)
y = ind_data2.rt_diff.values
X = ind_data2[['test_diff', 'hdi', 'governance', 'pche']]
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
ind_data2['xterm'] = ind_data2['test_diff']*ind_data2['hdi']
X = ind_data2[['test_diff', 'hdi', 'governance', 'pche', 'xterm']]
model2 = sm.OLS(y, X)
results2 = model2.fit()
print(results2.summary())
sm.stats.linear_rainbow(results)
sm.graphics.plot_partregress('rt_diff', 'test_diff', ['hdi', 'governance', 'pche'], data=ind_data2, obs_labels=False)
model = sm.OLS(y, X)
results = model.fit_regularized(L1_wt=0.0)
print(results.params)