We run a linear regression for the difference in R_t for 15 states in India over a period from April 2nd and May 9th. The endogenous variable is the difference in Rt. The exogenous variables are difference in Covid19 testing rates, the Human Development Index for the state, the Governance Index for the state and the per Capita Health Care Expenditure in the state.

#!pip install statsmodels
import pandas as pd
import statsmodels.api as sm

Load india_state_data worksheet

ind_data = pd.read_csv('india_state_data.csv')
ind_data = ind_data.rename(columns={'diff': 'rt_diff', 'testing2ndapr':'test1', 'test 9may': 'test2'})
ind_data['test_diff'] = ind_data['test2'] - ind_data['test1']
ind_data.head()
States Rt2ndapril Rt9may rt_diff test1 test2 test diff log(test diff) hdi governance pche interaction test_diff
0 AP 2.18 0.98 1.20 21 3121 3100 8.039 0.650 5.05 1013 5.225349 3100
1 BR 1.72 1.10 0.62 25 286 261 5.565 0.576 4.40 491 3.205440 261
2 DL 2.05 1.12 0.93 143 4591 4448 8.400 0.746 5.62 1992 6.266400 4448
3 GJ 1.35 1.06 0.29 66 1716 1650 7.409 0.672 5.04 1189 4.978848 1650
4 HR 1.96 1.15 0.81 48 1945 1897 7.548 0.708 5.00 1119 5.343984 1897
ind_data2 = ind_data[['rt_diff', 'test_diff', 'hdi', 'governance', 'pche']].copy()
ind_data2.columns
Index(['rt_diff', 'test_diff', 'hdi', 'governance', 'pche'], dtype='object')
ind_data2.head()
rt_diff test_diff hdi governance pche
0 1.20 3100 0.650 5.05 1013
1 0.62 261 0.576 4.40 491
2 0.93 4448 0.746 5.62 1992
3 0.29 1650 0.672 5.04 1189
4 0.81 1897 0.708 5.00 1119

MinMax Normalization of all columns

for c in ind_data2.columns.values:
    mmin = ind_data2[c].min()
    mmax = ind_data2[c].max()
    ind_data2[c] = (ind_data2[c] - mmin) / (mmax-mmin)
y = ind_data2.rt_diff.values
X = ind_data2[['test_diff', 'hdi', 'governance', 'pche']]
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.732
Model:                            OLS   Adj. R-squared (uncentered):              0.634
Method:                 Least Squares   F-statistic:                              7.505
Date:                Fri, 11 Nov 2022   Prob (F-statistic):                     0.00361
Time:                        04:23:45   Log-Likelihood:                        -0.89814
No. Observations:                  15   AIC:                                      9.796
Df Residuals:                      11   BIC:                                      12.63
Df Model:                           4                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
test_diff      0.5534      0.555      0.998      0.340      -0.668       1.774
hdi           -0.4396      0.520     -0.846      0.416      -1.583       0.704
governance     0.7055      0.369      1.912      0.082      -0.107       1.518
pche          -0.0758      0.668     -0.114      0.912      -1.545       1.394
==============================================================================
Omnibus:                        0.302   Durbin-Watson:                   1.605
Prob(Omnibus):                  0.860   Jarque-Bera (JB):                0.434
Skew:                          -0.249   Prob(JB):                        0.805
Kurtosis:                       2.332   Cond. No.                         12.2
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/scipy/stats/stats.py:1604: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=15
  "anyway, n=%i" % int(n))

Add Interaction Term

ind_data2['xterm'] = ind_data2['test_diff']*ind_data2['hdi']
X = ind_data2[['test_diff', 'hdi', 'governance', 'pche', 'xterm']]
model2 = sm.OLS(y, X)
results2 = model2.fit()
print(results2.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.760
Model:                            OLS   Adj. R-squared (uncentered):              0.640
Method:                 Least Squares   F-statistic:                              6.340
Date:                Fri, 11 Nov 2022   Prob (F-statistic):                     0.00666
Time:                        04:23:45   Log-Likelihood:                       -0.060217
No. Observations:                  15   AIC:                                      10.12
Df Residuals:                      10   BIC:                                      13.66
Df Model:                           5                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
test_diff      1.2916      0.874      1.478      0.170      -0.656       3.239
hdi           -0.2276      0.551     -0.413      0.688      -1.455       1.000
governance     0.5859      0.382      1.533      0.156      -0.266       1.437
pche          -0.2638      0.684     -0.385      0.708      -1.789       1.261
xterm         -1.0424      0.959     -1.087      0.302      -3.179       1.094
==============================================================================
Omnibus:                        0.960   Durbin-Watson:                   1.801
Prob(Omnibus):                  0.619   Jarque-Bera (JB):                0.446
Skew:                          -0.415   Prob(JB):                        0.800
Kurtosis:                       2.848   Cond. No.                         17.8
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/scipy/stats/stats.py:1604: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=15
  "anyway, n=%i" % int(n))

Misc Analysis

sm.stats.linear_rainbow(results)
(1.56889693867147, 0.38823518172143395)
sm.graphics.plot_partregress('rt_diff', 'test_diff', ['hdi', 'governance', 'pche'], data=ind_data2, obs_labels=False)
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
model = sm.OLS(y, X)
results = model.fit_regularized(L1_wt=0.0)
print(results.params)
[ 1.29155843 -0.22758562  0.58588266 -0.2638172  -1.04244082]