Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randomization inference p-values #26

Open
alexanderthclark opened this issue Mar 17, 2021 · 1 comment
Open

randomization inference p-values #26

alexanderthclark opened this issue Mar 17, 2021 · 1 comment

Comments

@alexanderthclark
Copy link
Contributor

I'm getting a different p-value than is calculated in ri.py or commented in ri.do. I only know python (excited by the new python code added!) so I'll reference the py file.

I believe the issue is with the line p_value = p_value[p_value['permutation'] == 1] which doesn't calculate a p-value based on a weak nor strict inequality. Prior to that line, signed t-statistics are ordered and ranked. There are several observations with the same t-stat of 1. So, if we wanted the p-value calculation to use a weak inequality (find the share of observations with a weakly higher ATE), the following minimal edits would do the job.

p_value = p_value[p_value['ate'] == 1] 
p_value['rank'].max() / n

This gives 0.4285.

The simplest code I can think to do the same thing is the following, though it's not the smartest because it relies on permutations instead of combinations.

from itertools import permutations
import pandas as pd
import numpy as np

url = 'https://github.com/scunning1975/mixtape/raw/master/ri.dta'
df = pd.read_stata(url, index_col = 'name')
observed_t_stat = 1
y_vec = df.y.values 

# create vector of treatment assignments
# use -1 instead of 0 for dot product assist
d = np.concatenate( [np.ones(4), (-1)*np.ones(4)] ) 

t_stats = np.array([])
for d_vec in permutations(d):    
    t = np.dot(y_vec, d_vec) / 4 # signed t-stat
    t_stats = np.append(t_stats, t)

p_value = (t_stats >= observed_t_stat).mean()

I'm making this an issue, because I want to check my own understanding (I'm self-studying) and I think there's also the issue of whether or not the code should be using absolute values for the t-statistics to match the book.

@scunning1975
Copy link
Owner

I never saw this, and I apologize for not responding. I didn't write the python code, so I need to look into this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants