A T*_*A T 5 statistics regression probability non-linear-regression hypothesis-test
什么是之间的相互作用perception,outcome以及outlook?
我将它们带入分类变量以[潜在地]简化事情。
import pandas as pd
import numpy as np
high, size = 100, 20
df = pd.DataFrame({'perception': np.random.randint(0, high, size),
'age': np.random.randint(0, high, size),
'smokes_cat': pd.Categorical(np.tile(['lots', 'little', 'not'],
size//3+1)[:size]),
'outcome': np.random.randint(0, high, size),
'outlook_cat': pd.Categorical(np.tile(['positive', 'neutral',
'negative'],
size//3+1)[:size])
})
df.insert(2, 'age_cat', pd.Categorical(pd.cut(df.age, range(0, high+5, size//2),
right=False, labels=[
"{0} - {1}".format(i, i + 9)
for i in range(0, high, size//2)])))
def tierify(i):
if i <= 25:
return 'lowest'
elif i <= 50:
return 'low'
elif i <= 75:
return 'med'
return 'high'
df.insert(1, 'perception_cat', df['perception'].map(tierify))
df.insert(6, 'outcome_cat', df['outcome'].map(tierify))
np.random.shuffle(df['smokes_cat'])
Run Code Online (Sandbox Code Playgroud)
在线运行:http : //ideone.com/fftuSv或https://repl.it/repls/MicroLeftSequences
这是伪造的数据,但应能说明问题。个人有感知的观点perception,然后呈现出实际的观点outcome,并由此可以决定他们的观点outlook。
使用Python(熊猫,或者实际上是任何开放源代码),如何显示这 3个从属列之间的交互作用的概率和p值(可能使用,作为潜在的混杂因素)?agesmokes_cat
一种选择是多项 logit 模型:
# Create one-hot encoded version of categorical variables
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
all_enc_df = pd.DataFrame({column: enc.fit_transform(df[column])
for column in ('perception_cat', 'age_cat',
'smokes_cat', 'outlook_cat')})
# Regression
from sklearn.linear_model import LogisticRegression
X, y = (all_enc_df[['age_cat', 'smokes_cat', 'outlook_cat']],
all_enc_df[['perception_cat']])
#clf = LogisticRegression(random_state=0, solver='lbfgs',
# multi_class='multinomial').fit(X, y)
import statsmodels.api as sm
mullogit = sm.MNLogit(y,X)
mulfit = mullogit.fit(method='bfgs', maxiter=100)
print(mulfit.summary())
Run Code Online (Sandbox Code Playgroud)
https://repl.it/repls/MicroLeftSequences