统计测试:(感知;实际结果;下一个)如何相互作用?

A T*_*A T 5 statistics regression probability non-linear-regression hypothesis-test

什么是之间的相互作用perceptionoutcome以及outlook

我将它们带入分类变量以[潜在地]简化事情。

import pandas as pd
import numpy as np

high, size = 100, 20
df = pd.DataFrame({'perception': np.random.randint(0, high, size),
                   'age': np.random.randint(0, high, size),
                   'smokes_cat': pd.Categorical(np.tile(['lots', 'little', 'not'],
                                                        size//3+1)[:size]),
                   'outcome': np.random.randint(0, high, size),
                   'outlook_cat': pd.Categorical(np.tile(['positive', 'neutral',
                                                          'negative'],
                                                          size//3+1)[:size])
                  })
df.insert(2, 'age_cat', pd.Categorical(pd.cut(df.age, range(0, high+5, size//2),
                                              right=False, labels=[
                                               "{0} - {1}".format(i, i + 9)
                                               for i in range(0, high, size//2)])))

def tierify(i):
    if i <= 25:
        return 'lowest'
    elif i <= 50:
        return 'low'
    elif i <= 75:
        return 'med'
    return 'high'

df.insert(1, 'perception_cat', df['perception'].map(tierify))
df.insert(6, 'outcome_cat', df['outcome'].map(tierify))

np.random.shuffle(df['smokes_cat'])
Run Code Online (Sandbox Code Playgroud)

在线运行:http : //ideone.com/fftuSvhttps://repl.it/repls/MicroLeftSequences


这是伪造的数据,但应能说明问题。个人有感知的观点perception,然后呈现出实际的观点outcome,并由此可以决定他们的观点outlook

使用Python(熊猫,或者实际上是任何开放源代码),如何显示 3个从属之间的交互作用的概率p值(可能使用,作为潜在的混杂因素)?agesmokes_cat

A T*_*A T 0

一种选择是多项 logit 模型

# Create one-hot encoded version of categorical variables
from sklearn.preprocessing import LabelEncoder
enc = LabelEncoder()
all_enc_df = pd.DataFrame({column: enc.fit_transform(df[column])
                           for column in ('perception_cat', 'age_cat',
                                          'smokes_cat', 'outlook_cat')})

# Regression
from sklearn.linear_model import LogisticRegression

X, y = (all_enc_df[['age_cat', 'smokes_cat', 'outlook_cat']],
        all_enc_df[['perception_cat']])

#clf = LogisticRegression(random_state=0, solver='lbfgs',
#                         multi_class='multinomial').fit(X, y)

import statsmodels.api as sm

mullogit = sm.MNLogit(y,X)
mulfit = mullogit.fit(method='bfgs', maxiter=100)

print(mulfit.summary())
Run Code Online (Sandbox Code Playgroud)

https://repl.it/repls/MicroLeftSequences