小编R_Q*_*ery的帖子

PatsyError：数据参数和列之间的行数不匹配（statsmodels）

我正在使用带有 Patsy 包的 R 风格公式来处理 statsmodels，并收到一个我无法理解的错误，任何提示或技巧将不胜感激。

PatsyError：数据参数和 C('Industry_Banking&CapitalMarkets') 之间的行数不匹配（8137 与 1）

DataFrame 有 8137 行并且没有丢失数据

完整代码如下

mixed = smf.mixedlm("""count_SoldServiceName ~ date_int + AzureActiveEngagementCount + AzureEngagementPartnerCount 
                     + DCount_learning_path_name + Industry_Automotive + C('Industry_Banking&CapitalMarkets') + C('Industry_Chemicals&Agrochemicals') + Industry_CivilianGovernment
                     + Industry_ConsumerGoods + C('Industry_Defense&Intelligence') + Industry_DiscreteManufacturing + Industry_Energy + Industry_Gaming 
                     + Industry_HealthPayor + Industry_HealthProvider + Industry_HigherEducation + Industry_Insurance + C('Industry_Media&Entertainment') + Industry_Nonprofit 
                     + Industry_PartnerProfessionalServices + Industry_Pharmaceuticals + C('Industry_Primary&SecondaryEdu/K-12') + Industry_ProfessionalServices 
                     + C('Industry_PublicSafety&Justice') + Industry_Retailers + Industry_SmartSpaces + Industry_Telecommunications +  C('Industry_Travel,Transport&Hospitality') 
                     + Industry_other + InvestmentArea_AA + …

Run Code Online (Sandbox Code Playgroud)

pandas statsmodels patsy

R_Q*_*ery

lucky-day

8
推荐指数

1
解决办法

6914
查看次数

更有效的方法来表示在熊猫数据框中将列子集居中并保留列名

我有一个大约有 370 列的数据框。我正在测试一系列假设，这些假设要求我使用模型的子集来拟合三次回归模型。我计划使用 statsmodels 对这些数据进行建模。

多项式回归过程的一部分涉及均值中心变量（从特定特征的每个案例中减去均值）。

我可以用 3 行代码做到这一点，但它似乎效率低下，因为我需要为六个假设复制这个过程。请记住，我需要从 statsmodel 输出中获取系数级别的数据，因此我需要保留列名。

这是数据的一瞥。这是我的假设检验之一所需的列子集。

      i  we  you  shehe  they  ipron
0  0.51   0    0   0.26  0.00   1.02
1  1.24   0    0   0.00  0.00   1.66
2  0.00   0    0   0.00  0.72   1.45
3  0.00   0    0   0.00  0.00   0.53

Run Code Online (Sandbox Code Playgroud)

这是表示居中并保留列名称的代码。

from sklearn import preprocessing
#create df of features for hypothesis, from full dataframe
h2 = df[['i', 'we', 'you', 'shehe', 'they', 'ipron']]

#center the variables
x_centered = preprocessing.scale(h2, with_mean='True', with_std='False')

#convert back into a …

Run Code Online (Sandbox Code Playgroud)

python machine-learning pandas scikit-learn statsmodels

R_Q*_*ery

2016 06-21

4
推荐指数

1
解决办法

1万
查看次数