Lum*_*mos 4 python numpy dataframe pandas valueerror
我的数据框有子类别,每个类别下(cat,dog,bird),统计信息呈现.我需要删除行,如果它们包含count和中的信息freq,并且只保留行sd和mean值.有些值是 NaN.
ValueError 发生在我的代码中.
DF:
var stats A B C
cat mean 2 3 4
NaN sd 2 1 3
NaN count 5 2 6
NaN freq 3 1 19
dog mean 8 1 2
NaN sd 2 1 3
NaN count 4 6 1
NaN freq 3 1 19
bird mean 2 3 4
NaN sd 2 1 3
NaN count 5 2 6
NaN freq NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
我的代码:
rows = ['count', 'freq']
df = [df.stats != rows]
Run Code Online (Sandbox Code Playgroud)
预期结果
var stats A B C
cat mean 2 3 4
NaN sd 2 1 3
dog mean 8 1 2
NaN sd 2 1 3
bird mean 2 3 4
NaN sd 2 1 3
Run Code Online (Sandbox Code Playgroud)
错误:
File "pandas/_libs/lib.pyx", line 805, in pandas._libs.lib.vec_compare
(pandas/_libs/lib.c:14288)
ValueError: Arrays were different lengths: 819 vs 9
Run Code Online (Sandbox Code Playgroud)
我不确定如何检查数组长度,但在我的Excel电子表格中,所有列和行都具有相同的长度.这个错误是由我的数据中的NaN /空单元格引起的吗?
谢谢!
!=不会在这里工作.使用pd.Series.isin以获得掩蔽,那么你就用它来过滤你的数据帧.
m = ~df.stats.isin(['count', 'freq'])
print(m)
0 True
1 True
2 False
3 False
4 True
5 True
6 False
7 False
8 True
9 True
10 False
11 False
Name: stats, dtype: bool
print(df[m])
var stats A B C
0 cat mean 2.0 3.0 4.0
1 NaN sd 2.0 1.0 3.0
4 dog mean 8.0 1.0 2.0
5 NaN sd 2.0 1.0 3.0
8 bird mean 2.0 3.0 4.0
9 NaN sd 2.0 1.0 3.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3780 次 |
| 最近记录: |