我有这样的数据帧:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9
200100001 23 1 2 4 4 1 5 5 5
200100002 21 1 12 3 1 55 7 7
200100003 12 3 3 6 3
200100004 4
200100005 6 5 3 9 3 5 6
200100005 23 4 4 2 4 3 6 5
Run Code Online (Sandbox Code Playgroud)
我想知道每个人的出行次数,所以我想创建一个新列,这样新表可能看起来像这样:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9 Chains
200100001 23 1 2 4 4 1 5 5 5 9
200100002 21 1 12 3 1 55 7 7 8
200100003 12 3 3 6 3 5
200100004 4 1
200100005 6 5 3 9 3 5 6 7
200100005 23 4 4 2 4 3 6 5 8
Run Code Online (Sandbox Code Playgroud)
有没有可能的解决方案?如果有人可以提供帮助,我将非常感激!提前致谢!
使用iloc
和count
,NaN
默认忽略:
df.iloc[:, 1:].count(1)
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
Run Code Online (Sandbox Code Playgroud)
如果值不是 NaN
,只需将空字符串替换为NaN
:
df.iloc[:, 1:].replace('', np.nan).count(1)
Run Code Online (Sandbox Code Playgroud)
运用
df.ne('').sum(1)-1
Out[287]:
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
Run Code Online (Sandbox Code Playgroud)
如果是NaN使用 info
df.iloc[:,1:].T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, Trip1 to Trip9
Data columns (total 6 columns):
0 9 non-null float64
1 8 non-null float64
2 5 non-null float64
3 1 non-null float64
4 7 non-null float64
5 8 non-null float64
dtypes: float64(6)
memory usage: 504.0+ bytes
Run Code Online (Sandbox Code Playgroud)
将所有空白值替换为NaN
,然后notnull
使用以下方法按行对值进行计数sum(1)
:
df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)
>>> df
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 \
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0
Trip9 Chains
0 5.0 9
1 NaN 8
2 NaN 5
3 NaN 1
4 NaN 7
5 NaN 8
Run Code Online (Sandbox Code Playgroud)