每当列更改其字符串值时,如何在数据框中标记行?
例如:
输入
ColumnA ColumnB
1 Blue
2 Blue
3 Red
4 Red
5 Yellow
# diff won't work here with strings.... only works in numerical values
dataframe['changed'] = dataframe['ColumnB'].diff()
ColumnA ColumnB changed
1 Blue 0
2 Blue 0
3 Red 1
4 Red 0
5 Yellow 1
Run Code Online (Sandbox Code Playgroud)
roo*_*oot 21
我得到了更好的性能,ne而不是使用实际的!=比较:
df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
Run Code Online (Sandbox Code Playgroud)
计时
使用以下设置生成更大的数据帧:
df = pd.concat([df]*10**5, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
我得到以下时间:
%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
10 loops, best of 3: 38.1 ms per loop
%timeit (df.ColumnB != df.ColumnB.shift()).astype(int)
10 loops, best of 3: 77.7 ms per loop
%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB'])
10 loops, best of 3: 99.6 ms per loop
%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
10 loops, best of 3: 19.3 ms per loop
Run Code Online (Sandbox Code Playgroud)
使用.shift和比较:
dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])
Run Code Online (Sandbox Code Playgroud)
对我来说工作比较shift,然后NaN被替换,0因为之前没有价值:
df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int)
df.ix[0,'diff'] = 0
print (df)
ColumnA ColumnB diff
0 1 Blue 0
1 2 Blue 0
2 3 Red 1
3 4 Red 0
4 5 Yellow 1
Run Code Online (Sandbox Code Playgroud)
按照另一个答案的时间进行编辑- 最快使用ne:
df['diff'] = (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
df.ix[0,'diff'] = 0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6597 次 |
| 最近记录: |