Pandas"diff()"带字符串

gui*_*cgs 26 python pandas

每当列更改其字符串值时,如何在数据框中标记行?

例如:

输入

ColumnA   ColumnB
1            Blue
2            Blue
3            Red
4            Red
5            Yellow


#  diff won't work here with strings....  only works in numerical values
dataframe['changed'] = dataframe['ColumnB'].diff()        


ColumnA   ColumnB      changed
1            Blue         0
2            Blue         0
3            Red          1
4            Red          0
5            Yellow       1
Run Code Online (Sandbox Code Playgroud)

roo*_*oot 21

我得到了更好的性能,ne而不是使用实际的!=比较:

df['changed'] = df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
Run Code Online (Sandbox Code Playgroud)

计时

使用以下设置生成更大的数据帧:

df = pd.concat([df]*10**5, ignore_index=True) 
Run Code Online (Sandbox Code Playgroud)

我得到以下时间:

%timeit df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(int)
10 loops, best of 3: 38.1 ms per loop

%timeit (df.ColumnB != df.ColumnB.shift()).astype(int)
10 loops, best of 3: 77.7 ms per loop

%timeit df['ColumnB'] == df['ColumnB'].shift(1).fillna(df['ColumnB'])
10 loops, best of 3: 99.6 ms per loop

%timeit (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
10 loops, best of 3: 19.3 ms per loop
Run Code Online (Sandbox Code Playgroud)


Kar*_*tik 8

使用.shift和比较:

dataframe['changed'] = dataframe['ColumnB'] == dataframe['ColumnB'].shift(1).fillna(dataframe['ColumnB'])
Run Code Online (Sandbox Code Playgroud)


jez*_*ael 6

对我来说工作比较shift,然后NaN被替换,0因为之前没有价值:

df['diff'] = (df.ColumnB != df.ColumnB.shift()).astype(int)
df.ix[0,'diff'] = 0
print (df)
   ColumnA ColumnB  diff
0        1    Blue     0
1        2    Blue     0
2        3     Red     1
3        4     Red     0
4        5  Yellow     1
Run Code Online (Sandbox Code Playgroud)

按照另一个答案的时间进行编辑- 最快使用ne:

df['diff'] = (df.ColumnB.ne(df.ColumnB.shift())).astype(int)
df.ix[0,'diff'] = 0
Run Code Online (Sandbox Code Playgroud)

  • @Navroop-您认为`df [[''ColumnA','ColumnB']]。ne(df [['ColumnA','ColumnB']]。shift())。any(axis = 1).astype(int )`? (2认同)