修改“熊猫系列”列中的所有值

Question

修改“熊猫系列”列中的所有值

我有一列包含9个字符的数字。我需要对该列中的所有值执行一些操作以达到12的长度。这是原始数据：

493    123456789
494    123456789
496    115098765
497    123456789
498    987654321
499    987654321

Run Code Online (Sandbox Code Playgroud)

现在，我需要对数字进行一些修改：

在第一个1之后，需要插入20
在最后5个数字之前，需要插入0

理想的解决方案是：

493    120234056789
494    120234056789
496    120150098765
497    120234056789
498    920876054321
499    920876054321

Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点？提前致谢。

Answer 1

jez*_*ael 6

将索引与str一起用于切片值：

s = df['col'].astype(str)
df['new'] = s.str[0] + '20' + s.str[1:-5] + '0' + s.str[-5:]
print (df)
           col           new
493  123456789  120234056789
494  123456789  120234056789
496  115098765  120150098765
497  123456789  120234056789
498  987654321  920876054321
499  987654321  920876054321

Run Code Online (Sandbox Code Playgroud)

类似的解决方案apply：

df['new'] = df['col'].astype(str).apply(lambda x:x[0] + '20' + x[1:-5] + '0' + x[-5:])

Run Code Online (Sandbox Code Playgroud)

@Mark Wang的表现：

#6k rows   
df = pd.concat([df] * 1000, ignore_index=True)

In [241]: %%timeit
     ...: s = df['col'].astype(str)
     ...: df['new1'] = s.str[0] + '20' + s.str[1:-5] + '0' + s.str[-5:]
     ...: 
19.5 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [242]: %%timeit 
     ...: df['new2'] = df['col'].astype(str).apply(lambda x:x[0] + '20' + x[1:-5] + '0' + x[-5:])
     ...: 
11.4 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Run Code Online (Sandbox Code Playgroud)

第二个更快，因为熊猫文本功能较慢。原因之一是，它们正确处理缺失值。

归档时间：	6 年，6 月前
查看次数：	54 次
最近记录：	6 年，6 月前