pandas dataframe:从整个数据帧的所有单元格值中添加和删除前缀/后缀

mur*_*310 8 python string dataframe pandas suffix

要为数据帧添加前缀/后缀,我通常会执行以下操作.

例如,要添加后缀'@',

df = df.astype(str) + '@'
Run Code Online (Sandbox Code Playgroud)

这基本上附加了'@'所有单元格值.

我想知道如何删除此后缀.是否有一个pandas.DataFrame类直接从整个DataFrame中删除特定前缀/后缀字符的方法?

我尝试迭代行(作为系列),同时使用rstrip('@')如下:

for index in range(df.shape[0]):
    row = df.iloc[index]
    row = row.str.rstrip('@')
Run Code Online (Sandbox Code Playgroud)

现在,为了使这个系列的数据帧,

new_df = pd.DataFrame(columns=list(df))
new_df = new_df.append(row)
Run Code Online (Sandbox Code Playgroud)

但是,这不起作用.提供空数据框.

有什么东西真的很基本我错过了吗?

Ale*_*exG 6

您可以使用 applymap 将字符串方法应用于每个元素:

df = df.applymap(lambda x: str(x).rstrip('@'))
Run Code Online (Sandbox Code Playgroud)

注意:我不希望这与矢量化方法一样快:pd.Series.str.rstrip即分别转换每一列


jua*_*aga 5

您可以使用applystr.strippd.Series 的方法:

In [13]: df
Out[13]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the

In [14]: df = df + "@"

In [15]: df
Out[15]:
        a        b       c
0    dog@   quick@    the@
1   lazy@    lazy@    fox@
2  brown@   quick@    dog@
3  quick@     the@   over@
4  brown@    over@   lazy@
5    fox@   brown@  quick@
6  quick@     fox@    the@
7    dog@  jumped@    the@
8   lazy@   brown@    the@
9    dog@    lazy@    the@

In [16]: df = df.apply(lambda S:S.str.strip('@'))

In [17]: df
Out[17]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the
Run Code Online (Sandbox Code Playgroud)

请注意,您的方法不起作用,因为当您在for循环中执行以下分配时:

row = row.str.rstrip('@')
Run Code Online (Sandbox Code Playgroud)

这只是将结果分配给row.str.strip名称row而不改变DataFrame.这与所有python对象和简单名称赋值的行为相同:

In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]]

In [19]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [20]: for row in rows:
    ...:     row = ['look','at','me']
    ...:

In [21]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Run Code Online (Sandbox Code Playgroud)

要实际更改基础数据结构,您需要使用mutator方法:

In [22]: rows
Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [23]: for row in rows:
    ...:     row.append("LOOKATME")
    ...:

In [24]: rows
Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']]
Run Code Online (Sandbox Code Playgroud)

请注意,切片赋值只是mutator方法的语法糖:

In [26]: rows
Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [27]: for row in rows:
    ...:     row[:] = ['look','at','me']
    ...:
    ...:

In [28]: rows
Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']]
Run Code Online (Sandbox Code Playgroud)

这类似于pandas lociloc基于分配.