如何删除数据框中的回车

Question

如何删除数据框中的回车

Sar*_*thy 5 python replace carriage-return pandas data-cleaning

我有一个包含名为id,country_name,location和total_deaths列的数据框.在进行数据清理过程时,我偶然发现了一个'\r'连接的值.完成清理过程后,我将生成的数据帧存储在destination.csv文件中.由于上面的特定行已\r附加,因此它始终会创建一个新行.

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

Run Code Online (Sandbox Code Playgroud)

我想删除\r.我试过了df.replace({'\r': ''}, regex=True).它不适合我.

还有其他解决方案.有人可以帮忙吗？

编辑:

在上面的过程中,我迭代df以查看是否\r存在.如果存在,则需要更换.这里row.replace()或row.str.strip()似乎没有工作,或者我可能以错误的方式做到这一点.

我不想在使用时指定列名或行号replace().因为我无法确定只有"位置"列才会有\r.请在下面找到代码.

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\\r': ''} , regex=True)
        print row
        count += 1

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 10

另一种解决方案是使用str.strip:

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

Run Code Online (Sandbox Code Playgroud)

如果你想使用replace,添加r和一个\:

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

Run Code Online (Sandbox Code Playgroud)

在replace您可以定义替换列,如:

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

Run Code Online (Sandbox Code Playgroud)

编辑评论:

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

Run Code Online (Sandbox Code Playgroud)

如果需要仅在列中替换location:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，6 月前
查看次数：	15672 次
最近记录：	6 年前