Pandas 将对象列转换为 str - 列包含 unicode、float 等

add*_*ons 5 utf-8 python-2.7 pandas

我有 Pandas 数据框,其中列类型显示为object但是当我尝试转换为字符串时,

df['column'] = df['column'].astype('str')

UnicodeEncodeError 被抛出: *** UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

我的下一个方法是处理编码部分: df['column'] = filtered_df['column'].apply(lambda x: x.encode('utf-8').strip())

但这会导致以下错误: *** AttributeError: 'float' object has no attribute 'encode'

将此列转换为字符串的最佳方法是什么。

列中的字符串示例

Thank you :)
Thank You !!!
responsibilities/assigned job.
Run Code Online (Sandbox Code Playgroud)

小智 5

I had the same problem in python 2.7 when trying to run a script that was originally intended for python 3. In python 2.7, the default str functionality is to encode to ASCII, which will apparently not work with your data. This can be replicated in a simple example:

import pandas as pd
df = pd.DataFrame({'column': ['asdf', u'uh ™ oh', 123]})
df['column'] = df['column'].astype('str')
Run Code Online (Sandbox Code Playgroud)

Results in:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 3: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)

Instead, you can specify unicode:

df['column'] = df['column'].astype('unicode')
Run Code Online (Sandbox Code Playgroud)

Verify that the number has been converted to a string:

df['column'][2]
Run Code Online (Sandbox Code Playgroud)

This outputs u'123', so it has been converted to a unicode string. The special character ™ has been properly preserved as well.