CAB*_*CAB 2 python dictionary pandas
我正在尝试使用地图函数更改数据中的字符串的数值。
这是数据:
label sms_message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
Run Code Online (Sandbox Code Playgroud)
我正在尝试使用以下命令将“垃圾邮件”更改为 1,将“火腿”更改为 0:
df['label'] = df.label.map({'ham':0, 'spam':1})
Run Code Online (Sandbox Code Playgroud)
但结果是:
label sms_message
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
Run Code Online (Sandbox Code Playgroud)
有谁能找出问题所在吗?
你是对的,我认为你执行了相同的语句两次(1 后 1)。在 Python 交互式终端上执行的以下语句阐明了这一点。
注意:如果您传递字典,
NaN如果与字典的键不匹配,则map()会将Series中的所有值替换为(我认为,您也做了同样的事情,即执行该语句两次)。检查pandas map()、apply()。Pandas 文档注释:当arg是字典时,Series中不在字典中的值(作为键)将转换为NaN。
>>> import pandas as pd
>>>
>>> d = {
... "label": ["ham", "ham", "spam", "ham", "ham"],
... "sms_messsage": [
... "Go until jurong point, crazy.. Available only ...",
... "Ok lar... Joking wif u oni...",
... "Free entry in 2 a wkly comp to win FA Cup fina...",
... "U dun say so early hor... U c already then say...",
... "Nah I don't think he goes to usf, he lives aro..."
... ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_messsage
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 0 Go until jurong point, crazy.. Available only ...
1 0 Ok lar... Joking wif u oni...
2 1 Free entry in 2 a wkly comp to win FA Cup fina...
3 0 U dun say so early hor... U c already then say...
4 0 Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
label sms_messsage
0 NaN Go until jurong point, crazy.. Available only ...
1 NaN Ok lar... Joking wif u oni...
2 NaN Free entry in 2 a wkly comp to win FA Cup fina...
3 NaN U dun say so early hor... U c already then say...
4 NaN Nah I don't think he goes to usf, he lives aro...
>>>
Run Code Online (Sandbox Code Playgroud)
>>> import pandas as pd
>>>
>>> d = {
... "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
... "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
Run Code Online (Sandbox Code Playgroud)
第一种方法 -
map()与dictionary参数一起使用
>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
label sms_message
0 spam M1
1 ham M2
2 ham M3
3 ham M4
4 spam M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
Run Code Online (Sandbox Code Playgroud)
第二种方式 -
map()与function参数一起使用
>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
Run Code Online (Sandbox Code Playgroud)
第三种方式 -
apply()与function参数一起使用
>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
label sms_message
0 1 M1
1 0 M2
2 0 M3
3 0 M4
4 1 M5
>>>
Run Code Online (Sandbox Code Playgroud)
谢谢。
| 归档时间: |
|
| 查看次数: |
10463 次 |
| 最近记录: |