cru*_*xer 21 python csv pandas
import pandas as pd
path1 = "/home/supertramp/Desktop/100&life_180_data.csv"
mydf = pd.read_csv(path1)
numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}
print mydf['Cigarettes']
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
print mydf['CigarNum']
mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')
Run Code Online (Sandbox Code Playgroud)
csv文件"100&life_180_data.csv"包含age,bmi,Cigarettes,Alocohol等列.
No int64
Age int64
BMI float64
Alcohol object
Cigarettes object
dtype: object
Run Code Online (Sandbox Code Playgroud)
香烟专栏包含"Never""1-5 Cigarettes/day","10-20 Cigarettes/day".我想为这些物体分配重量(从不,1-5根香烟/天,......)
预期的输出是附加的新列CigarNum,其仅包含数字0,1,2 CigarNum如预期的那样直到8行然后显示Nan直到CigarNum列中的最后一行
0 Never
1 Never
2 1-5 Cigarettes/day
3 Never
4 Never
5 Never
6 Never
7 Never
8 Never
9 Never
10 Never
11 Never
12 10-20 Cigarettes/day
13 1-5 Cigarettes/day
14 Never
...
167 Never
168 Never
169 10-20 Cigarettes/day
170 Never
171 Never
172 Never
173 Never
174 Never
175 Never
176 Never
177 Never
178 Never
179 Never
180 Never
181 Never
Name: Cigarettes, Length: 182, dtype: object
Run Code Online (Sandbox Code Playgroud)
我得到的输出几乎没有在第一行之后给出NaN.
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 NaN
11 NaN
12 NaN
13 NaN
14 0
...
167 NaN
168 NaN
169 NaN
170 NaN
171 NaN
172 NaN
173 NaN
174 NaN
175 NaN
176 NaN
177 NaN
178 NaN
179 NaN
180 NaN
181 NaN
Name: CigarNum, Length: 182, dtype: float64
Run Code Online (Sandbox Code Playgroud)
EdC*_*ica 33
好的,首先问题是你有嵌入空格导致函数错误地应用:
使用vectorised修复此问题str:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
Run Code Online (Sandbox Code Playgroud)
现在创建新列应该正常工作:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
Run Code Online (Sandbox Code Playgroud)
UPDATE
感谢@Jeff一如既往地指出了卓越的做事方式:
所以你可以打电话replace而不是打电话apply:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
Run Code Online (Sandbox Code Playgroud)
你也可以使用factorize方法.
想一想为什么不将dict值设置为浮点数然后你避免类型转换?
所以:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
Run Code Online (Sandbox Code Playgroud)
版本0.17.0或更高版本
convert_objects因此被弃用0.17.0,已被替换为to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
Run Code Online (Sandbox Code Playgroud)
这里errors='coerce'将返回NaN值无法转换为数值的位置,如果没有这将引发异常