Lit*_*tle 6 python ordinal pandas
我有一个数据框,其中空格作为缺失值,因此我使用正则表达式将它们替换为 NaN 值。我遇到的问题是当我想使用序数编码来替换分类值时。到目前为止我的代码如下:
x=pd.DataFrame(np.array([30,"lawyer","France",
25,"clerk","Italy",
22," ","Germany",
40,"salesman","EEUU",
34,"lawyer"," ",
50,"salesman","France"]
).reshape(6,3))
x.columns=["age","job","country"]
x = x.replace(r'^\s*$', np.nan, regex=True)
oe=preprocessing.OrdinalEncoder()
df.job=oe.fit_transform(df["job"].values.reshape(-1,1))
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
Input contains NaN
Run Code Online (Sandbox Code Playgroud)
我希望将工作列替换为数字,例如:[1,2,-1,3,1,3]。
您可以尝试使用factorize,注意这里是以 0 开头的类别
x.job.mask(x.job==' ').factorize()[0]
Out[210]: array([ 0, 1, -1, 2, 0, 2], dtype=int32)
Run Code Online (Sandbox Code Playgroud)