NaN 值的序数编码器问题

Lit*_*tle 6 python ordinal pandas

我有一个数据框,其中空格作为缺失值,因此我使用正则表达式将它们替换为 NaN 值。我遇到的问题是当我想使用序数编码来替换分类值时。到目前为止我的代码如下:

    x=pd.DataFrame(np.array([30,"lawyer","France",
                             25,"clerk","Italy",
                             22," ","Germany",
                             40,"salesman","EEUU",
                             34,"lawyer"," ",
                             50,"salesman","France"]
                             
            ).reshape(6,3))
    x.columns=["age","job","country"]
    x = x.replace(r'^\s*$', np.nan, regex=True)

    oe=preprocessing.OrdinalEncoder()
    df.job=oe.fit_transform(df["job"].values.reshape(-1,1))
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

Input contains NaN
Run Code Online (Sandbox Code Playgroud)

我希望将工作列替换为数字,例如:[1,2,-1,3,1,3]。

WeN*_*Ben 4

您可以尝试使用factorize,注意这里是以 0 开头的类别

x.job.mask(x.job==' ').factorize()[0]
Out[210]: array([ 0,  1, -1,  2,  0,  2], dtype=int32)
Run Code Online (Sandbox Code Playgroud)