如何仅使用 numpy （而不是 sklearn LabelEncoder）创建标签编码器？

Question

如何仅使用 numpy （而不是 sklearn LabelEncoder）创建标签编码器？

Sco*_*man 4 python numpy pandas scikit-learn

我正在尝试重新创建类似的东西 sklearn.preprocessing.LabelEncoder

但是我不想使用sklearnor pandas。我只想使用numpyPython 标准库。这是我想要实现的目标：

import numpy as np
input = np.array([['hi', 'there'],
                     ['scott', 'james'],
                     ['hi', 'scott'],
                     ['please', 'there']])

# Output would look like
np.ndarray([[0, 0],
            [1, 1],
            [0, 2],
            [2, 0]])

Run Code Online (Sandbox Code Playgroud)

如果能够将其映射回来也很棒，这样结果就会再次看起来与输入完全相同。

如果这是在电子表格中，输入将如下所示：

Answer 1

ALo*_*llz 5

这是一个简单的理解，使用return_inverse以下结果np.unique

arr = np.array([['hi', 'there'], ['scott', 'james'],
                ['hi', 'scott'], ['please', 'there']])

np.column_stack([np.unique(arr[:, i], return_inverse=True)[1] for i in range(arr.shape[1])])

array([[0, 2],
       [2, 0],
       [0, 1],
       [1, 2]], dtype=int64)

Run Code Online (Sandbox Code Playgroud)

或者沿轴应用：

np.column_stack(np.apply_along_axis(np.unique, 0, arr, return_inverse=True)[1])

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	2282 次
最近记录：	3 年，4 月前