han*_*nil 2 vectorization python-3.x categorical-data
响应编码是一种向量化分类数据的技术。假设我们有一个名为“grade_category”的分类特征,它具有以下唯一标签 - [“grades_3_5”、“grades_prek_2”、“grades_9_12”、“grades_6_8”]。假设我们正在研究目标类标签为 0 和 1 的分类问题
在响应编码中,您必须输出特征中每个标签与特定类标签一起出现的概率值,例如,grades_prek_2 = [它与 class_0 一起出现的概率,它与 class 1 一起出现的概率]
def response_coding(xtrain, ytrain, feature):
""" this method will encode the categorical features
using response_coding technique.
args:
xtrain, ytrain, feature (all are ndarray)
returns:
dictionary (dict)
"""
dictionary = dict()
x = PrettyTable()
x = PrettyTable([feature, 'class 1', 'class 0'])
unique_cat_labels = xtrain[feature].unique()
for i in tqdm(range(len(unique_cat_labels))):
total_count = xtrain.loc[:,feature][(xtrain[feature] == unique_cat_labels[i])].count()
p_0 = xtrain.loc[:, feature][((xtrain[feature] == unique_cat_labels[i]) & (ytrain==0))].count()
p_1 = xtrain.loc[:, feature][((xtrain[feature] == unique_cat_labels[i]) & (ytrain==1))].count()
dictionary[unique_cat_labels[i]] = [p_1/total_count, p_0/total_count]
row = []
row.append(unique_cat_labels[i])
row.append(p_1/total_count)
row.append(p_0/total_count)
x.add_row(row)
print()
print(x)[![enter image description here][1]][1]
return dictionary
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1970 次 |
| 最近记录: |