我正在使用基于特定列中的值的特定分类器对 DataFrame 中的行进行分类。我的目标是根据特定条件将结果附加到一个新列或另一列。代码看起来像这样:
df = pd.DataFrame({'A': [list with classifier ids], # Only 3 ids, One word strings
'B': [List of text to be classified], # Millions of unique rows, lines of text around 5-25 words long
'C': [List of the old classes]} # Hundreds of possible classes, four digit integers stored as strings
df.sort_values('A', inplace=True)
new_col1, new_col2 = [], []
for name, group in df.groupby('A', sort=False):
classifier = classy_dict[name]
vectors = vectorize(group.B.values)
preds = classifier.predict(vectors)
scores = classifier.decision_function(vectors)
for …Run Code Online (Sandbox Code Playgroud)