目标:使用 sklearn 基于 int 和基于对象的特征预测结果。
我正在使用来自 Kaggle 的以下数据集:Soccer Dataset
这是我的笔记本:Kaggle Notebook
图书馆
我创建了一个几乎可以工作的管道:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
# Read the data
df = total_df.copy()
# Remove rows with missing target
df.dropna(axis=0, subset=['result'], inplace=True)
# Separate target from predictors
y = df.result
X = df.drop(['result'], axis=1) …Run Code Online (Sandbox Code Playgroud)