A. *_*nha 4 python machine-learning dataframe pandas
假设我有一个包含 500 行的数据框。我想执行 10 折交叉验证。所以,我需要将这些数据分成 10 组,每组包含 50 行。我想一次将整个数据分成 10 组太随机了。
有没有办法使用诸如 pandas、numpy 等的任何库来做到这一点?
您可以使用 sklearn 的KFold:
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold
# create dummy dataframe with 500 rows
features = np.random.randint(1, 100, 500)
labels = np.random.randint(1, 100, 500)
df = pd.DataFrame(data = {"X": features, "Y": labels})
kf = KFold(n_splits=10, random_state=42, shuffle=True) # Define the split - into 10 folds
kf.get_n_splits(df) # returns the number of splitting iterations in the cross-validator
print(kf)
for train_index, test_index in kf.split(df):
print("TRAIN:", train_index)
print("TEST:", test_index)
X_train, X_test = df.loc[train_index, "X"], df.loc[test_index, "X"]
y_train, y_test = df.loc[train_index, "Y"], df.loc[test_index, "Y"]
Run Code Online (Sandbox Code Playgroud)
示例取自此处。
| 归档时间: |
|
| 查看次数: |
2775 次 |
| 最近记录: |