使用 python merf 库构建集成模型

use*_*827 10 python ensemble-learning mlxtend

我想merf在集成模型中使用(混合效应随机森林)库,例如通过使用mlensmlxtendpython 库。然而,由于拟合和预测方法的merf结构采用非传统方式,我无法弄清楚如何做到这一点:

from merf import MERF
merf = MERF()
merf.fit(X_train, Z_train, clusters_train, y_train)
y_hat = merf.predict(X_test, Z_test, clusters_test)
Run Code Online (Sandbox Code Playgroud)

有没有办法merf在集成模型中使用该库?问题在于,使用mlens或其他集成库构建集成模型会假定 scikit-learn 结构,其中 fit 方法将X,y作为输入,预测方法将 ,X作为输入。然而,merf显然在拟合和预测方法中都有更多的输入。这是一个简化的语法mlens

from mlens.ensemble import SuperLearner 
ensemble = SuperLearner()
ensemble.add(estimators)
ensemble.add_meta(meta_estimator)
ensemble.fit(X, y).predict(X)
Run Code Online (Sandbox Code Playgroud)

我不限于使用mlensmlxten。任何其他构建集成模型的方法merf也可以。

Dia*_*ost -1

我的意思是,您始终可以使用 :P 潜入数据制作过程merf。大部分数据生成来自流形 merf 示例

from merf.utils import MERFDataGenerator
import numpy as np
from mlens.ensemble import SuperLearner
from sklearn.svm import SVR
from sklearn.linear_model import Lasso
from mlens.metrics.metrics import rmse

dgm = MERFDataGenerator(m = .6, sigma_b = np.sqrt(4.5), sigma_e = 1)

num_clusters_each_size = 20
train_sizes = [1, 3, 5, 7, 9]
known_sizes = [9, 27, 45, 63, 81]
new_sizes = [10, 30, 50, 70, 90]

train_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(train_sizes, num_clusters_each_size)
known_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(known_sizes, num_clusters_each_size)
new_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(new_sizes, num_clusters_each_size)

train, test_known, test_new, training_cluster_ids, ptev, prev = dgm.generate_split_samples(train_cluster_sizes, known_cluster_sizes, new_cluster_sizes)

X_train = train[['X_0', 'X_1', 'X_2']]
Z_train = train[['Z']]
clusters_train = train['cluster']
y_train = train['y']
Run Code Online (Sandbox Code Playgroud)

在通过Flennerhag mlens.ensemble superlearner.py(Github)进行一些修改进行拟合和预测之前:

ensemble = SuperLearner()
ensemble.add([SVR(), Lasso()])
ensemble.add_meta(SVR())
pred = ensemble.fit(X_train, y_train).predict(X_train)

root = rmse(y_train, pred)

print(root)

>>>

2.345318341087564
Run Code Online (Sandbox Code Playgroud)

但当然,如果您不介意专门将merfensemble一起使用,那么总体上总有更好的方法。

Keras方法

from keras.models import Sequential
from keras.layers import Dense
from matplotlib import pyplot
from keras import backend
import matplotlib.pyplot as plt
import numpy as np
 
def rmse(y_true, y_pred):
    return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))

X = X_train.to_numpy().flatten()
model = Sequential()
model.add(Dense(2, input_dim=1, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=[rmse])
history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2)
plt.plot(history.history['rmse'])
plt.title("keras loss function")
plt.show()

>>>
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

请注意,X_train此处使用的内容来自之前的merf代码:

X_train = train[['X_0', 'X_1', 'X_2']]
Run Code Online (Sandbox Code Playgroud)