是否有一种方便的机制来锁定 scikit-learn 管道中的步骤以防止它们在 pipeline.fit() 上重新拟合?例如:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups
data = fetch_20newsgroups(subset='train')
firsttwoclasses = data.target<=1
y = data.target[firsttwoclasses]
X = np.array(data.data)[firsttwoclasses]
pipeline = Pipeline([
("vectorizer", CountVectorizer()),
("estimator", LinearSVC())
])
# fit intial step on subset of data, perhaps an entirely different subset
# this particular example would not be very useful in practice
pipeline.named_steps["vectorizer"].fit(X[:400])
X2 = pipeline.named_steps["vectorizer"].transform(X)
# fit estimator on all data without …Run Code Online (Sandbox Code Playgroud) 我有以下功能,它可以实现我想要的单个呼叫:
let shuffle (arr : 'a array) =
let array = Array.copy arr
let rng = new Random()
let n = array.Length
for x in 1..n do
let i = n-x
let j = rng.Next(i+1)
let tmp = array.[i]
array.[i] <- array.[j]
array.[j] <- tmp
array
Run Code Online (Sandbox Code Playgroud)
但是,对于多个调用,如下所示(x不用于任何内容),它会为每个调用产生相同的shuffle.我如何让它每次都产生不同的洗牌?
[for x in 1..3 do yield shuffle [|1;2;3|]]
>
val it : int [] list = [[|1; 3; 2|]; [|1; 3; 2|]; [|1; 3; 2|]]
Run Code Online (Sandbox Code Playgroud) 映射以下内容的最佳方法是什么:
[|"A"; "B"; "C"; "D"|]
Run Code Online (Sandbox Code Playgroud)
至
[|("","A","B"); ("A","B","C"); ("B","C","D"); ("C","D","")|]
Run Code Online (Sandbox Code Playgroud)
?
是否可以将记录类型的标签作为字符串列表获取?例如,给定以下类型:
type Person = {
Name: string
Age: int
}
Run Code Online (Sandbox Code Playgroud)
我想要一个可以给我的功能["Name"; "Age";]