我刚刚开始使用 FastText。.csv我正在通过使用数据集文件作为输入来对小型数据集进行交叉验证。为了处理数据集,我使用以下参数:
model = fasttext.train_supervised(input=train_file,
lr=1.0,
epoch=100,
wordNgrams=2,
bucket=200000,
dim=50,
loss='hs')
Run Code Online (Sandbox Code Playgroud)
不过,我想使用FastText 网站上提供的维基百科中的预训练嵌入。可行吗?如果是这样,我必须将特定参数添加到参数列表中吗?
我正在 scikit-learn 中进行交叉折叠验证。这里的脚本:
import pandas as pd
import numpy as np
from time import time
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
from sklearn.metrics import classification_report, accuracy_score, make_scorer
from sklearn.model_selection._validation import cross_val_score
from sklearn.model_selection import GridSearchCV, KFold, StratifiedKFold
r_filenameTSV = "TSV/A19784.tsv"
#DF 300 dimension start
tsv_read = pd.read_csv(r_filenameTSV, sep='\t', names=["vector"])
df = pd.DataFrame(tsv_read)
df = pd.DataFrame(df.vector.str.split(" ", 1).tolist(), columns=['label', 'vector'])
print(df)
#DF 300 dimension end
y = pd.DataFrame([df.label]).astype(int).to_numpy().reshape(-1, 1).ravel()
print(y.shape)
X = pd.DataFrame([dict(y.split(':') for y …Run Code Online (Sandbox Code Playgroud)