有没有办法在scikit学习库上实现跳过克?我已经手动生成了一个包含n-skim gram的列表,并将其作为scikit-learn方法的词汇表传递给skipgrams .
不幸的是,它在预测方面的表现非常差:只有63%的准确率.但是,CountVectorizer()使用CountVectorizer()默认代码时,我的准确率为77-80%.
是否有更好的方法来实施scikit中的skip-gram学习?
这是我的代码部分:
corpus = GetCorpus() # This one get text from file as a list
vocabulary = list(GetVocabulary(corpus,k,n))
# this one returns a k-skip n-gram
vec = CountVectorizer(
tokenizer=lambda x: x.split(),
ngram_range=(2,2),
stop_words=stopWords,
vocabulary=vocabulary)
Run Code Online (Sandbox Code Playgroud) public class MainActivity extends AppCompatActivity {
private static final String TAG = "MainActivity";
private static final int REQUEST_CODE = 1234;
private int mScreenDensity;
private MediaProjectionManager mProjectionManager;
private static final int DISPLAY_WIDTH = 720;
private static final int DISPLAY_HEIGHT = 1280;
private MediaProjection mMediaProjection;
private VirtualDisplay mVirtualDisplay;
private ToggleButton mToggleButton;
private MediaRecorder mMediaRecorder;
private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
private static final int REQUEST_PERMISSIONS = 10;
static {
ORIENTATIONS.append(Surface.ROTATION_0, 90);
ORIENTATIONS.append(Surface.ROTATION_90, 0);
ORIENTATIONS.append(Surface.ROTATION_180, 270);
ORIENTATIONS.append(Surface.ROTATION_270, 180);
}
@Override
public …Run Code Online (Sandbox Code Playgroud)