使用具有以下条件的子集创建数据框
subset_df = df_eq.loc[(df_eq['place'].str.contains('Chile')) & (df_eq['mag'] > 7.5),['time','latitude','longitude','mag','place']]
Run Code Online (Sandbox Code Playgroud)
想在 Pandas 中使用 query() 复制上述子集。但是不确定如何在 Pandas 查询中复制 str.contains() 等效项。查询中的“喜欢”似乎不起作用
query_df = df_eq[['time','latitude','longitude','mag','place']].query('place like \'%Chile\' and mag > 7.5')
place like '%Chile'and mag >7.5
^
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)
任何帮助将不胜感激
使用sklearn,我想在样本数据集中有3个分裂(即n_splits = 3),并且训练/测试比率为70:30.我能够将该组分成3个折叠但不能定义测试大小(类似于train_test_split方法).有没有办法在StratifiedKFold中定义测试样本大小?
from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
Run Code Online (Sandbox Code Playgroud)