我想将“UserId”中的最后一位数字存储在一个新变量中(此类 UserId 是字符串类型)。
我想出了这个,但这是一个很长的 df 并且需要永远。关于如何优化/避免 for 循环的任何提示?
df['LastDigit'] = np.nan
for i in range(0,len(df['UserId'])):
df.loc[i]['LastDigit'] = df.loc[i]['UserId'].strip()[-1]
Run Code Online (Sandbox Code Playgroud) 这是我在 CatBoost 中应用 BayesSearch 的尝试:
from catboost import CatBoostClassifier
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold
# Classifier
bayes_cv_tuner = BayesSearchCV(
estimator = CatBoostClassifier(
silent=True
),
search_spaces = {
'depth':(2,16),
'l2_leaf_reg':(1, 500),
'bagging_temperature':(1e-9, 1000, 'log-uniform'),
'border_count':(1,255),
'rsm':(0.01, 1.0, 'uniform'),
'random_strength':(1e-9, 10, 'log-uniform'),
'scale_pos_weight':(0.01, 1.0, 'uniform'),
},
scoring = 'roc_auc',
cv = StratifiedKFold(
n_splits=2,
shuffle=True,
random_state=72
),
n_jobs = 1,
n_iter = 100,
verbose = 1,
refit = True,
random_state = 72
)
Run Code Online (Sandbox Code Playgroud)
跟踪结果:
def status_print(optim_result):
"""Status callback durring bayesian …Run Code Online (Sandbox Code Playgroud) 我有一个包含多个用户和时区的数据框,如下所示:
cols = ['user', 'zone_name', 'utc_datetime']
data = [
[1, 'Europe/Amsterdam', pd.to_datetime('2019-11-13 11:14:15')],
[2, 'Europe/London', pd.to_datetime('2019-11-13 11:14:15')],
]
df = pd.DataFrame(data, columns=cols)
Run Code Online (Sandbox Code Playgroud)
基于另一篇文章,我应用以下更改来获取用户本地日期时间:
df['local_datetime'] = df.groupby('zone_name')[
'utc_datetime'
].transform(lambda x: x.dt.tz_localize(x.name))
Run Code Online (Sandbox Code Playgroud)
输出如下:
user zone_name utc_datetime local_datetime
1 Europe/Amsterdam 2019-11-13 11:14:15 2019-11-13 11:14:15+01:00
2 Europe/London 2019-11-13 11:14:15 2019-11-13 11:14:15+00:00
Run Code Online (Sandbox Code Playgroud)
但是,该local_datetime列是一个object,我找不到一种方法将其获取datetime64[ns]为以下格式(所需的输出):
user zone_name utc_datetime local_datetime
1 Europe/Amsterdam 2019-11-13 11:14:15 2019-11-13 12:14:15
2 Europe/London 2019-11-13 11:14:15 2019-11-13 11:14:15
Run Code Online (Sandbox Code Playgroud)