我已经运行了sklearn.manifold.TSNEsklearn 文档中的示例代码,但是我收到了问题标题中描述的错误。
我已经尝试将我的 sklearn 版本更新到最新版本(由!pip install -U scikit-learn)(scikit-learn=1.0.1)。然而,问题仍然存在。
有谁知道如何修理它?
示例代码:
import numpy as np
from sklearn.manifold import TSNE
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
X_embedded = TSNE(n_components=2, learning_rate='auto',
init='random').fit_transform(X)
X_embedded.shape
Run Code Online (Sandbox Code Playgroud)
错误行发生在:
X_embedded = TSNE(n_components=2, learning_rate='auto',
init='random').fit_transform(X)
Run Code Online (Sandbox Code Playgroud)
错误信息:
UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')
Run Code Online (Sandbox Code Playgroud) 我有四个类别特征和第五个数字特征 (Var5)。当我尝试以下代码时:
cat_attribs = ['var1','var2','var3','var4']
full_pipeline = ColumnTransformer([('cat', OneHotEncoder(handle_unknown = 'ignore'), cat_attribs)], remainder = 'passthrough')
X_train = full_pipeline.fit_transform(X_train)
model = XGBRegressor(n_estimators=10, max_depth=20, verbosity=2)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
当模型尝试进行预测时,我收到以下错误消息:
ValueError:数据的 DataFrame.dtypes 必须是 int、float、bool 或 categorical。当提供分类类型时,DMatrix 参数
enable_categorical必须设置为True.Var1、Var2、Var3、Var4
有谁知道这里出了什么问题?
如果有帮助,这里是 X_train 数据和 y_train 数据的一个小样本:
Var1 Var2 Var3 Var4 Var5
1507856 JP 2009 6581 OME 325.787218
839624 FR 2018 5783 I_S 11.956326
1395729 BE 2015 6719 OME 42.888565
1971169 DK 2011 3506 RPP 70.094146
1140120 AT 2019 5474 NMM …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn xgboost one-hot-encoding
为什么我不断收到此错误?我也尝试了其他代码,但是一旦使用该get_feature_names_out函数就会弹出此错误。
下面是我的代码:
from sklearn.datasets._twenty_newsgroups import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB # fast to train and achieves a decent F-score
from sklearn import metrics
import numpy as np
def show_top10(classifier, vectorizer, categories):
feature_names = vectorizer.get_feature_names_out()
for i, category in enumerate(categories):
top10 = np.argsort(classifier.coef_[i])[-10:]
print("%s: %s" % (category, " ".join(feature_names[top10])))
newsgroups_train = fetch_20newsgroups(subset='train')
print(list(newsgroups_train.target_names))
cats = ['alt.atheism', 'sci.space', 'rec.sport.baseball', 'rec.sport.hockey']
newsgroups_train = fetch_20newsgroups(subset='train', categories=cats)
print(list(newsgroups_train.target_names))
print(newsgroups_train.filenames.shape)
vectorizer = TfidfVectorizer()
vectors = vectorizer.fit_transform(newsgroups_train.data)
print(vectors.shape)
Run Code Online (Sandbox Code Playgroud)
悬停模板='大陆:%{df['大陆']}
'+'国家:%{df['国家']}
'+'gdpPercap:%{x:,.4f}
'+'lifeExp:%{y} '+''
我正在尝试使用hovertemplate 来自定义悬停信息。但是我无法让它显示我想要的内容。我正在让 x 和 y 正常工作。但我不知道如何将其他字段添加到悬停模板中。任何帮助,将不胜感激。
import numpy as np
df = df[df['year'] == 1952]
customdata = np.stack((df['continent'], df['country']), axis=-1)
fig = go.Figure()
for i in df['continent'].unique():
df_by_continent = df[df['continent'] == i]
fig.add_trace(go.Scatter(x=df_by_continent['gdpPercap'],
y=df_by_continent['lifeExp'],
mode='markers',
opacity=.7,
marker = {'size':15},
name=i,
hovertemplate=
'Continent: %{customdata[0]}<br>'+
'Country: %{customdata[1]}<br>'+
'gdpPercap: %{x:,.4f} <br>'+
'lifeExp: %{y}'+
'<extra></extra>',
))
fig.update_layout(title="My Plot",
xaxis={'title':'GDP Per Cap',
'type':'log'},
yaxis={'title':'Life Expectancy'},
)
fig.show()
Run Code Online (Sandbox Code Playgroud)
更新了更多代码。第一个答案仅返回 comdata 的文本值不起作用。
我想垂直对齐 a 的所有选项dash_core_components.RadioItems。
根据dash 文档,默认行为应包括选项的垂直对齐RadioItems。如果您想水平对齐选项,则必须指定:
labelStyle={'display': 'inline-block'}
Run Code Online (Sandbox Code Playgroud)
相反,作为默认行为,我得到水平对齐,但我不知道要指定什么作为项目display来获得选项的垂直对齐RadioItems。
这是我到目前为止的尝试:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
app = dash.Dash()
app.layout = html.Div([dcc.RadioItems(id = 'input-radio-button',
options = [dict(label = 'A', value = 'A'),
dict(label = 'B', value = 'B')],
value = 'A'),
html.P(id = 'output-text')])
@app.callback(Output('output-text', 'children'),
[Input('input-radio-button', 'value')])
def update_graph(value):
return f'The selected value is {value}'
if __name__ == "__main__":
app.run_server()
Run Code Online (Sandbox Code Playgroud)
我得到什么: …
我正在尝试将 XGBoost 用于包含大约 500,000 个观察值和 10 个特征的特定数据集。我正在尝试使用 进行一些超参数调整RandomizedSeachCV,并且具有最佳参数的模型的性能比具有默认参数的模型的性能差。
具有默认参数的模型:
model = XGBRegressor()
model.fit(X_train,y_train["speed"])
y_predict_speed = model.predict(X_test)
from sklearn.metrics import r2_score
print("R2 score:", r2_score(y_test["speed"],y_predict_speed, multioutput='variance_weighted'))
R2 score: 0.3540656307310167
Run Code Online (Sandbox Code Playgroud)
随机搜索的最佳模型:
booster=['gbtree','gblinear']
base_score=[0.25,0.5,0.75,1]
## Hyper Parameter Optimization
n_estimators = [100, 500, 900, 1100, 1500]
max_depth = [2, 3, 5, 10, 15]
booster=['gbtree','gblinear']
learning_rate=[0.05,0.1,0.15,0.20]
min_child_weight=[1,2,3,4]
# Define the grid of hyperparameters to search
hyperparameter_grid = {
'n_estimators': n_estimators,
'max_depth':max_depth,
'learning_rate':learning_rate,
'min_child_weight':min_child_weight,
'booster':booster,
'base_score':base_score
}
# Set up the random search with 4-fold …Run Code Online (Sandbox Code Playgroud) 在今天之前它一直在工作。我不知道为什么它今天不起作用。
import yfinance as yf
df = yf.Ticker('MMM').history(start='2021-01-01',end='2021-07-10')
Run Code Online (Sandbox Code Playgroud)
File "D:\anaconda3\envs\tensorflow\lib\site-packages\yfinance\base.py", line 157, in history
data = data.json()
File "D:\anaconda3\envs\tensorflow\lib\site-packages\requests\models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "D:\anaconda3\envs\tensorflow\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "D:\anaconda3\envs\tensorflow\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\anaconda3\envs\tensorflow\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Run Code Online (Sandbox Code Playgroud) 我正在尝试按照此处的建议使用 LightGBM 作为多输出预测器。我正在尝试预测连续三十天的值。我有一个面板数据集,所以我无法使用传统的时间序列方法。
我有一个非常大的数据集,因此在不提前停止的情况下训练模型需要很长时间。因此,我尝试传递eval_set,early_stopping_rounds和eval_metric参数,如下所示:
from lightgbm import LGBMRegressor
from sklearn.multioutput import MultiOutputRegressor
hyper_params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': ['l1','l2'],
'learning_rate': 0.01,
'feature_fraction': 0.9,
'bagging_fraction': 0.7,
'bagging_freq': 10,
'verbose': 0,
"max_depth": 8,
"num_leaves": 128,
"max_bin": 512,
"num_iterations": 10000
}
lgbc_fit_params = {
'early_stopping_rounds' : 300,
'eval_set': (X_test, y_test_array),
'eval_metric':'l1'
}
gbm = lgb.LGBMRegressor(**hyper_params)
regr_multiglb = MultiOutputRegressor(gbm)
regr_multiglb.fit(X_train, y_train_array, **lgbc_fit_params)
Run Code Online (Sandbox Code Playgroud)
这里, 和y_train_array都是形状分别为和 的y_test_array 二维 numpy 数组。(1953395, 30) …
此代码预测指定股票截至当前日期的价值,但不预测训练数据集之外的日期。这段代码来自我之前提出的一个问题,所以我对它的理解相当低。我认为解决方案是一个简单的变量更改以增加额外的时间,但我不知道需要操纵哪个值。
import pandas as pd
import numpy as np
import yfinance as yf
import os
import matplotlib.pyplot as plt
from IPython.display import display
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
pd.options.mode.chained_assignment = None
# download the data
df = yf.download(tickers=['AAPL'], period='2y')
# split the data
train_data = df[['Close']].iloc[: - 200, :]
valid_data = df[['Close']].iloc[- 200:, :]
# scale the data
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(train_data)
train_data = scaler.transform(train_data)
valid_data = scaler.transform(valid_data) …Run Code Online (Sandbox Code Playgroud) 我正在尝试在 Python 中的 Plotly Express 中添加每个堆叠条形顶部的总计以及各个条形值。
import plotly.express as px
df = px.data.medals_long()
fig = px.bar(df, x="medal", y="count", color="nation", text_auto=True)
fig.show()
Run Code Online (Sandbox Code Playgroud)
不过我想要如下图表:
python ×10
scikit-learn ×5
plotly ×2
xgboost ×2
css ×1
html ×1
keras ×1
lightgbm ×1
lstm ×1
plotly-dash ×1
tensorflow ×1
yfinance ×1