fig.align 没有使用R Markdown HTML输出和情节,我寻求一些帮助!
这是一个MWE:
---
title: "Untitled"
output: html_document
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
library(plotly)
```
```{r}
plot_ly(
x = c("giraffes", "orangutans"),
y = c(20, 14),
name = "SF Zoo",
type = "bar")
```
```{r, fig.align = 'right'}
plot_ly(
x = c("giraffes", "orangutans"),
y = c(20, 14),
name = "SF Zoo",
type = "bar")
```
```{r, fig.align = 'center'}
plot_ly(
x = c("giraffes", "orangutans"),
y = c(20, 14),
name = "SF …Run Code Online (Sandbox Code Playgroud) 我试图弄清楚如何构建我的数据集并构建 X 和 y,以便它可以与 Keras 的 Stacked LSTM 一起用于序列分类。
我有我试图预测分类的面板数据。我不完全确定如何理解时间步长或如何根据我的面板数据正确制作数据的形状。
# Libraries
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
import pandas as pd
# Here is an example of my data
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/sample2.csv')
df
Run Code Online (Sandbox Code Playgroud)
# Contains a handful of features, a target, year, and id of the observation
id year x1 x2 x3 y
0 A 2015 1 1 1 1
1 A 2016 2 2 2 1
2 A 2017 3 3 …Run Code Online (Sandbox Code Playgroud) 我想知道是否有人可以帮助我了解如何将简短的 TF 模型转换为 Torch。
考虑这个 TF 设置:
inp = layers.Input(shape = (386, 1024, 1), dtype = tf.float32)
x = layers.Dense(2)(inp) # [None, 386, 1024, 2]
start, end = tf.split(x, 2, axis=-1)
start = tf.squeeze(start, axis = -1)
end = tf.squeeze(end, axis = -1)
model = Model(inputs = inp, outputs = [start, end])
Run Code Online (Sandbox Code Playgroud)
具体来说,我不确定 Torch 命令会将我的数据从什么转变386, 1024, 1为386, 1024, 2,我也不明白它的作用:Model(inputs = inp, outputs = [start, end])
是:
inp = layers.Input(shape = (386, 1024, 1), dtype = …Run Code Online (Sandbox Code Playgroud) 我试图找出为什么我的数据集给出超出范围的索引错误。
考虑这个火炬数据集:
# prepare torch data set
class MSRH5Processor(torch.utils.data.Dataset):
def __init__(self, type, shard=False, **args):
# init configurable string
self.type = type
# init shard for sampling large ds if specified
self.shard = shard
# set seed if given
self.seed = args
# set loc
self.file_path = 'C:\\data\\h5py_embeds\\'
# set file paths
self.val_embed_path = self.file_path + 'msr_dev_bert_embeds.h5'
# if true, initialize the dev data
if self.type == 'dev':
# embeds are shaped: [layers, tokens, features]
self.embeddings = h5py.File(self.val_embed_path, 'r')["embeds"]
def __len__(self): …Run Code Online (Sandbox Code Playgroud) 我无法弄清楚如何在 for 循环中使用 np.where 的索引结果。我想使用这个 for 循环仅更改给定 np.where 索引结果的列的值。
这是一个假设的例子,我想在我的数据集中找到某些问题或异常的索引位置,使用 np.where 获取它们的位置,然后在数据帧上运行一个循环以将它们重新编码为 NaN,同时留下每个其他索引不变。
到目前为止,这是我的简单代码尝试:
import pandas as pd
import numpy as np
# import iris
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/iris.csv')
# conditional np.where -- hypothetical problem data
find_error = np.where((df['petal_length'] == 1.6) &
(df['petal_width'] == 0.2))
# loop over column to change error into NA
for i in enumerate(find_error):
df = df['species'].replace({'setosa': np.nan})
# df[i] is a problem but I cannot figure out how to get around this or an alternative
Run Code Online (Sandbox Code Playgroud) 我很难解释cluster_centers_数组输出的结果。
考虑以下 MWE:
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import numpy as np
# Load the data
iris = load_iris()
X, y = iris.data, iris.target
# shuffle the data
shuffle = np.random.permutation(np.arange(X.shape[0]))
X = X[shuffle]
# scale X
X = (X - X.mean()) / X.std()
# plot K-means centroids
km = KMeans(n_clusters = 2, n_init = 10) # establish the model
# fit the data
km.fit(X);
# km centers
km.cluster_centers_
Run Code Online (Sandbox Code Playgroud)
array([[ 1.43706001, -0.29278015, 0.75703227, -0.89603057],
[ …Run Code Online (Sandbox Code Playgroud) 我有一些文件大小只有几 MB 的表,我想将它们捕获为增量表。将新数据插入其中需要非常长的时间,超过 15 分钟,这让我感到惊讶。
我猜罪魁祸首是,虽然桌子很小;这些表中有 300 多列。
我尝试了以下方法,前者比后者更快(毫不奇怪(?)): (1) INSERT INTO, (2) MERGE INTO。
在将数据插入增量表之前,我应用了一些 Spark 函数来清理数据,然后最后将其注册为临时表(例如,INSERT INTO DELTA_TBL_OF_INTEREST (cols) SELECT * FROM tempTable
对于加快琐碎数据的这一过程有什么建议吗?
我有以下形状的数据:
BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols
Run Code Online (Sandbox Code Playgroud)
然而,当我尝试这样做时string_agg,我的代码将花费 10 多分钟才能在 500 行上执行。我该如何改进这个查询(数据不可用)。
select
BOM.*,
childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
FROM (
SELECT DISTINCT PROJECT3.NAME
FROM …Run Code Online (Sandbox Code Playgroud) python ×4
python-3.x ×2
databricks ×1
delta-lake ×1
h5py ×1
k-means ×1
keras ×1
numpy ×1
pandas ×1
panel-data ×1
plotly ×1
pyspark ×1
pytorch ×1
r ×1
r-markdown ×1
sql ×1
sql-server ×1
tensorflow ×1
torch ×1