小编Joh*_*tud的帖子

R Markdown和Plotly:fig.align无法使用HTML输出

fig.align 没有使用R Markdown HTML输出和情节,我寻求一些帮助!

这是一个MWE:

---
title: "Untitled"
output: html_document
editor_options: 
chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
library(plotly)
```


```{r}
plot_ly(
  x = c("giraffes", "orangutans"),
  y = c(20, 14),
  name = "SF Zoo",
  type = "bar")
```


```{r, fig.align = 'right'}
plot_ly(
  x = c("giraffes", "orangutans"),
  y = c(20, 14),
  name = "SF Zoo",
  type = "bar")
```

```{r, fig.align = 'center'}
plot_ly(
  x = c("giraffes", "orangutans"),
 y = c(20, 14),
 name = "SF …

Run Code Online (Sandbox Code Playgroud)

r r-markdown plotly

Joh*_*tud

lucky-day

5
推荐指数

2
解决办法

1470
查看次数

如何为 LSTM 和 Keras 构建面板数据？

我试图弄清楚如何构建我的数据集并构建 X 和 y，以便它可以与 Keras 的 Stacked LSTM 一起用于序列分类。

我有我试图预测分类的面板数据。我不完全确定如何理解时间步长或如何根据我的面板数据正确制作数据的形状。

# Libraries
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
import pandas as pd

# Here is an example of my data
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/sample2.csv')
df

Run Code Online (Sandbox Code Playgroud)

# Contains a handful of features, a target, year, and id of the observation
   id        year  x1 x2  x3  y
0   A       2015   1   1   1  1
1   A       2016   2   2   2  1
2   A       2017   3   3 …

Run Code Online (Sandbox Code Playgroud)

python panel-data keras

Joh*_*tud

2019 02-03

5
推荐指数

0
解决办法

934
查看次数

如何将 TF Dense 层转换为 PyTorch？

我想知道是否有人可以帮助我了解如何将简短的 TF 模型转换为 Torch。

考虑这个 TF 设置：

inp = layers.Input(shape = (386, 1024, 1), dtype = tf.float32)
x = layers.Dense(2)(inp)  # [None, 386, 1024, 2]
start, end = tf.split(x, 2, axis=-1)
start = tf.squeeze(start, axis = -1)
end = tf.squeeze(end, axis = -1)
model = Model(inputs = inp, outputs = [start, end])

Run Code Online (Sandbox Code Playgroud)

具体来说，我不确定 Torch 命令会将我的数据从什么转变386, 1024, 1为386, 1024, 2，我也不明白它的作用：Model(inputs = inp, outputs = [start, end])

是：

inp = layers.Input(shape = (386, 1024, 1), dtype = …

Run Code Online (Sandbox Code Playgroud)

python torch tensorflow

Joh*_*tud

2021 01-14

5
推荐指数

1
解决办法

3036
查看次数

PyTorch：为什么我的数据集类给出索引超出范围错误？

我试图找出为什么我的数据集给出超出范围的索引错误。

考虑这个火炬数据集：

# prepare torch data set
class MSRH5Processor(torch.utils.data.Dataset):
    def __init__(self, type, shard=False, **args):
        # init configurable string
        self.type = type
        # init shard for sampling large ds if specified
        self.shard = shard
        # set seed if given
        self.seed = args
        # set loc
        self.file_path = 'C:\\data\\h5py_embeds\\'
        # set file paths
        self.val_embed_path = self.file_path + 'msr_dev_bert_embeds.h5'

        # if true, initialize the dev data
        if self.type == 'dev':
            # embeds are shaped: [layers, tokens, features]
            self.embeddings = h5py.File(self.val_embed_path, 'r')["embeds"]

    def __len__(self): …

Run Code Online (Sandbox Code Playgroud)

python h5py pytorch

Joh*_*tud

2021 02-22

5
推荐指数

1
解决办法

2069
查看次数

使用 np.where 将列值转换为 NaN

我无法弄清楚如何在 for 循环中使用 np.where 的索引结果。我想使用这个 for 循环仅更改给定 np.where 索引结果的列的值。

这是一个假设的例子，我想在我的数据集中找到某些问题或异常的索引位置，使用 np.where 获取它们的位置，然后在数据帧上运行一个循环以将它们重新编码为 NaN，同时留下每个其他索引不变。

到目前为止，这是我的简单代码尝试：

import pandas as pd
import numpy as np

# import iris
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/iris.csv')

# conditional np.where -- hypothetical problem data
find_error = np.where((df['petal_length'] == 1.6) & 
                  (df['petal_width'] == 0.2))

# loop over column to change error into NA
for i in enumerate(find_error):
    df = df['species'].replace({'setosa': np.nan})

# df[i] is a problem but I cannot figure out how to get around this or an alternative

Run Code Online (Sandbox Code Playgroud)

python numpy python-3.x pandas

Joh*_*tud

2019 01-31

3
推荐指数

1
解决办法

395
查看次数

解释 K-Means cluster_centers_ 输出

我很难解释cluster_centers_数组输出的结果。

考虑以下 MWE：

from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import numpy as np

# Load the data
iris = load_iris()
X, y = iris.data, iris.target

# shuffle the data
shuffle = np.random.permutation(np.arange(X.shape[0]))
X = X[shuffle]

# scale X
X = (X - X.mean()) / X.std()

# plot K-means centroids
km = KMeans(n_clusters = 2, n_init = 10)  # establish the model

# fit the data
km.fit(X);

# km centers
km.cluster_centers_

Run Code Online (Sandbox Code Playgroud)

array([[ 1.43706001, -0.29278015,  0.75703227, -0.89603057],
       [ …

Run Code Online (Sandbox Code Playgroud)

k-means python-3.x unsupervised-learning

Joh*_*tud

lucky-day

2
推荐指数

1
解决办法

1620
查看次数

DataBricks：将数据插入 Delta 表的最快方法？

我有一些文件大小只有几 MB 的表，我想将它们捕获为增量表。将新数据插入其中需要非常长的时间，超过 15 分钟，这让我感到惊讶。

我猜罪魁祸首是，虽然桌子很小；这些表中有 300 多列。

我尝试了以下方法，前者比后者更快（毫不奇怪（？））： (1) INSERT INTO， (2) MERGE INTO。

在将数据插入增量表之前，我应用了一些 Spark 函数来清理数据，然后最后将其注册为临时表（例如，INSERT INTO DELTA_TBL_OF_INTEREST (cols) SELECT * FROM tempTable

对于加快琐碎数据的这一过程有什么建议吗？

pyspark databricks delta-lake

Joh*_*tud

2022 05-26

2
推荐指数

1
解决办法

5834
查看次数

提高这个string_agg的速度？

我有以下形状的数据：

BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols

Run Code Online (Sandbox Code Playgroud)

然而，当我尝试这样做时string_agg，我的代码将花费 10 多分钟才能在 500 行上执行。我该如何改进这个查询（数据不可用）。

select
    BOM.*,
    childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
    SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
    FROM (
        SELECT DISTINCT PROJECT3.NAME
    FROM …

Run Code Online (Sandbox Code Playgroud)

sql sql-server string-aggregation

Joh*_*tud

2022 03-02

1
推荐指数

1
解决办法

2083
查看次数