小编Hen*_*ski的帖子

错误：无法找到 conda 二进制文件。Anaconda安装了吗？网状工作室

我通过 Rstudio 安装了 reticulate。现在我想使用conda_create()，但我将 anaconda 安装在另一个目录中，然后是默认目录。如何更改 Rstudio 搜索 anaconda 的目录？

Error: Unable to find conda binary. Is Anaconda installed?

Run Code Online (Sandbox Code Playgroud)

python r anaconda reticulate

Hen*_*ski

lucky-day

8
推荐指数

1
解决办法

1万
查看次数

BertTokenizer - 当编码和解码序列出现额外空格时

使用 HuggingFace 的 Transformers 时，我遇到了编码和解码方法的问题。

我有以下字符串：

test_string = 'text with percentage%'

Run Code Online (Sandbox Code Playgroud)

然后我运行以下代码：

import torch
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

test_string = 'text with percentage%'

# encode Converts a string in a sequence of ids (integer), using the tokenizer and vocabulary.
input_ids = tokenizer.encode(test_string)
output = tokenizer.decode(input_ids)

Run Code Online (Sandbox Code Playgroud)

输出如下所示：

'text with percentage %'

Run Code Online (Sandbox Code Playgroud)

在 % 前有一个额外的空格。我已经尝试了额外的参数，clean_up_tokenization_spaces 但这是不同的。

我应该如何在解码和编码中使用什么来获得前后完全相同的文本。这也发生在其他特殊标志上。

python tokenize torch pytorch bert-language-model

Hen*_*ski

lucky-day

7
推荐指数

1
解决办法

4265
查看次数

Model（）为参数'nr_class'获取了多个值-SpaCy多分类模型（BERT集成）

嗨，我正在用新的SpaCy Model实现一个多分类模型（5个类）en_pytt_bertbaseuncased_lg。新管道的代码在这里：

nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
    'pytt_textcat',
    config={
        "nr_class":5,
        "exclusive_classes": True,
    }
)
nlp.add_pipe(textcat, last = True)

textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")

Run Code Online (Sandbox Code Playgroud)

培训的代码如下，并基于此处的示例（https://pypi.org/project/spacy-pytorch-transformers/）：

def extract_cat(x):
    for key in x.keys():
        if x[key]:
            return key

# get names of other pipes to disable them during training
n_iter = 250 # number of epochs

train_data = list(zip(train_texts, [{"cats": cats} for cats in train_cats]))


dev_cats_single   = [extract_cat(x) for x in dev_cats]
train_cats_single = [extract_cat(x) for x in train_cats] …

Run Code Online (Sandbox Code Playgroud)

python spacy multiclass-classification pytorch

Hen*_*ski

2019 08-16

5
推荐指数

1
解决办法

131
查看次数

SpaCy-ValueError：操作数不能与形状（1,2）（1,5）一起广播

与上一篇关于stackoverflow的帖子有关， Model（）为参数'nr_class'获取了多个值-SpaCy多分类模型（BERT集成），其中我的问题部分已经解决，我想分享实现解决方案后出现的问题。

如果我删除nr_class参数，则会在此出现此错误：

ValueError: operands could not be broadcast together with shapes (1,2) (1,5)

Run Code Online (Sandbox Code Playgroud)

我实际上以为会发生这种情况，因为我没有指定nr_class参数。它是否正确？

再一次，我的多类模型代码：

nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
    'pytt_textcat',
    config={
        "nr_class":5,
        "exclusive_classes": True,
    }
)
nlp.add_pipe(textcat, last = True)

textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")

Run Code Online (Sandbox Code Playgroud)

培训的代码如下，并基于此处的示例（https://pypi.org/project/spacy-pytorch-transformers/）：

def extract_cat(x):
    for key in x.keys():
        if x[key]:
            return key

# get names of other pipes to disable them during training
n_iter = 250 # number of epochs

train_data = list(zip(train_texts, [{"cats": cats} for …

Run Code Online (Sandbox Code Playgroud)

python spacy multiclass-classification pytorch

Hen*_*ski

lucky-day

5
推荐指数

1
解决办法

98
查看次数

将数字编码为分类向量

我有一个整数向量,y <- c(1, 2, 3, 3)现在我想把它转换成这样的列表(一个热编码):

Run Code Online (Sandbox Code Playgroud)

我试图找到一个带有to_categorical的解决方案,但我遇到了数据类型的问题......有谁知道这项任务的智能和流畅的解决方案？

这是我的尝试:

 for (i in 1:length(y)) {
  one_character <- list(as.vector(to_categorical(y[[i]], num_classes = 3)))
  list_test <- rbind(list_test, one_character)
  }

Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  IndexError: index 3 is out of bounds for axis 1 with size 3

Run Code Online (Sandbox Code Playgroud)

r list one-hot-encoding

Hen*_*ski

lucky-day

3
推荐指数

1
解决办法

83
查看次数

InvalidArgumentError：Keras R 中的索引 [127,7] = 43 不在 [0, 43) 中

该问题与：InvalidArgumentError（回溯见上文）：indices[1] = 10 is not in [0, 10) 我需要它用于 R，因此是上面链接中给出的另一种解决方案。

maxlen <- 40
chars <- c("'",  "-",  " ",  "!",  "\"", "(",  ")",  ",",  ".",  ":",  ";",  "?",  "[",  "]",  "_",  "=",  "0", "a",  "b",  "c",  "d",  "e", "f",  "g",  "h",  "i",  "j",  "k",  "l",  "m",  "n",  "o",  "p",  "q",  "r",  "s",  "t",  "u",  "v",  "w",  "x",  "y",  "z")



tokenizer <- text_tokenizer(char_level = T, filters = NULL)

tokenizer %>% fit_text_tokenizer(chars)
unlist(tokenizer$word_index)

Run Code Online (Sandbox Code Playgroud)

输出是：

 '  -     !  "  (  )  , …

Run Code Online (Sandbox Code Playgroud)

indexing r tokenize keras

Hen*_*ski

lucky-day

3
推荐指数

1
解决办法

6027
查看次数

网格不适用于Python的R-Data框架和fit()函数(TypeError:'float'对象不能解释为整数)

我尝试使用R数据框并将其与"网状"包一起使用.我在互联网上找不到答案.对不起,如果这是一个基本问题.

# Sample Data
n <- 5000
n_outlier <- .05 * n

set.seed(11212)
inlier <- mvtnorm::rmvnorm(n, mean = c(0,0))
outlier <- mvtnorm::rmvnorm(n_outlier, mean = c(20, 20))
testdata <- rbind(inlier, outlier)
smp_size <- floor(0.5 * nrow(testdata))
train_ind <- sample(seq_len(nrow(testdata)), size = smp_size)
train_lof <-as.data.frame(testdata[train_ind, ])
test_lof <- as.data.frame(testdata[-train_ind, ])

sklearn.neighbors <- import("sklearn.neighbors")

lof1 = sklearn.neighbors$LocalOutlierFactor(n_neighbors=15)
lof1$fit(train_lof)

Run Code Online (Sandbox Code Playgroud)

给出以下错误: