我通过 Rstudio 安装了 reticulate。现在我想使用conda_create(),但我将 anaconda 安装在另一个目录中,然后是默认目录。如何更改 Rstudio 搜索 anaconda 的目录?
Error: Unable to find conda binary. Is Anaconda installed?
Run Code Online (Sandbox Code Playgroud) 使用 HuggingFace 的 Transformers 时,我遇到了编码和解码方法的问题。
我有以下字符串:
test_string = 'text with percentage%'
Run Code Online (Sandbox Code Playgroud)
然后我运行以下代码:
import torch
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
test_string = 'text with percentage%'
# encode Converts a string in a sequence of ids (integer), using the tokenizer and vocabulary.
input_ids = tokenizer.encode(test_string)
output = tokenizer.decode(input_ids)
Run Code Online (Sandbox Code Playgroud)
输出如下所示:
'text with percentage %'
Run Code Online (Sandbox Code Playgroud)
在 % 前有一个额外的空格。我已经尝试了额外的参数,clean_up_tokenization_spaces 但这是不同的。
我应该如何在解码和编码中使用什么来获得前后完全相同的文本。这也发生在其他特殊标志上。
嗨,我正在用新的SpaCy Model实现一个多分类模型(5个类)en_pytt_bertbaseuncased_lg。新管道的代码在这里:
nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
'pytt_textcat',
config={
"nr_class":5,
"exclusive_classes": True,
}
)
nlp.add_pipe(textcat, last = True)
textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")
Run Code Online (Sandbox Code Playgroud)
培训的代码如下,并基于此处的示例(https://pypi.org/project/spacy-pytorch-transformers/):
def extract_cat(x):
for key in x.keys():
if x[key]:
return key
# get names of other pipes to disable them during training
n_iter = 250 # number of epochs
train_data = list(zip(train_texts, [{"cats": cats} for cats in train_cats]))
dev_cats_single = [extract_cat(x) for x in dev_cats]
train_cats_single = [extract_cat(x) for x in train_cats] …Run Code Online (Sandbox Code Playgroud) 与上一篇关于stackoverflow的帖子有关, Model()为参数'nr_class'获取了多个值-SpaCy多分类模型(BERT集成),其中我的问题部分已经解决,我想分享实现解决方案后出现的问题。
如果我删除nr_class参数,则会在此出现此错误:
ValueError: operands could not be broadcast together with shapes (1,2) (1,5)
Run Code Online (Sandbox Code Playgroud)
我实际上以为会发生这种情况,因为我没有指定nr_class参数。它是否正确?
再一次,我的多类模型代码:
nlp = spacy.load('en_pytt_bertbaseuncased_lg')
textcat = nlp.create_pipe(
'pytt_textcat',
config={
"nr_class":5,
"exclusive_classes": True,
}
)
nlp.add_pipe(textcat, last = True)
textcat.add_label("class1")
textcat.add_label("class2")
textcat.add_label("class3")
textcat.add_label("class4")
textcat.add_label("class5")
Run Code Online (Sandbox Code Playgroud)
培训的代码如下,并基于此处的示例(https://pypi.org/project/spacy-pytorch-transformers/):
def extract_cat(x):
for key in x.keys():
if x[key]:
return key
# get names of other pipes to disable them during training
n_iter = 250 # number of epochs
train_data = list(zip(train_texts, [{"cats": cats} for …Run Code Online (Sandbox Code Playgroud) 我有一个整数向量,y <- c(1, 2, 3, 3)现在我想把它转换成这样的列表(一个热编码):
1 0 0
0 1 0
0 0 1
0 0 1
Run Code Online (Sandbox Code Playgroud)
我试图找到一个带有to_categorical的解决方案,但我遇到了数据类型的问题......有谁知道这项任务的智能和流畅的解决方案?
这是我的尝试:
for (i in 1:length(y)) {
one_character <- list(as.vector(to_categorical(y[[i]], num_classes = 3)))
list_test <- rbind(list_test, one_character)
}
Run Code Online (Sandbox Code Playgroud)
但是我收到以下错误:
Error in py_call_impl(callable, dots$args, dots$keywords) :
IndexError: index 3 is out of bounds for axis 1 with size 3
Run Code Online (Sandbox Code Playgroud) 该问题与:InvalidArgumentError(回溯见上文):indices[1] = 10 is not in [0, 10) 我需要它用于 R,因此是上面链接中给出的另一种解决方案。
maxlen <- 40
chars <- c("'", "-", " ", "!", "\"", "(", ")", ",", ".", ":", ";", "?", "[", "]", "_", "=", "0", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z")
tokenizer <- text_tokenizer(char_level = T, filters = NULL)
tokenizer %>% fit_text_tokenizer(chars)
unlist(tokenizer$word_index)
Run Code Online (Sandbox Code Playgroud)
输出是:
' - ! " ( ) , …Run Code Online (Sandbox Code Playgroud) 我尝试使用R数据框并将其与"网状"包一起使用.我在互联网上找不到答案.对不起,如果这是一个基本问题.
# Sample Data
n <- 5000
n_outlier <- .05 * n
set.seed(11212)
inlier <- mvtnorm::rmvnorm(n, mean = c(0,0))
outlier <- mvtnorm::rmvnorm(n_outlier, mean = c(20, 20))
testdata <- rbind(inlier, outlier)
smp_size <- floor(0.5 * nrow(testdata))
train_ind <- sample(seq_len(nrow(testdata)), size = smp_size)
train_lof <-as.data.frame(testdata[train_ind, ])
test_lof <- as.data.frame(testdata[-train_ind, ])
sklearn.neighbors <- import("sklearn.neighbors")
lof1 = sklearn.neighbors$LocalOutlierFactor(n_neighbors=15)
lof1$fit(train_lof)
Run Code Online (Sandbox Code Playgroud)
给出以下错误:
py_call_impl中的错误(callable,dots $ args,dots $ keywords):TypeError:'float'对象不能解释为整数
我有以下情况:
df1
a b c d
1 2 3 4
df2
a c
5 6
Run Code Online (Sandbox Code Playgroud)
我想要的结果是,用 df1 中缺失的列填充第二个 data.frame 并用零填充它们。所以结果应该是:
df3
a b c d
5 0 6 0
Run Code Online (Sandbox Code Playgroud)
数据框非常大,这就是为什么自动执行此操作的方法会很受欢迎。