支持向量机火车插入符错误kernlab类概率计算失败; 返回NAs

我有一些数据,Y变量是一个因素 - 好或坏.我正在使用'caret'包中的'train'方法构建一个支持向量机.使用'train'功能,我能够最终确定各种调整参数的值,并获得最终的支持向量机.对于测试数据,我可以预测"类".但是当我试图预测测试数据的概率时,我得到的误差低于(例如我的模型告诉我测试数据中的第一个数据点y ='good',但我想知道获得'好'的概率是多少...通常在支持向量机的情况下,模型将计算预测的概率.如果Y变量有2个结果,则模型将预测每个结果的概率.具有最大概率的结果被认为是最终解决方案)

**Warning message:  
In probFunction(method, modelFit, ppUnk) :  
  kernlab class probability calculations failed; returning NAs**

Run Code Online (Sandbox Code Playgroud)

示例代码如下

library(caret)
trainset <- data.frame( 
     class=factor(c("Good",    "Bad",   "Good", "Good", "Bad",  "Good", "Good", "Good", "Good", "Bad",  "Bad",  "Bad")),
     age=c(67,  22, 49, 45, 53, 35, 53, 35, 61, 28, 25, 24))

testset <- data.frame( 
     class=factor(c("Good",    "Bad",   "Good"  )),
    age=c(64,   23, 50))



library(kernlab)
set.seed(231)

### finding optimal value of a tuning parameter
sigDist <- sigest(class ~ ., data = trainset, frac = 1)
### creating …

Run Code Online (Sandbox Code Playgroud)

testing r r-caret

use*_*622

2015 01-11

5
推荐指数

1
解决办法

1万
查看次数

如何确定对象是否是PySpark中的有效键值对

如果我有一个rdd,我怎么理解数据是关键:值格式？有没有办法找到相同的东西 - 像类型(对象)这样的东西告诉我一个对象的类型.我试过了print type(rdd.take(1)),但它只是说<type 'list'>.
假设我有一个类似的数据(x,1),(x,2),(y,1),(y,3),我使用 groupByKey并得到了(x,(1,2)),(y,(1,3)).有没有一种方法来定义 (1,2),并(1,3)为其中x和y键的值？或者密钥必须是单个值吗？我注意到,如果我使用reduceByKey和sum函数来获取数据,((x,3),(y,4))那么将这些数据定义为键值对变得更加容易

python key key-value apache-spark pyspark

use*_*622

2018 08-24

5
推荐指数

1
解决办法

1992
查看次数

R:基于其他列填充和/或复制行

我的问题是基于这个问题.

我有如下数据.我想首先向下看,然后通过向上看,只要bom是相同的,就可以填充细胞.在bom = A的情况下,我想填充如图所示的行.但是在bom = B的情况下,由于type_p列不同,我想复制行并感觉空白

bom=c(rep("A",4),rep("B",3))
Part=c("","lambda","beta","","tim","tom","")
type_p=c("","sub","sub","","sub","pan","")
ww=c(1,2,3,4,1,2,3)
df=data.frame(bom,Part,type_p,ww)

> df
  bom   Part type_p ww
1   A                1
2   A lambda    sub  2
3   A   beta    sub  3
4   A                4
5   B    tim    sub  1
6   B    tom    pan  2
7   B                3

Run Code Online (Sandbox Code Playgroud)

我想要的最终数据如下

    bom Part    type_p  ww
1   A   lambda  sub      1
2   A   lambda  sub      2
3   A   beta    sub      3
4   A   beta    sub      4
5   B   tim     sub      1
6   B …

Run Code Online (Sandbox Code Playgroud)

r rows missing-data dataframe

use*_*622

2017 05-23

5
推荐指数

1
解决办法

529
查看次数

SparkReduce函数：了解它是如何工作的

我正在学习这门课程。

它说 RDD 上的归约操作一次在一台机器上完成。这意味着，如果您的数据分布在两台计算机上，那么以下函数将处理第一台计算机中的数据，找到该数据的结果，然后将从第二台计算机中获取单个值，运行该函数，然后它将继续如此直到完成机器 2 中的所有值。这是正确的吗？

我认为该函数将同时在两台机器上开始运行，然后一旦获得两台机器的结果，它将再次最后一次运行该函数

rdd1=rdd.reduce(lambda x,y: x+y)

Run Code Online (Sandbox Code Playgroud)

更新1--------------------------------------------------------

与reduce函数相比，下面的步骤会给出更快的答案吗？

Rdd=[3,5,4,7,4]
seqOp = (lambda x, y: x+y)
combOp = (lambda x, y: x+y)
collData.aggregate(0, seqOp, combOp)

Run Code Online (Sandbox Code Playgroud)

更新2------------------------------------------------

下面的两组代码是否应该在相同的时间内执行？我检查了一下，似乎两者花费的时间相同。

import datetime

data=range(1,1000000000)
distData = sc.parallelize(data,4)
print(datetime.datetime.now())
a=distData.reduce(lambda x,y:x+y)
print(a)
print(datetime.datetime.now())

seqOp = (lambda x, y: x+y)
combOp = (lambda x, y: x+y)
print(datetime.datetime.now())
b=distData.aggregate(0, seqOp, combOp)
print(b)
print(datetime.datetime.now())

Run Code Online (Sandbox Code Playgroud)

reduce apache-spark

use*_*622

2016 03-25

5
推荐指数

1
解决办法

2万
查看次数

在Windows上升级tensorflow

如何升级Windows机器上的tensoflow库？Tensorflow目前已安装并运行正常.我正在使用anaconda发行版.

import tensorflow as tf

tf.VERSION
Out[37]: '1.2.1'

Run Code Online (Sandbox Code Playgroud)

我的python版本如下

>python -V
Python 3.5.3 :: Anaconda custom (64-bit)

Run Code Online (Sandbox Code Playgroud)

---------------更新1 --------------------------------- ---------

尝试了第一个答案,得到了以下信息

>pip install tensorflow --upgrade
Collecting tensorflow
  Downloading tensorflow-1.3.0-cp35-cp35m-win_amd64.whl (25.5MB)
    100% |################################| 25.5MB 47kB/s
Collecting protobuf>=3.3.0 (from tensorflow)
  Downloading protobuf-3.4.0-py2.py3-none-any.whl (375kB)
    100% |################################| 378kB 1.4MB/s
Requirement already up-to-date: numpy>=1.11.0 in c:\users\johndoe\appdata\local\conda\conda\envs\tensorflowspyder\lib\site-packages (from tensorflow)
Collecting tensorflow-tensorboard<0.2.0,>=0.1.0 (from tensorflow)
  Downloading tensorflow_tensorboard-0.1.8-py3-none-any.whl (1.6MB)
    100% |################################| 1.6MB 591kB/s
Requirement already up-to-date: six>=1.10.0 in c:\users\johndoe\appdata\local\conda\conda\envs\tensorflowspyder\lib\site-packages (from tensorflow)
Collecting wheel>=0.26 (from tensorflow)
  Downloading wheel-0.30.0-py2.py3-none-any.whl (49kB)
    100% …

Run Code Online (Sandbox Code Playgroud)

python anaconda tensorflow

use*_*622

2017 10-20

5
推荐指数

2
解决办法

2万
查看次数

将pdf转换为图像，但放大后

此链接显示了pdfs可以如何转换为图像。有没有一种方法可以将我pdf的缩放到转换成图像的位置？在我的项目中，我将pdfs 转换为pngs，然后使用Python-tesseract库提取文本。我注意到，如果我缩放pdfs，然后将零件另存为pngs，那么OCR会提供更好的结果。那么有没有办法在转换为png之前先缩放pdf？

pdf ocr image poppler python-tesseract

use*_*622

lucky-day

5
推荐指数

1
解决办法

531
查看次数

powerbi 如何为多个孩子创建家谱/层次结构图

此链接显示了如何使用 Akvelon 的层次结构图自定义可视化工具在 powerBI 中创建家谱/层次结构图。它显示了一个节点如何可以有多个父节点。但是有没有办法让同一个父母有多个孩子？

视觉图：

在上面的例子中，我想证明这一点，Jeffrey Knight并Sandra Knight有一个儿子叫my brother.

问题的可重现数据（在 Power Query 的高级编辑器中插入）

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY5dC4IwFIb/yti1wqbp7DaEoA8IvFxeDDo2oZaMFfjvO5uJ2Y3BYOfdnufdpKScRrSEV2vI3rRX7TDytRAx7un5yViSr3Cs4NbQOpI0wXDsycY+nAYb4IL9wOOl51MvK3Ox6rs/C0o2KDmOJ2XBuGD4hh00jYV+pqReEYNSzJUsxE6R6t46HXDO4gRJnHFtLf6gm/j8c2b6fw0xvnAAWKaLqX+Zr98=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Self ID" = _t, Name = _t, Years = _t, #"Parent ID" = _t, Relationship = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Self ID", Int64.Type}, {"Name", type text}, {"Years", type text}, {"Parent ID", type text}, {"Relationship", type …

Run Code Online (Sandbox Code Playgroud)

customization hierarchy family-tree powerbi

use*_*622

2020 06-23

5
推荐指数

0
解决办法

856
查看次数

convert to proper case using dax

I have a text column and would like to convert it to Proper. Is there a way to do it using DAX only? I dont want to use inbuilt powerbi functions

Proper case is any text that is written with each of the first letters of every word being capitalized. For example, "This Is An Example Of Proper Case." is an example of sentence in proper case. Tip. Proper case should not be confused with Title case, which is most …

dax

use*_*622

lucky-day

5
推荐指数

2
解决办法

2万
查看次数

Huggingface Transformer Longformer 优化器警告 AdamW

当我尝试从此页面运行代码时，出现以下警告。

/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,

Run Code Online (Sandbox Code Playgroud)

我非常困惑，因为代码似乎根本没有设置优化器。最有可能设置优化器的地方可能在下面，但我不知道如何更改优化器

# define the training arguments
training_args = TrainingArguments(
    output_dir = '/media/data_files/github/website_tutorials/results',
    num_train_epochs = 5,
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 8,    
    per_device_eval_batch_size= 16,
    evaluation_strategy = "epoch",
    disable_tqdm = False, 
    load_best_model_at_end=True,
    warmup_steps=200,
    weight_decay=0.01,
    logging_steps = 4,
    fp16 = True,
    logging_dir='/media/data_files/github/website_tutorials/logs',
    dataloader_num_workers = 0,
    run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm'
)

# instantiate the …

Run Code Online (Sandbox Code Playgroud)

python nlp huggingface-transformers

use*_*622

2022 02-14

5
推荐指数

2
解决办法

3333
查看次数

使用 num_labels 1 vs 2 进行 Huggingface 变形金刚分类

问题1）

这个问题的答案表明，对于二元分类问题，我可以使用num_labels1（正或负）或 2（正和负）。有关于哪种设置更好的指南吗？看起来如果我们使用 1 那么概率将使用sigmoid函数计算，如果我们使用 2 那么概率将使用函数计算softmax。

问题2）

在这两种情况下，我的 y 标签是否相同？每个数据点都有 0 或 1 而不是一个热编码？例如，如果我有 2 个数据点，那么 y 将为0,1而不是[0,0],[0,1]

我遇到了非常不平衡的分类问题，其中 1 类仅出现 2% 的次数。在我的训练数据中，我进行了过采样

问题3）

我的数据已输入pandas dataframe，我正在将其转换为 adataset并使用下面的命令创建 y 变量。label如果我打算使用num_labels=1 ，我应该如何投射我的 y 列？

`train_dataset=Dataset.from_pandas(train_df).cast_column("label", ClassLabel(num_classes=2, names=['neg', 'pos'], names_file=None, id=None))`

Run Code Online (Sandbox Code Playgroud)

python classification huggingface-transformers

use*_*622

2022 04-09

5
推荐指数

1
解决办法

3116
查看次数

标签统计

python ×4

apache-spark ×2

huggingface-transformers ×2

r ×2

anaconda ×1

classification ×1

customization ×1

dataframe ×1

dax ×1

family-tree ×1

hierarchy ×1

image ×1

key ×1

key-value ×1

missing-data ×1

nlp ×1

ocr ×1

pdf ×1

poppler ×1

powerbi ×1

pyspark ×1

python-tesseract ×1

r-caret ×1

reduce ×1

rows ×1

tensorflow ×1

testing ×1

小编use*_*622的帖子

更新1--------------------------------------------------------

更新2------------------------------------------------

标签 统计

小编use_622的帖子

标签统计