小编Fra*_* M.的帖子

将列添加到sparkR中的DataFrame

我想N在SparkR中的DataFrame中添加一个填充了字符的列.我会像非SparkR代码那样做:

df$new_column <- "N"

Run Code Online (Sandbox Code Playgroud)

但是使用SparkR,我收到以下错误:

Error: class(value) == "Column" || is.null(value) is not TRUE

Run Code Online (Sandbox Code Playgroud)

我已经尝试过疯狂的东西来管理它,我能够使用另一个(现有的)创建一个列df <- withColumn(df, "new_column", df$existing_column),但这个简单的事情,不...

有帮助吗？

谢谢.

r sparkr

Fra*_* M.

lucky-day

13
推荐指数

1
解决办法

3211
查看次数

游侠的重要性

我用caret+ 训练了一个随机森林ranger.

fit <- train(
    y ~ x1 + x2
    ,data = total_set
    ,method = "ranger"
    ,trControl = trainControl(method="cv", number = 5, allowParallel = TRUE, verbose = TRUE)
    ,tuneGrid = expand.grid(mtry = c(4,5,6))
    ,importance = 'impurity'
)

Run Code Online (Sandbox Code Playgroud)

现在我想看看变量的重要性.但是,这些都不起作用:

> importance(fit)
Error in UseMethod("importance") : no applicable method for 'importance' applied to an object of class "c('train', 'train.formula')"
> fit$variable.importance
NULL
> fit$importance
NULL

> fit
Random Forest 

217380 samples
    32 predictors

No pre-processing
Resampling: Cross-Validated (5 …

Run Code Online (Sandbox Code Playgroud)

r machine-learning random-forest r-caret

Fra*_* M.

lucky-day

12
推荐指数

3
解决办法

1万
查看次数

如何加快随机森林的训练？

我正在尝试训练几个随机森林(用于回归)让他们竞争,看看哪个特征选择和哪个参数给出最佳模型.

然而,训练似乎花了很多时间,我想知道我做错了什么.

我用于训练的数据集(train下面称为)有217k行和58列(其中只有21列作为随机森林中的预测变量.它们都是numeric或者integer,除了布尔值,它是类的character该y输出是numeric).

我跑到下面的代码四次,给值4,100,500,2000到nb_trees:

library("randomForest")
nb_trees <- #this changes with each test, see above
ptm <- proc.time()
fit <- randomForest(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 
    + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 …

Run Code Online (Sandbox Code Playgroud)

parallel-processing r random-forest parallel-foreach doparallel

Fra*_* M.

2017 05-23

11
推荐指数

2
解决办法

9855
查看次数

缩短用户输入的睡眠时间

我试图找到一种方法来缩短 atime.sleep(600)如果用户输入一个键，而不诉诸于一些丑陋的黑客：

key_pressed = False
for i in range(600):
    key_pressed = key_was_pressed()
    if not key_pressed:
        time.sleep(1)
    else:
        break

Run Code Online (Sandbox Code Playgroud)

python

Fra*_* M.

lucky-day

11
推荐指数

1
解决办法

266
查看次数

将字符串转换为f-string

如何将经典string转换为f-string？:

variable = 42
user_input = "The answer is {variable}"
print(user_input)

Run Code Online (Sandbox Code Playgroud)

答案是{变量}

f_user_input = # Here the operation to go from a string to an f-string
print(f_user_input)

Run Code Online (Sandbox Code Playgroud)

答案是42

python string-interpolation python-3.6

Fra*_* M.

2019 09-19

9
推荐指数

4
解决办法

4069
查看次数

Pandas的数据帧(Python)是否更接近R的数据帧或数据表？

为了理解我的问题,我首先要指出的是,R数据表不仅仅是具有语法糖的R数据帧,还存在重要的行为差异:数据表中的列引用/引用修改可以避免在内存中复制整个对象(参见示例)在这个quora答案中)就像数据帧中的情况一样.

我曾经多次发现,data.table行为产生的速度和记忆差异是一个关键因素,它允许人们使用一些大数据集,而这种data.frame行为是不可能的.

因此,我想知道的是:在Python中,Pandas'数据帧在这方面的表现如何？

额外的问题:如果Pandas的数据帧更接近R数据帧而不是R数据表,并且具有相同的下行(分配/修改列时对象的完整副本),是否有相当于R的data.table包的Python ？

每条评论请求编辑:代码示例:

R数据帧:

# renaming a column
colnames(mydataframe)[1] <- "new_column_name"

Run Code Online (Sandbox Code Playgroud)

R数据表:

# renaming a column
library(data.table)
setnames(mydatatable, 'old_column_name', 'new_column_name')

Run Code Online (Sandbox Code Playgroud)

在熊猫:

mydataframe.rename(columns = {'old_column_name': 'new_column_name'}, inplace=True)

Run Code Online (Sandbox Code Playgroud)

python r dataframe pandas data.table

Fra*_* M.

2017 12-15

9
推荐指数

1
解决办法

899
查看次数

临时表postgresql函数

我无法找到创建(和使用)表的语法的明确解释,仅用于函数的内部计算.请问有人给我一个语法例吗？

从我发现,我已经尝试了这一点(有和没有@前temp_table):

CREATE FUNCTION test.myfunction()
RETURNS SETOF test.out_table
AS $$

DECLARE @temp_table TABLE
( 
        id int,
        value text
 )
BEGIN
 INSERT INTO @temp_table 
        SELECT id, value
        FROM test.another_table;

 INSERT INTO test.out_table
        SELECT id, value
        FROM @temp_table;
RETURN END
$$ LANGUAGE SQL;

Run Code Online (Sandbox Code Playgroud)

我明白了:

错误:语法错误在"DECLARE"第5行或附近:DECLARE @temp_table表

我也试过这里建议的CREATE TABLE方法,这样:

CREATE FUNCTION test.myfunction()
RETURNS SETOF test.out_table
AS $$

    CREATE TABLE temp_table AS
        SELECT id, value
        FROM test.another_table;

    INSERT INTO test.out_table
        SELECT id, value
        FROM temp_table;

$$ LANGUAGE …

Run Code Online (Sandbox Code Playgroud)

sql postgresql pgadmin

Fra*_* M.

2017 05-23

8
推荐指数

1
解决办法

2万
查看次数

SQL在多列上保留外连接

根据此SQL连接备忘单,一列上的左外连接如下:

SELECT *
  FROM a
  LEFT JOIN b 
    ON a.foo = b.foo
  WHERE b.foo IS NULL

Run Code Online (Sandbox Code Playgroud)

我想知道在多个列上连接会是什么样子,如果它是一个OR或AND在WHERE子句中？

SELECT *
  FROM a
  LEFT JOIN b 
    ON  a.foo = b.foo
    AND a.bar = b.bar
    AND a.ter = b.ter
WHERE b.foo IS NULL 
  OR  b.bar IS NULL 
  OR  b.ter IS NULL

Run Code Online (Sandbox Code Playgroud)

要么

SELECT *
  FROM a
  LEFT JOIN b 
    ON  a.foo = b.foo
    AND a.bar = b.bar
    AND a.ter = b.ter
WHERE b.foo IS NULL …

Run Code Online (Sandbox Code Playgroud)

sql join vertica

Fra*_* M.

2016 10-13

8
推荐指数

1
解决办法

3万
查看次数

在Facebook GRAPH API的GET请求中缺少结果字段

我正在尝试访问给定URL在Facebook上收到的喜欢,分享和评论的数量.

据我理解这篇文档,以下URL应该给我我想要的东西.

https://graph.facebook.com/v2.4?id=http://stackoverflow.com&fields=og_object,share&access_token=MY_ACCESS_TOKEN

Run Code Online (Sandbox Code Playgroud)

它给了我以下内容:

{
"og_object": {
   "id": "10150180465825637",
   "description": "Q&A for professional and enthusiast programmers",
   "title": "Stack Overflow",
   "type": "website",
   "updated_time": "2015-08-02T04:03:47+0000",
   "url": "http://stackoverflow.com/"
},
"share": {
   "comment_count": 4,
   "share_count": 32567
},
"id": "http://stackoverflow.com"
}

Run Code Online (Sandbox Code Playgroud)

其中包括comment_count = 4和share_count = 32567.

但是,如果我参考之前链接的文档,喜欢的数量应该出现在"og_object":应该有engagement一行内有两个元素,count(喜欢的数量)和social_sentence(一个社交句子,如"你和31,608,561其他人喜欢"这个.")

显然,这些engagement和count元素都不存在.我怎么能让它们出现？

注意:在第一个URL中,我尝试更改&fields=og_object,share以下任何一个部分:

&fields=og_object.engagement,share
&fields=og_object.engagement.count,share
&fields=og_object,engagement,share
&fields=og_object,engagement,share
&fields=engagement,share
&fields=engagement.count,share
&fields=engagement,count,share
&fields=count,share

(NB : I also tried by putting `share` first in …

Run Code Online (Sandbox Code Playgroud)

facebook facebook-graph-api

Fra*_* M.

lucky-day

5
推荐指数

1
解决办法

2397
查看次数

Folium地图未在Spyder中显示

标题说明了一切：我无法让Spyder使用来显示地图folium。

这是我得到的：

import folium
m = folium.Map(location=[45.5236, -122.6750])
m

Run Code Online (Sandbox Code Playgroud)

没有错误（也没有地图），只是这样：

<folium.folium.Map位于0xd03fcf8>

m.render() # No idea what .render() it's supposed to do, 
# but "render" sounds like maybe it could display the map, so I tried it.
# But it prints nothing

m.render

Run Code Online (Sandbox Code Playgroud)