小编use*_*256的帖子

闪亮的应用程序:downloadHandler不生成文件

我无法弄清楚发生了什么 - 一切似乎都有效但我的应用程序没有生成文件 - 尽管看起来确实如此.我在Windows上运行它,在RStudio 0.98.125上,我使用以下行运行它:runApp()下面是一个非常简单的可重现示例:

我的'ui.R':

shinyUI(pageWithSidebar(

  headerPanel("My App"),

  sidebarPanel(
    numericInput('NumRuns','Number of runs',value=3,min=3,max=10,step=1),

    actionButton(inputId="goButton","Run!"),

    textInput("downloadData","Save My Data Frame:",value="Data Frame 1"),
    downloadButton('downloadData','Save my file!')

  ),

  mainPanel(
    tabPanel("Some Text",
             h4(textOutput("caption2")),
             tableOutput("mydf"),
             value=3))
  ))

Run Code Online (Sandbox Code Playgroud)

我的'server.R':

shinyServer(function(input,output){

  # Creating files for download at the end

  myout = reactive({
    if(input$goButton==0) return(NULL)

      nrruns=input$NumRuns
      mylist=NULL
      for(i in 1:nrruns){
        mylist[[i]]<-data.frame(a=rnorm(10),b=runif(10))
        names(mylist)[i]<-paste("dataframe",i,sep="")
      }
      return(mylist)
  })

     output$mydf <- renderTable({
     if(input$goButton==0) return(NULL)
     input$goButton
     isolate(
       myout()$dataframe1
     )
   })

  output$downloadData <- downloadHandler(
    filename = function() { paste(input$downloadData, " ",Sys.Date(),".csv",sep="") },
    content = …

Run Code Online (Sandbox Code Playgroud)

r shiny

use*_*256

2017 12-01

15
推荐指数

2
解决办法

2万
查看次数

KMeans在PySpark中聚类

我有一个包含许多列的spark数据帧'mydataframe'.我试图只在两列上运行kmeans:lat和long(纬度和经度),使用它们作为简单值).我想基于这两个列提取7个集群,然后我想将集群asignment附加到我的原始数据帧.我试过了:

from numpy import array
from math import sqrt
from pyspark.mllib.clustering import KMeans, KMeansModel

# Prepare a data frame with just 2 columns:
data = mydataframe.select('lat', 'long')
data_rdd = data.rdd  # needs to be an RDD
data_rdd.cache()

# Build the model (cluster the data)
clusters = KMeans.train(data_rdd, 7, maxIterations=15, initializationMode="random")

Run Code Online (Sandbox Code Playgroud)

但一段时间后我收到一个错误:

org.apache.spark.SparkException:作业因阶段失败而中止:阶段5191.0中的任务1失败4次,最近失败:阶段5191.0中丢失任务1.3(TID 260738,10.19.211.69,执行程序1):org.apache. spark.api.python.PythonException:Traceback(最近一次调用最后一次)

我试图分离并重新连接群集.结果相同.我究竟做错了什么？

非常感谢你!

machine-learning k-means pyspark apache-spark-ml apache-spark-mllib

use*_*256

2017 12-02

9
推荐指数

2
解决办法

2万
查看次数

运行计数命令时引发不一致

关于 Spark 计算不一致的问题。这存在吗？例如，我两次运行完全相同的命令，例如：

imp_sample.where(col("location").isNotNull()).count()

Run Code Online (Sandbox Code Playgroud)

每次运行时我都会得到略有不同的结果（141,830，然后是 142,314）！或这个：

imp_sample.where(col("location").isNull()).count()

Run Code Online (Sandbox Code Playgroud)

得到 2,587,013，然后是 2,586,943。怎么可能？谢谢！

count pyspark spark-dataframe

use*_*256

lucky-day

7
推荐指数

1
解决办法

2508
查看次数

r 闪亮：按下按钮时 eventReactive 没有反应

下面是我的代码。它可能看起来有点长，但实际上它是一个非常简单的应用程序。

用户应该上传一个很小的数据框（如果您在美国，则为 x.csv，如果您在欧洲，则为 x_Europe.csv）。然后用户应该点击按钮开始计算。最后，用户应该能够将这些计算的结果下载为数据框。

我的问题：上传文件后，当我单击“do_it”操作按钮时 - 没有任何反应。我可以看到它，因为我的控制台没有打印任何内容。为什么？毕竟，我的函数“main_calc”应该是 input$do_it 的 eventReactive？为什么 main_calc 中的所有计算仅在用户尝试下载结果后才开始发生？

重要提示：将“数据”函数与 main_calc 分开对我来说很重要。

非常感谢！

首先，在您的工作目录中生成以下 2 个文件之一：

# generate file 'x.csv' to read in later in the app:
write.csv(data.frame(a = 1:4, b = 2:5), "x.csv", row.names = F)  # US file
write.csv2(data.frame(a = 1:4, b = 2:5), "x_Europe.csv", row.names = F)

Run Code Online (Sandbox Code Playgroud)

这是闪亮应用程序的代码：

library(shiny)

ui <- fluidPage(
  # User should upload file x here:
  fileInput("file_x", label = h5("Upload file 'x.csv'!")),
  br(),
  actionButton("do_it", "Click Here First:"),
  br(),
  br(),
  textInput("user_filename","Save your …

Run Code Online (Sandbox Code Playgroud)

r shiny reactive

use*_*256

lucky-day

6
推荐指数

1
解决办法

3693
查看次数

r dplyr过滤器,带有动态变量名称

我试图只选择没有NA的行:

library(dplyr)
x = data.frame(a = c(NA, 2, 3, 4))
var_a <- "a"
# This works:
x %>% filter(!is.na(a))
# That works too:
var_a <- quo(a)
x %>% filter(!is.na(!!var_a))
# But this doesn't work:
var_a <- "a"
x %>% filter(!is.na(!!var_a))

Run Code Online (Sandbox Code Playgroud)

我应该在最后一行更改它才能工作？因为我必须使用var_a < - "a".非常感谢你!

r filter dplyr

use*_*256

lucky-day

6
推荐指数

1
解决办法

2182
查看次数

是否可以从 R 脚本内部运行 git 命令？

下面是 git 中传统工作流程的描述。

是否有可能以某种方式在 R 中编写一个脚本，让 git 执行所有脚本？这样做是否明智？非常感谢！

在项目目录中创建/修改文件。
在命令行中，通过输入更改目录cd <path_to_local_repository>。
在命令行中输入git add --all以暂存更改。
git commit -m '<commit_message>'在命令行输入以将更改提交到本地存储库。
git push 在命令行中输入，将更改推送到远程存储库（例如，在 Bitbucket 上）。
如果提示进行身份验证，请输入您的 Bitbucket 密码。

git r system

use*_*256

2019 03-28

6
推荐指数

1
解决办法

1637
查看次数

R：日期向量中的日期是什么：日期或数字值？（x [i]和i之间的差异）

谁能解释一下，为什么在第一个循环中我的日期向量的每个元素都是一个日期，而在第二个循环中，我的日期向量的每个元素都是数字？谢谢！

x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))

Run Code Online (Sandbox Code Playgroud)

r date

use*_*256

lucky-day

6
推荐指数

2
解决办法

61
查看次数

Spark：从管道模型中提取 ML 逻辑回归模型的摘要

我已经使用管道估计了逻辑回归。

我在拟合逻辑回归之前的最后几行：

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol="lr_features", labelCol = "targetvar")
# create assember to include encoded features
    lr_assembler = VectorAssembler(inputCols= numericColumns + 
                               [categoricalCol + "ClassVec" for categoricalCol in categoricalColumns],
                               outputCol = "lr_features")
from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
# Model definition:
lr = LogisticRegression(featuresCol = "lr_features", labelCol = "targetvar")
# Pipeline definition:
lr_pipeline = Pipeline(stages = indexStages + encodeStages +[lr_assembler, lr])
# Fit the logistic regression model:
lrModel = lr_pipeline.fit(train_train)

Run Code Online (Sandbox Code Playgroud)

然后我尝试运行模型的摘要。但是，下面的代码行：

trainingSummary …

Run Code Online (Sandbox Code Playgroud)

python pipeline logistic-regression apache-spark pyspark

use*_*256

2017 12-07

5
推荐指数

1
解决办法

4557
查看次数

从ggrepel获取标签位置的坐标

下面是使用 ggplot2 和 ggrepel 的“标签位置优化”示例：

x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
  0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
  0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
         "SaxRub", "TurMer", "TurPil", "TurPhi")

df <- data.frame(x = x, y = y, z = ShortSci)
library(ggplot2)
library(ggrepel)
ggplot(data = df, aes(x = x, y = y)) + theme_bw() + 
  geom_text_repel(aes(label = z), 
                  box.padding = unit(0.45, "lines")) +
  geom_point(colour = "green", size = 3) …

Run Code Online (Sandbox Code Playgroud)

r ggplot2 ggrepel

use*_*256

lucky-day

3
推荐指数

1
解决办法

715
查看次数

How to standardize ONE column in Spark using StandardScaler?

I am trying to standardize (mean = 0, std = 1) one column ('age') in my data frame. Below is my code in Spark (Python):

from pyspark.ml.feature import StandardScaler
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline

# Make my 'age' column an assembler type:
age_assembler = VectorAssembler(inputCols= ['age'], outputCol = "age_feature")

# Create a scaler that takes 'age_feature' as an input column:
scaler = StandardScaler(inputCol="age_feature", outputCol="age_scaled",
                        withStd=True, withMean=True)

# Creating a mini-pipeline for those 2 steps:
age_pipeline = Pipeline(stages=[age_assembler, …

Run Code Online (Sandbox Code Playgroud)

python scale apache-spark pyspark

use*_*256

2017 12-04

3
推荐指数

1
解决办法

3248
查看次数