小编Jad*_*ins的帖子

Sparklyr忽略行分隔符

我试图在sparklyr中读取2GB~(5mi行)的.csv:

bigcsvspark <- spark_read_csv(sc, "bigtxt", "path", 
                              delimiter = "!",
                              infer_schema = FALSE,
                              memory = TRUE,
                              overwrite = TRUE,
                              columns = list(
                                  SUPRESSED COLUMNS AS = 'character'))

Run Code Online (Sandbox Code Playgroud)

并收到以下错误:

Job aborted due to stage failure: Task 9 in stage 15.0 failed 4 times, most recent failure: Lost task 9.3 in stage 15.0 (TID 3963,
10.1.4.16):  com.univocity.parsers.common.TextParsingException: Length of parsed input (1000001) exceeds the maximum number of characters defined in your parser settings (1000000). Identified line separator characters in the parsed content. This …

Run Code Online (Sandbox Code Playgroud)

csv r sparklyr

Jad*_*ins

lucky-day

6
推荐指数

1
解决办法

210
查看次数

Pandas/Python - 按时间段分组数据

我有一些财务数据,并希望只获得特定时间段(小时,天,月......)的最后一笔交易.

例:

>>df
      time  price_BRL     qt              time_dt
1312001297      23.49   1.00  2011-07-30 04:48:17
1312049148      23.40   1.00  2011-07-30 18:05:48
1312121523      23.49   2.00  2011-07-31 14:12:03
1312121523      23.50   6.50  2011-07-31 14:12:03
1312177622      23.40   2.00  2011-08-01 05:47:02
1312206416      23.25   1.00  2011-08-01 13:46:56
1312637929      18.95   1.50  2011-08-06 13:38:49
1312637929      18.95   4.00  2011-08-06 13:38:49
1312817114       0.80   0.01  2011-08-08 15:25:14
1312818289       0.10   0.01  2011-08-08 15:44:49
1312819795       6.00   0.09  2011-08-08 16:09:55
1312847064      16.00   0.86  2011-08-08 23:44:24
1312849282      16.00   6.14  2011-08-09 00:21:22
1312898146      19.90   1.00  2011-08-09 13:55:46
1312915666       6.00   0.01 …

Run Code Online (Sandbox Code Playgroud)

python pandas

Jad*_*ins

2017 01-15

5
推荐指数

1
解决办法

201
查看次数

如何用重复变量重写无点样式？

如何以无点样式重写以下表达式？

p x y = x*x + y

Run Code Online (Sandbox Code Playgroud)

使用lambda演算我做了以下:

p = \x -> \y -> (+) ((*) x x) y
  = \x -> (+) ((*) x x) -- here start my problem
  = \x -> ((+) . ((*) x )) x
  ... ?

Run Code Online (Sandbox Code Playgroud)

haskell pointfree tacit-programming

Jad*_*ins

lucky-day

3
推荐指数

3
解决办法

145
查看次数

Scala/Spark无法匹配功能

我正在尝试运行以下命令:

df = df.withColumn("DATATmp", to_date($"DATA", "yyyyMMdd"))

Run Code Online (Sandbox Code Playgroud)

并收到此错误:

<console>:34: error: too many arguments for method to_date: (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column

Run Code Online (Sandbox Code Playgroud)

我怎样才能指定要导入的确切功能？有另一种方法可以避免这个错误吗？

编辑:Spark版本2.1

scala apache-spark

Jad*_*ins

2018 03-04

2
推荐指数

1
解决办法

354
查看次数

从输入中将[Int]转换为[Double]

我有一个常数和多项式方程的度数列表,并希望返回一个该组合的列表,以便应用一个值,然后求和.但是这个列表是Int,我的函数需要一个double列表.

poly :: [Double] -> [Double] -> [Double -> Double]
poly a b =
  let f x y = (*x) . (**y)
        in uncurry f <$> zip a b

-- does not work
poly ([1,2,3]:: [Double]) ([1,2,3]:: [Double])

Run Code Online (Sandbox Code Playgroud)

如何将int列表转换为double列表？

haskell

Jad*_*ins

2018 06-17

1
推荐指数

1
解决办法

100
查看次数