我试图在sparklyr中读取2GB~(5mi行)的.csv:
bigcsvspark <- spark_read_csv(sc, "bigtxt", "path",
delimiter = "!",
infer_schema = FALSE,
memory = TRUE,
overwrite = TRUE,
columns = list(
SUPRESSED COLUMNS AS = 'character'))
Run Code Online (Sandbox Code Playgroud)
并收到以下错误:
Job aborted due to stage failure: Task 9 in stage 15.0 failed 4 times, most recent failure: Lost task 9.3 in stage 15.0 (TID 3963,
10.1.4.16): com.univocity.parsers.common.TextParsingException: Length of parsed input (1000001) exceeds the maximum number of characters defined in your parser settings (1000000). Identified line separator characters in the parsed content. This …Run Code Online (Sandbox Code Playgroud) 我有一些财务数据,并希望只获得特定时间段(小时,天,月......)的最后一笔交易.
例:
>>df
time price_BRL qt time_dt
1312001297 23.49 1.00 2011-07-30 04:48:17
1312049148 23.40 1.00 2011-07-30 18:05:48
1312121523 23.49 2.00 2011-07-31 14:12:03
1312121523 23.50 6.50 2011-07-31 14:12:03
1312177622 23.40 2.00 2011-08-01 05:47:02
1312206416 23.25 1.00 2011-08-01 13:46:56
1312637929 18.95 1.50 2011-08-06 13:38:49
1312637929 18.95 4.00 2011-08-06 13:38:49
1312817114 0.80 0.01 2011-08-08 15:25:14
1312818289 0.10 0.01 2011-08-08 15:44:49
1312819795 6.00 0.09 2011-08-08 16:09:55
1312847064 16.00 0.86 2011-08-08 23:44:24
1312849282 16.00 6.14 2011-08-09 00:21:22
1312898146 19.90 1.00 2011-08-09 13:55:46
1312915666 6.00 0.01 …Run Code Online (Sandbox Code Playgroud) 如何以无点样式重写以下表达式?
p x y = x*x + y
Run Code Online (Sandbox Code Playgroud)
使用lambda演算我做了以下:
p = \x -> \y -> (+) ((*) x x) y
= \x -> (+) ((*) x x) -- here start my problem
= \x -> ((+) . ((*) x )) x
... ?
Run Code Online (Sandbox Code Playgroud) 我正在尝试运行以下命令:
df = df.withColumn("DATATmp", to_date($"DATA", "yyyyMMdd"))
Run Code Online (Sandbox Code Playgroud)
并收到此错误:
<console>:34: error: too many arguments for method to_date: (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
Run Code Online (Sandbox Code Playgroud)
我怎样才能指定要导入的确切功能?有另一种方法可以避免这个错误吗?
编辑:Spark版本2.1
我有一个常数和多项式方程的度数列表,并希望返回一个该组合的列表,以便应用一个值,然后求和.但是这个列表是Int,我的函数需要一个double列表.
poly :: [Double] -> [Double] -> [Double -> Double]
poly a b =
let f x y = (*x) . (**y)
in uncurry f <$> zip a b
-- does not work
poly ([1,2,3]:: [Double]) ([1,2,3]:: [Double])
Run Code Online (Sandbox Code Playgroud)
如何将int列表转换为double列表?