我试图在Windows 7 64位中运行spark作业的单元测试.我有
HADOOP_HOME=D:/winutils
winutils path= D:/winutils/bin/winutils.exe
Run Code Online (Sandbox Code Playgroud)
我跑下面的命令:
winutils ls \tmp\hive
winutils chmod -R 777 \tmp\hive
Run Code Online (Sandbox Code Playgroud)
但是,当我运行我的测试时,我得到以下错误.
Running com.dnb.trade.ui.ingest.spark.utils.ExperiencesUtilTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.132 sec
17/01/24 15:37:53 INFO Remoting: Remoting shut down
17/01/24 15:37:53 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\415387\AppData\Local\Temp\spark-b1672cf6-989f-4890-93a0-c945ff147554
java.io.IOException: Failed to delete: C:\Users\415387\AppData\Local\Temp\spark-b1672cf6-989f-4890-93a0-c945ff147554
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:929)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at .....
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=786m; support was removed in 8.0
Caused by: java.lang.RuntimeException: java.io.IOException: Access …Run Code Online (Sandbox Code Playgroud) 哪一个更快?使用Where子句的Spark SQL或在Spark SQL之后使用Dataframe中的Filter?
比如选择col1,col2来自tab 1,其中col1 = val;
要么
dataframe df = sqlContext.sql(从选项卡1中选择col1,col2);
df.filter( "Col1中= VAL");