我在为最新的hadoop-2.2版本启动namenode时遇到以下错误.我没有在hadoop bin文件夹中找到winutils exe文件.我试过下面的命令
$ bin/hdfs namenode -format
$ sbin/yarn-daemon.sh start resourcemanager
ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:863)
Run Code Online (Sandbox Code Playgroud) 我在Windows 7中使用Jupyter笔记本(Python 2.7)在PySpark中工作.我有一个pyspark.rdd.PipelinedRDD名为RDD的类型idSums.尝试执行时idSums.saveAsTextFile("Output"),我收到以下错误:
Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001
Run Code Online (Sandbox Code Playgroud)
在我看来,RDD对象不应该有任何问题,因为我能够无错误地执行其他操作,例如执行idSums.collect()产生正确的输出.
此外,Output创建目录(包含所有子目录)并part-00001创建文件,但它是0字节.
HDFS上的root scratch dir:/ tmp/hive应该是可写的.当前权限是:rwx --------
嗨,我在CDH 5.8的Eclipse中执行以下Spark代码并获得RuntimeExeption
public static void main(String[] args) {
final SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("HiveConnector");
final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new HiveContext(sparkContext);
DataFrame df = sqlContext.sql("SELECT * FROM test_hive_table1");
//df.show();
df.count();
}
Run Code Online (Sandbox Code Playgroud)
根据Exception / tmp/hive,HDFS应该是可写的,但是我们在本地模式下执行spark作业.这意味着本地(linux)文件系统中的目录/ tmp/hive没有可写权限,而不是HDFS.
所以我执行了以下命令以获得许可.
$ sudo chmod -R 777 /tmp/hive
Run Code Online (Sandbox Code Playgroud)
现在它对我有用.
如果在群集模式下执行spark job期间遇到同样的问题,则应在hive conf文件夹的hive-site.xml文件中配置以下属性,然后重新启动hive服务器.
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>hive.scratch.dir.permission</name>
<value>777</value>
<description>The permission for the user-specific scratch directories that …Run Code Online (Sandbox Code Playgroud) 我是一个火花菜鸟,并使用Windows 10尝试使火花起作用。我已经正确设置了环境变量,并且我也有winutils。当我进入spark/bin并输入时spark-shell,它会运行spark,但是会出现以下错误。
而且它不显示spark上下文或spark会话。我一步一步地跟随了这个视频。
C:\Users\Akshay\Downloads\spark\bin>spark-shell
17/06/19 23:45:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/19 23:45:19 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/Akshay/Downloads/spark/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Users/Akshay/Downloads/spark/jars/datanucleus-api-jdo-3.2.6.jar."
17/06/19 23:45:20 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. …Run Code Online (Sandbox Code Playgroud) 我正在使用RStudio.
创建会话后,如果我尝试使用R数据创建数据帧,则会出错.
Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7")
Sys.setenv(HADOOP_HOME = "E:/winutils")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
Sys.setenv('SPARKR_SUBMIT_ARGS'='"sparkr-shell"')
library(SparkR)
sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="C:/Temp"))
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
df <- createDataFrame(localDF)
Run Code Online (Sandbox Code Playgroud)
错误:
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
at org.a
>
Run Code Online (Sandbox Code Playgroud)
TIA.