相关疑难解决方法(0)

无法在hadoop二进制路径中找到winutils二进制文件

我在为最新的hadoop-2.2版本启动namenode时遇到以下错误.我没有在hadoop bin文件夹中找到winutils exe文件.我试过下面的命令

$ bin/hdfs namenode -format
$ sbin/yarn-daemon.sh start resourcemanager

ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:863)

Run Code Online (Sandbox Code Playgroud)

hadoop

use*_*491

2013 10-28

104
推荐指数

7
解决办法

17万
查看次数

(null)在Pyspark上的saveAsTextFile()中的命令字符串异常中的条目

我在Windows 7中使用Jupyter笔记本(Python 2.7)在PySpark中工作.我有一个pyspark.rdd.PipelinedRDD名为RDD的类型idSums.尝试执行时idSums.saveAsTextFile("Output"),我收到以下错误:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

Run Code Online (Sandbox Code Playgroud)

在我看来,RDD对象不应该有任何问题,因为我能够无错误地执行其他操作,例如执行idSums.collect()产生正确的输出.

此外,Output创建目录(包含所有子目录)并part-00001创建文件,但它是0字节.

apache-spark pyspark jupyter-notebook

Jr *_*wec

lucky-day

13
推荐指数

1
解决办法

1万
查看次数

HDFS上的root scratch dir:/ tmp/hive应该是可写的.当前权限是:rwx ---------(在Linux上)

HDFS上的root scratch dir:/ tmp/hive应该是可写的.当前权限是:rwx --------

嗨,我在CDH 5.8的Eclipse中执行以下Spark代码并获得RuntimeExeption

public static void main(String[] args) {
    final SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("HiveConnector");
    final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
    SQLContext sqlContext = new HiveContext(sparkContext);

    DataFrame df = sqlContext.sql("SELECT * FROM test_hive_table1");
    //df.show();
    df.count();
 }

Run Code Online (Sandbox Code Playgroud)

根据Exception / tmp/hive,HDFS应该是可写的,但是我们在本地模式下执行spark作业.这意味着本地(linux)文件系统中的目录/ tmp/hive没有可写权限,而不是HDFS.

所以我执行了以下命令以获得许可.

$ sudo chmod -R 777 /tmp/hive

Run Code Online (Sandbox Code Playgroud)

现在它对我有用.

如果在群集模式下执行spark job期间遇到同样的问题,则应在hive conf文件夹的hive-site.xml文件中配置以下属性,然后重新启动hive服务器.

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>Scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.scratch.dir.permission</name>
    <value>777</value>
    <description>The permission for the user-specific scratch directories that …

Run Code Online (Sandbox Code Playgroud)

hive hiveql apache-spark apache-spark-sql spark-dataframe

Pra*_*hoo

lucky-day

6
推荐指数

1
解决办法

3991
查看次数

为什么spark-shell失败并显示“ HDFS上的根暂存目录：/ tmp / hive应该是可写的”？

我是一个火花菜鸟，并使用Windows 10尝试使火花起作用。我已经正确设置了环境变量，并且我也有winutils。当我进入spark/bin并输入时spark-shell，它会运行spark，但是会出现以下错误。

而且它不显示spark上下文或spark会话。我一步一步地跟随了这个视频。

C:\Users\Akshay\Downloads\spark\bin>spark-shell
    17/06/19 23:45:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/06/19 23:45:19 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/Akshay/Downloads/spark/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Users/Akshay/Downloads/spark/jars/datanucleus-api-jdo-3.2.6.jar."
    17/06/19 23:45:20 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. …

Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql windows-10

Aks*_*hay

2017 06-21

4
推荐指数

1
解决办法

6991
查看次数

来自Rstudio的SparkR - 在invokeJava中给出错误(isStatic = TRUE,className,methodName,...):

我正在使用RStudio.

创建会话后,如果我尝试使用R数据创建数据帧,则会出错.

Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7")
Sys.setenv(HADOOP_HOME = "E:/winutils")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
Sys.setenv('SPARKR_SUBMIT_ARGS'='"sparkr-shell"')

library(SparkR)

sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="C:/Temp"))

localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
df <- createDataFrame(localDF)

Run Code Online (Sandbox Code Playgroud)

错误:

Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
    at org.a
>

Run Code Online (Sandbox Code Playgroud)

TIA.