小编Sep*_*ixx的帖子

PyArrow 0.16.0 fs.HadoopFileSystem 抛出 HDFS 连接失败

我目前正在迁移旧的 Arrow 文件系统接口:

http://arrow.apache.org/docs/python/filesystems_deprecated.html

到新的文件系统接口:

http://arrow.apache.org/docs/python/filesystems.html

我正在尝试使用 fs.HadoopFileSystem 连接到 HDFS,如下所示

from pyarrow import fs
import os
os.environ['HADOOP_HOME'] = '/usr/hdp/current/hadoop-client'
os.environ['JAVA_HOME'] = '/opt/jdk8'
os.environ['ARROW_LIBHDFS_DIR'] = '/usr/lib/ams-hbase/lib/hadoop-native'

fs.HadoopFileSystem("hdfs://namenode:8020?user=hdfsuser")
Run Code Online (Sandbox Code Playgroud)

我尝试了不同的 uri 组合,并将 uri 替换为 fs.HdfsOptions:

connection_tuple = ("namenode", 8020)
fs.HadoopFileSystem(fs.HdfsOptions(connection_tuple, user="hdfsuser"))
Run Code Online (Sandbox Code Playgroud)

以上所有内容都给我带来了同样的错误:

Environment variable CLASSPATH not set!
getJNIEnv: getGlobalJNIEnv failed
Environment variable CLASSPATH not set!
getJNIEnv: getGlobalJNIEnv failed
/arrow/cpp/src/arrow/filesystem/hdfs.cc:56: Failed to disconnect hdfs client: IOError: HDFS hdfsFS::Disconnect failed, errno: 255 (Unknown error 255) Please check that you are connecting to the correct HDFS …
Run Code Online (Sandbox Code Playgroud)

connection hadoop hdfs pyarrow

5
推荐指数
1
解决办法
9119
查看次数

标签 统计

connection ×1

hadoop ×1

hdfs ×1

pyarrow ×1