我正在尝试通过 Pyarrow 连接到 HDFS,但它不起作用,因为libhdfs无法加载库。
libhdfs.so既在 又$HADOOP_HOME/lib/native在$ARROW_LIBHDFS_DIR。
print(os.environ['ARROW_LIBHDFS_DIR'])
fs = hdfs.connect()
bash-3.2$ ls $ARROW_LIBHDFS_DIR
examples libhadoop.so.1.0.0 libhdfs.a libnativetask.a
libhadoop.a libhadooppipes.a libhdfs.so libnativetask.so
libhadoop.so libhadooputils.a libhdfs.so.0.0.0 libnativetask.so.1.0.0
Run Code Online (Sandbox Code Playgroud)
我得到的错误:
Traceback (most recent call last):
File "wine-pred-ml.py", line 31, in <module>
fs = hdfs.connect()
File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 183, in connect
extra_conf=extra_conf)
File "/Users/PVZP/Library/Python/2.7/lib/python/site-packages/pyarrow/hdfs.py", line 37, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable …Run Code Online (Sandbox Code Playgroud) 我有一个Hadoop集群(HDP 2.1).一切都已经工作了很长时间,但突然工作开始返回以下重复出现的错误:
16/10/13 16:21:11 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:12 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:12 INFO impl.TimelineClientImpl: Timeline service address: http://dev-fiwr-bignode-12.hi.inet:8188/ws/v1/timeline/
16/10/13 16:21:13 INFO client.RMProxy: Connecting to ResourceManager at dev-fiwr-bignode-12.hi.inet/10.95.76.79:8050
16/10/13 16:21:13 INFO input.FileInputFormat: Total input paths to process : 2
16/10/13 16:21:13 INFO mapreduce.JobSubmitter: number of splits:2
16/10/13 16:21:13 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
16/10/13 16:21:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476366871137_0003
16/10/13 16:21:14 INFO impl.YarnClientImpl: …Run Code Online (Sandbox Code Playgroud) 我已经开发了一个自定义身份验证器提供程序,在身份验证方面一切正常:HiveServer2启动良好,经过身份验证的连接已经过适当验证.甚至,简单的Hive查询工作,例如show tables.
问题是当我尝试从远程Hive客户端执行查询时.由于我已经连接传递我的凭据(用户+密码......好吧,不是真正的密码,它是一个令牌,但这不相关),并且Hive配置已准备好进行模拟(见下文),我预计HiveServer2会执行查询为我的用户.然而,它使用的hive用户对我的HDFS用户空间没有任何权限.
例如,如果我创建一个表:
> create external table mytable (name string, job string, age string) row format delimited fields terminated by ',' location '/user/frb/testdir'
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hive, access=EXECUTE, inode="/user/frb":frb:frb:drwxr-----
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5519)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3517)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:785)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at …Run Code Online (Sandbox Code Playgroud) 这是我第一次来这里,很抱歉,如果我不发布罚款,抱歉我的英语不好.
我正在尝试配置Apache Flume和Elasticsearch接收器.一切都很好,似乎它运行正常,但是当我启动代理时有2个警告; 以下是:
2015-11-16 09:11:22,122 (lifecycleSupervisor-1-3) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@ce359aa counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.NoSuchMethodError: org.elasticsearch.common.transport.InetSocketTransportAddress.<init>(Ljava/lang/String;I)V
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.configureHostnames(ElasticSearchTransportClient.java:143)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.<init>(ElasticSearchTransportClient.java:77)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory.getClient(ElasticSearchClientFactory.java:48)
at org.apache.flume.sink.elasticsearch.ElasticSearchSink.start(ElasticSearchSink.java:357)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-11-16 09:11:22,137 (lifecycleSupervisor-1-3) [WARN - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:260)] Component SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@ce359aa counterGroup:{ name:null counters:{} } } stopped, since it could not besuccessfully started due to missing dependencies …Run Code Online (Sandbox Code Playgroud) 我已经PasswdAuthenticationProvider基于OAuth2 创建了一个自定义的界面实现.我认为代码与我遇到的问题无关,但是,它可以在这里找到.
我已配置hive-site.xml以下属性:
<property>
<name>hive.server2.authentication</name>
<value>CUSTOM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>com.telefonica.iot.cosmos.hive.authprovider.OAuth2AuthenticationProviderImpl</value>
</property>
Run Code Online (Sandbox Code Playgroud)
然后我重新启动了Hive服务,并且我已成功连接基于JDBC的远程客户端.这是在/var/log/hive/hiveserver2.log以下位置找到的成功运行的示例:
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(66)) - Setting max total connections (500)
2016-02-01 11:52:44,515 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (HttpClientFactory.java:<init>(67)) - Setting default max connections per route (100)
2016-02-01 11:52:44,799 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(65)) - Doing request: GET https://account.lab.fiware.org/user?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HTTP/1.1
2016-02-01 11:52:44,800 INFO [pool-5-thread-5]: authprovider.HttpClientFactory (OAuth2AuthenticationProviderImpl.java:Authenticate(76)) - Response received: {"organizations": [], "displayName": "frb", "roles": [{"name": "provider", "id": "106"}], "app_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "email": "frb@tid.es", …Run Code Online (Sandbox Code Playgroud) 当使用 Luigi 循环文件时,我不会强制保存空文件只是为了表明任务已完成,并让下一个任务检查 txt 中是否有任何行,等等。
如何在不输出文件的情况下让任务显示它成功(即运行方法按预期工作)?我在这里错过了什么吗?
我最近看到FIWARE Lab中对Cosmos的WebHDFS的访问受到了OAuth2的保护.我知道我必须向请求添加OAuth2令牌才能继续使用WebHDFS,但是:
如果没有令牌,API始终会返回:
$ curl -X GET "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/gtorodelvalle?op=liststatus&user.name=gtorodelvalle"
Auth-token not found in request header
Run Code Online (Sandbox Code Playgroud) 我已在FILAB中部署了Orion实例,并已配置了Cygnus inyector,以便在Cosmos中存储信息。
但是...让我们想象一下这样一种场景,其中实体的数量急剧增加。在这种假设的情况下,仅Orion GE的一个实例是不够的,因此有必要部署更多实例。
比例程序是什么?考虑到最大配额为:
VM实例:5个VCPU:10个硬盘:100 GB内存:10240 MB公用IP:1
我知道配额可能会更改,但是免费帐户限额是多少?
Cosmos头节点中的硬盘限制是多少?(理论上为5GB配额)
是否可以通过单个公共IP部署更多Orion Context Broker实例,还是有必要要求多个公共IP?怎么样?
总而言之,我要求提供有关拟议方案的扩展程序和免费帐户限制(可能的最大配额)的信息。
先感谢您。亲切的问候。
拉蒙
我试图在Hadoop环境中执行NLTK.以下是我用于执行的命令.
斌/ Hadoop的罐子$ HADOOP_HOME /的contrib /流/ Hadoop的流-1.0.4.jar - 输入/用户/ NLTK /输入/ - 输出/用户/ NLTK /输出1/-file /家庭/ hduser /软件/ NLTK/unsupervised_sentiment -master.zip -mapper /home/hduser/softwares/NLTK/unsupervised_sentiment-master/sentiment.py
unsupervised_sentiment-master.zip ---包含sentiment.py所需的所有相关文件
我正进入(状态
了java.lang.RuntimeException:PipeMapRed.waitOutputThreads():在子进程在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)与代码2失败(PipeMapRed的.java:576)在org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)在org.apache.hadoop. streaming.PipeMapRunner.run(PipeMapRunner.java:36)在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)在org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)在org.apache.hadoop.mapred.Child $ 4.run(Child.java:255)在java.security.AccessController.doPrivileged(本机方法)在javax.security.auth.Subject.doAs(Subject.java:415)在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)org.apache.hadoop.mapred.Child.main(Child.java:249)
任何帮助将不胜感激!!!
我需要在 CKAN 2.5 中创建一个资源视图,但所有API 文档都说:
\n\n\n\n\nckan.logic.action.create.resource_view_create(上下文,data_dict)
\n\n创建新的资源视图。
\n\n参数:
\n\nresource_id (string) \xe2\x80\x93 资源的 id
\n\ntitle (string) \xe2\x80\x93 视图的标题
\n\n描述(字符串)\xe2\x80\x93 视图的描述(可选)
\n\nview_type (字符串) \xe2\x80\x93 视图类型
\n\nconfig (JSON 字符串)\xe2\x80\x93 重新创建视图状态所需的选项(可选)
\n\n返回:
\n\n新创建的资源视图
\n\n返回类型:
\n\n字典
\n
没有提及可用的内容,view_type也没有提及如何为有效负载创建所需的 Json。同样,有人向我指出了http://docs.ckan.org/en/latest/maintaining/data-viewer.html,我可以弄清楚视图是recline_view,recline_grid_view等等。
我尝试创建一个recline_view视图,但如上所述,Json 有效负载是必要的:
$ curl -s -S -H "Authorization: my-api-key" "http://demo.ckan.org/api/3/action/resource_view_create?resource_id=eaf95b46-3a9f-4cbc-87cf-a6364e9581b1&title=view_test&view_type=recline_view"\n"Bad request - JSON Error: No request body data"\n …Run Code Online (Sandbox Code Playgroud)