我正在研究基于 Scala 的 Apache Spark 实现,用于将数据从远程位置加载到 HDFS,然后将数据从 HDFS 摄取到 Hive 表。
使用我的第一个 Spark 作业,我已将数据/文件加载到 HDFS 中的某个位置 -
hdfs://sandbox.hortonworks.com:8020/data/analytics/raw/ 文件夹
让我们考虑一下,在加入 CT_Click_Basic.csv 和 CT_Click_Basic1.csv.gz 文件后,我在 HDFS 中有以下文件 [共享位置的文件名将是此处的文件夹名称,其内容将出现在部分 xxxxx 文件中]:
[root@sandbox ~]# hdfs dfs -ls /data/analytics/raw/*/ 找到 3 项
-rw-r--r-- 3 chauhan.bhupesh hdfs 0 2017-07-27 15:02 /data/analytics/raw/CT_Click_Basic.csv/_SUCCESS
-rw-r--r-- 3 chauhan.bhupesh hdfs 8383 2017-07-27 15:02 /data/analytics/raw/CT_Click_Basic.csv/part-00000
-rw-r--r-- 3 chauhan.bhupesh hdfs 8395 2017-07-27 15:02 /data/analytics/raw/CT_Click_Basic.csv/part-00001
找到 2 件商品
-rw-r--r-- 3 chauhan.bhupesh hdfs 0 2017-07-27 15:02 /data/analytics/raw/CT_Click_Basic1.csv.gz/_SUCCESS
-rw-r--r-- 3 chauhan.bhupesh hdfs 16588 2017-07-27 15:02 …
hadoop hadoop-partitioning apache-spark hadoop2 apache-spark-sql
我是Kafka和Hadoop技术的新手.我试图在AWS EC2 VM实例上安装并运行我的第一个单节点,单个代理群集,我完成了:
1)java安装
2)~/.bashrc and ~/.nash_profile使用java相关条目更新文件
3)能够运行内部zookeeper实例,但是
4)当我试图启动kafka经纪人时,它会抛出以下错误消息:
$ bin/kafka-server-start.sh config/server.properties
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0130000, 986513408, 0) failed; error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 986513408 bytes for committing reserved memory.
An error report file with more information is saved as:
/usr/local/kafka/hs_err_pid2549.log
Run Code Online (Sandbox Code Playgroud)
我不确定我做错了什么.此AWS EC2 VM实例是新创建的Ubuntu-t2.micro实例,具有8 GB通用SSD卷.
我是bigdata技术/ hadoop生态系统的新手.
作为我的任务之一,我正在尝试在我的单节点hadoop集群[apache distribution hadoop2.6.0]上安装和运行Hue.
我按照许多网站提供的说明安装了hue:
sudo make installRun Code Online (Sandbox Code Playgroud)Traceback (most recent call last): File "/usr/local/hue/desktop/core/src/desktop/lib/wsgiserver.py", line1198,在沟通req.respond()文件"/usr/local/hue/desktop/core/src/desktop/lib/wsgiserver.py",第568行,在响应self._respond()文件"/ usr/local/hue/desktop/core/src/desktop/lib/wsgiserver.py",第580行,在_respond response = self.wsgi_app(self.environ,self.start_response)文件"/ usr/local/hue/build/env/lib /python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/wsgi.py",第206行,正在通话中 response = self.get_response(request)File"/usr/local/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base .py",第194行,在get_response response = self.handle_uncaught_exception(request,resolver,sys.exc_info())File"/usr/local/hue/build/env/lib/python2.6/site-packages/Django- 1.6.10-py2.6.egg/django/core/handlers/base.py",第236行,在handle_uncaught_exception中返回回调(request,**param_dict)文件"/ usr/local/hue/desktop/core/src/desktop/views.py",第304行,在serve_500_error中返回render("500.mako",request,{'traceback':traceback.extract_tb(exc_info [2])})文件"/ usr/local/hue/desktop/core/src/desktop/lib/django_util.py",第225行,在渲染**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_util.py",第146行,在_render_to_response中返回django_mako.render_to_response(模板,*args,**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第125行,在render_to_response中返回HttpResponse(render_to_string(template_name,data_dictionary),**kwargs)文件"/ usr/local/hue /desktop/core/src/desktop/lib/django_mako.py",第114行,在render_to_string_normal中= template.render(**data_dict)文件"/usr/local/hue/build/env/lib/python2.6/ site-packages/Mako-0.8.1-py2.6.egg/mako/template.py",第443行,在渲染返回运行时中.第146行,在_render_to_response中返回django_mako.render_to_response(template,*args,**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第125行,在render_to_response中返回HttpResponse (render_to_string(template_name,data_dictionary),**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第114行,在render_to_string_normal result = template.render(**data_dict) )文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/template.py",第443行,在渲染返回运行时.第146行,在_render_to_response中返回django_mako.render_to_response(template,*args,**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第125行,在render_to_response中返回HttpResponse (render_to_string(template_name,data_dictionary),**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第114行,在render_to_string_normal result = template.render(**data_dict) )文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/template.py",第443行,在渲染返回运行时.在render_to_response中返回HttpResponse(render_to_string(template_name,data_dictionary),**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第114行,在render_to_string_normal result = template.render (**data_dict)文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/template.py",第443行,在渲染返回运行时.在render_to_response中返回HttpResponse(render_to_string(template_name,data_dictionary),**kwargs)文件"/usr/local/hue/desktop/core/src/desktop/lib/django_mako.py",第114行,在render_to_string_normal result = template.render (**data_dict)文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/template.py",第443行,在渲染返回运行时.第443行,在渲染返回运行时.第443行,在渲染返回运行时.渲染(self,self.callable,args,data)文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/runtime .py",第786行,在_render**_ kwargs_for_callable(callable_,data))文件"/usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6 .egg/mako/runtime.py",第818行,在_render_context _exec_template中(inherit,lclcontext,args = args,kwargs = kwargs)文件"/usr/local/hue/build/env/lib/python2.6/site- packages/Mako-0.8.1-py2.6.egg/mako/runtime.py",第844行,_exec_template callable_(context,*args,**kwargs)文件"/tmp/tmpjqe8jG/desktop/500.mako. py",第103行,在render_body中 …
hadoop ×3
apache-kafka ×1
apache-spark ×1
django ×1
hadoop2 ×1
hue ×1
python-huey ×1
ubuntu-14.04 ×1