小编use*_*und的帖子

Spark在Yarn集群exitCode = 13上运行:

我是一个火花/纱线新手,当我在纱线集群上提交火花作业时,遇到exitCode = 13.当火花作业在本地模式下运行时,一切都很好.

我使用的命令是:

/usr/hdp/current/spark-client/bin/spark-submit --class com.test.sparkTest --master yarn --deploy-mode cluster --num-executors 40 --executor-cores 4 --driver-memory 17g --executor-memory 22g --files /usr/hdp/current/spark-client/conf/hive-site.xml /home/user/sparkTest.jar*

Run Code Online (Sandbox Code Playgroud)

火花错误日志:

16/04/12 17:59:30 INFO Client:
         client token: N/A
         diagnostics: Application application_1459460037715_23007 failed 2 times due to AM Container for appattempt_1459460037715_23007_000002 exited with  exitCode: 13
For more detailed output, check application tracking page:http://b-r06f2-prod.phx2.cpe.net:8088/cluster/app/application_1459460037715_23007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e40_1459460037715_23007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487) …

Run Code Online (Sandbox Code Playgroud)

hadoop scala hadoop-yarn apache-spark

use*_*und

2016 04-13

9
推荐指数

2
解决办法

2万
查看次数

如何优化spark sql并行运行它

我是一个火花新手,并使用Spark SQL/hiveContext有一个简单的spark应用程序:

从蜂巢表中选择数据(10亿行)
做一些过滤,聚合包括row_number over window function来选择第一行,group by,count()和max()等.
将结果写入HBase(数亿行)

我提交作业在纱线集群(100个执行器)上运行它,它很慢,当我在Spark UI中查看DAG可视化时,似乎只有hive表扫描任务并行运行,其余步骤#2和#以上3只在一个实例中运行,可能应该能够优化并行化？

该应用程序看起来像:

步骤1:

val input = hiveContext
  .sql(
     SELECT   
           user_id  
           , address  
           , age  
           , phone_number  
           , first_name  
           , last_name  
           , server_ts   
       FROM  
       (     
           SELECT  
               user_id  
               , address  
               , age  
               , phone_number  
               , first_name  
               , last_name  
               , server_ts   
               , row_number() over 
                (partition by user_id, address,  phone_number, first_name, last_name  order by user_id, address, phone_number, first_name, last_name,  server_ts desc, age) AS rn  
           FROM  
           (  
               SELECT  
                   user_id  
                   , address  
                   , age  
                   , phone_number …

Run Code Online (Sandbox Code Playgroud)

sql parallel-processing hadoop-yarn apache-spark apache-spark-sql

use*_*und

2019 01-08

6
推荐指数

1
解决办法

5291
查看次数

高GPU内存使用但低挥发性gpu-util

Keras和DL新手在这里.我想构建一个模型来训练顺序文本数据以进行分类.数据看起来像:

id,文字,标签

1,tom.hasLunch,0

2,jerry.drinkWater,1

我用python3.5和keras 2(TF作为后端)构建它.模型摘要如下:

第一个/输入层是一个word2Vec嵌入,它是从头开始构建的,有4332个字.
第二层是一个简单的LSTM层,参数包括:(dense_dim = 100,kernel_initializer ='he_normal',dropout = 0.15,recurrent_dropout = 0.15,implementation = 2)
接下来是第三个辍学层:辍学(0.3)
输出层

训练数据大小约为30GB.参数的数量并不多,因为我将功能的嵌入层数从300减少到100,而我只为每行/ ID选择前1000个字.在AWS EC2 p2.8xlarge实例上运行后,我发现了

低易失性gpu-util但高GPU内存使用率GPU-Util通常约为30%ish且不超过50%,我希望能更好地利用GPU,以便加速训练.1个时代现在需要大约6-7个小时.
考虑到实例/机器的强大程度,CPU和内存使用率也非常低.看起来只有python3线程正在运行,但它确实通过htop显示多个线程,但仍然是非常低的CPU利用率.