相关疑难解决方法(0)

Spark中的并发作业执行

我使用了以下格式的输入数据:

0
1
2
3
4
5
…
14

Input Location: hdfs://localhost:9000/Input/datasource
Run Code Online (Sandbox Code Playgroud)

我使用以下代码片段将RDD保存为使用多个线程的文本文件:

package org.apache.spark.examples;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import org.apache.avro.ipc.specific.Person;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;

import scala.Tuple2;

class RunnableDemo implements Runnable
{

    private Thread t;
    private String threadName;
    private String path;
    private JavaRDD<String> javaRDD; …
Run Code Online (Sandbox Code Playgroud)

java multithreading hadoop-yarn apache-spark

5
推荐指数
1
解决办法
6693
查看次数

标签 统计

apache-spark ×1

hadoop-yarn ×1

java ×1

multithreading ×1