我有一个用scala编写的spark工作.我用
spark-shell -i <file-name>
Run Code Online (Sandbox Code Playgroud)
跑这份工作.我需要将命令行参数传递给作业.现在,我通过linux任务调用脚本,我在那里
export INPUT_DATE=2015/04/27
Run Code Online (Sandbox Code Playgroud)
并使用环境变量选项来访问该值:
System.getenv("INPUT_DATE")
Run Code Online (Sandbox Code Playgroud)
有没有更好的方法来处理Spark-shell中的命令行参数?
我有一个MapReduce作业,我正在尝试迁移到PySpark.有没有办法定义输出文件的名称,而不是获取part-xxxxx?
在MR中,我正在使用org.apache.hadoop.mapred.lib.MultipleTextOutputFormat该类来实现这一目标,
PS:我确实试过这个saveAsTextFile()方法.例如:
lines = sc.textFile(filesToProcessStr)
counts = lines.flatMap(lambda x: re.split('[\s&]', x.strip()))\
.saveAsTextFile("/user/itsjeevs/mymr-output")
Run Code Online (Sandbox Code Playgroud)
这将创建相同的part-0000文件.
[13:46:25] [spark] $ hadoop fs -ls /user/itsjeevs/mymr-output/
Found 3 items
-rw-r----- 2 itsjeevs itsjeevs 0 2014-08-13 13:46 /user/itsjeevs/mymr-output/_SUCCESS
-rw-r--r-- 2 itsjeevs itsjeevs 101819636 2014-08-13 13:46 /user/itsjeevs/mymr-output/part-00000
-rw-r--r-- 2 itsjeevs itsjeevs 17682682 2014-08-13 13:46 /user/itsjeevs/mymr-output/part-00001
Run Code Online (Sandbox Code Playgroud)
编辑
最近阅读了文章,这将使Spark用户的生活更加轻松.
我是Springboot的新手。这是我要解决的问题。我有一个具有以下属性的application.yml文件:
kinesis:
streaming:
client:
featuretoggle:
kinesisSenderFeature: true
Run Code Online (Sandbox Code Playgroud)
我尝试使用以下代码访问KinesisSenderFeature的值:
@Value("${kinesis.streaming.client.featuretoggle.kinesisSenderFeature}")
private boolean featureToggle;
Run Code Online (Sandbox Code Playgroud)
以及
@Value("${kinesis.streaming.client.featuretoggle.kinesisSenderFeature}")
private Boolean featureToggle;
Run Code Online (Sandbox Code Playgroud)
PropertySourcesPlaceholderConfigurer Bean定义为:
@Bean
@Primary
public static PropertySourcesPlaceholderConfigurer propertySourcesPlaceholderConfigurer() {
PropertySourcesPlaceholderConfigurer propertySourcesPlaceholderConfigurer = new PropertySourcesPlaceholderConfigurer();
YamlPropertiesFactoryBean yaml = new YamlPropertiesFactoryBean();
yaml.setResources(new ClassPathResource("application.yml"));
propertySourcesPlaceholderConfigurer.setProperties(yaml.getObject());
return propertySourcesPlaceholderConfigurer;
}
Run Code Online (Sandbox Code Playgroud)
当我尝试构建时,ApplicaitonContext无法加载,并显示以下错误:
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'rabbitMessageConsumer': Unsatisfied dependency expressed through field 'featureToggle'; nested exception is org.springframework.beans.TypeMismatchException: Failed to convert value of type 'java.lang.String' to required type 'java.lang.Boolean'; nested exception is java.lang.IllegalArgumentException: Invalid boolean value …Run Code Online (Sandbox Code Playgroud) 我无法使用spark-avro库读取spark文件.以下是我采取的步骤:
spark-shell --jars avro/spark-avro_2.10-0.1.jar执行git自述文件中给出的命令:
import com.databricks.spark.avro._
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val episodes = sqlContext.avroFile("episodes.avro")
Run Code Online (Sandbox Code Playgroud)操作sqlContext.avroFile("episodes.avro")失败,并显示以下错误:
scala> val episodes = sqlContext.avroFile("episodes.avro")
java.lang.IncompatibleClassChangeError: class com.databricks.spark.avro.AvroRelation has interface org.apache.spark.sql.sources.TableScan as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
Run Code Online (Sandbox Code Playgroud)