我正在使用XStream实用程序将POJO转换为XML.
但是当我生成一个xml并尝试读取它以进行进一步处理时,它会抛出一个错误
Exception caused by : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
Run Code Online (Sandbox Code Playgroud)
为了解决这个问题,我用Google搜索并发现它缺少xml的标头标记.
<?xml version="1.0" encoding="UTF-8"?>
Run Code Online (Sandbox Code Playgroud)
在将Java对象转换为XML文件时如何添加上面的标题?
我正在尝试过滤从oracle读取的数据框的列,如下所示
import org.apache.spark.sql.functions.{col, lit, when}
val df0 = df_org.filter(col("fiscal_year").isNotNull())
Run Code Online (Sandbox Code Playgroud)
当我这样做时,我得到以下错误:
java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit ()
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
at scala.util.Try.getOrElse(Try.scala:79)
at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
at org.apache.spark.sql.functions$.typedLit(functions.scala:113)
at org.apache.spark.sql.functions$.lit(functions.scala:96)
at org.apache.spark.sql.Column.apply(Column.scala:212)
at com.snp.processors.BenchmarkModelValsProcessor2.process(BenchmarkModelValsProcessor2.scala:80)
at com.snp.utils.Utils$$anonfun$getAllDefinedProcessors$1.apply(Utils.scala:30)
at com.snp.utils.Utils$$anonfun$getAllDefinedProcessors$1.apply(Utils.scala:30)
at com.sp.MigrationDriver$$anonfun$main$6$$anonfun$apply$2.apply(MigrationDriver.scala:140)
at com.sp.MigrationDriver$$anonfun$main$6$$anonfun$apply$2.apply(MigrationDriver.scala:140)
at scala.Option.map(Option.scala:146)
at com.sp.MigrationDriver$$anonfun$main$6.apply(MigrationDriver.scala:138)
at com.sp.MigrationDriver$$anonfun$main$6.apply(MigrationDriver.scala:135)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.MapLike$DefaultKeySet.foreach(MapLike.scala:174)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at com.sp.MigrationDriver$.main(MigrationDriver.scala:135)
at com.sp.MigrationDriver.main(MigrationDriver.scala)
Run Code Online (Sandbox Code Playgroud)
知道我在这里做错什么以及如何解决吗?
我需要从AWS C *表中插入和提取数据。我的数据生产者是通过Java 8的Spring Boot定义的。
因此,我应该在稳定和高效的项目中使用哪一个。我有办法(我猜这里)1. Sprinda-data-JPA。2. datastax的cassandra-driver-core。
cassandra datastax-java-driver spring-data-cassandra cassandra-3.0
我正在使用 spark-sql 2.4.1、spark-cassandra-connector_2.11-2.4.1.jar 和 java8。当我试图从表中获取数据时,我遇到了
java.io.IOException: Failed to write statements to keyspace1.model_vals. The
latest exception was
An unexpected error occurred server side on cassandra-node1: com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
Run Code Online (Sandbox Code Playgroud)
那么如何从火花代码建立到 Cassandra db 的区域/直流感知连接?
YML
现有的
spring:
data:
cassandra:
keyspace-name: raproduct
contact-points:
- cassandra-node1
- cassandra-node2
port: 9042
Run Code Online (Sandbox Code Playgroud)
变成
spring:
data:
cassandra:
connection:
local_dc: southeast-1
keyspace-name: raproduct
contact-points:
- cassandra-node1
- cassandra-node2
port: 9042
Run Code Online (Sandbox Code Playgroud)
题
但它没有反映/应用更改后的“local_dc”。如何在 spring-data 中做到这一点?
cassandra spring-data datastax-java-driver apache-spark spring-data-cassandra
我有火花scala的场景,我需要转换RDD[List[String]]为RDD[String].
我该怎么做?
@eric,我可以知道为什么问题不在话题吗?
我使用的是使用 hadoop-2.6.5.jar 版本的 spark-sql-2.4.1v。我需要先将数据保存在 hdfs 上,然后再转移到 cassandra。因此,我试图将数据保存在 hdfs 上,如下所示:
String hdfsPath = "/user/order_items/";
cleanedDs.createTempViewOrTable("source_tab");
givenItemList.parallelStream().forEach( item -> {
String query = "select $item as itemCol , avg($item) as mean groupBy year";
Dataset<Row> resultDs = sparkSession.sql(query);
saveDsToHdfs(hdfsPath, resultDs );
});
public static void saveDsToHdfs(String parquet_file, Dataset<Row> df) {
df.write()
.format("parquet")
.mode("append")
.save(parquet_file);
logger.info(" Saved parquet file : " + parquet_file + "successfully");
}
Run Code Online (Sandbox Code Playgroud)
当我在集群上运行我的工作时,它无法抛出此错误:
java.io.IOException: Failed to rename FileStatus{path=hdfs:/user/order_items/_temporary/0/_temporary/attempt_20180626192453_0003_m_000007_59/part-00007.parquet; isDirectory=false; length=952309; replication=1; blocksize=67108864; modification_time=1530041098000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to hdfs:/user/order_items/part-00007.parquet …Run Code Online (Sandbox Code Playgroud) 我使用的是spark-sql 2.3.1,我设置
spark.sql.shuffle.partitions=40
Run Code Online (Sandbox Code Playgroud)
在我的代码中'
val partitioned_df = vals_df.repartition(col("model_id"),col("fiscal_year"),col("fiscal_quarter"))
Run Code Online (Sandbox Code Playgroud)
当我说
println(" Number of partitions : " + partitioned_df.rdd.getNumPartitions)
Run Code Online (Sandbox Code Playgroud)
它给出 40 作为输出,事实上重新分区后理想情况下计数应该在 400 左右,为什么重新分区在这里不起作用?我在这里做错了什么?如何修复它?
apache-spark ×4
cassandra ×2
datastax ×2
scala ×2
databricks ×1
hadoop ×1
hadoop2 ×1
hdfs ×1
java ×1
jaxb ×1
parquet ×1
spring-data ×1
xstream ×1