小编Dee*_*mar的帖子

即使 json 数据包含架构和有效负载字段，kafka 连接 hdfs 接收器连接器也失败

我正在尝试使用 kafka 连接 hdfs 接收器连接器将 json 数据从 kafka 移动到 hdfs。

即使 kafka 中的 json 数据具有架构和有效负载 kafka 连接任务也失败并出现错误

org.apache.kafka.connect.errors.DataException: JsonConverter with schemas.enable requires \"schema\" and \"payload\" fields and may not contain additional fields.

Run Code Online (Sandbox Code Playgroud)

卡夫卡中的数据：

./bin/kafka-console-consumer --topic test_hdfs_json_schema_payload_1 --zookeeper localhost:2181 --from-beginning

{"schema": {"type": "struct","fields": [{"type": "string","optional": false,"field": "Name"}, {"type": "string","optional": false,"field": "company"}],"optional": false,"name": "Person"},"payload": {"Name": "deepak","company": "BT"}}
{"schema": {"type": "struct","fields": [{"type": "string","optional": false,"field": "Name"}, {"type": "string","optional": false,"field": "company"}],"optional": false,"name": "Person"},"payload": {"Name": "sufi","company": "BT"}}
{"schema": {"type": "struct","fields": [{"type": "string","optional": false,"field": "Name"}, {"type": "string","optional": …

Run Code Online (Sandbox Code Playgroud)

hdfs apache-kafka apache-kafka-connect

Dee*_*mar

lucky-day

6
推荐指数

1
解决办法

5280
查看次数

Spark中的PCA输出与scikit-learn不匹配

我在Spark ML中尝试PCA(主成分分析).

data = [(Vectors.dense([1.0, 1.0]),),
  (Vectors.dense([1.0, 2.0]),),
  (Vectors.dense([4.0, 4.0]),), 
  (Vectors.dense([5.0, 4.0]),)]

df = spark.createDataFrame(data, ["features"])
pca = PCA(k=1, inputCol="features", outputCol="pcaFeatures")
model = pca.fit(df)
transformed_feature = model.transform(df)
transformed_feature.show()

Run Code Online (Sandbox Code Playgroud)

输出:

+---------+--------------------+
| features|         pcaFeatures|
+---------+--------------------+
|[1.0,1.0]|[-1.3949716649258...|
|[1.0,2.0]|[-1.976209858644928]|
|[4.0,4.0]|[-5.579886659703326]|
|[5.0,4.0]|[-6.393620130910061]|
+---------+--------------------+

Run Code Online (Sandbox Code Playgroud)

当我在scikit上尝试使用相同数据的PCA时 - 如下所示,给出了不同的结果

X = np.array([[1.0, 1.0], [1.0, 2.0], [4.0, 4.0], [5.0, 4.0]])
pca = PCA(n_components=1)
pca.fit(X)
X_transformed = pca.transform(X)
for x,y in zip(X ,X_transformed):
    print(x,y)

Run Code Online (Sandbox Code Playgroud)

输出:

[ 1.  1.] [-2.44120041]
[ 1.  2.] [-1.85996222]
[ 4.  4.] [ 1.74371458]
[ …

Run Code Online (Sandbox Code Playgroud)

python pca apache-spark pyspark apache-spark-ml

Dee*_*mar

2017 12-18

6
推荐指数

1
解决办法

700
查看次数

标签统计

apache-kafka ×1

apache-kafka-connect ×1

apache-spark ×1

apache-spark-ml ×1

hdfs ×1

pca ×1

pyspark ×1

python ×1

即使 json 数据包含架构和有效负载字段，kafka 连接 hdfs 接收器连接器也失败

Spark中的PCA输出与scikit-learn不匹配

标签 统计

小编Dee_mar的帖子

标签统计