我mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies在我的项目中运行,我看到以下错误:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (default-cli) on project beam-sdks-java-core: Some problems were encountered while processing the POMs:
[ERROR] [ERROR] Unknown packaging: bundle @ line 6, column 16: 1 problem was encountered while building the effective model for org.xerial.snappy:snappy-java:1.1.4
[ERROR] [ERROR] Unknown packaging: bundle @ line 6, column 16
Run Code Online (Sandbox Code Playgroud)
查看 Snappy 的 pom 文件,它看起来像这样:
<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
<packaging>bundle</packaging>
<description>snappy-java: A fast compression/decompression library</description>
<version>1.1.4</version>
<name>snappy-java</name>
....
Run Code Online (Sandbox Code Playgroud)
具体来说,这<packaging>bundle</packaging> …
我有一个 Dataflow 工作没有取得进展 - 或者进展非常缓慢,我不知道为什么。我如何开始调查工作缓慢/卡住的原因?
我无法使用 python 3.7 暂存云数据流模板。它在一个参数化的论点上失败了apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible
使用 python 2.7 暂存模板工作正常。
我试过用 3.7 运行数据流作业,它们工作正常。只有模板暂存被破坏。数据流模板中仍然不支持 python 3.7 还是 python 3 中的暂存语法发生了变化?
这是管道部分
class WordcountOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from',
dest="input")
def main(argv=None):
options = PipelineOptions(flags=argv)
setup_options = options.view_as(SetupOptions)
wordcount_options = options.view_as(WordcountOptions)
with beam.Pipeline(options=setup_options) as p:
lines = p | 'read' >> ReadFromText(wordcount_options.input)
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
这是带有暂存脚本的完整存储库https://github.com/firemuzzy/dataflow-templates-bug-python3
之前有一个类似的问题,但我不确定它是如何相关的,因为它是在 python 2.7 中完成的,但我的模板在 2.7 中阶段正常但在 3.7 …
我正在 Google Dataflow 中运行流式 Apache 光束管道。它从 Kafka 读取数据并将流式插入到 Bigquery。
但是在 bigquery 流插入步骤中,它抛出了大量警告 -
java.lang.RuntimeException: ManagedChannel allocation site
at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init> (ManagedChannelOrphanWrapper.java:93)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:53)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:44)
at io.grpc.internal.ManagedChannelImplBuilder.build (ManagedChannelImplBuilder.java:612)
at io.grpc.internal.AbstractManagedChannelImplBuilder.build (AbstractManagedChannelImplBuilder.java:261)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel (InstantiatingGrpcChannelProvider.java:340)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.access$1600 (InstantiatingGrpcChannelProvider.java:73)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider$1.createSingleChannel (InstantiatingGrpcChannelProvider.java:214)
at com.google.api.gax.grpc.ChannelPool.create (ChannelPool.java:72)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel (InstantiatingGrpcChannelProvider.java:221)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel (InstantiatingGrpcChannelProvider.java:204)
at com.google.api.gax.rpc.ClientContext.create (ClientContext.java:169)
at com.google.cloud.bigquery.storage.v1beta2.stub.GrpcBigQueryWriteStub.create (GrpcBigQueryWriteStub.java:138)
at com.google.cloud.bigquery.storage.v1beta2.stub.BigQueryWriteStubSettings.createStub (BigQueryWriteStubSettings.java:145)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.<init> (BigQueryWriteClient.java:128)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.create (BigQueryWriteClient.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.newBigQueryWriteClient (BigQueryServicesImpl.java:1255)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.access$800 (BigQueryServicesImpl.java:135)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:521)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:449)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.getDatasetService (BigQueryServicesImpl.java:169) …Run Code Online (Sandbox Code Playgroud) 我想从两个不同的JSON文件中找出女性员工,只选择我们感兴趣的字段并将输出写入另一个JSON.
此外,我正在尝试使用Dataflow在Google的云平台上实现它.有人可以提供任何可以实现的示例Java代码来获得结果.
员工JSON
{"emp_id":"OrgEmp#1","emp_name":"Adam","emp_dept":"OrgDept#1","emp_country":"USA","emp_gender":"female","emp_birth_year":"1980","emp_salary":"$100000"}
{"emp_id":"OrgEmp#1","emp_name":"Scott","emp_dept":"OrgDept#3","emp_country":"USA","emp_gender":"male","emp_birth_year":"1985","emp_salary":"$105000"}
Run Code Online (Sandbox Code Playgroud)
部门JSON
{"dept_id":"OrgDept#1","dept_name":"Account","dept_start_year":"1950"}
{"dept_id":"OrgDept#2","dept_name":"IT","dept_start_year":"1990"}
{"dept_id":"OrgDept#3","dept_name":"HR","dept_start_year":"1950"}
Run Code Online (Sandbox Code Playgroud)
预期的输出JSON文件应该是这样的
{"emp_id":"OrgEmp#1","emp_name":"Adam","dept_name":"Account","emp_salary":"$100000"}
Run Code Online (Sandbox Code Playgroud) 我正在尝试绘制直方图。这个直方图的数据来自一个包含频率列表的字典,我需要的只是绘制:
直方图或,
每个元素的值的条形图(直方图可以从这里导出:))
以下是字典外观的示例:
{0: 282, 1: 152, 2: 131, 3: 122, 4: 108, 5: 101, 6: 106, 7: 91, 8: 96, 9: 92,
...
1147: 1, 1157: 1, 1186: 1, 1217: 1, 1236: 1, 1251: 1, 1255: 1, 1291: 1, 1372: 1, 1402: 1}
Run Code Online (Sandbox Code Playgroud)
非常感谢。
我最近升级到emacs 24.4,每当我尝试在术语中执行命令时(例如C-x C-f打开文件),它就会说C-x C-f is undefined.
如何启用命令以在术语模式下运行?
我正在编写一个数据流流管道。在其中一个转换中,DoFn 我想要访问外部服务 - 在本例中,它是数据存储区。
这种初始化步骤有没有最佳实践?我不想为每个 processElement 方法调用创建数据存储连接对象。
java google-cloud-datastore google-cloud-dataflow apache-beam
我有一个PCollection,我想使用ParDo从中筛选出一些元素。
在这里可以找到一个例子吗?
我正在尝试在 Apache Beam 中运行一个非常简单的程序来尝试它是如何工作的。
import apache_beam as beam
class Split(beam.DoFn):
def process(self, element):
return element
with beam.Pipeline() as p:
rows = (p | beam.io.ReadAllFromText(
"input.csv") | beam.ParDo(Split()))
Run Code Online (Sandbox Code Playgroud)
运行此程序时,我收到以下错误
.... some more stack....
File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/util.py", line 565, in expand
windowing_saved = pcoll.windowing
File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
self.producer.inputs)
File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
return inputs[0].windowing
File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
self.producer.inputs)
File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
return inputs[0].windowing
AttributeError: 'PBegin' object has no attribute 'windowing'
Run Code Online (Sandbox Code Playgroud)
知道这里出了什么问题吗?
谢谢
apache-beam ×7
java ×2
python ×2
dictionary ×1
emacs ×1
json ×1
maven ×1
plot ×1
python-3.x ×1