小编Pab*_*blo的帖子

Maven:来自依赖项打包为包的`未知包装:包`错误

mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies在我的项目中运行,我看到以下错误:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (default-cli) on project beam-sdks-java-core: Some problems were encountered while processing the POMs:
[ERROR] [ERROR] Unknown packaging: bundle @ line 6, column 16: 1 problem was encountered while building the effective model for org.xerial.snappy:snappy-java:1.1.4
[ERROR] [ERROR] Unknown packaging: bundle @ line 6, column 16
Run Code Online (Sandbox Code Playgroud)

查看 Snappy 的 pom 文件,它看起来像这样:

<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.xerial.snappy</groupId>
    <artifactId>snappy-java</artifactId>
    <packaging>bundle</packaging>
    <description>snappy-java: A fast compression/decompression library</description>
    <version>1.1.4</version>
    <name>snappy-java</name>
    ....
Run Code Online (Sandbox Code Playgroud)

具体来说,这<packaging>bundle</packaging> …

java maven

6
推荐指数
1
解决办法
5589
查看次数

如何调试我的 Dataflow 作业卡住的原因?

我有一个 Dataflow 工作没有取得进展 - 或者进展非常缓慢,我不知道为什么。我如何开始调查工作缓慢/卡住的原因?

google-cloud-dataflow apache-beam

5
推荐指数
1
解决办法
1434
查看次数

使用 Apache Beam python 创建谷歌云数据流模板时出现 RuntimeValueProviderError

我无法使用 python 3.7 暂存云数据流模板。它在一个参数化的论点上失败了apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible

使用 python 2.7 暂存模板工作正常。

我试过用 3.7 运行数据流作业,它们工作正常。只有模板暂存被破坏。数据流模板中仍然不支持 python 3.7 还是 python 3 中的暂存语法发生了变化?

这是管道部分

class WordcountOptions(PipelineOptions):
  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_value_provider_argument(
      '--input',
      default='gs://dataflow-samples/shakespeare/kinglear.txt',
      help='Path of the file to read from',
      dest="input")

def main(argv=None):
  options = PipelineOptions(flags=argv)
  setup_options = options.view_as(SetupOptions)

  wordcount_options = options.view_as(WordcountOptions)

  with beam.Pipeline(options=setup_options) as p:
    lines = p | 'read' >> ReadFromText(wordcount_options.input)

if __name__ == '__main__':
  main()
Run Code Online (Sandbox Code Playgroud)

这是带有暂存脚本的完整存储库https://github.com/firemuzzy/dataflow-templates-bug-python3

之前有一个类似的问题,但我不确定它是如何相关的,因为它是在 python 2.7 中完成的,但我的模板在 2.7 中阶段正常但在 3.7 …

python python-3.x google-cloud-dataflow apache-beam

5
推荐指数
1
解决办法
1036
查看次数

Apache Beam - Bigquery 流插入显示 RuntimeException: ManagedChannel 分配站点

我正在 Google Dataflow 中运行流式 Apache 光束管道。它从 Kafka 读取数据并将流式插入到 Bigquery。

但是在 bigquery 流插入步骤中,它抛出了大量警告 -

    java.lang.RuntimeException: ManagedChannel allocation site
at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init> (ManagedChannelOrphanWrapper.java:93)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:53)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:44)
at io.grpc.internal.ManagedChannelImplBuilder.build (ManagedChannelImplBuilder.java:612)
at io.grpc.internal.AbstractManagedChannelImplBuilder.build (AbstractManagedChannelImplBuilder.java:261)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel (InstantiatingGrpcChannelProvider.java:340)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.access$1600 (InstantiatingGrpcChannelProvider.java:73)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider$1.createSingleChannel (InstantiatingGrpcChannelProvider.java:214)
at com.google.api.gax.grpc.ChannelPool.create (ChannelPool.java:72)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel (InstantiatingGrpcChannelProvider.java:221)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel (InstantiatingGrpcChannelProvider.java:204)
at com.google.api.gax.rpc.ClientContext.create (ClientContext.java:169)
at com.google.cloud.bigquery.storage.v1beta2.stub.GrpcBigQueryWriteStub.create (GrpcBigQueryWriteStub.java:138)
at com.google.cloud.bigquery.storage.v1beta2.stub.BigQueryWriteStubSettings.createStub (BigQueryWriteStubSettings.java:145)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.<init> (BigQueryWriteClient.java:128)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.create (BigQueryWriteClient.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.newBigQueryWriteClient (BigQueryServicesImpl.java:1255)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.access$800 (BigQueryServicesImpl.java:135)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:521)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:449)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.getDatasetService (BigQueryServicesImpl.java:169) …
Run Code Online (Sandbox Code Playgroud)

google-bigquery google-cloud-dataflow apache-beam

5
推荐指数
1
解决办法
516
查看次数

使用数据流在Google Cloud Platform中加入两个json

我想从两个不同的JSON文件中找出女性员工,只选择我们感兴趣的字段并将输出写入另一个JSON.

此外,我正在尝试使用Dataflow在Google的云平台上实现它.有人可以提供任何可以实现的示例Java代码来获得结果.

员工JSON

{"emp_id":"OrgEmp#1","emp_name":"Adam","emp_dept":"OrgDept#1","emp_country":"USA","emp_gender":"female","emp_birth_year":"1980","emp_salary":"$100000"}
{"emp_id":"OrgEmp#1","emp_name":"Scott","emp_dept":"OrgDept#3","emp_country":"USA","emp_gender":"male","emp_birth_year":"1985","emp_salary":"$105000"}
Run Code Online (Sandbox Code Playgroud)

部门JSON

{"dept_id":"OrgDept#1","dept_name":"Account","dept_start_year":"1950"}
{"dept_id":"OrgDept#2","dept_name":"IT","dept_start_year":"1990"}
{"dept_id":"OrgDept#3","dept_name":"HR","dept_start_year":"1950"}
Run Code Online (Sandbox Code Playgroud)

预期的输出JSON文件应该是这样的

{"emp_id":"OrgEmp#1","emp_name":"Adam","dept_name":"Account","emp_salary":"$100000"}
Run Code Online (Sandbox Code Playgroud)

json google-cloud-dataflow apache-beam

4
推荐指数
1
解决办法
1512
查看次数

从 Python 中的字典值绘制直方图

我正在尝试绘制直方图。这个直方图的数据来自一个包含频率列表的字典,我需要的只是绘制:

  • 直方图或,

  • 每个元素的值的条形图(直方图可以从这里导出:))

以下是字典外观的示例:

{0: 282, 1: 152, 2: 131, 3: 122, 4: 108, 5: 101, 6: 106, 7: 91, 8: 96, 9: 92,
...
1147: 1, 1157: 1, 1186: 1, 1217: 1, 1236: 1, 1251: 1, 1255: 1, 1291: 1, 1372: 1, 1402: 1}
Run Code Online (Sandbox Code Playgroud)

非常感谢。

python plot dictionary

3
推荐指数
1
解决办法
3149
查看次数

emacs术语模式无法识别命令

我最近升级到emacs 24.4,每当我尝试在术语中执行命令时(例如C-x C-f打开文件),它就会说C-x C-f is undefined.

如何启用命令以在术语模式下运行?

emacs

3
推荐指数
1
解决办法
366
查看次数

在 Beam 中初始化外部服务连接

我正在编写一个数据流流管道。在其中一个转换中,DoFn 我想要访问外部服务 - 在本例中,它是数据存储区。

这种初始化步骤有没有最佳实践?我不想为每个 processElement 方法调用创建数据存储连接对象。

java google-cloud-datastore google-cloud-dataflow apache-beam

3
推荐指数
1
解决办法
1265
查看次数

如何使用Apache Beam Python SDK使用ParDo过滤PCollection的元素

我有一个PCollection,我想使用ParDo从中筛选出一些元素。

在这里可以找到一个例子吗?

google-cloud-dataflow apache-beam

3
推荐指数
1
解决办法
2617
查看次数

addng ReadAllFromText 转换时管道失败

我正在尝试在 Apache Beam 中运行一个非常简单的程序来尝试它是如何工作的。

import apache_beam as beam


class Split(beam.DoFn):
    def process(self, element):
        return element


with beam.Pipeline() as p:
    rows = (p | beam.io.ReadAllFromText(
        "input.csv") | beam.ParDo(Split()))
Run Code Online (Sandbox Code Playgroud)

运行此程序时,我收到以下错误

.... some more stack....
 File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/util.py", line 565, in expand
    windowing_saved = pcoll.windowing
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
    self.producer.inputs)
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
    return inputs[0].windowing
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
    self.producer.inputs)
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
    return inputs[0].windowing
AttributeError: 'PBegin' object has no attribute 'windowing'
Run Code Online (Sandbox Code Playgroud)

知道这里出了什么问题吗?

谢谢

apache-beam apache-beam-io

3
推荐指数
1
解决办法
1396
查看次数