标签: flume-twitter

Flume Twitter Agent生成的Avro Text文件未在Java中读取

无法使用Flume twitter代理读取和解析通过流式传输Twitter数据创建的文件,既不使用Java也不使用Avro Tools.我的要求是将avro格式转换为JSON格式.

当使用任何一种方法时,我得到例外: org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40

我在伪节点集群中使用Hadoop vanilla配置,而hadoop版本是2.7.1

Flume版本是1.6.0

twitter代理的flume配置文件和解析avro文件的java代码如下:

TwitterAgent.sources=Twitter
TwitterAgent.channels=MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel

TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxxxxx

TwitterAgent.sources.Twitter.keywords=Modi,PMO,Narendra Modi,BJP

TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/ashish/Twitter_Data
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10
TwitterAgent.sinks.HDFS.hdfs.rollInterval=30
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
Run Code Online (Sandbox Code Playgroud)
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.FileReader;
import org.apache.avro.file.SeekableInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.mapred.FsInput;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;

import java.io.IOException;

public class AvroReader {

    public static void main(String[] args) throws IOException {
        Path path = new Path("hdfs://localhost:9000/user/ashish/Twitter_Data/FlumeData.1449656815028"); …
Run Code Online (Sandbox Code Playgroud)

java flume avro flume-ng flume-twitter

7
推荐指数
1
解决办法
676
查看次数

Flume不使用Hadoop 2.5 cdh5.3处理来自Twitter源的关键字

我正在尝试用MemChannel和处理一些推特关键字HDFS.但是在控制台上flume-ng显示HDFS started状态后没有显示进一步的进展.

这是/etc/flume-ns/conf/flume-env.sh文件内容.

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy …
Run Code Online (Sandbox Code Playgroud)

flume-ng flume-twitter

5
推荐指数
1
解决办法
3444
查看次数

Flume事件标头中的预期时间戳,但它为null

我使用下面的配置细节使用Flume将Twitter提要推送到HDFS,但是在Flume事件头中获得了预期的时间戳,但它是null

twitter.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken =  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Run Code Online (Sandbox Code Playgroud)

运行命令

$ flume-ng agent --conf-file twitter.conf --name TwitterAgent
Run Code Online (Sandbox Code Playgroud)

twitter.conf我的配置文件名在哪里

但是将错误视为:

java.lang.NullPointerException: …
Run Code Online (Sandbox Code Playgroud)

flume flume-ng flume-twitter

3
推荐指数
1
解决办法
4990
查看次数

标签 统计

flume-ng ×3

flume-twitter ×3

flume ×2

avro ×1

java ×1