无法使用Flume twitter代理读取和解析通过流式传输Twitter数据创建的文件,既不使用Java也不使用Avro Tools.我的要求是将avro格式转换为JSON格式.
当使用任何一种方法时,我得到例外: org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
我在伪节点集群中使用Hadoop vanilla配置,而hadoop版本是2.7.1
Flume版本是1.6.0
twitter代理的flume配置文件和解析avro文件的java代码如下:
TwitterAgent.sources=Twitter
TwitterAgent.channels=MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords=Modi,PMO,Narendra Modi,BJP
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/ashish/Twitter_Data
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10
TwitterAgent.sinks.HDFS.hdfs.rollInterval=30
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100
Run Code Online (Sandbox Code Playgroud)
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.FileReader;
import org.apache.avro.file.SeekableInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.mapred.FsInput;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
public class AvroReader {
    public static void main(String[] args) throws IOException {
        Path path = new Path("hdfs://localhost:9000/user/ashish/Twitter_Data/FlumeData.1449656815028"); …Run Code Online (Sandbox Code Playgroud) 我正在尝试用MemChannel和处理一些推特关键字HDFS.但是在控制台上flume-ng显示HDFS started状态后没有显示进一步的进展.
这是/etc/flume-ns/conf/flume-env.sh文件内容.
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy …Run Code Online (Sandbox Code Playgroud) 我使用下面的配置细节使用Flume将Twitter提要推送到HDFS,但是在Flume事件头中获得了预期的时间戳,但它是null
twitter.conf
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken =  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Run Code Online (Sandbox Code Playgroud)
运行命令
$ flume-ng agent --conf-file twitter.conf --name TwitterAgent
Run Code Online (Sandbox Code Playgroud)
twitter.conf我的配置文件名在哪里  
但是将错误视为:
java.lang.NullPointerException: …Run Code Online (Sandbox Code Playgroud)