我希望我的JSON看起来像这样:
{
"information": [{
"timestamp": "xxxx",
"feature": "xxxx",
"ean": 1234,
"data": "xxxx"
}, {
"timestamp": "yyy",
"feature": "yyy",
"ean": 12345,
"data": "yyy"
}]
}
Run Code Online (Sandbox Code Playgroud)
代码到目前为止:
import java.util.List;
public class ValueData {
private List<ValueItems> information;
public ValueData(){
}
public List<ValueItems> getInformation() {
return information;
}
public void setInformation(List<ValueItems> information) {
this.information = information;
}
@Override
public String toString() {
return String.format("{information:%s}", information);
}
}
Run Code Online (Sandbox Code Playgroud)
和
public class ValueItems {
private String timestamp;
private String feature;
private int ean;
private String data; …
Run Code Online (Sandbox Code Playgroud) 日志文件如下所示:
Time stamp,activity,-,User,-,id,-,data
Run Code Online (Sandbox Code Playgroud)
-
2013-01-08T16:21:35.561+0100,reminder,-,User1234,-,131235467,-,-
2013-01-02T15:57:24.024+0100,order,-,User1234,-,-,-,{items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
2013-01-08T16:21:35.561+0100,login,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,reminder,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,order,-,User45687,-,-,-,{items:[{"prd":"1315467","count": 5, "amount": 11.6},{"prd": "133545", "count": 1, "amount": 55.99}], oid: 5556}
...
...
Run Code Online (Sandbox Code Playgroud)
编辑
此日志中的具体示例:
User1234
已经得到了reminder
-这reminder
有id
= 131235467
,在此之后,他作出了order
与以下data
:{items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
在这种情况下id
和prd
的data
都是一样的,所以我想总结一下count
*amount
- >在这种情况下,5*11.6 = 58和输出它像
User 1234 Prdsum: 58
Run Code Online (Sandbox Code Playgroud)
User45687
还有一个做order
,但他并没有收到 …
Mapper/Reducer 1 --> (key,value)
/ | \
/ | \
Mapper/Reducer 2 | Mapper/Reducer 4
-> (oKey,oValue) | -> (xKey, xValue)
|
|
Mapper/Reducer 3
-> (aKey, aValue)
Run Code Online (Sandbox Code Playgroud)
我有一个日志文件,我与MR1聚合.Mapper2,Mapper3,Mapper4将MR1的输出作为输入.乔布斯被束缚住了.
MR1输出:
User {infos of user:[{data here},{more data},{etc}]}
..
Run Code Online (Sandbox Code Playgroud)
MR2输出:
timestamp idCount
..
Run Code Online (Sandbox Code Playgroud)
MR3输出:
timestamp loginCount
..
Run Code Online (Sandbox Code Playgroud)
MR4输出:
timestamp someCount
..
Run Code Online (Sandbox Code Playgroud)
我想结合MR2-4的输出:最终输出 - >
timestamp idCount loginCount someCount
..
..
..
Run Code Online (Sandbox Code Playgroud)
没有猪或蜂巢的方式吗?我正在使用Java.
我想将输出分隔符更改为; 而不是制表符.我已经尝试过: Hadoop:键和值在输出文件中以制表符分隔.怎么做以分号分隔? 但我的输出仍然是
key (tab) value
Run Code Online (Sandbox Code Playgroud)
我正在使用Cloudera Demo(CDH 4.1.3).这是我的代码:
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapreduce.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;
Run Code Online (Sandbox Code Playgroud)
我想要
key;value
Run Code Online (Sandbox Code Playgroud)
作为我的输出.
.txt看起来像:
2013-04-10;248179;5431;5375.30€;1.49
..
..
..
Run Code Online (Sandbox Code Playgroud)
我需要带有标题的.csv文件:
Date Visit Login Euro Rate
2013-04-10 248179 5431 5375.30€ 1.49
.. .. .. .. ..
.. .. .. .. ..
Run Code Online (Sandbox Code Playgroud)
有没有办法用BASH得到这个结果?