fit*_*ida 1 csv apache apache-nifi
I have a csv files, that it has the following structure.
ERP,J,JACKSON,8388 SOUTH CALIFORNIA ST.,TUCSON,AZ,85708,267-3352,,ALLENTON,MI,48002,810,710-0470,369-98-6555,462-11-4610,1953-05-00,F,
MARKETING,J,JACKSON,8388 SOUTH CALIFORNIA ST.,TUCSON,AZ,85708,267-3352,,ALLENTON,MI,48002,810,710-0470,369-98-6555,462-11-4610,1953-05-00,F,
Run Code Online (Sandbox Code Playgroud)
As you can see there is not header, but for your information the first part (first column) represents the sector where are getting the data.
What I have to do is depending on the first column value, for example (MARKETING or ERP) I have to send all that rows to a different output directory.
For example, all rows with ERP to /output/ERP/ all rows with MARKETING to /output/marketing/
I have an idea about how to do it, but my problem is about the RouteOnAttribute processor I am using, I don't know how to refer to the first column and to indicate what is the value (ERP or MARKETING) to later on send it to the correct output directory.
Here is my schema.
Thanks.
PartitionRecord
在这种情况下,请使用处理器。
使用配置处理器record reader/writer controller services
。即使没有标题,也可以在avro模式中使用col1,col2 ... etc。
现在分区记录处理器添加partition field attribute
with值,通过利用此属性值,我们可以dynamically store files
将其动态地放入受尊重的目录中。
流:
1.GetFile
2.PartitionRecord
3.PutFile //configure directory as /output/${<keep_partition_field_name_here>}
Run Code Online (Sandbox Code Playgroud)
请参考此链接以配置分区记录处理器的用法。
(要么)
旧方法:
使用RouteText
处理器而不是SplitText + RouteOnAttribute
处理器
将RouteText处理器配置为
使用ERP/MARKETING
连接连接到PutFile处理器,并使用RouteText.Route
属性值将文件动态保存到目录中。
流:
1.GetFile
2.RouteText
3.PutFile //configure directory as /output/${RouteText.Route}/
Run Code Online (Sandbox Code Playgroud)
您还可以使用“ 组正则表达式”属性值来创建分区。
注意
使用PartitionRecord处理器将比RouteText处理器更有效。
归档时间: |
|
查看次数: |
667 次 |
最近记录: |