Rob*_*bin 11 java csv large-files opencsv
我试图读取大CSV
和TSV
(Tab sepperated)文件大约1000000
行或更多.现在我试着读一下TSV
含有的~2500000
行opencsv
,但是它却引发了我的注意java.lang.NullPointerException
.它适用于TSV
带有~250000
线条的较小文件.所以我想知道是否有任何其他Libraries
支持阅读巨大CSV
和TSV
文件.你有什么想法?
每个对我的代码感兴趣的人(我缩短它,所以Try-Catch
显然无效):
InputStreamReader in = null;
CSVReader reader = null;
try {
in = this.replaceBackSlashes();
reader = new CSVReader(in, this.seperator, '\"', this.offset);
ret = reader.readAll();
} finally {
try {
reader.close();
}
}
Run Code Online (Sandbox Code Playgroud)
编辑:这是我构建的方法InputStreamReader
:
private InputStreamReader replaceBackSlashes() throws Exception {
FileInputStream fis = null;
Scanner in = null;
try {
fis = new FileInputStream(this.csvFile);
in = new Scanner(fis, this.encoding);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while (in.hasNext()) {
String nextLine = in.nextLine().replace("\\", "/");
// nextLine = nextLine.replaceAll(" ", "");
nextLine = nextLine.replaceAll("'", "");
out.write(nextLine.getBytes());
out.write("\n".getBytes());
}
return new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
} catch (Exception e) {
in.close();
fis.close();
this.logger.error("Problem at replaceBackSlashes", e);
}
throw new Exception();
}
Run Code Online (Sandbox Code Playgroud)
Jer*_*kes 13
不要使用CSV解析器来解析TSV输入.例如,如果TSV具有带引号字符的字段,它将会中断.
uniVocity-parsers附带一个TSV解析器.您可以毫无问题地解析十亿行.
解析TSV输入的示例:
TsvParserSettings settings = new TsvParserSettings();
TsvParser parser = new TsvParser(settings);
// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new FileReader(yourFile));
Run Code Online (Sandbox Code Playgroud)
如果您的输入太大,则无法保存在内存中,请执行以下操作:
TsvParserSettings settings = new TsvParserSettings();
// all rows parsed from your input will be sent to this processor
ObjectRowProcessor rowProcessor = new ObjectRowProcessor() {
@Override
public void rowProcessed(Object[] row, ParsingContext context) {
//here is the row. Let's just print it.
System.out.println(Arrays.toString(row));
}
};
// the ObjectRowProcessor supports conversions from String to whatever you need:
// converts values in columns 2 and 5 to BigDecimal
rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(2, 5);
// converts the values in columns "Description" and "Model". Applies trim and to lowercase to the values in these columns.
rowProcessor.convertFields(Conversions.trim(), Conversions.toLowerCase()).set("Description", "Model");
//configures to use the RowProcessor
settings.setRowProcessor(rowProcessor);
TsvParser parser = new TsvParser(settings);
//parses everything. All rows will be pumped into your RowProcessor.
parser.parse(new FileReader(yourFile));
Run Code Online (Sandbox Code Playgroud)
披露:我是这个图书馆的作者.它是开源和免费的(Apache V2.0许可证).
我没有尝试过,但我之前曾调查过superCSV.
http://sourceforge.net/projects/supercsv/
http://supercsv.sourceforge.net/
检查这是否适合您,250万行.
归档时间: |
|
查看次数: |
18223 次 |
最近记录: |