使用CsvBeanReader读取具有可变列数的CSV文件

Question

使用CsvBeanReader读取具有可变列数的CSV文件

所以我正在解析.csv文件.我在StackOverflow上的某个地方接受了另一个线程的建议并下载了SuperCSV.我终于完成了一切工作,但现在我遇到了一个似乎很难解决的错误.

出现此问题的原因可能是也可能未填充最后两列数据.下面是.csv文件的示例,第一行缺少最后一列,第二行完全完成:

2012:07:25,11:48:20,922,"uLog.exe","",按键,1246,341,-1.00,-1.00,1.00,班次2012:07:25,11:48:21,094," uLog.exe","",按下键,1246,341,-1.00,-1.00,1.00,b,Shift

根据我对Super CSV Javadoc的理解,如果存在可变数量的列,则无法使用CsvBeanReader填充Java Bean .这似乎真的很愚蠢,因为我觉得在初始化Bean时,应该允许这些缺少的列为null或其他一些默认值.

作为参考,这是我的解析器的完整代码:

public class ULogParser {

String uLogFileLocation;
String screenRecorderFileLocation;

private static final CellProcessor[] cellProcessor = new CellProcessor[] {
    new ParseDate("yyyy:MM:dd"),
    new ParseDate("HH:mm:ss"),
    new ParseDate("SSS"),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
    new ParseInt(),
    new ParseInt(),
    new ParseDouble(),
    new ParseDouble(),
    new ParseDouble(),
    new StrMinMax(0, 100),
    new StrMinMax(0, 100),
};

public String[] header = {"Date", "Time", "Msec", "Application", "Window", "Message", "X", "Y", "RelDist", "TotalDist", "Rate", "Extra1", "Extra2"}; 

public ULogParser(String uLogFileLocation, String screenRecorderFileLocation)
{
    this.uLogFileLocation = uLogFileLocation;
    this.screenRecorderFileLocation = screenRecorderFileLocation;
}

public void parse()
{
    try {
        ICsvBeanReader reader = new CsvBeanReader(new BufferedReader(new FileReader(uLogFileLocation)), CsvPreference.STANDARD_PREFERENCE);
        reader.getCSVHeader(false); //parse past the header
        Entry entry;
        entry = reader.read(Entry.class, header, cellProcessor);
        System.out.println(entry.Application);
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

public void sendToDB()
{
    Query query = new Query();
}
}

Run Code Online (Sandbox Code Playgroud)

而Entry类的代码:

public class Entry
{
private Date Date;
private Date Time;
private Date Msec;
private String Application;
private String Window;
private String Message;
private int X;
private int Y;
private double RelDist;
private double TotalDist;
private double Rate;
private String Extra1;
private String Extra2;

public Date getDate() { return Date; }
public Date getTime() { return Time; }
public Date getMsec() { return Msec; }
public String getApplication() { return Application; }
public String getWindow() { return Window; }
public String getMessage() { return Message; }
public int getX() { return X; }
public int getY() { return Y; }
public double getRelDist() { return RelDist; }
public double getTotalDist() { return TotalDist; }
public double getRate() { return Rate; }
public String getExtra1() { return Extra1; }
public String getExtra2() { return Extra2; }

public void setDate(Date Date) { this.Date = Date; }
public void setTime(Date Time) { this.Time = Time; }
public void setMsec(Date Msec) { this.Msec = Msec; }
public void setApplication(String Application) { this.Application = Application; }
public void setWindow(String Window) { this.Window = Window; }
public void setMessage(String Message) { this.Message = Message; }
public void setX(int X) { this.X = X; }
public void setY(int Y) { this.Y = Y; }
public void setRelDist(double RelDist) { this.RelDist = RelDist; }
public void setTotalDist(double TotalDist) { this.TotalDist = TotalDist; }
public void setRate(double Rate) { this.Rate = Rate; }
public void setExtra1(String Extra1) { this.Extra1 = Extra1; }
public void setExtra2(String Extra2) { this.Extra2 = Extra2; }

public Entry(){}
}

Run Code Online (Sandbox Code Playgroud)

我接收到的异常(请注意,这与上面的示例不同,缺少最后两列):

Exception in thread "main" The value array (size 12)  must match the processors array (size 13): You are probably reading a CSV line with a different number of columns than the number of cellprocessors specified context: Line: 2 Column: 0 Raw line:
[2012:07:25, 11:48:05, 740, uLog.exe,  , Logging started, -1, -1, -1.00, -1.00, -1.00, ]
 offending processor: null
    at org.supercsv.util.Util.processStringList(Unknown Source)
    at org.supercsv.io.CsvBeanReader.read(Unknown Source)
    at processing.ULogParser.parse(ULogParser.java:59)
    at ui.ParseImplicitData.main(ParseImplicitData.java:15)

是的,写下所有那些吸气剂和制定者是痛苦的屁股.另外,我道歉,在使用SuperCSV时可能没有完美的约定(如果你只想要未经修改的字符串就像使用CellProcessor那样),但是你明白了.此外,此代码显然不完整.目前,我只是想成功检索一行数据.

此时,我想知道是否可以为我的目的使用CsvBeanReader.如果没有,我有点失望,因为CsvListReader(我会发布超链接,但StackOverflow也不允许我,也是哑巴)就像没有使用API一样容易,只是使用Scanner.next ().

任何帮助,将不胜感激.提前致谢!

Answer 1

Jam*_*ett 4

编辑：Super CSV 2.0.0-beta-1更新

请注意，Super CSV 2.0.0-beta-1 中的 API 已更改（代码示例基于 1.52）。getCSVHeader()现在对所有读者的方法是（与作者getHeader()一致）。writeHeader

另外，SuperCSVException已更名为SuperCsvException.

编辑： Super CSV 2.1.0 更新

从版本 2.1.0 开始，可以使用新方法读取一行 CSV后executeProcessors()执行单元处理器。有关更多信息，请参阅项目网站上的此示例。请注意，这仅与相关CsvListReader，因为它是唯一允许可变列长度的阅读器。

您是对的 -CsvBeanReader不支持具有可变列数的 CSV 文件。根据大多数 CSV 规范（包括RFC 4180），每行的列数必须相同。

出于这个原因（作为超级 CSV 开发人员）我不愿意将此功能添加到超级 CSV 中。如果您能想到一种优雅的方式来添加它，那么请随时在项目的 SourceForge 站点上提出建议。这可能意味着一个新的阅读器扩展CsvBeanReader：它必须将读取和映射/处理分成两个单独的方法（除非您知道有多少列，否则您不能对 bean 的字段进行任何处理或映射）。

简单的解决方案

对此的简单解决方案（如果您可以控制正在使用的 CSV 文件）是在编写 CSV 文件时添加一个空白列（示例中的第一行末尾有一个逗号 - 表示最后一列为空）。这样，您的 CSV 文件将有效（每行的列数相同），并且您可以CsvBeanReader像您已经在做的那样使用。

如果那不可能，那么一切都还没有丢失！

奇特的解决方案

您可能已经意识到，CsvBeanReader使用名称映射将 CSV 文件中的每一列与 bean 中的字段关联起来，并使用 CellProcessor 数组来处理每一列。换句话说，如果您想使用它，您必须知道有多少列（以及它们代表什么）。

CsvListReader另一方面，它非常原始，可以读取不同长度的行（因为它不需要处理或映射它们）。

因此，您可以通过使用两个读取器并行读取文件来组合CsvBeanReaderwith的所有功能（如以下示例中所示）：使用来计算有多少列，并进行处理/映射。CsvListReaderCsvListReaderCsvBeanReader

请注意，这假设只有birthDate 列可能不存在（即，如果您无法判断缺少哪一列，则该列将不起作用）。

package example;

import java.io.StringReader;
import java.util.Date;

import org.supercsv.cellprocessor.ParseDate;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.exception.SuperCSVException;
import org.supercsv.io.CsvBeanReader;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.ICsvBeanReader;
import org.supercsv.io.ICsvListReader;
import org.supercsv.prefs.CsvPreference;

public class VariableColumns {

    private static final String INPUT = "name,birthDate,city\n"
        + "John,New York\n" 
        + "Sally,22/03/1974,London\n" 
        + "Jim,Sydney";

    // cell processors
    private static final CellProcessor[] NORMAL_PROCESSORS = 
    new CellProcessor[] {null, new ParseDate("dd/MM/yyyy"), null };
    private static final CellProcessor[] NO_BIRTHDATE_PROCESSORS = 
    new CellProcessor[] {null, null };

    // name mappings
    private static final String[] NORMAL_HEADER = 
    new String[] { "name", "birthDate", "city" };
    private static final String[] NO_BIRTHDATE_HEADER = 
    new String[] { "name", "city" };

    public static void main(String[] args) {

        // using bean reader and list reader together (to read the same file)
        final ICsvBeanReader beanReader = new CsvBeanReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);
        final ICsvListReader listReader = new CsvListReader(new StringReader(
                INPUT), CsvPreference.STANDARD_PREFERENCE);

        try {
            // skip over header
            beanReader.getCSVHeader(true);
            listReader.getCSVHeader(true);

            while (listReader.read() != null) {

                final String[] nameMapping;
                final CellProcessor[] processors;

                if (listReader.length() == NORMAL_HEADER.length) {
                    // all columns present - use normal header/processors
                    nameMapping = NORMAL_HEADER;
                    processors = NORMAL_PROCESSORS;

                } else if (listReader.length() == NO_BIRTHDATE_HEADER.length) {
                    // one less column - birth date must be missing
                    nameMapping = NO_BIRTHDATE_HEADER;
                    processors = NO_BIRTHDATE_PROCESSORS;

                } else {
                    throw new SuperCSVException(
                            "unexpected number of columns: "
                                    + listReader.length());
                }

                // can now use CsvBeanReader safely 
                // (we know how many columns there are)
                Person person = beanReader.read(Person.class, nameMapping,
                        processors);

                System.out.println(String.format(
                        "Person: name=%s, birthDate=%s, city=%s",
                        person.getName(), person.getBirthDate(),
                        person.getCity()));

            }
        } catch (Exception e) {
            // handle exceptions here
            e.printStackTrace();
        } finally {
            // close readers here
        }
    }

    public static class Person {

        private String name;
        private Date birthDate;
        private String city;

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public Date getBirthDate() {
            return birthDate;
        }

        public void setBirthDate(Date birthDate) {
            this.birthDate = birthDate;
        }

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }
    }

}

Run Code Online (Sandbox Code Playgroud)

我希望这有帮助。

哦，您的类中的字段是否有任何原因Entry不遵循正常的命名约定（驼峰命名法）？如果您更新header数组以使用驼峰命名法，那么您的字段也可以是驼峰命名法。

归档时间：	13 年，7 月前
查看次数：	13655 次
最近记录：	10 年，2 月前