用于解析CSV数据的Javascript代码

Pie*_*ois 202 javascript csv

有人知道我在哪里可以找到一些解析CSV数据的JavaScript代码吗?

Kir*_*tan 247

您可以使用此博客条目中提到的CSVToArray()函数.

<script type="text/javascript">
    // ref: http://stackoverflow.com/a/1293163/2343
    // This will parse a delimited string into an array of
    // arrays. The default delimiter is the comma, but this
    // can be overriden in the second argument.
    function CSVToArray( strData, strDelimiter ){
        // Check to see if the delimiter is defined. If not,
        // then default to comma.
        strDelimiter = (strDelimiter || ",");

        // Create a regular expression to parse the CSV values.
        var objPattern = new RegExp(
            (
                // Delimiters.
                "(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +

                // Quoted fields.
                "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +

                // Standard fields.
                "([^\"\\" + strDelimiter + "\\r\\n]*))"
            ),
            "gi"
            );


        // Create an array to hold our data. Give the array
        // a default empty first row.
        var arrData = [[]];

        // Create an array to hold our individual pattern
        // matching groups.
        var arrMatches = null;


        // Keep looping over the regular expression matches
        // until we can no longer find a match.
        while (arrMatches = objPattern.exec( strData )){

            // Get the delimiter that was found.
            var strMatchedDelimiter = arrMatches[ 1 ];

            // Check to see if the given delimiter has a length
            // (is not the start of string) and if it matches
            // field delimiter. If id does not, then we know
            // that this delimiter is a row delimiter.
            if (
                strMatchedDelimiter.length &&
                strMatchedDelimiter !== strDelimiter
                ){

                // Since we have reached a new row of data,
                // add an empty row to our data array.
                arrData.push( [] );

            }

            var strMatchedValue;

            // Now that we have our delimiter out of the way,
            // let's check to see which kind of value we
            // captured (quoted or unquoted).
            if (arrMatches[ 2 ]){

                // We found a quoted value. When we capture
                // this value, unescape any double quotes.
                strMatchedValue = arrMatches[ 2 ].replace(
                    new RegExp( "\"\"", "g" ),
                    "\""
                    );

            } else {

                // We found a non-quoted value.
                strMatchedValue = arrMatches[ 3 ];

            }


            // Now that we have our value string, let's add
            // it to the data array.
            arrData[ arrData.length - 1 ].push( strMatchedValue );
        }

        // Return the parsed data.
        return( arrData );
    }

</script>
Run Code Online (Sandbox Code Playgroud)

  • 正则表达式中有一个错误:`"([^ \"\\"`应该是``([^ \\"`.否则,在不带引号的值中任何地方的双引号都会过早地结束它.发现这很难... (6认同)
  • 它给__empty fields__的_ undefined`赋予__quoted__.例如:`CSVToArray("4,,6")`给了我`[["4","","6"]]`,但``CSVToArray("4,\"\",6")`给了我`[[ "4",未定义, "6"]]`. (4认同)
  • 我在firefox中遇到过这个问题,而且脚本没有响应.它似乎只影响了一些用户,所以无法找到原因 (3认同)
  • 借用@JoshMc(谢谢!)并添加了标头功能和更强大的字符转义。见 https://gist.github.com/plbowers/7560ae793613ee839151624182133159 (3认同)
  • 这可以处理嵌入的逗号、引号和换行符,例如: var csv = 'id, value\n1, James\n02,"Jimmy Smith, Esq."\n003,"James ""Jimmy"" Smith, III"\ n0004,"James\nSmith\nWuz Here"' var array = CSVToArray(csv, ","); (2认同)
  • 对于正在寻找上述方法的简化版本且使用上述正则表达式修复程序的任何人:https://gist.github.com/Jezternz/c8e9fafc2c114e079829974e3764db75 (2认同)

Eva*_*ice 145

我想我可以充分击败Kirtan的答案

输入jQuery-CSV

它是一个jquery插件,旨在用作将CSV解析为Javascript数据的端到端解决方案.它处理RFC 4180中提供的每个边缘情况,以及一些弹出Excel/Google Spreadsheed导出(即大多数涉及空值)的规范缺少的情况.

例:

曲目,艺术家,专辑,年份

危险,'Busta押韵','当灾难来袭',1997年

// calling this
music = $.csv.toArrays(csv)

// outputs...
[
  ["track","artist","album","year"],
  ["Dangerous","Busta Rhymes","When Disaster Strikes","1997"]
]

console.log(music[1][2]) // outputs: 'When Disaster Strikes'
Run Code Online (Sandbox Code Playgroud)

更新:

哦,是的,我也应该提一下,它是完全可配置的.

music = $.csv.toArrays(csv, {
  delimiter:"'", // sets a custom value delimiter character
  separator:';', // sets a custom field separator character
});
Run Code Online (Sandbox Code Playgroud)

更新2:

它现在也适用于Node.js上的jQuery.因此,您可以选择使用相同的lib进行客户端或服务器端解析.

更新3:

自Google代码关闭以来,jquery-csv已迁移到GitHub.

免责声明:我也是jQuery-CSV的作者.

  • 为什么是jQuery csv?为什么它依赖于jQuery?我已经对源代码进行了快速扫描......看起来你并没有使用jQuery (28认同)
  • @ paulslater19该插件不依赖于jquery.相反,它遵循常见的jQuery开发指南.包含的所有方法都是静态的,并且位于它们自己的命名空间下(即$ .csv).要在没有jQuery的情况下使用它们,只需创建一个全局$对象,插件将在初始化期间绑定到该对象. (17认同)
  • 解决方案代码中的`csv`是指`.csv文件名'吗?我对解析CSV文件的优质JS / JQuery工具感兴趣 (2认同)
  • 鉴于它不依赖于 jQuery,最好删除全局“$”依赖项并让用户传递他们想要的任何对象引用。如果可用,也许默认为 jQuery。还有其他使用“$”的库,开发团队可能会使用这些库的最少代理。 (2认同)

And*_*ner 39

我有一个实现作为电子表格项目的一部分.

此代码尚未经过彻底测试,但欢迎任何人使用它.

正如一些答案所指出的那样,如果您实际拥有DSVTSV文件,那么您的实现可以更加简单,因为它们不允许在值中使用记录和字段分隔符.另一方面,CSV实际上可以在字段内部使用逗号和换行符,从而打破了大多数正则表达式和基于拆分的方法.

var CSV = {
parse: function(csv, reviver) {
    reviver = reviver || function(r, c, v) { return v; };
    var chars = csv.split(''), c = 0, cc = chars.length, start, end, table = [], row;
    while (c < cc) {
        table.push(row = []);
        while (c < cc && '\r' !== chars[c] && '\n' !== chars[c]) {
            start = end = c;
            if ('"' === chars[c]){
                start = end = ++c;
                while (c < cc) {
                    if ('"' === chars[c]) {
                        if ('"' !== chars[c+1]) { break; }
                        else { chars[++c] = ''; } // unescape ""
                    }
                    end = ++c;
                }
                if ('"' === chars[c]) { ++c; }
                while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) { ++c; }
            } else {
                while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) { end = ++c; }
            }
            row.push(reviver(table.length-1, row.length, chars.slice(start, end).join('')));
            if (',' === chars[c]) { ++c; }
        }
        if ('\r' === chars[c]) { ++c; }
        if ('\n' === chars[c]) { ++c; }
    }
    return table;
},

stringify: function(table, replacer) {
    replacer = replacer || function(r, c, v) { return v; };
    var csv = '', c, cc, r, rr = table.length, cell;
    for (r = 0; r < rr; ++r) {
        if (r) { csv += '\r\n'; }
        for (c = 0, cc = table[r].length; c < cc; ++c) {
            if (c) { csv += ','; }
            cell = replacer(r, c, table[r][c]);
            if (/[,\r\n"]/.test(cell)) { cell = '"' + cell.replace(/"/g, '""') + '"'; }
            csv += (cell || 0 === cell) ? cell : '';
        }
    }
    return csv;
}
};
Run Code Online (Sandbox Code Playgroud)

  • 这是我最喜欢的答案之一.它是一个真正的解析器,不是在很多代码中实现的. (7认同)
  • 如果逗号放在行尾,则后面应有一个空单元格。这段代码只是跳到下一行,导致一个“未定义”的单元格。例如,`console.log(CSV.parse("first,last,age\r\njohn,doe,"));` (2认同)

Tre*_*xon 28

这是一个非常简单的CSV解析器,可以使用逗号,换行符和转义双引号来处理带引号的字段.没有分裂或RegEx.它一次扫描输入字符串1-2个字符并构建一个数组.

http://jsfiddle.net/vHKYH/进行测试.

function parseCSV(str) {
    var arr = [];
    var quote = false;  // true means we're inside a quoted field

    // iterate over each character, keep track of current row and column (of the returned array)
    for (var row = 0, col = 0, c = 0; c < str.length; c++) {
        var cc = str[c], nc = str[c+1];        // current character, next character
        arr[row] = arr[row] || [];             // create a new row if necessary
        arr[row][col] = arr[row][col] || '';   // create a new column (start with empty string) if necessary

        // If the current character is a quotation mark, and we're inside a
        // quoted field, and the next character is also a quotation mark,
        // add a quotation mark to the current column and skip the next character
        if (cc == '"' && quote && nc == '"') { arr[row][col] += cc; ++c; continue; }  

        // If it's just one quotation mark, begin/end quoted field
        if (cc == '"') { quote = !quote; continue; }

        // If it's a comma and we're not in a quoted field, move on to the next column
        if (cc == ',' && !quote) { ++col; continue; }

        // If it's a newline (CRLF) and we're not in a quoted field, skip the next character
        // and move on to the next row and move to column 0 of that new row
        if (cc == '\r' && nc == '\n' && !quote) { ++row; col = 0; ++c; continue; }

        // If it's a newline (LF or CR) and we're not in a quoted field,
        // move on to the next row and move to column 0 of that new row
        if (cc == '\n' && !quote) { ++row; col = 0; continue; }
        if (cc == '\r' && !quote) { ++row; col = 0; continue; }

        // Otherwise, append the current character to the current column
        arr[row][col] += cc;
    }
    return arr;
}
Run Code Online (Sandbox Code Playgroud)

  • 这对我也有用.我不得不做一个修改,但允许正确处理换行符:`if(cc =='\ r'&& nc =='\n'&&!quote){++ row; col = 0; ++℃; 继续; } if(cc =='\n'&&!quote){++ row; col = 0; 继续; }` (3认同)
  • 这似乎更清洁,更直接.我必须解析一个4mb的文件,其他的答案在ie8中崩溃了,但是这个管理了它. (2认同)

Tre*_*xon 14

这是我的PEG(.js)语法,似乎在RFC 4180上可以正常运行(即它处理http://en.wikipedia.org/wiki/Comma-separated_values上的示例):

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]
Run Code Online (Sandbox Code Playgroud)

http://jsfiddle.net/knvzk/10http://pegjs.majda.cz/online上试一试.通过https://gist.github.com/3362830下载生成的解析器.

  • 打的好.+1让我转向PEG.我喜欢解析器生成器."为什么要在五天内手工编程,你可以花五年的时间自动化?" - Terence Parr,ANTLR (6认同)
  • PEG?对于Type III语法而言,不会为AST构建一个重要的内存.它可以处理包含换行符的字段,因为这是"常规语法"解析器中最难处理的情况.无论哪种方式,+1都是一种新颖的方法. (2认同)
  • 很好......仅凭这一点,它比我见过的所有实现中的95%要好.如果要检查完整的RFC兼容性,请查看此处的测试(http://jquery-csv.googlecode.com/git/test/test.html). (2认同)

dt1*_*192 14

csvToArray v1.3

紧凑(645字节)但兼容的功能,可将CSV字符串转换为2D数组,符合RFC4180标准.

https://code.google.com/archive/p/csv-to-array/downloads

常见用法:jQuery

 $.ajax({
        url: "test.csv",
        dataType: 'text',
        cache: false
 }).done(function(csvAsString){
        csvAsArray=csvAsString.csvToArray();
 });
Run Code Online (Sandbox Code Playgroud)

常见用法:Javascript

csvAsArray = csvAsString.csvToArray();
Run Code Online (Sandbox Code Playgroud)

覆盖字段分隔符

csvAsArray = csvAsString.csvToArray("|");
Run Code Online (Sandbox Code Playgroud)

覆盖记录分隔符

csvAsArray = csvAsString.csvToArray("", "#");
Run Code Online (Sandbox Code Playgroud)

覆盖跳过标题

csvAsArray = csvAsString.csvToArray("", "", 1);
Run Code Online (Sandbox Code Playgroud)

覆盖所有

csvAsArray = csvAsString.csvToArray("|", "#", 1);
Run Code Online (Sandbox Code Playgroud)


Ste*_*uan 8

这是另一个解决方案。这使用:

  • 用于分割 CSV 字符串的粗略全局正则表达式(包括周围的引号和尾随逗号)
  • 细粒度正则表达式,用于清理周围的引号和尾随逗号
  • 此外,还具有区分字符串、数字、布尔值和空值的类型校正

对于以下输入字符串:

"This is\, a value",Hello,4,-123,3.1415,'This is also\, possible',true,
Run Code Online (Sandbox Code Playgroud)

代码输出:

[
  "This is, a value",
  "Hello",
  4,
  -123,
  3.1415,
  "This is also, possible",
  true,
  null
]
Run Code Online (Sandbox Code Playgroud)

这是我在可运行代码片段中对 parseCSVLine() 的实现:

"This is\, a value",Hello,4,-123,3.1415,'This is also\, possible',true,
Run Code Online (Sandbox Code Playgroud)

  • 这非常整洁!这是 npm 包的一部分吗? (2认同)

小智 5

这是我的简单 JavaScript 代码:

let a = 'one,two,"three, but with a comma",four,"five, with ""quotes"" in it.."'
console.log(splitQuotes(a))

function splitQuotes(line) {
  if(line.indexOf('"') < 0) 
    return line.split(',')

  let result = [], cell = '', quote = false;
  for(let i = 0; i < line.length; i++) {
    char = line[i]
    if(char == '"' && line[i+1] == '"') {
      cell += char
      i++
    } else if(char == '"') {
      quote = !quote;
    } else if(!quote && char == ',') {
      result.push(cell)
      cell = ''
    } else {
      cell += char
    }
    if ( i == line.length-1 && cell) {
      result.push(cell)
    }
  }
  return result
}
Run Code Online (Sandbox Code Playgroud)


归档时间:

查看次数:

306951 次

最近记录:

7 年,2 月 前