从nodejs中包含重音字符的文件中读取

mko*_*yak 1 utf-8 node.js maxmind

所以我正在解析一个大的 csv 文件并将结果推送到 mongo 中。

\n\n

该文件是maxminds city 数据库。它有各种有趣的utf8字符。我仍然在某些城市名​​称中收到(?)符号。这是我读取文件的方式:

\n\n

(使用csv节点模块)

\n\n
csv().from.stream(fs.createReadStream(path.join(__dirname, \'datafiles\', \'cities.csv\'), {\n    flags: \'r\',\n    encoding: \'utf8\'\n})).on(\'record\', function(row,index){\n.. uninteresting code to add it to mongodb\n});\n
Run Code Online (Sandbox Code Playgroud)\n\n

我在这里可能做错了什么?\n我在 mongo 中得到这样的东西:Ch\xef\xbf\xbdteauguay,加拿大

\n\n

编辑

\n\n

我尝试使用不同的库来读取文件:

\n\n
lazy(fs.createReadStream(path.join(__dirname, \'datafiles\', \'cities.csv\'), {\n    flags: \'r\',\n    encoding: \'utf8\',\n    autoClose: true\n  }))\n    .lines\n    .map(String)\n    .skip(1) // skips the two lines that are iptables header\n    .map(function (line) {\n      console.log(line);\n    });\n
Run Code Online (Sandbox Code Playgroud)\n\n

它产生相同的不良结果:\n154252,"PA","03","Capellan\xef\xbf\xbda","",8.3000,-80.5500,,\n154220,"AR","01","Villa Espa\xef\xbf\xbda","",-34.7667,-58.2000,,

\n

mko*_*yak 6

结果 maxmind 用 latin1 编码了他们的东西。

这有效:

  var iconv  = require('iconv-lite')
  lazy(fs.createReadStream(path.join(__dirname, 'datafiles', 'cities.csv')))
    .lines
    .map(function(byteArray) {
      return iconv.decode(byteArray, 'latin1');
    })
    .skip(1) // skips the two lines that are iptables header
    .map(function (line) {
   //WORKS
Run Code Online (Sandbox Code Playgroud)