UTF-8中的Ruby无效字节序列

Question

UTF-8中的Ruby无效字节序列

我有以下代码,它给我一个指向扫描方法的无效字节序列错误initialize.有想法该怎么解决这个吗？对于它的价值,当(.*)h1标签和关闭之间不存在时,不会发生错误>.

#!/usr/bin/env ruby

class NewsParser

  def initialize
      Dir.glob("./**/index.htm") do |file|
        @file = IO.read file 
        parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)
        self.write(parsed)
      end
  end

  def write output
    @contents = output
    open('output.txt', 'a') do |f| 
      f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n" 
    end
  end

end

p = NewsParser.new

Run Code Online (Sandbox Code Playgroud)

编辑:这是错误消息:

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

已解决:使用的组合: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) 和 encoding: UTF-8 解决问题.

谢谢!

Answer 1

red*_*gem 36

结合使用:@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)并#encoding: UTF-8解决了问题.

归档时间：	13 年，7 月前
查看次数：	21098 次
最近记录：	11 年，3 月前