从导入的.csv文件中删除BOM()

Int*_*ive 7 php csv import fgetcsv

我想从导入的文件中删除BOM,但它似乎不起作用.

我试着preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $file);和一个str_replace.

我希望有人看到我做错了什么.

$filepath = get_bloginfo('template_directory')."/testing.csv";
            setlocale(LC_ALL, 'nl_NL');
            ini_set('auto_detect_line_endings',TRUE);
            $file = fopen($filepath, "r") or die("Error opening file");
            $i = 0;
            while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
                if($i == 0) {
                    $c = 0;
                    foreach($line as $col) {
                        $cols[$c] = utf8_encode($col);
                        $c++;
                    }
                } else if($i > 0) {
                    $c = 0;
                    foreach($line as $col) {
                        $data[$i][$cols[$c]] = utf8_encode($col);
                        $c++;
                    }
                }
                $i++;
            }
Run Code Online (Sandbox Code Playgroud)

-----------已
解决的版本:

setlocale(LC_ALL, 'nl_NL');
ini_set('auto_detect_line_endings',TRUE);
require_once(ABSPATH.'wp-admin/includes/file.php' );

$path = get_home_path();        
$filepath = $path .'wp-content/themes/pon/testing.csv';
$content = file_get_contents($filepath); 
file_put_contents($filepath, str_replace("\xEF\xBB\xBF",'', $content));

// FILE_PUT_CONTENTS AUTOMATICCALY CLOSES THE FILE
$file = fopen($filepath, "r") or die("Error opening file"); 

$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
    if($i == 0) {
        $c = 0;
        foreach($line as $col) {
            $cols[$c] = $col;
            $c++;
        }
    } else if($i > 0) {
        $c = 0;
        foreach($line as $col) {
            $data[$i][$cols[$c]] = $col;
            $c++;
        }
    }
    $i++;
}
Run Code Online (Sandbox Code Playgroud)

我发现它删除了BOM并通过用新数据覆盖它来调整文件.问题是我的其余部分不再起作用了,我看不出原因.这是一个新的.csv文件

Tom*_*asz 11

试试这个:

function removeBomUtf8($s){
  if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
       return substr($s,3);
   }else{
       return $s;
   }
}
Run Code Online (Sandbox Code Playgroud)

  • 删除 UTF16 Little Endian BOM `(substr($s, 0, 2) == chr(0xFF).chr(0xFE))` (2认同)

And*_*eyP 7

正确的方法是跳过文件中存在的 BOM(https://www.php.net/manual/en/function.fgetcsv.php#122696):

ini_set('auto_detect_line_endings',TRUE);
$file = fopen($filepath, "r") or die("Error opening file");
if (fgets($file, 4) !== "\xef\xbb\xbf") //Skip BOM if present
        rewind($file); //Or rewind pointer to start of file

$i = 0;
while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
    ...
}
Run Code Online (Sandbox Code Playgroud)

  • `auto_detect_line_endings` 在 PHP 8.1 中已弃用,并将在 PHP 9.0 中删除。https://php.watch/versions/8.1/auto_detect_line_endings-ini-deprecated (3认同)

小智 6

BOM 不是可以为您提供有关如何将输入重新编码为脚本/应用程序/数据库所需内容的线索吗?仅仅删除是没有帮助的。

这就是我如何强制将字符串(从带有 的文件中提取file_get_contents())以 UTF-8 进行编码并删除 BOM:

switch (true) { 
    case (substr($string,0,3) == "\xef\xbb\xbf") :
        $string = substr($string, 3);
        break;
    case (substr($string,0,2) == "\xfe\xff") :                            
        $string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16BE");
        break;
    case (substr($string,0,2) == "\xff\xfe") :                            
        $string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16LE");
        break;
    case (substr($string,0,4) == "\x00\x00\xfe\xff") :
        $string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32BE");
        break;
    case (substr($string,0,4) == "\xff\xfe\x00\x00") :
        $string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32LE");
        break;
    default:
        $string = iconv(mb_detect_encoding($string, mb_detect_order(), true), "UTF-8", $string);
};
Run Code Online (Sandbox Code Playgroud)


小智 5

如果字符编码函数不适合您(在某些情况下对我来说就是这种情况)并且您知道您的文件始终具有 BOM,您可以简单地使用 fseek() 跳过前 3 个字节,这是 BOM 的长度。

$fp = fopen("testing.csv", "r");
fseek($fp, 3);
Run Code Online (Sandbox Code Playgroud)

您也不应该使用explode() 来分割您的CSV 行和列,因为如果您的列包含您分割的字符,您将得到不正确的结果。使用这个代替:

while (!feof($fp)) {
    $arrayLine = fgetcsv($fp, 0, ";", '"');
    ...
}
Run Code Online (Sandbox Code Playgroud)

  • 如果您不能确定是否有 BOM 标记,最好检查一下,如果没有则倒带: `if (!fread($handle, 3)==chr(0xEF).chr(0xBB).chr(0xBF)) { 倒回($handle); }` 而不是 `fseek` (5认同)