Java文件编码转换

Ash*_*ish 4 java encoding file

我需要将文件的编码从ANSI(windows-1252)更改为UTF8.我写下面的程序是通过java来完成的.此程序将字符转换为UTF8,但是当我在notepade ++中打开文件时,编码类型显示为ANSI为UTF8.当我在访问数据库中导入此文件时,这会给我带来错误.仅需要具有UTF8编码的文件.此外,还要求转换文件而不在任何编辑器中打开它.

公共类ConvertFromAnsiToUtf8 {

private static final char BYTE_ORDER_MARK = '\uFEFF';
private static final String ANSI_CODE = "windows-1252";
private static final String UTF_CODE = "UTF8";
private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);

public static void main(String[] args) {

    List<File> fileList;
    File inputFolder = new File(args[0]);
    if (!inputFolder.isDirectory()) {
        return;
    }
    File parentDir = new File(inputFolder.getParent() + "\\"
                    + inputFolder.getName() + "_converted");

    if (parentDir.exists()) {
        return;
    }
    if (parentDir.mkdir()) {

    } else {
        return;
    }

    fileList = new ArrayList<File>();
    for (final File fileEntry : inputFolder.listFiles()) {
        fileList.add(fileEntry);
    }

    InputStream in;

    Reader reader = null;
    Writer writer = null;
    try {
        for (File file : fileList) {
            in = new FileInputStream(file.getAbsoluteFile());
            reader = new InputStreamReader(in, ANSI_CHARSET);

            OutputStream out = new FileOutputStream(
                            parentDir.getAbsoluteFile() + "\\"
                                            + file.getName());
            writer = new OutputStreamWriter(out, UTF_CODE);
            writer.write(BYTE_ORDER_MARK);
            char[] buffer = new char[10];
            int read;
            while ((read = reader.read(buffer)) != -1) {
                System.out.println(read);
                writer.write(buffer, 0, read);
            }
        }
        reader.close();
        writer.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
Run Code Online (Sandbox Code Playgroud)

}

任何指针都会有所帮助.

谢谢,Ashish

McD*_*ell 5

发布的代码正确地从windows-1252转码为UTF-8.

Notepad ++消息令人困惑,因为"ANSI as UTF-8"没有明显的含义; 它似乎是Notepad ++中的一个开放缺陷.我相信Notepad ++意味着没有BOM的UTF-8(参见编码菜单.)

作为Windows程序的Microsoft Access可能希望UTF-8文件以字节顺序标记(BOM)开头.

您可以通过在文件开头写入代码点U + FEFF将BOM注入文档:

import java.io.*;
import java.nio.charset.*;

public class Ansi1252ToUtf8 {
  private static final char BYTE_ORDER_MARK = '\uFEFF';

  public static void main(String[] args) throws IOException {
    Charset windows1252 = Charset.forName("windows-1252");
    try (InputStream in = new FileInputStream(args[0]);
        Reader reader = new InputStreamReader(in, windows1252);
        OutputStream out = new FileOutputStream(args[1]);
        Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8)) {
      writer.write(BYTE_ORDER_MARK);
      char[] buffer = new char[1024];
      int read;
      while ((read = reader.read(buffer)) != -1) {
        writer.write(buffer, 0, read);
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)