相关疑难解决方法(0)

查找任何文件编码的有效方法

是的是一个最常见的问题,这个问题对我来说很模糊,因为我对此并不了解.

但我想要一种非常精确的方法来查找文件编码.像Notepad ++一样精确.

c# encoding

Fáb*_*nes

2017 04-16

93
推荐指数

7
解决办法

12万
查看次数

什么是XML BOM以及如何检测它？

ANSI XML文档中的BOM究竟是什么,是否应该删除？XML文档应该是UTF-8吗？谁能告诉我一个可以检测BOM的Java方法？BOM由EF BB BF字符组成.

java xml

dja*_*fan

lucky-day

23
推荐指数

2
解决办法

4万
查看次数

从键盘读取时,希腊字符串与正则表达式不匹配

public static void main(String[] args) throws IOException {
   String str1 = "??123456";
   System.out.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")); //??123456-true

   BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
   String str2 = br.readLine(); //??123456 same as str1.
   System.out.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")); //?”??123456-false

   System.out.println(str1.equals(str2)); //false
}

Run Code Online (Sandbox Code Playgroud)

从键盘读取时,相同的String与正则表达式不匹配.
是什么导致了这个问题,我们如何解决这个问题呢？
提前致谢.

编辑:我使用System.console()进行输入和输出.

public static void main(String[] args) throws IOException {
        PrintWriter pr = System.console().writer();

        String str1 = "??123456";
        pr.println(str1+"-"+str1.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str1.length());

        String str2 = System.console().readLine();
        pr.println(str2+"-"+str2.matches("^\\p{InGreek}{2}\\d{6}")+"-"+str2.length());

        pr.println("str1.equals(str2)="+str1.equals(str2));
}

Run Code Online (Sandbox Code Playgroud)

输出:

ΔΞ123456真-8-
ΔΞ123456
ΔΞ123456真-8
str1.equals(STR2)=真

java regex

ath*_*spk

2011 01-03

11
推荐指数

2
解决办法

1619
查看次数

Java:我如何从inputStream获取编码？

我想从流中获取编码.

第一种方法 - 使用InputStreamReader.

但它总是返回OS编码.

InputStreamReader reader = new InputStreamReader(new FileInputStream("aa.rar"));
System.out.println(reader.getEncoding());

Run Code Online (Sandbox Code Playgroud)

输出:GBK

第二种方法 - 使用UniversalDetector.

但它总是返回null.

    FileInputStream input = new FileInputStream("aa.rar");

    UniversalDetector detector = new UniversalDetector(null);
    byte[] buf = new byte[4096];

    int nread;
    while ((nread = input.read(buf)) > 0 && !detector.isDone()) {
        detector.handleData(buf, 0, nread);
    }

    // (3)
    detector.dataEnd();

    // (4)
    String encoding = detector.getDetectedCharset();

    if (encoding != null) {
        System.out.println("Detected encoding = " + encoding);
    } else {
        System.out.println("No encoding detected.");
    }

    // (5)
    detector.reset();

Run Code Online (Sandbox Code Playgroud)

输出:空

我怎样才能做对的？:(

java io encoding

you*_*ang

2011 11-29

10
推荐指数

1
解决办法

1万
查看次数

重构自动检测文件的编码

我需要检查编码文件.这段代码工作但有点长.如何能够重构这个逻辑.也许可以为这个目标使用另一种变体？

码:

class CharsetDetector implements Checker {

    Charset detectCharset(File currentFile, String[] charsets) {
        Charset charset = null;

        for (String charsetName : charsets) {
            charset = detectCharset(currentFile, Charset.forName(charsetName));
            if (charset != null) {
                break;
            }
        }

        return charset;
    }

    private Charset detectCharset(File currentFile, Charset charset) {
        try {
            BufferedInputStream input = new BufferedInputStream(
                    new FileInputStream(currentFile));

            CharsetDecoder decoder = charset.newDecoder();
            decoder.reset();

            byte[] buffer = new byte[512];
            boolean identified = false;
            while ((input.read(buffer) != -1) && (!identified)) {
                identified = identify(buffer, decoder);
            } …

Run Code Online (Sandbox Code Playgroud)

java refactoring encoding

naz*_*art

2013 03-11

8
推荐指数

1
解决办法

5632
查看次数

有没有办法用Java检查.txt文件的charset编码？

有没有办法检查用Unicode编码的文本文件(.txt)或用Java编写的UTF-8？

java unicode utf-8 character-encoding file-encodings

Zoo*_*key

2012 06-14

7
推荐指数

1
解决办法

3万
查看次数

Java App:无法正确读取iso-8859-1编码文件

我有一个编码为iso-8859-1的文件,其中包含ô等字符.

我正在使用java代码读取此文件,例如:

File in = new File("myfile.csv");
InputStream fr = new FileInputStream(in);
byte[] buffer = new byte[4096];
while (true) {
    int byteCount = fr.read(buffer, 0, buffer.length);
    if (byteCount <= 0) {
        break;
    }

    String s = new String(buffer, 0, byteCount,"ISO-8859-1");
    System.out.println(s);
}

Run Code Online (Sandbox Code Playgroud)

然而,ô字符总是乱码,通常打印为？.

我已经阅读了这个主题(并且在路上学到了一点),例如

但仍然无法使这个工作

有趣的是,这适用于我的本地电脑(xp),但不适用于我的Linux机箱.

我已经检查过我的jdk支持所需的字符集(它们是标准的,所以这并不令人惊讶)使用:

System.out.println(java.nio.charset.Charset.availableCharsets());

Run Code Online (Sandbox Code Playgroud)

java encoding iso-8859-1 character-encoding

Joe*_*oel

2015 07-27

6
推荐指数

3
解决办法

3万
查看次数

读取任何有奇怪编码的文本文件？

我有一个带有奇怪编码"UCS-2 Little Endian"的文本文件,我想用Java读取它的内容.

使用NotePad ++打开文本文件

正如您在上面的屏幕截图中看到的那样,文件内容在Notepad ++中显得很好,但是当我使用此代码读取它时,只是在控制台中打印垃圾:

String textFilePath = "c:\strange_file_encoding.txt"
BufferedReader reader = new BufferedReader( new InputStreamReader( new FileInputStream( filePath ), "UTF8" ) );
String line = "";

while ( ( line = reader.readLine() ) != null ) {
    System.out.println( line );  // Prints garbage characters 
}

Run Code Online (Sandbox Code Playgroud)

重点是用户选择要读取的文件,因此它可以是任何编码,并且因为我无法检测文件编码,所以我使用"UTF8"对其进行解码,但如上例所示,它无法正确读取.

有没有以正确的方式阅读这些奇怪的文件？或者至少可以检测出我的代码是否无法正确读取它？

java text-files fileinputstream bufferedreader

Bra*_*rad

2016 05-19

6
推荐指数

1
解决办法

1万
查看次数

如何确定文本编码

我知道UTF文件有用于确定编码的BOM但是其他编码却不知道如何猜测编码.

我是新的java程序员.我编写了使用UTF BOM猜测UTF编码的代码.但我有其他编码的问题.我怎么猜他们.

有人可以帮帮我吗？提前致谢.

java utf

par*_*uma

2012 03-03

5
推荐指数

1
解决办法

571
查看次数

如何在不使用BOM且以非ASCII字符开头的情况下识别针对文件的不同编码？

我在尝试识别没有BOM的文件的编码时遇到了问题,特别是当文件以非ascii字符开头时.

我找到了关于如何识别文件编码的两个主题,

目前,我创建了一个类来识别文件的不同编码(例如UTF-8,UTF-16,UTF-32,UTF-16无BOM等),如下所示,

public class UnicodeReader extends Reader {
private static final int BOM_SIZE = 4;
private final InputStreamReader reader;

/**
 * Construct UnicodeReader
 * @param in Input stream.
 * @param defaultEncoding Default encoding to be used if BOM is not found,
 * or <code>null</code> to use system default encoding.
 * @throws IOException If an I/O error occurs.
 */
public UnicodeReader(InputStream in, String defaultEncoding) throws IOException {
    byte bom[] = new byte[BOM_SIZE];
    String encoding;
    int unread;
    PushbackInputStream pushbackStream …

Run Code Online (Sandbox Code Playgroud)

java unicode encoding byte-order-mark non-ascii-characters

eag*_*les

2017 05-23

5
推荐指数

1
解决办法

2478
查看次数