标签: character-encoding

如何更正文件的字符编码?

我有一个ANSI编码的文本文件,不应该编码为ANSI,因为ANSI不支持重音字符.我宁愿使用UTF-8.

数据可以正确解码还是在转码中丢失?

我可以使用哪些工具?

以下是我的样本:

ç é
Run Code Online (Sandbox Code Playgroud)

我可以从上下文(café应该是café)告诉我们这些应该是这两个字符:

ç é
Run Code Online (Sandbox Code Playgroud)

encoding utf-8 character-encoding codepages text-files

46
推荐指数
4
解决办法
17万
查看次数

如何转换这些奇怪的字符?(Ã,Ã,Ã,ù,Ã)

我的页面经常显示像Ã,Ã,Ã,Ã,Ã,代替普通字符的东西.

我使用utf8作为头页和MySQL编码.这是怎么发生的?

php mysql character-encoding utf8-decode mojibake

46
推荐指数
4
解决办法
24万
查看次数

Unicode可打印字符的范围是多少?

任何人都可以告诉我Unicode可打印字符的范围是什么?[例如Ascii可打印字符范围是\ u0020 - \u007f]

unicode character-encoding unicode-string

45
推荐指数
5
解决办法
3万
查看次数

Java中String的字符编码是什么?

我对Java中的字符串编码感到困惑.我有一些问题.如果您知道答案,请帮助我:

1)内存中Java字符串的本机编码是什么?我写的String a = "Hello"时候会存储哪种格式?由于Java与机器无关,我认为系统不会进行编码.

2)我在网上读到"UTF-16"是默认编码,但我感到困惑,因为我写的时候说int a = 'c'我得到了ASCII表中字符的编号.那么ASCII和UTF-16是一样的吗?

3)我还不确定内存中字符串的存储取决于:操作系统,语言?

java string character-encoding

45
推荐指数
3
解决办法
6万
查看次数

泽西网络服务json utf-8编码

我使用Jersey 1.11制作了一个小型的Rest Web服务.当我调用返回Json的url时,非英文字符的字符编码存在问题.Xml的相应url("test.xml"在启动的xml-tag中使其成为utf-8.

如何让网址"test.json"返回utf-8编码的响应?

这是服务的代码:

@Stateless
@Path("/")
public class RestTest {   
    @EJB
    private MyDao myDao;

    @Path("test.xml/")
    @GET
    @Produces(MediaType.APPLICATION_XML )
    public List<Profile> getProfiles() {    
        return myDao.getProfilesForWeb();
    }

    @Path("test.json/")
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public List<Profile> getProfilesAsJson() {
        return myDao.getProfilesForWeb();
    }
}
Run Code Online (Sandbox Code Playgroud)

这是服务使用的pojo:

package se.kc.mimee.profile.model;

@XmlRootElement
public class Profile {
    public int id;
    public String name;

    public Profile(int id, String name) {
        this.id = id;
        this.name = name;
    }

    public Profile() {}

}
Run Code Online (Sandbox Code Playgroud)

java jersey character-encoding

45
推荐指数
4
解决办法
8万
查看次数

utf 8 charset不适用于javax邮件

我使用Javax Mail API发送电子邮件.我使用联系方式发送输入,必须将其发送到特定的电子邮件.

发送电子邮件没有问题,虽然我是一个丹麦人,因此我需要三个丹麦字符,即'æ','ø'和'å',在主题和电子邮件文本中.

因此我看到我可以使用UTF-8字符编码来提供这些字符,但是当我的邮件发送时我只看到一些奇怪的字母 - 'ã|','ã¸'和'ã¥' - 而不是丹麦语字母 - 'æ','ø'和'å'.

我发送电子邮件的方法看起来像这样

public void sendEmail(String name, String fromEmail, String subject, String message) throws AddressException, MessagingException, UnsupportedEncodingException, SendFailedException
{
    //Set Mail properties
    Properties props = System.getProperties();
    props.setProperty("mail.smtp.starttls.enable", "true");
    props.setProperty("mail.smtp.host", "smtp.gmail.com");
    props.setProperty("mail.smtp.socketFactory.port", "465");
    props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
    props.setProperty("mail.smtp.auth", "true");
    props.setProperty("mail.smtp.port", "465");
    Session session = Session.getDefaultInstance(props, new javax.mail.Authenticator() {
        @Override
        protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication("my_username", "my_password");
        }
    });

    //Create the email with variable input
    MimeMessage mimeMessage = new MimeMessage(session);
    mimeMessage.setHeader("Content-Type", "text/plain; charset=UTF-8");
    mimeMessage.setFrom(new …
Run Code Online (Sandbox Code Playgroud)

java jakarta-mail utf-8 character-encoding

45
推荐指数
3
解决办法
6万
查看次数

字符编码检测算法

我正在寻找一种方法来检测文档中的字符集.我一直在这里阅读Mozilla字符集检测实现:

通用字符集检测

我还发现了一个名为jCharDet的Java实现:

JCharDet

这两者都是基于使用一组静态数据进行的研究.我想知道的是,是否有人成功使用过任何其他实现,如果有的话,是什么?你有自己的方法吗?如果是的话,你用来检测字符集的算法是什么?

任何帮助,将不胜感激.我不是在寻找通过谷歌的现有方法列表,也不是在寻找Joel Spolsky文章的链接 - 只是为了澄清:)

更新:我对此进行了大量研究,最终找到了一个名为cpdetector的框架,该框架使用可插入的方法进行字符检测,请参阅:

CPDetector

这提供了BOM,chardet(Mozilla方法)和ASCII检测插件.编写自己的代码也很容易.还有另一个框架,它提供了更好的字符检测,Mozilla方法/ jchardet等......

ICU4J

为cpdetector编写自己的插件非常容易,它使用这个框架来提供更准确的字符编码检测算法.它比Mozilla方法更好用.

java character-encoding

44
推荐指数
2
解决办法
3万
查看次数

如何确定文本文件的编码表

我有.txt.java文件,我不知道如何确定文件的编码表(Unicode,UTF-8,ISO-8525,...).是否存在任何程序来确定文件编码或查看编码?

unicode encoding text character-encoding

44
推荐指数
3
解决办法
4万
查看次数

PowerShell的UTF-8输出

我正在尝试使用Process.Start重定向的I/O来调用PowerShell.exe字符串,并以UTF-8的形式返回输出.但我似乎无法做到这一点.

我尝试过的:

  • 传递命令以通过-Command参数运行
  • 使用UTF-8编码将PowerShell脚本作为文件写入磁盘
  • 使用具有BOM编码的UTF-8将PowerShell脚本作为文件写入磁盘
  • 使用UTF-16将PowerShell脚本作为文件写入磁盘
  • 设置Console.OutputEncoding在这两个我的控制台应用程序,并在PowerShell脚本
  • $OutputEncoding在PowerShell中设置
  • 设置 Process.StartInfo.StandardOutputEncoding
  • Encoding.Unicode而不是做到这一切Encoding.UTF8

在每种情况下,当我检查我给出的字节时,我会得到与原始字符串不同的值.我真的很想解释为什么这不起作用.

这是我的代码:

static void Main(string[] args)
{
    DumpBytes("Héllo");

    ExecuteCommand("PowerShell.exe", "-Command \"$OutputEncoding = [System.Text.Encoding]::UTF8 ; Write-Output 'Héllo';\"",
        Environment.CurrentDirectory, DumpBytes, DumpBytes);

    Console.ReadLine();
}

static void DumpBytes(string text)
{
    Console.Write(text + " " + string.Join(",", Encoding.UTF8.GetBytes(text).Select(b => b.ToString("X"))));
    Console.WriteLine();
}

static int ExecuteCommand(string executable, string arguments, string workingDirectory, Action<string> output, Action<string> error)
{
    try
    {
        using …
Run Code Online (Sandbox Code Playgroud)

powershell encoding utf-8 character-encoding io-redirection

44
推荐指数
2
解决办法
10万
查看次数

如何在文本文件中检测无效的utf8 unicode/binary

我需要检测损坏的文本文件,其中存在无效(非ASCII)utf-8,Unicode或二进制字符.

�>t�ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½w�ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½o��������ï¿ï¿½_��������������������o����������������������￿����ß����������ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½~�ï¿ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½}���������}w��׿��������������������������������������ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½~������������������������������������_������������������������������������������������������������������������������^����ï¿ï¿½s�����������������������������?�������������ï¿ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½w�������������ï¿ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½}����������ï¿ï¿½ï¿½ï¿½ï¿½y����������������ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½o�������������������������}��
Run Code Online (Sandbox Code Playgroud)

我试过的:

iconv -f utf-8 -t utf-8 -c file.csv 
Run Code Online (Sandbox Code Playgroud)

这将文件从utf-8编码转换为utf-8编码,-c用于跳过无效的utf-8字符.然而最后这些非法字符仍然被打印出来.在linux或其他语言的bash中还有其他解决方案吗?

linux bash utf-8 character-encoding

43
推荐指数
5
解决办法
4万
查看次数