如何在 Notepad++ 中替换 Unicode 字符

Xsi*_*Sec 3 find-and-replace

我有一个 .xlf 文件,如下图所示:

在此处输入图片说明

我想知道如何搜索并将 unicode 字符"xE5"替换为"æ" 我以为我可以搜索:^0145 =xE5并替换"æ",这不起作用。

如果这是不可能的,我可以使用另一个文本编辑器(例如 Ultraedit)。

这是文件中粘贴的文本:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-strict.xsd">
  <file xmlns:bind="http://bind.sorona.se" original="CTO12623_1_en-GB-da.xml" source-language="en" datatype="xml" date="2015-11-11T15:35:51Z" target-language="da" product-name="Anders_LP8504_151111" bind:file-id="78452" bind:file-hash="85075c54359fa47b087d6c67ec967f43">
    <header>
      <tool tool-name="Sorona TMS" tool-id="bind" tool-version="3.1.5" tool-company="Sorona Innovation" />
      <count-group name="word-count">
        <count count-type="total" unit="word">2743</count>
      </count-group>
    </header>
    <body>
      <trans-unit id="e1ca41ef868a74944745b8cd1dfa59e7" translate="yes" approved="no" restype="string" resname="p">
        <source>The trench compactor LP 8504 is a radio controlled trench compactor. It has a robust design and is suitable for compaction of medium to deep layers of cohesive and granular soils on limited areas such as trenches, construction back-fills and on roads. No other use is permitted.</source><seg-source><mrk mtype="seg" mid="1">The trench compactor LP 8504 is a radio controlled trench compactor. It has a robust design and is suitable for compaction of medium to deep layers of cohesive and granular soils on limited areas such as trenches, construction back-fills and on roads. No other use is permitted.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="1">Vibrationstromlen LP 8504 er radiostyret. Den har et robust design og er beregnet til komprimering af middel til dybe lag af sammenh?ende og granuleret jord p?egr?ede omr?r s?om grr, anl?opfyldninger og p?eje. Den m?kke anvendes til andre form?</mrk></target>
      </trans-unit>
      <trans-unit id="3b3dbf229f5f1f06ab9427d689c9740b" translate="yes" approved="no" restype="string" resname="p">
        <source>The LP trench compactor must only be used in well-ventilated areas, as is the case for all combustion engine machines.</source><seg-source><mrk mtype="seg" mid="2">The LP trench compactor must only be used in well-ventilated areas, as is the case for all combustion engine machines.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="2">LP vibrationstromlen m?ige som alle andre maskiner med forbr?ingsmotorer kun bruges i godt ventilerede omr?r.</mrk></target>
      </trans-unit>
      <trans-unit id="3ceced74b90bcbc582c1857395a8abf1" translate="yes" approved="no" restype="string" resname="p">
        <source>The LP trench compactor must not be towed behind vehicles.</source><seg-source><mrk mtype="seg" mid="3">The LP trench compactor must not be towed behind vehicles.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="3">LP vibrationstromlen m?kke sl?s efter biler.</mrk></target>
      </trans-unit>
      <trans-unit id="c1ff7c8ab3ea4123fc2d5fb6a105d98b" translate="yes" approved="no" restype="string" resname="p">
        <source>Handbrake</source><seg-source><mrk mtype="seg" mid="4">Handbrake</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="4">H?bremse</mrk></target>
      </trans-unit>
    </body>
  </file>
</xliff>
Run Code Online (Sandbox Code Playgroud)

我还附上了 xlf 文件,这里是一个链接:
这里是下载 xlf 的链接

有什么建议?

Dav*_*ill 5

我想知道如何搜索和替换 unicode 字符xE5"æ

请注意,æ实际上是 Unicode 00E6not 00E5

搜索和替换不是显示正确字符的正确方法。

<?xml version="1.0" encoding="utf-8"?>
Run Code Online (Sandbox Code Playgroud)

上面说明了编码是,utf-8但文件实际上被编码为ANSI.

您需要将文件正确转换为UTF-8,如下所示:

  1. 打开测试文件.xlf

  2. 文件看起来像:

    在此处输入图片说明

    Unicode 显示不正确。

  3. 菜单 >编码> 选择ANSI 编码

    在此处输入图片说明

  4. 文件看起来像:

    在此处输入图片说明

    Unicode 正确显示。

  5. 选择所有文件内容 ( ctrl+ a)

  6. 菜单 >编码> 选择转换为 UTF-8

    在此处输入图片说明

  7. 保存文件 ( ctrl+ s)

  8. 关闭并重新打开。

  9. 文件现在正确编码为 UTF-8,Unicode 字符显示正确。


你怎么能看到文件实际上是ANSI?

cygwinfile实用程序显示了这一点(转换前后):

DavidPostill@Hal /f/test
$ file -i Testfile*.xlf
Testfile.xlf:          application/xml; charset=iso-8859-1
TestfileConverted.xlf: application/xml; charset=utf-8
Run Code Online (Sandbox Code Playgroud)