在 Raku 中将单词的字符简洁地转换为其 ascii 代码列表

Question

在 Raku 中将单词的字符简洁地转换为其 ascii 代码列表

我正在尝试将单词wall转换为它的 ascii 代码列表，(119, 97, 108, 108)如下所示：

my @ascii="abcdefghijklmnopqrstuvwxyz";

my @tmp;
map { push @tmp, $_.ord if $_.ord == @ascii.comb.any.ord }, "wall".comb;
say @tmp;

Run Code Online (Sandbox Code Playgroud)

有没有办法使用@tmp而不在单独的行中声明它？
有没有办法在一行而不是 3 行中生成 ascii 代码列表？如果是这样，该怎么做？

请注意，我必须使用@ascii变量，即我不能使用连续递增的 ascii 序列，(97, 98, 99 ... 122)因为我也计划将此代码用于非 ascii 语言。

Answer 1

use*_*601 10

我们可以在这里做一些事情来使它工作。

首先，让我们处理@ascii变量。该@印记指示位置的变量，但你指定一个字符串给它。这将创建一个 1 元素数组['abc...']，这将导致问题。根据您需要的通用程度，我建议您直接创建数组：

my @ascii = <a b c d e f g h i j k l m n o p q r s t u v x y z>;
my @ascii = 'a' .. 'z';
my @ascii = 'abcdefghijklmnopqrstuvwxyz'.comb;

Run Code Online (Sandbox Code Playgroud)

或继续处理该any部分：

my $ascii-char = any <a b c d e f g h i j k l m n o p q r s t u v x y z>;
my $ascii-char = any 'a' .. 'z';
my $ascii-char = 'abcdefghijklmnopqrstuvwxyz'.comb.any;

Run Code Online (Sandbox Code Playgroud)

在这里，我使用了$sigil，因为它any确实指定了任何单个值，因此将起到同样的作用（这也使我们的生活更轻松）。我个人会使用$ascii，但我使用了一个单独的名称，以使后面的示例更易于区分。

现在我们可以处理地图功能了。基于以上两个版本的ascii，我们可以将你的map函数改写为以下任意一个

{ push @tmp, $_.ord if $_ eq @ascii.any  }
{ push @tmp, $_.ord if $_ eq $ascii-char }

Run Code Online (Sandbox Code Playgroud)

请注意，如果您更喜欢使用==，则可以继续在初始ascii创建时创建数值，然后使用$_.ord。同样，就个人而言，我喜欢命名映射变量，例如：

{ push @tmp, $^char.ord if $^char eq @ascii.any  }
{ push @tmp, $^char.ord if $^char eq $ascii-char }

Run Code Online (Sandbox Code Playgroud)

其中$^foo内容替换$_（如果你使用一个以上的，它们映射字母顺序@_[0]，@_[1]等等）。

但让我们在这里讨论更有趣的问题。我们如何在不需要预先声明的情况下完成所有这些@tmp？显然，这只需要在 map 循环中创建数组。您可能认为当我们没有 ASCII 值时这可能会很棘手，但是如果if语句未运行则返回Empty（或()）这一事实使生活变得非常简单：

my @tmp = map { $^char.ord if $^char eq $ascii-char }, "wall".comb;
my @tmp = map { $^char.ord if $^char eq @ascii.any  }, "wall".comb;

Run Code Online (Sandbox Code Playgroud)

如果我们用“墙”，名单收集的map是119, Empty, 108, 108，它是自动恢复的119, 108, 108。因此，@tmp设置为 just 119, 108, 108。

Answer 2

Bra*_*ert 8

是的，有一个更简单的方法。

"wall".ords.grep('az'.ords.minmax);

Run Code Online (Sandbox Code Playgroud)

当然，这依赖于a以z作为一个完整的序列。这是因为minmax根据列表中的最小值和最大值创建 Range 对象。

如果它们不是不间断的序列，您可以使用连接点。

"wall".ords.grep( 'az'.ords.minmax | 'AZ'.ords.minmax );

Run Code Online (Sandbox Code Playgroud)

但是你说你要匹配其他语言。这对我来说尖叫正则表达式。

"wall".comb.grep( /^ <:Ll> & <:ascii> $/ ).map( *.ord )

Run Code Online (Sandbox Code Playgroud)

这匹配同样在 ASCII 中的小写字母。

其实我们可以让它更简单。comb可以采用正则表达式来确定它从输入中获取哪些字符。

"wall".comb( / <:Ll> & <:ascii> / ).map( *.ord )
# (119, 97, 108, 108)

"???????".comb( / <:Ll> & <:Greek> / ).map( *.ord )
# (945, 946, 947, 948, 949)
# Does not include ? or ?, as they are not lowercase

Run Code Online (Sandbox Code Playgroud)

请注意，如果您没有组合口音，则上述内容仅适用于 ASCII。

 "de\c[COMBINING ACUTE ACCENT]f".comb( / <:Ll> & <:ascii> / )
 # ("d", "f")

Run Code Online (Sandbox Code Playgroud)

Combining Acute Accent 与e拉丁文小写字母 E 和 Acute 组合在一起。该组合字符不在 ASCII 中，因此被跳过。

如果角色没有组合值，它会变得更加奇怪。

"f\c[COMBINING ACUTE ACCENT]".comb( / <:Ll> & <:ascii> / )
# ("f?",)

Run Code Online (Sandbox Code Playgroud)

那是因为f是小写和 ASCII。尽管如此，编写代码点还是被带来了。

基本上，如果您的数据具有或可以具有组合重音并且如果它可以破坏事物，那么您最好在它仍然是二进制形式时处理它。

$buf.grep: {
    .uniprop() eq 'Ll' #
    && .uniprop('Block') eq 'Basic Latin' # ASCII
}

Run Code Online (Sandbox Code Playgroud)

以上也适用于单个字符串，因为.uniprop适用于表示代码点的整数或实际字符。

"wall".comb.grep: {
    .uniprop() eq 'Ll' #
    && .uniprop('Block') eq 'Basic Latin' # ASCII
}

Run Code Online (Sandbox Code Playgroud)

再次注意，这与组合代码点存在相同的问题，因为它适用于字符串。

您可能还想使用.uniprop('Script')而不是.uniprop('Block')取决于您想要做什么。

归档时间：	5 年，1 月前
查看次数：	315 次
最近记录：	4 年，5 月前