dax*_*xim 6 unicode perl perl6 text-segmentation raku
作品:
#!/usr/bin/env python3
from uniseg.graphemecluster import grapheme_clusters
def albanian_digraph_dh(s, breakables):
for i, breakable in enumerate(breakables):
if s.endswith('d', 0, i) and s.startswith('h', i):
yield 0
else:
yield breakable
print(list(grapheme_clusters('dhelpëror', albanian_digraph_dh)))
#['dh', 'e', 'l', 'p', 'ë', 'r', 'o', 'r']
Run Code Online (Sandbox Code Playgroud)
需要改进/定制:
perl -C -Mutf8 -mUnicode::GCString -E'
say join " ", Unicode::GCString
->new("dhelpëror")->as_array
'
#d h e l p ë r o r
perl6 -e'"dhelpëror".comb.say'
#(d h e l p ë r o r)
Run Code Online (Sandbox Code Playgroud)
注意:编写自己的细分(几乎可以保证不会正确实现UAX#29)算是避免问题的方法。
D:\>perl6 -e "'dhelpëror'.comb(/dh|./).say"
(dh e l p ë r o r)
Run Code Online (Sandbox Code Playgroud)
您可以在旧的Perl中执行相同的操作。
print join ' ', 'dhelpëror' =~ /dh|./g
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
170 次 |
| 最近记录: |