Hut*_*ut8 6 language-agnostic algorithm parsing
我有一堆人名.它们都是"西方"的名字,我只需要美国的公约/缩写(例如,先生而不是高级的señor).不幸的是,我发送东西的人没有输入自己的名字,所以我不能问他们想要叫什么.我知道每个人的性别和他们的全名,但没有真正解析出更具体的事情.
一些例子:
我希望能够解析每个名称的部分内容:
name = Name.new("John Smith Jr.")
name.first_name # <= John
name.greeting # <= Mr. Smith
Run Code Online (Sandbox Code Playgroud)
如果我正在寻找"问候"(可能不是最好的术语),我想要的是1-4,"史密斯先生".对于5岁,我想要史密斯博士,但我会选择史密斯先生.
Ruby的宝石就是理想的选择.我被启发要求Chronic这个奇怪的东西,一个以非常人性化的方式处理时间的Ruby宝石,让我正确地告诉它"上周二",并让它得出一些明智的东西."有些算法足以满足大多数人的要求.角落案件.
我正在努力解决虚假程序员提出的一些问题
由于你只限于西式名字,我认为一些规则会让你大部分时间:
{ mr mrs miss ms rev dr prof } and any more you can think of. Using a table of title "scores" (e.g. [mr=1, mrs=1, rev=2, dr=3, prof=4] -- order them however you want), record the highest-scoring title that was deleted.{ jr phd } or are Roman numerals of value roughly 50 or less (/[XVI]+/ is probably a good enough regex).It will never be possible to guarantee that a name like "John Baxter Smith" is parsed correctly, since not all double-barrelled surnames use hyphens. Is "Baxter Smith" the surname? Or is "Baxter" a middle name? I think it's safe to assume that middle names are relatively more common than double-barrelled-but-unhyphenated surnames, meaning it's better to default to reporting the last word as the surname. You might want to also compile a list of common double-barrelled surnames and check against this, however.
| 归档时间: |
|
| 查看次数: |
2245 次 |
| 最近记录: |