mongodb $regex 中排序规则的使用

Fre*_*ken 6 mongodb mongodb-query

由于 v3.4 排序规则可用于查找操作,尤其是在关注变音符号的匹配时。虽然具有确定值($eq opeartor 或相应构造)的查找查询将匹配字母和对应的变音符号,但如果使用 $regex 来实现对部分搜索字符串(“LIKE”)的匹配,则情况并非如此)。

是否可以让 $regex 查询以与 $eq 查询相同的方式使用排序规则?

考虑示例集合 testcoll:

{ "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" }, 
{ "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antwërpen" }
Run Code Online (Sandbox Code Playgroud)

此查询将找到两条记录

db.testcoll.find({city: 'antwerpen'}).collation({"locale" : "en_US", "strength" : 1});
Run Code Online (Sandbox Code Playgroud)

使用正则表达式的相同查询不会(仅查找带有“Antwerpen”的记录)

db.testcoll.find({city: /antwe/i}).collation({"locale" : "en_US", "strength" : 1});
Run Code Online (Sandbox Code Playgroud)

Rap*_*res 6

我今天遇到了同样的问题,我疯狂地搜索互联网试图找到解决方案。没有找到。所以我想出了我的解决方案,一个对我有用的小弗兰肯斯坦。

\n

我创建了一个函数,它从字符串中删除所有特殊字符,然后将所有可能特殊的字符替换为可能特殊的等效正则表达式。最后,我只是添加一个"i"选项来覆盖数据库中的大写字符串。

\n
export const convertStringToRegexp = (text: string) => {\n  let regexp = '';\n  const textNormalized = text\n    .normalize('NFD')\n    .replace(/[\\u0300-\\u036f]/g, '') // remove all accents\n    .replace(/[|\\\\{}()[\\]^$+*?.]/g, '\\\\$&') // remove all regexp reserved char\n    .toLowerCase();\n\n  regexp = textNormalized\n    .replace(/a/g, '[a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3]')\n    .replace(/e/g, '[e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa]')\n    .replace(/i/g, '[i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae]')\n    .replace(/o/g, '[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]')\n    .replace(/u/g, '[u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb]')\n    .replace(/c/g, '[c,\xc3\xa7]')\n    .replace(/n/g, '[n,\xc3\xb1]')\n    .replace(/[\xc2\xaa\xc2\xba\xc2\xb0]/g, '[\xc2\xaa\xc2\xba\xc2\xb0]');\n  return new RegExp(regexp, 'i'); // "i" -> ignore case\n};\n
Run Code Online (Sandbox Code Playgroud)\n

在我的find()方法中,我只是将这个函数与$regex选项一起使用,如下所示:

\n
db.testcoll.find({city: {$regex: convertStringToRegexp('twerp')} })\n\n/*\nOutput:\n[\n  { "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" }, \n  { "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antw\xc3\xabrpen" }\n]\n*/\n
Run Code Online (Sandbox Code Playgroud)\n

我还创建了一个.spec.ts文件(使用 Chai)来测试此功能。当然,你可以适应 Jest。

\n
\ndescribe('ConvertStringToRegexp', () => {\n  it('should convert all "a" to regexp', () => {\n    expect(convertStringToRegexp('TA\xc3\x81da\xc3\xa1h!')).to.deep.equal(\n      /t[a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3]d[a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3]h!/i\n    );\n  });\n  it('should convert all "e" to regexp', () => {\n    expect(convertStringToRegexp('ME\xc3\x89e\xc3\xa9h!')).to.deep.equal(\n      /m[e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa][e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa][e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa][e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa]h!/i\n    );\n  });\n  it('should convert all "i" to regexp', () => {\n    expect(convertStringToRegexp('V\xc3\x8dIiish\xc3\xad!')).to.deep.equal(\n      /v[i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae][i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae][i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae][i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae]sh[i,\xc3\xad,\xc3\xaf,\xc3\xac,\xc3\xae]!/i\n    );\n  });\n  it('should convert all "o" to regexp', () => {\n    expect(convertStringToRegexp('\xc3\x93Oo\xc3\xb3hhhh!!!!')).to.deep.equal(\n      /[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4][o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4][o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4][o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]hhhh!!!!/i\n    );\n  });\n  it('should convert all "u" to regexp', () => {\n    expect(convertStringToRegexp('\xc3\x9aUhuu\xc3\xball!')).to.deep.equal(\n      /[u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb][u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb]h[u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb][u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb][u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb]ll!/i\n    );\n  });\n  it('should convert all "c" to regexp', () => {\n    expect(convertStringToRegexp('Cacacacaca')).to.deep.equal(\n      /[c,\xc3\xa7][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][c,\xc3\xa7][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][c,\xc3\xa7][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][c,\xc3\xa7][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3][c,\xc3\xa7][a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3]/i\n    );\n  });\n  it('should remove all special characters', () => {\n    expect(\n      convertStringToRegexp('hello 123 \xc2\xb0\xc2\xba\xc2\xb6\xc2\xa7\xe2\x88\x9e\xc2\xa2\xc2\xa3\xe2\x84\xa2\xc2\xb7\xc2\xaa\xe2\x80\xa2*!@#$%^WORLD?.')\n    ).to.deep.equal(\n      /h[e,\xc3\xa9,\xc3\xab,\xc3\xa8,\xc3\xaa]ll[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4] 123 [\xc2\xaa\xc2\xba\xc2\xb0][\xc2\xaa\xc2\xba\xc2\xb0]\xc2\xb6\xc2\xa7\xe2\x88\x9e\xc2\xa2\xc2\xa3\xe2\x84\xa2\xc2\xb7[\xc2\xaa\xc2\xba\xc2\xb0]\xe2\x80\xa2\\*!@#\\$%\\^w[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]rld\\?\\./i\n    );\n  });\n  it('should accept all regexp reserved characters', () => {\n    expect(\n      convertStringToRegexp('Ol\xc3\xa1 [-[]{}()*+?.,\\\\o/^$|#s] Mundo! ')\n    ).to.deep.equal(\n      /* eslint-disable @typescript-eslint/no-explicit-any */\n      /[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]l[a,\xc3\xa1,\xc3\xa0,\xc3\xa4,\xc3\xa2,\xc3\xa3] \\[-\\[\\]\\{\\}\\(\\)\\*\\+\\?\\.,\\\\[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]\\/\\^\\$\\|#s\\] m[u,\xc3\xbc,\xc3\xba,\xc3\xb9,\xc3\xbb][n,\xc3\xb1]d[o,\xc3\xb3,\xc3\xb6,\xc3\xb2,\xc3\xb5,\xc3\xb4]! /i\n    );\n  });\n});\n\n
Run Code Online (Sandbox Code Playgroud)\n


小智 4

文档

不区分大小写的正则表达式查询通常无法有效地使用索引。$regex 实现不支持排序规则,并且无法使用不区分大小写的索引。