在 Spring Data Elasticsearch 中使用 asciifolding 过滤器创建自定义分析器

ene*_*ral 1 spring elasticsearch spring-data spring-boot spring-data-elasticsearch

我想在使用name进行搜索时cozum或在使用 name 进行记录后检索相同的对象。我已经搜索过这个并建议了。如何使用 spring data elasticsearch 实现此功能?\xc3\xa7\xc3\xb6z\xc3\xbcm\xc3\xa7\xc3\xb6z\xc3\xbcmasciifolding filter

\n
    @Document(indexName = "erp")\n    public class Company {\n    \n        @Id\n        private String id;\n    \n        private String name;\n    \n        private String description;\n    \n        @Field(type = FieldType.Nested, includeInParent = true)\n        private List<Employee> employees;\n\n        // getters, setter\n    }\n
Run Code Online (Sandbox Code Playgroud)\n

P.J*_*sch 8

您需要创建一个 asciifolding 分析器,请参阅Elasticsearch 文档,并将其添加到索引的索引设置中。

\n

然后,您可以在name@Field属性的注释中引用此分析器。

\n

编辑:完整示例

\n

首先是索引设置的文件,我将其命名为erp-company.json并将其保存在src/main/resources下:

\n
{\n  "analysis": {\n    "analyzer": {\n      "custom_analyzer": {\n        "type": "custom",\n        "tokenizer": "standard",\n        "char_filter": [\n          "html_strip"\n        ],\n        "filter": [\n          "lowercase",\n          "asciifolding"\n        ]\n      }\n    }\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

然后您需要在实体类中引用此文件和分析器,此处名为Company

\n
@Document(indexName = "erp")\n@Setting(settingPath = "/erp-company.json")\npublic class Company {\n\n    @Id\n    private String id;\n\n    @Field(type = FieldType.Text, analyzer = "custom_analyzer")\n    private String name;\n\n    @Field(type = FieldType.Text, analyzer = "custom_analyzer")\n    private String description;\n\n    // getters, setter\n}\n
Run Code Online (Sandbox Code Playgroud)\n

使用这个CompanyController的:

\n
@RestController\n@RequestMapping("/company")\npublic class CompanyController {\n\n    private final CompanyRepository repository;\n\n    public CompanyController(CompanyRepository repository) {\n        this.repository = repository;\n    }\n\n\n    @PostMapping\n    public Company put(@RequestBody Company company) {\n        return repository.save(company);\n    }\n\n    @GetMapping("/{name}")\n    public SearchHits<Company> get(@PathVariable String name) {\n        return repository.searchByName(name);\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

保存一些包含变音符号的数据(使用httpie):

\n
http POST :8080/company id=1 name="Ren\xc3\xa9e et Fran\xc3\xa7ois"\n
Run Code Online (Sandbox Code Playgroud)\n

不带变音符号的搜索:

\n
http  GET :8080/company/francois\n\nHTTP/1.1 200\nCache-Control: no-cache, no-store, max-age=0, must-revalidate\nConnection: keep-alive\nContent-Type: application/json\nDate: Wed, 09 Sep 2020 17:56:16 GMT\nExpires: 0\nKeep-Alive: timeout=60\nPragma: no-cache\nTransfer-Encoding: chunked\nX-Content-Type-Options: nosniff\nX-Frame-Options: DENY\nX-XSS-Protection: 1; mode=block\n\n{\n    "aggregations": null,\n    "empty": false,\n    "maxScore": 0.2876821,\n    "scrollId": null,\n    "searchHits": [\n        {\n            "content": {\n                "description": null,\n                "id": "1",\n                "name": "Ren\xc3\xa9e et Fran\xc3\xa7ois"\n            },\n            "highlightFields": {},\n            "id": "1",\n            "index": "erp",\n            "innerHits": {},\n            "nestedMetaData": null,\n            "score": 0.2876821,\n            "sortValues": []\n        }\n    ],\n    "totalHits": 1,\n    "totalHitsRelation": "EQUAL_TO"\n}\n
Run Code Online (Sandbox Code Playgroud)\n

Elasticsearch为索引返回的索引信息:

\n
{\n    "erp": {\n        "aliases": {},\n        "mappings": {\n            "properties": {\n                "_class": {\n                    "fields": {\n                        "keyword": {\n                            "ignore_above": 256,\n                            "type": "keyword"\n                        }\n                    },\n                    "type": "text"\n                },\n                "description": {\n                    "analyzer": "custom_analyzer",\n                    "type": "text"\n                },\n                "id": {\n                    "fields": {\n                        "keyword": {\n                            "ignore_above": 256,\n                            "type": "keyword"\n                        }\n                    },\n                    "type": "text"\n                },\n                "name": {\n                    "analyzer": "custom_analyzer",\n                    "type": "text"\n                }\n            }\n        },\n        "settings": {\n            "index": {\n                "analysis": {\n                    "analyzer": {\n                        "custom_analyzer": {\n                            "char_filter": [\n                                "html_strip"\n                            ],\n                            "filter": [\n                                "lowercase",\n                                "asciifolding"\n                            ],\n                            "tokenizer": "standard",\n                            "type": "custom"\n                        }\n                    }\n                },\n                "creation_date": "1599673911503",\n                "number_of_replicas": "1",\n                "number_of_shards": "1",\n                "provided_name": "erp",\n                "uuid": "lRwcKcPUQxKKGuNJ6G30uA",\n                "version": {\n                    "created": "7090099"\n                }\n            }\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n