1. term 与 terms 查询
ES term 查询,表示完全匹配,即精确查询,搜索时不会对搜索词进行分词拆解等.
例如,存放数据为:
{
"title": "love China",
"content": "people very love China",
"tags": ["China", "love"]
}
{
"title": "love HuBei",
"content": "people very love HuBei",
"tags": ["HuBei", "love"]
}
1.1. term 查询
term 查询:
{
"query": {
"term": {
"title": "love"
}
}
}
返回结果如,存放的两条数据均能查询到:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6931472,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "8",
"_score": 0.6931472,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": ["HuBei","love"]
}
},
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 0.6931472,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": ["China","love"]
}
}
]
}
}
可以发现,title里有关love的关键字都查出来了,但是只想精确匹配 love China
,按照下面的写法看看能不能查出来:
{
"query": {
"term": {
"title": "love China"
}
}
}
执行发现无数据,从概念上看,term属于精确匹配,只能查单个词.
1.2. terms 查询
想用term匹配多个词怎么做?可以使用 terms 来:
{
"query": {
"terms": {
"title": ["love", "China"]
}
}
}
查询结果如:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6931472,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "8",
"_score": 0.6931472,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": ["HuBei","love"]
}
},
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 0.6931472,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": ["China","love"]
}
}
]
}
}
发现全部查询出来,为什么?因为terms里的 [ ] 多个是或者的关系,只要满足其中一个词就可以.想要通知满足两个词的话,就得使用bool的must来做,如下:
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "love"
}
},
{
"term": {
"title": "china"
}
}
]
}
}
}
可以看到,上面使用china
是小写的。当使用的是大写的China
进行搜索的时候,发现搜不到任何信息。这是为什么了?title 这个词在进行存储的时候,进行了分词处理。这里使用的是默认的分词处理器进行了分词处理。可以看看如何进行分词处理的?
1.3. 分词处理器
GET test/_analyze
{
"text" : "love China"
}
结果为:
{
"tokens": [
{
"token": "love",
"start_offset": 0,
"end_offset": 4,
"type": "",
"position": 0
},
{
"token": "china",
"start_offset": 5,
"end_offset": 10,
"type": "",
"position": 1
}
]
}
分析出来的为love
和china
的两个词. 而term
只能完完整整的匹配上面的词,不做任何改变的匹配. 所以,使用China
这样的方式进行的查询的时候,就会失败.
2. match 和 match_parse 查询
2.1. match 查询原理
匹配查询match是核心查询语法,它的主要应用场景就是全文搜索,例如:
GET /music/children/_search
{
"query": {
"match": {
"name": "wake"
}
}
}
ES 执行的步骤:
[1] - 检索字段类型:match的字段 name 为 text 类型,是一个 analyzed 的字段,那么查询条件的字符串也应该被analyzed.
[2] - 分析查询字符串:将查询字符串"wake"传入分词器中,因为只有一个单词,所以match最终执行的是单个底层的term查询.
[3] - 查找匹配文档:用 term 倒排索引中查找 wake 然后获取一组包含该词的文档.
[4] - 为每个文档评分:用term查询计算每个文档相关度评分,即TF、IDF、length norm 算法.
执行得到的结果,如:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "music",
"_type": "children",
"_id": "2",
"_score": 0.2876821,
"_source": {
"id": "a810fad4-54cb-59a1-9b7a-82adb46fa58d",
"author": "John Smith",
"name": "wake me, shark me",
"content": "don't let me sleep too late, gonna get up brightly early in the morning",
"language": "english",
"tags": "enlighten",
"length": 55,
"isRelease": true,
"releaseDate": "2019-12-21"
}
}
]
}
}
因为样本数据的问题,暂时只有一条文档匹配.
再如:
[1] - 搜索name中包含"you"或"sunshine"的文档
GET /music/children/_search
{
"query": {
"match": {
"name": "you sunshine"
}
}
}
[2] - 搜索name中包含"you"和"sunshine"的文档
GET /music/children/_search
{
"query": {
"match": {
"name": {
"query": "you sunshine",
"operator": "and" //使用 and 关键字
}
}
}
}
[3] - 搜索"you"、"my"、"sunshine"、"teeth" 4个关键字中,至少包含3个的文档
GET /music/children/_search
{
"query": {
"match": {
"name": {
"query": "you my sunshine teeth",
"minimum_should_match": "75%" //指定至少匹配其中的多少个关键字
}
}
}
}
2.2. match 查询
先用 love China
来匹配:
GET test/doc/_search
{
"query": {
"match": {
"title": "love China"
}
}
}
返回结果如:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.3862944,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 1.3862944,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
},
{
"_index": "test",
"_type": "doc",
"_id": "8",
"_score": 0.6931472,
"_source": {
"title": "love HuBei",
"content": "people very love HuBei",
"tags": [
"HuBei",
"love"
]
}
}
]
}
}
发现两个都查出来了,为什么?因为match进行搜索的时候,会先进行分词拆分,拆完后,再来匹配,上面两个内容,他们title的词条为: love china hubei
,搜索的为love China
进行分词处理得到为love china
,并且属于或的关系,只要任何一个词条在里面就能匹配到.
如果想 love
和 China
同时匹配到的话,怎么做?使用 match_phrase
.
2.3. match_phrase 查询
match_phrase
称为短语搜索,要求所有的分词必须同时出现在文档中,同时位置必须紧邻一致.
GET test/doc/_search
{
"query": {
"match_phrase": {
"title": "love china"
}
}
}
返回结果为:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.3862944,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "7",
"_score": 1.3862944,
"_source": {
"title": "love China",
"content": "people very love China",
"tags": [
"China",
"love"
]
}
}
]
}
}
2.4. match 查询之 minimum_should_match
minimum_should_match:当operator参数设置为or时,该参数用来控制应该匹配的分词的最少数量;
{"query":{
"match":{
"字段名":{
"query":"查询内容",
"operator":"or",
"minimum_should_match":"70%"
}
}
}
}
数字可以是负数,例如有4个term的匹配,当匹配度为-25%与75%,其意义是一样的,都是最少匹配三个,但处理5个term时,-25%表示至少匹配四个,而75%表示至少匹配三个term.