ES - term与match查询

Author： AIHGF
发布时间：March 15, 2021
968views
No comments
8047 words
Categories：操作系统数据分析

1. term 与 terms 查询

ES term 查询，表示完全匹配，即精确查询，搜索时不会对搜索词进行分词拆解等.

例如，存放数据为：

{
    "title": "love China",
    "content": "people very love China",
    "tags": ["China", "love"]
}
{
    "title": "love HuBei",
    "content": "people very love HuBei",
    "tags": ["HuBei", "love"]
}

1.1. term 查询

term 查询：

{
  "query": {
    "term": {
      "title": "love"
    }
  }
}

返回结果如，存放的两条数据均能查询到：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

可以发现，title里有关love的关键字都查出来了，但是只想精确匹配 love China，按照下面的写法看看能不能查出来：

{
  "query": {
    "term": {
      "title": "love China"
    }
  }
}

执行发现无数据，从概念上看，term属于精确匹配，只能查单个词.

1.2. terms 查询

想用term匹配多个词怎么做？可以使用 terms 来：

{
  "query": {
    "terms": {
      "title": ["love", "China"]
    }
  }
}

查询结果如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

发现全部查询出来，为什么？因为terms里的 [ ] 多个是或者的关系，只要满足其中一个词就可以.想要通知满足两个词的话，就得使用bool的must来做，如下：

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "title": "love"
          }
        },
        {
          "term": {
            "title": "china"
          }
        }
      ]
    }
  }
}

可以看到，上面使用china是小写的。当使用的是大写的China 进行搜索的时候，发现搜不到任何信息。这是为什么了？title 这个词在进行存储的时候，进行了分词处理。这里使用的是默认的分词处理器进行了分词处理。可以看看如何进行分词处理的？

1.3. 分词处理器

GET test/_analyze
{
  "text" : "love China"
}

结果为：

{
  "tokens": [
    {
      "token": "love",
      "start_offset": 0,
      "end_offset": 4,
      "type": "",
      "position": 0
    },
    {
      "token": "china",
      "start_offset": 5,
      "end_offset": 10,
      "type": "",
      "position": 1
    }
  ]
}

分析出来的为love和china的两个词. 而term只能完完整整的匹配上面的词，不做任何改变的匹配. 所以，使用China这样的方式进行的查询的时候，就会失败.

2. match 和 match_parse 查询

2.1. match 查询原理

匹配查询match是核心查询语法，它的主要应用场景就是全文搜索，例如：

GET /music/children/_search
{
  "query": {
    "match": {
      "name": "wake"
    }
  }
}

ES 执行的步骤：

[1] - 检索字段类型：match的字段 name 为 text 类型，是一个 analyzed 的字段，那么查询条件的字符串也应该被analyzed.

[2] - 分析查询字符串：将查询字符串"wake"传入分词器中，因为只有一个单词，所以match最终执行的是单个底层的term查询.

[3] - 查找匹配文档：用 term 倒排索引中查找 wake 然后获取一组包含该词的文档.

[4] - 为每个文档评分：用term查询计算每个文档相关度评分，即TF、IDF、length norm 算法.

执行得到的结果，如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "music",
        "_type": "children",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "id": "a810fad4-54cb-59a1-9b7a-82adb46fa58d",
          "author": "John Smith",
          "name": "wake me, shark me",
          "content": "don't let me sleep too late, gonna get up brightly early in the morning",
          "language": "english",
          "tags": "enlighten",
          "length": 55,
          "isRelease": true,
          "releaseDate": "2019-12-21"
        }
      }
    ]
  }
}

因为样本数据的问题，暂时只有一条文档匹配.

再如：

[1] - 搜索name中包含"you"或"sunshine"的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": "you sunshine"
    }
  }
}

[2] - 搜索name中包含"you"和"sunshine"的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": {
        "query": "you sunshine",
        "operator": "and" //使用 and 关键字
      }
    }
  }
}

[3] - 搜索"you"、"my"、"sunshine"、"teeth" 4个关键字中，至少包含3个的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": {
        "query": "you my sunshine teeth",
        "minimum_should_match": "75%" //指定至少匹配其中的多少个关键字
      }
    }
  }
}

2.2. match 查询

先用 love China来匹配：

GET test/doc/_search
{
  "query": {
    "match": {
      "title": "love China"
    }
  }
}

返回结果如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": [
            "HuBei",
            "love"
          ]
        }
      }
    ]
  }
}

发现两个都查出来了，为什么？因为match进行搜索的时候，会先进行分词拆分，拆完后，再来匹配，上面两个内容，他们title的词条为： love china hubei ，搜索的为love China 进行分词处理得到为love china ，并且属于或的关系，只要任何一个词条在里面就能匹配到.

如果想 love 和 China 同时匹配到的话，怎么做？使用 match_phrase.

2.3. match_phrase 查询

match_phrase 称为短语搜索，要求所有的分词必须同时出现在文档中，同时位置必须紧邻一致.

GET test/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "love china"
    }
  }
}

返回结果为：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      }
    ]
  }
}

2.4. match 查询之 minimum_should_match

minimum_should_match：当operator参数设置为or时，该参数用来控制应该匹配的分词的最少数量；

{"query":{  
      "match":{  
         "字段名":{  
            "query":"查询内容",
            "operator":"or",
            "minimum_should_match":"70%"
         }
      }
   }
}

数字可以是负数，例如有4个term的匹配，当匹配度为-25%与75%，其意义是一样的，都是最少匹配三个，但处理5个term时，-25%表示至少匹配四个，而75%表示至少匹配三个term.

3. 参考

[1] - Elasticsearch系列---深入全文搜索 - 2020-03-03 - 知乎

[2] - ES搜索 term与match区别 bool查询 - 2020.07.05

Last modification：March 21st, 2021 at 06:58 pm

ES - term与match查询

AIHGF • 2021 年 03 月 15 日

1. term 与 terms 查询

ES term 查询，表示完全匹配，即精确查询，搜索时不会对搜索词进行分词拆解等.

例如，存放数据为：

{
    "title": "love China",
    "content": "people very love China",
    "tags": ["China", "love"]
}
{
    "title": "love HuBei",
    "content": "people very love HuBei",
    "tags": ["HuBei", "love"]
}

1.1. term 查询

term 查询：

{
  "query": {
    "term": {
      "title": "love"
    }
  }
}

返回结果如，存放的两条数据均能查询到：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

可以发现，title里有关love的关键字都查出来了，但是只想精确匹配 love China，按照下面的写法看看能不能查出来：

{
  "query": {
    "term": {
      "title": "love China"
    }
  }
}

执行发现无数据，从概念上看，term属于精确匹配，只能查单个词.

1.2. terms 查询

想用term匹配多个词怎么做？可以使用 terms 来：

{
  "query": {
    "terms": {
      "title": ["love", "China"]
    }
  }
}

查询结果如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "title": "love"
          }
        },
        {
          "term": {
            "title": "china"
          }
        }
      ]
    }
  }
}

1.3. 分词处理器

GET test/_analyze
{
  "text" : "love China"
}

结果为：

{
  "tokens": [
    {
      "token": "love",
      "start_offset": 0,
      "end_offset": 4,
      "type": "",
      "position": 0
    },
    {
      "token": "china",
      "start_offset": 5,
      "end_offset": 10,
      "type": "",
      "position": 1
    }
  ]
}

2. match 和 match_parse 查询

2.1. match 查询原理

匹配查询match是核心查询语法，它的主要应用场景就是全文搜索，例如：

GET /music/children/_search
{
  "query": {
    "match": {
      "name": "wake"
    }
  }
}

ES 执行的步骤：

[1] - 检索字段类型：match的字段 name 为 text 类型，是一个 analyzed 的字段，那么查询条件的字符串也应该被analyzed.

[2] - 分析查询字符串：将查询字符串"wake"传入分词器中，因为只有一个单词，所以match最终执行的是单个底层的term查询.

[3] - 查找匹配文档：用 term 倒排索引中查找 wake 然后获取一组包含该词的文档.

[4] - 为每个文档评分：用term查询计算每个文档相关度评分，即TF、IDF、length norm 算法.

执行得到的结果，如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "music",
        "_type": "children",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "id": "a810fad4-54cb-59a1-9b7a-82adb46fa58d",
          "author": "John Smith",
          "name": "wake me, shark me",
          "content": "don't let me sleep too late, gonna get up brightly early in the morning",
          "language": "english",
          "tags": "enlighten",
          "length": 55,
          "isRelease": true,
          "releaseDate": "2019-12-21"
        }
      }
    ]
  }
}

因为样本数据的问题，暂时只有一条文档匹配.

再如：

[1] - 搜索name中包含"you"或"sunshine"的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": "you sunshine"
    }
  }
}

[2] - 搜索name中包含"you"和"sunshine"的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": {
        "query": "you sunshine",
        "operator": "and" //使用 and 关键字
      }
    }
  }
}

[3] - 搜索"you"、"my"、"sunshine"、"teeth" 4个关键字中，至少包含3个的文档

GET /music/children/_search
{
  "query": {
    "match": {
      "name": {
        "query": "you my sunshine teeth",
        "minimum_should_match": "75%" //指定至少匹配其中的多少个关键字
      }
    }
  }
}

2.2. match 查询

先用 love China来匹配：

GET test/doc/_search
{
  "query": {
    "match": {
      "title": "love China"
    }
  }
}

返回结果如：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": [
            "HuBei",
            "love"
          ]
        }
      }
    ]
  }
}

如果想 love 和 China 同时匹配到的话，怎么做？使用 match_phrase.

2.3. match_phrase 查询

match_phrase 称为短语搜索，要求所有的分词必须同时出现在文档中，同时位置必须紧邻一致.

GET test/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "love china"
    }
  }
}

返回结果为：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      }
    ]
  }
}

2.4. match 查询之 minimum_should_match

minimum_should_match：当operator参数设置为or时，该参数用来控制应该匹配的分词的最少数量；

{"query":{  
      "match":{  
         "字段名":{  
            "query":"查询内容",
            "operator":"or",
            "minimum_should_match":"70%"
         }
      }
   }
}

3. 参考

[1] - Elasticsearch系列---深入全文搜索 - 2020-03-03 - 知乎

[2] - ES搜索 term与match区别 bool查询 - 2020.07.05

ES - term与match查询

1. term 与 terms 查询

1.1. term 查询

1.2. terms 查询

1.3. 分词处理器

2. match 和 match_parse 查询

2.1. match 查询原理

2.2. match 查询

2.3. match_phrase 查询

2.4. match 查询之 minimum_should_match

3. 参考

※相关文章推荐※

※最新文章推荐※

Leave a Comment Cancel reply

ES - term与match查询

1. term 与 terms 查询

1.1. term 查询

1.2. terms 查询

1.3. 分词处理器

2. match 和 match_parse 查询

2.1. match 查询原理

2.2. match 查询

2.3. match_phrase 查询

2.4. match 查询之 minimum_should_match

3. 参考