MongoDB - 正则式查询 $regex

> MongoDB 提供了很多高效的数据查询工具，比如正则表达式 regular expression.
>   [$regex](https://docs.mongodb.com/manual/reference/operator/query/regex/)

正则表达式用于匹配字符串的查询.
其语法形式有：

```mysql
{ <field>: { $regex: /pattern/, $options: '<options>' } }
{ <field>: { $regex: 'pattern', $options: '<options>' } }
{ <field>: { $regex: /pattern/<options> } }
```

MongoDB 中，还可以采用正则表达式对象(如 `/pattern/`) 来指定正则表达式:

```mysql
{ <field>: /pattern/<options> }
```

可选参数说明 - `$options`:
- i - 不区分大小写
- m - 查询匹配中使用了 anchors，例如 ^(代表开头)和 $(代表结尾)，以及匹配 \n 后的字符串. 没有该参数时，anchors 匹配字符串的开始或结尾.
- x - 忽视所有空白字符. 要求 \$regex 与 \$option 一起用
- s - 允许点字符 (.) 匹配所有的字符，包括换行符. 要求 \$regex 与 \$option 一起用.

## 1. 实例

假设集合 collection - **products**，其包含如下文档：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }
{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before  line" }
{ "_id" : 103, "sku" : "xyz789", "description" : "Multiple\nline description" }
```

[1] - **LIKE 匹配查询**
匹配所有以 789 结尾**sku** 的文档：

```mysql
db.products.find( { sku: { $regex: /789$/ } } )
```

类似于 SQL LIKE：

```mysql
SELECT * FROM products
WHERE sku like "%789";
```

[2] - **不区分大小写的匹配查询**
利用 **option - i** 实现不区分大小写的匹配查询，如匹配 **ABC**：

```mysql
db.products.find( { sku: { $regex: /^ABC/i } } )
```

输出如下：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }
```

[3] - **包含 **S** 开头的多行字符串匹配**
比如以 **S** 开头，其中包括 **/nS** 开头的，利用 **option - m** 实现，如：

```mysql
db.products.find( { description: { $regex: /^S/, $options: 'm' } } )
```

输出如下：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }
```

如果没有 **option - m** 选项，则输出如下：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
```

如果 **$regex** 匹配没有 anchor，则将字符串作为整体去匹配，如：

```mysql
db.products.find( { description: { $regex: /S/ } } )
```

则，匹配结果输出如下：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }
```

[4] - **采用 . Dot 字符的匹配查询**
利用 **option - s** 实现 . dot 字符来匹配所有的字符，包括换行符，如：
查询 `description` 中 `m` 开头，且后面包含 `line` 字符串的结果：

```mysql
db.products.find( { description: { $regex: /m.*line/, $options: 'si' } } )
```

输出如下：

```mysql
{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before     line" }
{ "_id" : 103, "sku" : "xyz789", "description" : "Multiple\nline description" }
```

如果没有 **option - s** ，则匹配结果的输入仅有如下：

```mysql
{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before     line" }
```

[5] - **忽略空字符的匹配**
采用 **option - x** 来忽略空字符和注释，其中 **#** 表示注释， **\n** 表示结尾，如：

```mysql
var pattern = "abc #category code\n123 #item number"
db.products.find( { sku: { $regex: pattern, $options: "x" } } )
```

查询的匹配结果输出如下：

```mysql
{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
```

[6] - **数组元素使用正则表达式**
在数组字段中使用正则表达式来查找内容, 对于标签的实现上非常有用.
如果要查找包含以 `run` 开头的`sku` 数据(`ru` 或 `run` 或 `runoob`)，则：

```mysql
db.posts.find({sku:{$regex:"run"}})
```

## 2. pymongo 正则表达式

```python
# --*-- coding: utf-8 --*--
from pymongo import MongoClient

client = MongoClient('192.168.1.189')
db = client['db_test']
coll = db['db_coll']

# 查询匹配开头是 abc 的 pic_name 的所有文档.
datas = coll.find({"pic_name":{"$regex": "^abc"}})

#模糊匹配查询 - 匹配 'SomeUser'
datas = coll.find({'$and':[{'pic_name': 'test.jpg'},{'username':{'$regex':r'(?i)SomeUser'}}]})
#模糊匹配查询 - 匹配数字
datas = coll.find({'pic_num':{'$regex':r'^[\d+]'}})

#返回指定数量的记录
count = 0
for data in coll.find().litmit(5):
    print(data)
    count += 1
    print(count)
```

## 3. 文档

[1] - [MongoDB Documentation](https://docs.mongodb.com/manual/reference/operator/query/regex/)
[2] - [PyMongo  Tutorial](http://api.mongodb.com/python/current/tutorial.html#getting-a-database)

MongoDB 提供了很多高效的数据查询工具，比如正则表达式 regular expression.
$regex

正则表达式用于匹配字符串的查询.
其语法形式有：

{ <field>: { $regex: /pattern/, $options: '<options>' } }
{ <field>: { $regex: 'pattern', $options: '<options>' } }
{ <field>: { $regex: /pattern/<options> } }

MongoDB 中，还可以采用正则表达式对象(如 /pattern/) 来指定正则表达式:

{ <field>: /pattern/<options> }

可选参数说明 - $options:

i - 不区分大小写
m - 查询匹配中使用了 anchors，例如 ^(代表开头)和 $(代表结尾)，以及匹配 n 后的字符串. 没有该参数时，anchors 匹配字符串的开始或结尾.
x - 忽视所有空白字符. 要求 $regex 与 $option 一起用
s - 允许点字符 (.) 匹配所有的字符，包括换行符. 要求 $regex 与 $option 一起用.

1. 实例

假设集合 collection - products，其包含如下文档：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }
{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before  line" }
{ "_id" : 103, "sku" : "xyz789", "description" : "Multiple\nline description" }

[1] - LIKE 匹配查询
匹配所有以 789 结尾sku 的文档：

db.products.find( { sku: { $regex: /789$/ } } )

类似于 SQL LIKE：

SELECT * FROM products
WHERE sku like "%789";

[2] - 不区分大小写的匹配查询
利用 option - i 实现不区分大小写的匹配查询，如匹配 ABC：

db.products.find( { sku: { $regex: /^ABC/i } } )

输出如下：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }

[3] - 包含 S 开头的多行字符串匹配
比如以 S 开头，其中包括 /nS 开头的，利用 option - m 实现，如：

db.products.find( { description: { $regex: /^S/, $options: 'm' } } )

输出如下：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }

如果没有 option - m 选项，则输出如下：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }

如果 $regex 匹配没有 anchor，则将字符串作为整体去匹配，如：

db.products.find( { description: { $regex: /S/ } } )

则，匹配结果输出如下：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }
{ "_id" : 101, "sku" : "abc789", "description" : "First line\nSecond line" }

[4] - 采用 . Dot 字符的匹配查询
利用 option - s 实现 . dot 字符来匹配所有的字符，包括换行符，如：
查询 description 中 m 开头，且后面包含 line 字符串的结果：

db.products.find( { description: { $regex: /m.*line/, $options: 'si' } } )

输出如下：

{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before     line" }
{ "_id" : 103, "sku" : "xyz789", "description" : "Multiple\nline description" }

如果没有 option - s ，则匹配结果的输入仅有如下：

{ "_id" : 102, "sku" : "xyz456", "description" : "Many spaces before     line" }

[5] - 忽略空字符的匹配
采用 option - x 来忽略空字符和注释，其中 # 表示注释， n 表示结尾，如：

var pattern = "abc #category code\n123 #item number"
db.products.find( { sku: { $regex: pattern, $options: "x" } } )

查询的匹配结果输出如下：

{ "_id" : 100, "sku" : "abc123", "description" : "Single line description." }

[6] - 数组元素使用正则表达式
在数组字段中使用正则表达式来查找内容, 对于标签的实现上非常有用.
如果要查找包含以 run 开头的sku 数据(ru 或 run 或 runoob)，则：

db.posts.find({sku:{$regex:"run"}})

2. pymongo 正则表达式

# --*-- coding: utf-8 --*--
from pymongo import MongoClient

client = MongoClient('192.168.1.189')
db = client['db_test']
coll = db['db_coll']

# 查询匹配开头是 abc 的 pic_name 的所有文档.
datas = coll.find({"pic_name":{"$regex": "^abc"}}) 

#模糊匹配查询 - 匹配 'SomeUser'
datas = coll.find({'$and':[{'pic_name': 'test.jpg'},{'username':{'$regex':r'(?i)SomeUser'}}]})
#模糊匹配查询 - 匹配数字
datas = coll.find({'pic_num':{'$regex':r'^[\d+]'}})

#返回指定数量的记录
count = 0
for data in coll.find().litmit(5):
    print(data)
    count += 1
    print(count)

3. 文档

[1] - MongoDB Documentation
[2] - PyMongo Tutorial

Last modification：November 3, 2019

If you think my article is useful to you, please feel free to appreciate

1. 实例

2. pymongo 正则表达式

3. 文档