ElasticSearch 基础操作

1 增删改查CURD
2 数据类型
3 搜索操作
- 3.1 URI Search
- 3.2 Request Body Search
4 参考

1 增删改查CURD

1.1 索引的CURD

1）新增

# 创建索引名为 tehero_index 的索引
PUT /tehero_index?pretty
{
# 索引设置
  "settings": {
    "index": {
      "number_of_shards": 1, # 分片数量设置为1，默认为5
      "number_of_replicas": 1 # 副本数量设置为1，默认为1
    }
  },
# 映射配置
  "mappings": {
    "_doc": { # 类型名，强烈建议设置为 _doc
      "dynamic": false, # 动态映射配置
# 字段属性配置
      "properties": {
        "id": {
          "type": "integer"  # 表示字段id，类型为integer
        },
        "name": {
          "type": "text",
          "analyzer": "ik_max_word", # 存储时的分词器
          "search_analyzer": "ik_smart"  # 查询时的分词器
        },
        "createAt": {
          "type": "date"
        }
      }
    }
  }
}

注：dynamic：是动态映射的开关，有3种状态：true 动态添加新的字段--缺省；推荐使用）false 忽略新的字段,不会添加字段映射，但是会存在于_source中；（strict 如果遇到新字段抛出异常；

# 返回值如下：
{
  "acknowledged": true, # 是否在集群中成功创建了索引
  "shards_acknowledged": true,
  "index": "tehero_index"
}

2）查询

GET /tehero_index  # 索引名，可以同时检索多个索引或所有索引
如：GET /*    GET /tehero_index,other_index

GET /_cat/indices?v  #查看所有 index

结果：

{
  "tehero_index": {
    "aliases": {},
    "mappings": {
      "_doc": {
        "dynamic": "false",
        "properties": {
          "createAt": {
            "type": "date"
          },
          "id": {
            "type": "integer"
          },
          "name": {
            "type": "text",
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_smart"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1589271136921",
        "number_of_shards": "1",
        "number_of_replicas": "1",
        "uuid": "xueDIxeUQnGBQTms65wA6Q",
        "version": {
          "created": "6050499"
        },
        "provided_name": "tehero_index"
      }
    }
  }
}

3）修改

ES 提供了一系列对 index 修改的语句，包括副本数量的修改、新增字段、refresh_interval 值的修改、索引分析器的修改、别名的修改

先学习常用的语法：

# 修改副本数
PUT /tehero_index/_settings
{
    "index" : {
        "number_of_replicas" : 2
    }
}

# 修改分片刷新时间,默认为1s
PUT /tehero_index/_settings
{
    "index" : {
        "refresh_interval" : "2s"
    }
}

# 新增字段 age
PUT /tehero_index/_mapping/_doc 
{
  "properties": {
    "age": {
      "type": "integer"
    }
  }
}

更新完后，我们再次查看索引配置：

GET /tehero_index
结果略：验证是否已经修改成功

4）删除

# 删除索引
DELETE /tehero_index
# 验证索引是否存在
HEAD tehero_index
返回：404 - Not Found

1.2 文档的CURD

1）新增

# 新增单条数据，并指定es的id 为 1
PUT /tehero_index/_doc/1?pretty
{
  "name": "Te Hero"
}
# 新增单条数据，使用ES自动生成id
POST /tehero_index/_doc?pretty
{
  "name": "Te Hero2"
}

# 使用 op_type 属性，强制执行某种操作
PUT tehero_index/_doc/1?op_type=create
{
     "name": "Te Hero3"
}
注意：op_type=create强制执行时，若id已存在，ES会报“version_conflict_engine_exception”。
op_type 属性在实践中同步数据时是有用的，后面讲解数据库与ES的数据同步问题时，TeHero再为大家详细讲解。

我们查询数据，看下效果：GET /tehero_index/_doc/_search

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Te Hero"
        }
      },
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "P7-FCHIBJxE1TMY0WNGN",
        "_score": 1,
        "_source": {
          "name": "Te Hero2"
        }
      }
    ]
  }
}

2）修改

# 根据id，修改单条数据
（ps：修改语句和新增语句相同，可以理解为根据ID，存在则更新；不存在则新增）
PUT /tehero_index/_doc/1?pretty
{
  "name": "Te Hero-update"
}

# 根据查询条件id=10，修改name="更新后的name"
（版本冲突而不会导致_update_by_query 中止）
POST tehero_index/_update_by_query
{
  "script": {
    "source": "ctx._source.name = params.name",
    "lang": "painless",
    "params":{
      "name":"更新后的name"
    }
  },
  "query": {
    "term": {
      "id": "10"
    }
  }
}

文档的更新还可以Update By Query API

3）查询

# 1、根据id，获取单个数据
GET /tehero_index/_doc/1
结果：
{
  "_index": "tehero_index",
  "_type": "_doc",
  "_id": "1",
  "_version": 5,
  "found": true,
  "_source": {
    "name": "Te Hero-update",
    "age": 18
  }
}

# 2、获取索引下的所有数据
GET /tehero_index/_doc/_search
结果：
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "P7-FCHIBJxE1TMY0WNGN",
        "_score": 1,
        "_source": {
          "name": "Te Hero2"
        }
      },
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "_update",
        "_score": 1,
        "_source": {
          "name": "Te Hero3"
        }
      },
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "Te Hero-update",
          "age": 18
        }
      }
    ]
  }
}

# 3、条件查询（下一节详细介绍）
GET /tehero_index/_doc/_search
{
  "query": {
    "match": {
      "name": "2"
    }
  }
}
结果：
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.9808292,
    "hits": [
      {
        "_index": "tehero_index",
        "_type": "_doc",
        "_id": "P7-FCHIBJxE1TMY0WNGN",
        "_score": 0.9808292,
        "_source": {
          "name": "Te Hero2"
        }
      }
    ]
  }
}

4）删除

# 1、根据id，删除单个数据
DELETE /tehero_index/_doc/1

# 2、delete by query
POST tehero_index/_delete_by_query
{
  "query": { 
    "match": {
     "name": "2"
    }
  }
}

1.3 批量操作 Bulk API

bulk API，支持将多个操作打包成一个请求，实现批处理。

既能实现高效的执行操作，也可以减少网络传递次数。

# 批量操作
POST _bulk
{ "index" : { "_index" : "tehero_test1", "_type" : "_doc", "_id" : "1" } }
{ "this_is_field1" : "this_is_index_value" }
{ "delete" : { "_index" : "tehero_test1", "_type" : "_doc", "_id" : "2" } }
{ "create" : { "_index" : "tehero_test1", "_type" : "_doc", "_id" : "3" } }
{ "this_is_field3" : "this_is_create_value" }
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "tehero_test1"} }
{ "doc" : {"this_is_field2" : "this_is_update_value"} }

# 查询所有数据
GET /tehero_test1/_doc/_search
结果：
{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "tehero_test1",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "this_is_field1": "this_is_index_value",
          "this_is_field2": "this_is_update_value"
        }
      },
      {
        "_index": "tehero_test1",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_source": {
          "this_is_field3": "this_is_create_value"
        }
      }
    ]
  }
}

注：POST _bulk 都做了哪些操作呢？
1、若索引tehero_test1不存在，则创建一个名为tehero_test1的 index，同时若id = 1 的文档存在，则更新；不存在则插入一条 id=1 的文档；
2、删除 id=2 的文档；
3、插入 id=3 的文档；若文档已存在，则报异常；
4、更新 id = 1 的文档。

2 数据类型

2.1 String类型

主要包括text和keyword两种，二者的主要区别在于是否进行了分词

text类型会进行分词，可能导致长文无法检索的情况

# 创建索引
PUT /toherotest
{
  "mappings": {
    "_doc":{
      "properties" : {
                "field1" : { "type" : "text" }
            }
    }
  }
}
# 存入数据
POST /toherotest/_doc/1
{
  "field1":"中国我爱你"
}

2.2 date时间类型

ES支持的三种时间类型格式

yyyy-MM-dd HH:mm:ss
yyyy-MM-dd
epoch_millis（毫秒值）

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "date": {
          "type": "date",
    	  "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{ "date": "2015-01-01" } 

PUT my_index/_doc/2
{ "date": "2015-01-01T12:10:30Z" } 

PUT my_index/_doc/3
{ "date": 1420070400001 } 

GET my_index/_search
{
  "sort": { "date": "asc"} 
}

注意：一旦我们规定了格式，如果新增数据不符合这个格式，ES将会报错mapper_parsing_exception

2.3 复杂格式

Array：数组中的所有值都必须具有相同的数据类型

# 新增数据
POST /toherotest/_doc/2
{
  "field1":["这是","一个","数组"]
}

object：最常见的键值对，类似于字典；多个object可以组成List<object>类型，这种情况下的object不允许彼此独立地索引查询，即不能实现有效的过滤查询

# 添加 属性为object的字段 field3
PUT toherotest/_mapping/_doc 
{
  "properties": {
    "field3": {
      "type": "object"
    }
  }
}
# 新增数据
POST /toherotest/_doc/3
{
  "field3":[ { "name":"tohero1", "age":1 }, { "name":"tohero2", "age":2 } ]
}

POST /toherotest/_doc/4
{
  "field3": [ { "name":"tohero1", "age":2 }, { "name":"tohero2", "age":1 } ]
}

# 注意：查询 name = “tohero1” and “age”= 1 的结果会出现两条（id=3和id=4）

nested：需要建立对象数组的索引并保持数组中每个对象的独立性，则应使用nested数据类型而不是 object数据类型。借助nested类型，可以确保每个嵌套的对象可以被独立的查询。

2.4 GEO 地理位置类型

分为坐标：Geo-point 和形状：Geo-shape，两种数据类型

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}

PUT my_index/_doc/2
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}

PUT my_index/_doc/3
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}

PUT my_index/_doc/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

距离查询：距离某个点方圆200km

GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "200km",
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }
}

3 搜索操作

URI Search，通常 URI 参数指定搜索参数。
Request Body Search，在请求内容包含在请求体中发送。

相对而言，Request Body 方式更灵活，包含了全部的搜索支持。而 URI Search 主要在测试时使用，比较方便。

3.1 URI Search

GET /users/_search?q=username:wupx

URI Search 使用的是 GET 方式，其中 q 指定查询语句，语法为 Query String Syntax，是 KV 键值对的形式；上面的请求表示对 username 字段进行查询，查询包含 wupx 的所有文档。

常见参数：

df：默认字段，不指定时会对所有字段进行查询
sort：根据字段名排序
from：返回的索引匹配结果的开始值，默认为 0
size：搜索结果返回的条数，默认为 10
timeout：超时的时间设置
fields：只返回索引中指定的列，多个列中间用逗号分开
analyzer：当分析查询字符串的时候使用的分词器
analyze_wildcard：通配符或者前缀查询是否被分析，默认为 false
explain：在每个返回结果中，将包含评分机制的解释
_source：是否包含元数据，同时支持 _source_includes 和 _source_excludes
lenient：若设置为 true，字段类型转换失败的时候将被忽略，默认为 false
default_operator：默认多个条件的关系，AND 或者 OR，默认为 OR
search_type：搜索的类型，可以为 dfs_query_then_fetch 或 query_then_fetch，默认为 query_then_fetch

指定字段查询VS泛查询

指定字段查询示例： GET /movies/_search?q=2012&df=title

指定字段查询等效示例： GET /movies/_search?q=title:2012

泛查询也就是不指定字段，对所有字段进行查询

泛查询示例：GET /movies/_search?q=2012

Team Query VS Phrase Query

Term Query 示例： GET /movies/_search?q=title:(Beautiful Mind)

Beautiful Mind 等效于 Beautiful OR Mind

Phrase Query 示例： GET /movies/_search?q=title:"Beautiful Mind"

"Beautiful Mind"等效于 Beautiful AND Mind，还要求前后顺序一致

URI Search的其他特性

布尔操作：AND（&&）、OR（||）、NOT（！），注意大写

范围查询和数据运算：GET /movies/_search?q=year:>=1994

支持通配符查询（效率低，不建议用）、正则、模糊匹配和近似查询

操作简单，方便测试，但是功能不够全面

3.2 Request Body Search

支持 GET 和 POST 方式对索引进行查询

需要指定操作的索引名称，通过 _search 来标明这个请求为搜索请求

# 简单示例
POST /movies/_search
{
  "from":10,
  "size":20, # from和size可配合实现分页效果
  "sort":[{"year":"desc"}], # 排序最好仅针对数值型和日期型
  "_source":["title"], # 限定返回结果为title
  "query":{
    "match_all": {}
  }
}

脚本字段：使用 ES 中的 painless 的脚本去算出一个新的字段结果

GET /movies/_search
{
  "script_fields": {
    "new_field": {
      "script": {
        "lang": "painless",
        "source": "doc['year'].value+'_hello'"
      }
    }
  },
  "query": {
    "match_all": {}
  }
}

查询之match：对查询语句分词，然后根据倒排索引匹配，按照得分排序结果

POST /users/_search
{
  "query": {
    "match": {
      "title": "wupx huxy"
      "operator": "and"
    }
  }
}

查询之match_phrase：相比于普通match，有顺序约束，并且所有term都要匹配到

POST /movies/_search
{
  "query": {
    "match_phrase": {
      "title":{
        "query": "one love" # 必须顺序出现
        "slop":1 # 控制单词间的间隔（关键词间可以有一个字符）
      }
    }
  }
}

查询之term：不分词的匹配，传入多个词时关键词term应改为terms

POST /users/_search
{
  "query": {
    "terms": {
      "username": [
        "wupx",
        "huxy"
      ]
    }
  }
}

查询之query_string：语法式查询，效果类似于match_phrase，但没有顺序的约束

POST users/_search
{
  "query": {
    "query_string": {
      "default_field": "username",
      "query": "wupx AND huxy"
    }
  }
}

查询之simple_query_string：query_string的简化版，会忽略错误语法，但功能不全面

{
  "query": {
    "simple_query_string": {
      "query": "wu px",
      "fields": ["username"],
      "default_operator": "AND"
    }
  }
}

4 参考

ElasticSearch系列04：索引与文档的CURD

ElasticSearch系列03：ES的数据类型

看完这篇还不会 Elasticsearch 搜索,那我就哭了！

个人笔记

Digital Garden | 王半仙