Elasticsearch practical tutorial

Have you ever worked on SQLLIKEQuerying is slow and giving you a headache? Need to quickly analyze massive amounts of unstructured data? Elasticsearch is a powerful tool to solve these problems - it is a distributed RESTful search and analysis engine built on Apache Lucene. It is suitable for full-text retrieval, log analysis, real-time data insights and other scenarios.


1. Get started quickly: installation and startup

We use Docker for deployment, which is the fastest way to start a local development environment without having to deal with system dependencies.

1.1 Docker single node deployment

First pull the official stable version image (this article uses 8.12.0):

# 拉取 Elasticsearch 8.12.0 镜像
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.12.0

Then run the container (Note: Security features are disabled for local development and must be enabled for production environments):

# 启动单节点 Elasticsearch 容器
docker run -d --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  docker.elastic.co/elasticsearch/elasticsearch:8.12.0

Wait about 30 seconds and check whether the startup is successful:

# 检查集群状态
curl -X GET "localhost:9200/"

see containscluster_nameversionThe JSON response indicates success.

1.2 Docker Compose deployment

To facilitate subsequent management, usedocker-compose.ymldocument:

# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ports:
      - "9200:9200"
      - "9300:9300"
    volumes:
      - es_data:/usr/share/elasticsearch/data

volumes:
  es_data:

Start command:

docker-compose up -d

2. Literacy of core concepts

Use relational-database as an analogy to quickly understand the core concepts of Elasticsearch:

Elasticsearch conceptsDescriptionAnalogy MySQL
IndexCollection of documentsDatabase
DocumentThe basic unit of search (JSON format)Row
FieldDocument propertiesColumn
ShardSharding (horizontal split index, for expansion)Horizontal sharding
ReplicaReplica (sharded backup, used for high availability)Backup table
Note

The Type concept is obsolete after Elasticsearch 7.x and is no need to pay attention to.


3. Index management: define data structure

The index is the container of the document. Before creating the index, you need to define Mapping (similar to the table structure of the database).

3.1 Create blog post index

We create an index for storing blog posts, containing common fields:

PUT /blog_posts
{
  "settings": {
    "number_of_shards": 1,    // 单节点环境用 1 个分片
    "number_of_replicas": 0   // 单节点副本设为 0(否则集群状态为 yellow)
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",        // 全文搜索用 text(会分词)
        "fields": {
          "keyword": { "type": "keyword" }  // 精确匹配/排序用 keyword
        }
      },
      "content": { "type": "text" },
      "author": { "type": "keyword" },
      "publish_date": { "type": "date" },
      "tags": { "type": "keyword" },
      "views": { "type": "integer" }
    }
  }
}
Common field types
  • text: Used for full-text retrieval, it will be split into terms by the word segmenter;
  • keyword: Used for exact matching, sorting, and aggregation, without word segmentation;
  • date/integer: time, numerical type, used for range query and aggregation. :::

Check whether the index is created successfully:

GET /_cat/indices?v

4. Document operation: CRUD basics

now we areblog_postsAdd, delete, modify, and query documents in the index.

4.1 New documents

You can specify the ID or let Elasticsearch automatically generate it:

# 指定 ID 新增文档
PUT /blog_posts/_doc/1
{
  "title": "Elasticsearch 入门指南",
  "content": "Elasticsearch 是一个基于 Lucene 的分布式搜索引擎,支持全文搜索、实时分析等功能。",
  "author": "张三",
  "publish_date": "2026-04-10",
  "tags": ["elasticsearch", "search", "tutorial"],
  "views": 100
}

# 自动生成 ID 新增文档
POST /blog_posts/_doc
{
  "title": "Python 与 Elasticsearch 集成",
  "content": "如何在 Python 项目中使用 Elasticsearch 客户端进行搜索开发。",
  "author": "李四",
  "publish_date": "2026-04-09",
  "tags": ["python", "elasticsearch"],
  "views": 200
}

4.2 Query documents

Get a single document based on ID:

GET /blog_posts/_doc/1

4.3 Update documentation

Partially update the document (without rewriting the entire document):

POST /blog_posts/_update/1
{
  "doc": {
    "views": 120,  // 增加浏览量
    "last_updated": "2026-04-10T12:00:00"  // 新增字段
  }
}

4.4 Delete documents

Delete documents based on ID:

DELETE /blog_posts/_doc/1

5. Search DSL: Core Features

Elasticsearch's Query DSL is its most powerful feature, using JSON to build complex search requests.

1. Search all documents

GET /blog_posts/_search
{
  "query": { "match_all": {} },
  "size": 10  // 默认返回 10 条结果
}

2. Full text search (match)

existcontentSearch for documents containing "elasticsearch" in the field:

GET /blog_posts/_search
{
  "query": {
    "match": { "content": "elasticsearch" }
  }
}

3. Multi-field search (multi_match)

existtitlecontenttagsSearch in three fields, wheretitleThe highest weight (^3):

GET /blog_posts/_search
{
  "query": {
    "multi_match": {
      "query": "elasticsearch",
      "fields": ["title^3", "content", "tags^2"]
    }
  }
}

4. Exact match (term)

Find documents whose author is "Zhang San" (must usekeywordfields):

GET /blog_posts/_search
{
  "query": {
    "term": { "author.keyword": "张三" }
  }
}

5.2 Boolean query (bool)

Combine multiple conditions usingmust(AND)、filter(Filtering, does not affect scoring),should(OR):

GET /blog_posts/_search
{
  "query": {
    "bool": {
      "must": [  // 必须满足的条件
        { "match": { "content": "elasticsearch" } }
      ],
      "filter": [  // 过滤条件(不计算相关性,更快且缓存)
        { "term": { "author.keyword": "张三" } },
        { "range": { "views": { "gte": 100 } } }  // 浏览量 >= 100
      ]
    }
  }
}

:::tip Prioritize use of filter Filter context does not calculate relevance scores, and the results are cached, and the performance is much higher thanmust

5.3 Sorting results

In descending order of publication time, then in descending order of relevance score:

GET /blog_posts/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "publish_date": { "order": "desc" } },
    { "_score": { "order": "desc" } }
  ]
}

6. Python integration: practical development

Use officialelasticsearchPython client interacts with the cluster.

6.1 Install client

pip install elasticsearch

6.2 Basic connections and operations

First connect to the local cluster:

from elasticsearch import Elasticsearch
import json

# 连接本地 Elasticsearch
es = Elasticsearch(hosts=["http://localhost:9200"])

# 检查连接
if es.ping():
    print("✅ 成功连接到 Elasticsearch")
else:
    print("❌ 连接失败")

Then write several commonly used auxiliary functions:

def index_post(post_id: str, post: dict) -> dict:
    """新增或更新博客文章"""
    return es.index(index="blog_posts", id=post_id, document=post)

def get_post(post_id: str) -> dict | None:
    """根据 ID 获取文章"""
    try:
        res = es.get(index="blog_posts", id=post_id)
        return res["_source"]
    except Exception as e:
        print(f"获取文章失败: {e}")
        return None

def search_posts(query: str, size: int = 10) -> dict:
    """搜索文章,支持高亮"""
    query_body = {
        "query": {
            "multi_match": {
                "query": query,
                "fields": ["title^3", "content", "tags^2"]
            }
        },
        "highlight": {
            "fields": {"title": {}, "content": {}}
        },
        "size": size
    }
    return es.search(index="blog_posts", body=query_body)

6.3 Test code

Add an article and perform a search:

# 新增示例文章
sample_post = {
    "title": "Elasticsearch Python 客户端实战",
    "content": "本文介绍如何使用 Python 客户端操作 Elasticsearch,包括索引、搜索、聚合等功能。",
    "author": "王五",
    "publish_date": "2026-04-11",
    "tags": ["python", "elasticsearch", "client"],
    "views": 150
}
index_post("3", sample_post)

# 执行搜索
res = search_posts("Python Elasticsearch")
print(json.dumps(res, indent=2, ensure_ascii=False))

7. Rapid optimization and best practices

Here are some optimization suggestions that novices must know:

  1. Batch Index: UsebulkAPI adds multiple documents at one time to reduce network overhead;
  2. Restrict return fields: Use_sourceGet only the required fields;
  3. Use index aliases: Access the index through aliases to avoid downtime during re-indexing;
  4. Avoid over-sharding: Use 1-3 shards for small indexes;
  5. Monitor cluster status: UseGET /_cluster/healthCheck status (green=normal, yellow=no replicas, red=abnormal).

Summarize

This article covers the core of Elasticsearch: quick installation, core concepts, indexing and document management, Query DSL search, and Python integration. After mastering this content, you can already add basic search functionality to your project.

We will continue to update advanced content such as aggregation analysis, cluster deployment, and security configuration in the future, so stay tuned!