Skip to content

ES分片不足

故障现象

同事反馈,企微聊天无法查询,查看ES提示如下异常。排查发现是ES分片不足,分片数量为2000,但是有1999个分片。

image-20250228151329838

查询DeepSeek回复要增加参数,指定分片数量

image-20250228151423004

结果修改配置文件后,docker容器都没法正常启动了。

问题解决

恢复配置文件

从容器中拷贝配置文件到宿主机,

并恢复配置 elasticsearch.yml 文件,由于容器无法正常启动,所以该命令需要多试几次,执行成功就行。

bash
docker cp chatsync-elasticsearch:/usr/share/elasticsearch/config/elasticsearch.yml .

将配置文件设置成

yaml
## 设置,不需要persistent:
cluster.max_shards_per_node: 3000

复制文件到容器中,等待容器重启。

bash
docker cp elasticsearch.yml chatsync-elasticsearch:/usr/share/elasticsearch/config/elasticsearch.yml

查看启动日志

查看启动日志,发下还存在如下问题

bash
{
    "type": "server",
    "timestamp": "2025-02-28T13:39:51,387+08:00",
    "level": "INFO",
    "component": "o.e.c.r.a.AllocationService",
    "cluster.name": "docker-cluster",
    "node.name": "elasticsearch",
    "message": "Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[gupaoedu-wxcp-msg-2023-08-21][0]]]).",
    "cluster.uuid": "9dflySuCQOCWvfZqolNj3Q",
    "node.id": "FNI3xNTLQj6yYGaq8LI9Sg"
}

查看集群健康状态

登录 kibana https://ke.gupaoedu.cn/csk/

bash
GET /_cluster/health?pretty
json
{
  "cluster_name": "docker-cluster",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 1012,
  "active_shards": 1012,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 991,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 50.52421367948078
}
bash
GET _cat/allocation?v
bash
shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node
  1012       83.1gb   558.3gb      229gb    787.3gb           70 10.0.1.60 10.0.1.60 elasticsearch
   991                                                                               UNASSIGNED

设置分片大小

查看集群配置,发现之前的配置还是2000

bash
GET /_cluster/settings
json
{
  "persistent": {
    "cluster": {
      "max_shards_per_node": "2000"
    }
  },
  "transient": {}
}

我们修改为3000,问题解决

bash
PUT /_cluster/settings
{
  "persistent" : {
    "cluster" : {
      "max_shards_per_node" : "3000"
    }
  },
  "transient" : { }
}

人生感悟