Content is empty
If you don't find the content you expect, please try another search term
Last updated:2022-01-11 13:43:09
The Kingsoft Cloud Elasticsearch Service (KES) console provides real-time and historical monitoring data for you to monitor cluster and node resources, such as storage, CPU, and memory. You can obtain the real-time running status of KES clusters based on these metrics and remove risks in real time to ensure the stable running of the clusters.
Log in to the KES console.
Cluster status
Metric description
Monitoring metric | Description | Details |
---|---|---|
Service status | The status of the KES cluster, which can be Green, Yellow, or Red. Green: The cluster is normal. Yellow: Alarms of the cluster are reported, and specific replica shards are unavailable. Red: Exceptions occur on the cluster, and specific primary shards are unavailable. |
If the cluster status is Yellow, the search result is complete. However, the high availability of the cluster is affected, resulting in high risks of data loss. You must investigate, locate, and fix issues in a timely manner to prevent data loss. If the cluster status is Red, some data has been lost, and the search operation returns only a part of data. If a write request is allocated to a lost shard, an exception is returned. You must locate and fix abnormal shards in a timely manner. |
Cluster query QPS | Total number of queries sent per second by the cluster. | QPS is determined by the number of primary shards for a query index. If a query index has five primary shards, one query request equals 5 QPS. If the QPS sharply increases, the CPU or heap memory usage or load_1m may be high, leading to downgraded processing capabilities of cluster nodes. |
Document write QPS | Total number of written documents per second. | If the QPS increases sharply, the CPU or heap memory usage or load_1m may be high, leading to downgraded processing capabilities of cluster nodes. |
Node status
Metric description
Monitoring metric | Description | Details |
---|---|---|
CPU usage (%) | The percentage of CPU usage of each node. The statistic data of this metric is collected every 60s. | High CPU usage may lead to downgraded processing capabilities of cluster nodes. If this metric remains high, you can scale out the cluster node to improve the load capacity of the node. |
Disk usage (%) | The percentage of disk usage of each node. The statistic data of this metric is collected every 60s. | The disk usage of a node must be less than 85% to prevent impact on services. Clear useless indexes in a timely manner. To scale out the cluster, you can increase the disk capacity of a single node or increase the number of nodes. |
Heap memory usage (%) | The percentage of heap memory usage of each node. The statistic data of this metric is collected every 60s. | High heap memory usage will affect the ES cluster service and automatically trigger garbage collection (GC). If the heap memory usage is too high, out of memory (OOM) will occur. |
load_1m | The load of the cluster nodes within 60s. | The value of this metric must be smaller than the number of CPU cores of the KES cluster nodes. Taking a single-core KES cluster node as an example. The value of this metric is as follows: < 1: No process is waiting. = 1: The system cannot provide extra resources for running more processes. > 1: Processes are congested and waiting for resources. If the value is too high, we recommend that you reduce the cluster load or increase the cluster node specifications. |
Total GC running duration | Accumulated GC duration within 60s. | If the GC duration is excessively long, the node is short of memory. We recommend that you increase the node memory to balance the load in the current node, or increase the number of nodes to balance the load in the cluster. |
Rejected requests | The number of write rejects and query rejects within 60s. | If the CPU, memory, and disk usage is too high, the cluster write and query rejects may increase. Typically, this occurs when the current cluster configuration cannot meet the read and write operation requirements. If the value is too high, we recommend that you improve the cluster node configuration or the processing capability of cluster nodes. |
Pure Mode