Last updated：2022-01-11 13:43:17
The topic describes the preparations required before you purchase and use Kingsoft Cloud Elasticsearch Service (KES). For example, you must estimate the storage capacity, shard size and quantity, and cluster specifications, and understand the suggestions on model selection.
The following table describes the major factors that affect the disk storage of KES.
|Number of replicas||The default and recommended number of replicas is 1. In scenarios that can withstand unexpected data loss, you can configure 0 replicas.|
|Indexes||In addition to the raw data, KES needs to store indexes and column store data. The required storage space in KES is usually 10% larger than the source data in size. This estimation does not consider fields such as _all.|
|Internal tasks||Segment merging, KES translogs, and other logs occupy about 20% of the disk space.|
|Resources reserved by the operating system||The Linux operating system reserves 5% of the disk space for the root user by default. The reserved disk space is used to handle key processes, recover the system, and prevent disk fragmentation.|
|Safety threshold||20% of the disk space must be reserved for security reasons.|
Based on the preceding factors, the minimum disk space is 3.6 times the source data size. The minimum disk space is calculated based on the following formula:
Disk space = Source data size × (1 + Number of replicas) × (1 + Index space)/(1 - Linux reserved space)/(1 - Internal task space)/(1 - Safety threshold) = Source data size × (1 + Number of replicas) × 1.8 = Source data size × 3.6
The size and number of shards greatly affect the stability and performance of a KES cluster. Each index in the KES cluster requires a proper shard plan. By default, five shards are planned for each index.
The computing resources of KES are mainly consumed by writes and queries. The complexity and proportions of writes and queries vary in different scenarios. Therefore, it is more difficult to estimate computing resources than to estimate storage resources. We recommend that you first estimate the amount of storage resources and then preliminarily select computing resources. You can determine whether the computing resources are sufficient during testing.
We recommend that you first select at least three nodes to avoid split-brain and ensure high fault tolerance to nodes.
After you preliminarily select the instance type, you can use practical data for testing. You can determine whether the instance type is appropriate by observing monitoring metrics such as the CPU usage, write performance, QPS, and rejected writes or queries. In addition, we recommend that you configure alarms for the monitoring metrics to identify resource shortages in a timely manner during online use.
Did you find the above information helpful?
Please give us your feedback.
Thank you for your feedback.