All Documents
Current Document

Content is empty

If you don't find the content you expect, please try another search term

Documentation

Step 1: Estimate the specifications

Last updated:2023-07-11 14:20:02

Estimate the storage capacity

The following factors determine the disk storage capacity of Kingsoft Cloud Elasticsearch Service (KES):

  • Number of replicas: The default and recommended number of replicas is 1. In scenarios that can withstand unexpected data loss, you can configure 0 replicas.
  • Index overheads: In addition to the raw data, KES needs to store indexes and column store data. The required storage space in KES is usually 10% larger than the source data in size. This estimation does not consider fields such as _all.
  • Internal task overheads: Segment merging, KES translogs, and other logs occupy about 20% of the disk space.
  • Resources reserved by the operating system: The Linux operating system reserves 5% of the disk space for the root user by default. The reserved disk space is used to handle key processes, recover the system, and prevent disk fragmentation.
  • Safety threshold: 20% of the disk space must be reserved.

Based on the preceding factors, the minimum disk space is equal to the source data size multiplied by 3.6. The minimum disk space is calculated based on the following formula:

Disk space = Source data size × (1 + Number of replicas) × (1 + Index overheads)/(1 - Linux reserved space)/(1 - Internal task overheads)/(1 - Safety threshold) = Source data size × (1 + Number of replicas) × 1.8 = Source data size × 3.6

Estimate the number of shards

The size and number of shards greatly affect the stability and performance of a KES cluster. Each index in the KES cluster requires a proper shard plan. By default, five shards are planned for each index.

  • Each shard is up to 50 GB in size. We recommend that you use a shard size within 10 to 50 GB.
  • The number of shards and replicas is preferably equal to the number of nodes, or is an integer multiple of the number of nodes.
  • Excessive shards make it difficult to manage the cluster status. We recommend that you plan up to 20 shards per 1 GB of memory on each instance in a KES cluster. For example, if an instance has 10 GB memory, the number of shards on this instance cannot exceed 200.

Estimate the cluster specifications

The computing resources of KES are mainly consumed by writes and queries. The complexity and proportions of writes and queries vary in different business scenarios. Therefore, it is more difficult to estimate computing resources than to estimate storage resources. We recommend that you first estimate the amount of storage resources and then preliminarily select computing resources. You can determine whether the computing resources are sufficient during testing.

We recommend that you first select at least three nodes to avoid split-brain and ensure high fault tolerance of the KES nodes.

After you preliminarily select the instance type, you can use practical data for testing. You can determine whether the instance type is appropriate by observing monitoring information such as the CPU usage, write performance, QPS, and rejected writes or queries. In addition, we recommend that you configure alarms for the monitoring information to identify resource shortage as soon as possible during online use.

Suggestions on model selection

  • The Local SSD model is cost-effective and suitable for customers that require a low storage capacity and superb performance.
  • The Elastic Block Storage (EBS) model is suitable for customers that require a high storage capacity, stable performance, and high data availability. For example, this model is ideal for processing audit logs and financial data. The cost of the EBS model is higher that of the Local SSD model.
  • The Elastic Physical Compute (EPC) model is suitable for customers with large-scale businesses that require ultimate performance.
On this page
Pure ModeNormal Mode

Pure Mode

Click to preview the document content in full screen
Feedback