Load testing Shared Elasticsearch¶
One of the prerequisites when implementing Shared ElasticSearch (ES) in Grove was to confirm if it worked for OpenCraft's use-case. To wit, "Can we run a large amount of instances, while limiting resource usage?".
Prepping the test¶
To carry out the load tests, we made use of both Locust and K6. Initially, we relied on Locust for our load testing needs, but we encountered challenges when attempting to generate a significant load on ElasticSearch. Locust required multiple workers, and despite our efforts, the results were unreliable. Consequently, we opted to switch to K6 for improved performance and consistency.
The locust results are not included here for this reason.
Secondly, we needed some data to query which was provided by Shakespeare dataset. Importing the data is fairly simple, using the bulk import endpoint on any of the ES nodes.
curl -XPOST --insecure \
-u elastic:$ELASTIC_PASSWORD \
-H "Content-type: application/json" \
--data-binary @shakespeare.json \
"https://localhost:9200/shakespeare/_bulk?pretty"
Running the load test¶
The test consisted of 2000 users running a search
query on the cluster for our imported shakespeare
index. Each test
will run for 1 minute in total and we'll compare the number of successful requests handled by our ES cluster vs the built-in
ES deployment provided by tutor
.
Our Digital Kubernetes cluster consisted of three nodes, each with 4 CPU Cores and 8GB of memory.
$ kubectl exec -it -nelasticsearch ubuntu-shell -- \
k6 run /cluster.js --vus 2000 --duration 1m
import http from "k6/http";
import encoding from "k6/encoding";
const username = "elastic";
const password = "";
const host = "https://elasticsearch-master.elasticsearch.svc.cluster.local:9200";
export const options = {
insecureSkipTLSVerify: true,
};
const encodedCredentials = encoding.b64encode(`${username}:${password}`);
options.headers = {
Authorization: `Basic ${encodedCredentials}`,
};
export default function () {
const maxRecords = 10000;
const start = Math.floor(Math.random() * maxRecords);
let size = Math.floor(Math.random() * 500);
if (start + size > maxRecords) {
size = maxRecords - start;
}
const url = `${host}/shakespeare/_search?from=${start}&size=${size}`;
return http.get(url, options);
}
import http from "k6/http";
import encoding from "k6/encoding";
const host = "http://elasticsearch:9200";
export default function () {
const maxRecords = 10000;
const start = Math.floor(Math.random() * maxRecords);
let size = Math.floor(Math.random() * 500);
if (start + size > maxRecords) {
size = maxRecords - start;
}
const url = `${host}/shakespeare/_search?from=${start}&size=${size}`;
return http.get(url);
}
$ kubectl run ubuntu-shell -nelasticsearch --image ubuntu -- sleep 365d
$ kubectl exec -it -nelasticsearch ubuntu-shell -- bash
# apt update && \
apt install ca-certificates gpg -y && \
mkdir -p ~/.gnupg && \
gpg --no-default-keyring \
--keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 \
--recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69 && \
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] \
https://dl.k6.io/deb stable main" | \
tee /etc/apt/sources.list.d/k6.list &&
apt-get update && \
apt-get install -y k6
Running the test¶
Let's start with the result.
data_received..................: 657 MB 10 MB/s
data_sent......................: 3.5 MB 55 kB/s
http_req_blocked...............: avg=745.88ms min=0s med=3.54µs max=57.67s p(90)=16.43µs p(95)=3.13s
http_req_connecting............: avg=39.85ms min=0s med=0s max=1.17s p(90)=80.71ms p(95)=107.98ms
http_req_duration..............: avg=4.13s min=0s med=3.18s max=58.5s p(90)=7.06s p(95)=8.86s
{ expected_response:true }...: avg=4.45s min=190.83ms med=3.38s max=58.5s p(90)=7.31s p(95)=9.1s
http_req_failed................: 9.37% ✓ 1095 ✗ 10584
http_req_receiving.............: avg=10.05ms min=0s med=673.28µs max=4.59s p(90)=3.4ms p(95)=6.49ms
http_req_sending...............: avg=34.87µs min=0s med=18.64µs max=31.36ms p(90)=52.34µs p(95)=77.01µs
http_req_tls_handshaking.......: avg=709.98ms min=0s med=0s max=57.29s p(90)=0s p(95)=2.72s
http_req_waiting...............: avg=4.12s min=0s med=3.18s max=58.5s p(90)=7.04s p(95)=8.86s
http_reqs......................: 11679 184.290917/s
iteration_duration.............: avg=10.37s min=203.28ms med=3.77s max=1m0s p(90)=45.75s p(95)=1m0s
iterations.....................: 11679 184.290917/s
vus............................: 22 min=22 max=2000
vus_max........................: 2000 min=2000 max=2000
data_received..................: 2.4 GB 40 MB/s
data_sent......................: 4.9 MB 81 kB/s
http_req_blocked...............: avg=18.31ms min=841ns med=1.84µs max=585.46ms p(90)=4.23µs p(95)=90.09µs
http_req_connecting............: avg=1.06ms min=0s med=0s max=166.73ms p(90)=0s p(95)=0s
http_req_duration..............: avg=2.94s min=197.98ms med=530.51ms max=59.99s p(90)=966.3ms p(95)=1.27s
{ expected_response:true }...: avg=691.35ms min=197.98ms med=525.98ms max=59.23s p(90)=909.15ms p(95)=991.66ms
http_req_failed................: 3.83% ✓ 1551 ✗ 38909
http_req_receiving.............: avg=1.15ms min=0s med=74.72µs max=1.5s p(90)=166.92µs p(95)=622.84µs
http_req_sending...............: avg=172.06µs min=4.56µs med=8.73µs max=64.83ms p(90)=22.6µs p(95)=157.4µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=2.94s min=197.92ms med=529.85ms max=59.99s p(90)=964.77ms p(95)=1.26s
http_reqs......................: 40460 668.607358/s
iteration_duration.............: avg=2.96s min=198.08ms med=531.05ms max=1m0s p(90)=966.76ms p(95)=1.39s
iterations.....................: 40460 668.607358/s
vus............................: 1000 min=1000 max=2000
vus_max........................: 2000 min=2000 max=2000
Of interest is the total number of requests: 38k in total for the default deployment vs 10k for the cluster.
It is important to note that the default deployment offers significantly greater throughput in comparison to the clustered version. The primary reason for this discrepancy can be attributed to the restrictive limits that are imposed by the helm chart. Moreover, factors such as SSL overhead and internal cluster communication contribute to further reductions in throughput for the clustered version.
Improving the cluster's performance¶
Since the Shared ElasticSearch cluster needs to handle multiple instance, it needs to have performance at least on par with a single node.
In short, we changed the following settings from their defaults.
resources:
limits:
cpu: "2000m"
memory: "4Gi"
threadpool:
search:
size: 5000
CPU limits as well as memory limits are doubled. We recommend doubling them again if you run a larger instance. The thread pool for search is also increased to 5000 from the default of 1000. This will allow the cluster to handle more users at once (at the expense of memory).
The load test for the cluster now looks like this:
data_received..................: 1.4 GB 22 MB/s
data_sent......................: 6.4 MB 102 kB/s
http_req_blocked...............: avg=233.87ms min=0s med=3.7µs max=59.38s p(90)=7.09µs p(95)=153.51µs
http_req_connecting............: avg=26.13ms min=0s med=0s max=1.22s p(90)=0s p(95)=92.61ms
http_req_duration..............: avg=2.73s min=0s med=2.03s max=58.63s p(90)=4.37s p(95)=5.59s
{ expected_response:true }...: avg=2.81s min=14.13ms med=2.09s max=58.4s p(90)=4.41s p(95)=5.67s
http_req_failed................: 4.09% ✓ 923 ✗ 21618
http_req_receiving.............: avg=7.56ms min=0s med=860.68µs max=3.31s p(90)=3.94ms p(95)=6.32ms
http_req_sending...............: avg=31.44µs min=0s med=19.12µs max=4.05ms p(90)=47.34µs p(95)=80.27µs
http_req_tls_handshaking.......: avg=207.73ms min=0s med=0s max=58.89s p(90)=0s p(95)=0s
http_req_waiting...............: avg=2.73s min=0s med=2.02s max=58.63s p(90)=4.36s p(95)=5.59s
http_reqs......................: 22541 361.528155/s
iteration_duration.............: avg=5.38s min=14.45ms med=2.17s max=1m0s p(90)=6.61s p(95)=33.83s
iterations.....................: 22541 361.528155/s
vus............................: 28 min=28 max=2000
vus_max........................: 2000 min=2000 max=2000
Conclusion¶
The most surprising result of these tests is that just adding more nodes to an ES cluster doesn't result in any increase in throughput.
- Allowing the cluster to use more CPU had the greatest effect.
- Running with only a single replica is risky, but not out of the question.
- A cluster with more than 3 nodes will likely be overkill.
References and extra reading¶
- https://logz.io/blog/elasticsearch-performance-tuning/
- https://www.instaclustr.com/blog/understanding-and-configuring-elasticsearch-node-types/
- https://blog.opstree.com/2019/10/01/tuning-of-elasticsearch-cluster/
- https://luis-sena.medium.com/the-complete-guide-to-increase-your-elasticsearch-write-throughput-e3da4c1f9e92