Kubernetes Performance Testing How-to

Catching Little Issues Before They Become Big Problems when Running Kubernetes at Scale

Our customer Yahoo! Japan America (YJA) asked that we assist them in creating a procedure to measure Kubernetes performance. Having collaborated with them to create a large installation, the next logical step was to verify performance. Protecting our mutual investment is one obvious benefit. Another is having a procedure to ensure any updates will not cause performance degradation.

The intent of this document is to describe how to run an end-to-end performance test in a Kubernetes environment using the Kubemark utility.[1]

Service Level Objectives (SLOs)

The Kubernetes service level objectives (SLOs) are defined here as follows




99% of all API calls return in less than 1s

Pod startup time

99% of Pods (with pre-pulled images) start within 5s

The Tests

The tests will measure primarily API call latency and pod startup times – see more about api testing here. The API calls measured are will be permutations of the verbs executed against the resources listed in the following table. That is to say, each verb on the left is tried with each resource on the right.






















Starting The Test Environment

  1. Get a copy of the Kubernetes source

git clone ./kubernetes

  1. Run the compilation process[2]

cd ./kubernetes

unset CDPATH[3]

make quick-release

  1. (optional) Create a Kubernetes cluster

If you do not have an existing cluster, or you would rather use an alternative, you can start a 10 node

cluster on GCE with the following script[4]:

export NUM_NODES=10


  1. Start the Kubemark testing harness.


Running The Tests

Run the tests with the following commands. This will execute an end-to-end test with a focus on performance. The total testing time should be about 5 minutes. NOTE: It is assumed the cluster is in a newly-deployed functional state.

./test/kubemark/ \

“–ginkgo.focus=\[Feature:Performance\]” \

–gather-metrics-at-teardown=true \

–output-print-type=json \


Taking Down The Test Environment

After completing the tests, stop the Kubemark test environment by running the following command:


Reviewing Results

API Latency

The test will generate a json file with “load” in the name similar to this (see Appendix A for an overview of the JSON output):

For the purposes of brevity, we will only focus on the 99% quantile and its analysis. To extract the 99th quantile from the test results, run the following command (See Appendix B for a breakdown of how the JSON data was transformed for reporting):

cat MetricsForE2E_load_2017-05-05T13:52:59-05:00.json | \

jq -r -c ‘.ApiServerMetrics.apiserver_request_latencies_summary[] | \

select(.metric.quantile == “0.99”) | \

[ (.metric.verb + “_” + .metric.resource), (.value[1]| tonumber/1000) ] | \

@csv’ > 99thquantile.csv

This will yield a file called 99thquantile.csv containing summary data. Here is a sampling of these results in graphical form (below). The graph indicates, for example, that 99% of the time, POST events took about 13 milliseconds or less.[5]

Pod Startup Time

The pod startup times are recorded in a JSON file with a name similar to this.


To extract the pod startup times, run the following command:

cat PodStartupLatency_density_2017-05-14T19:51:19-05:00.json

The output will be something similar to this:

May 7 17:34:06.088: INFO: Pod startup latency: {

“latency”: {

“Perc50”: 991916170,

“Perc90”: 1996352483,

“Perc99”: 2082791907,

“Perc100”: 2082791907



The values are in nanoseconds. Thus the above “Perc99/Perc100” values equate to approximately 2 seconds. All pod startup latency was 2 seconds or less.

Further Reading

The Kubernetes blog posts here, here and here are a valuable resource. They explain in great detail how performance and capacity results are gathered.

For benchmarking purposes, you can compare your own results against the Kubernetes daily scalability regression results found here.

Appendix A – API Latency JSON Data Format

Output in the “load” JSON file generated by /test/kubemark/ is formatted like this.

JSON Data from MetricsForE2E_load_* file

“ApiServerMetrics”: {

“apiserver_request_latencies_summary”: [
“metric”: {
“__name__”: “apiserver_request_latencies_summary”,
“quantile”: “0.5”,
“resource”: “clusterrolebindings”,
“verb”: “GET”
“value”: [
“metric”: {
“__name__”: “apiserver_request_latencies_summary”,
“quantile”: “0.9”,
“resource”: “clusterrolebindings”,
“verb”: “GET”
“value”: [
“metric”: {
“__name__”: “apiserver_request_latencies_summary”,
“quantile”: “0.99”,
“resource”: “clusterrolebindings”,
“verb”: “GET”
“value”: [

For all tests, summaries are broken down into 3 types of result: 50% quantile, 90% quantile, and 99% quantile. The quantile values indicate the percentage of operations that took less than the the amount of time (measured in microseconds) indicated in the value field. For example, the data in red (above) indicates the “GET clusterrolebindings test operations took less than .852 milliseconds in 50% of tests

Appendix B – Extracting JSON Data

To extract these findings in a usable form, the ‘jq’ utility was used. Here are the steps broken down:

1. Get specifically the apiserver_request_latencies_summary array.

jq -r -c ‘.ApiServerMetrics.apiserver_request_latencies_summary[]

2. Pipe and select only the 0.99 quantile results (again, this is for brevity).

| select(.metric.quantile == “0.99”)

3. Pipe and combine the names of the verb + resource and number of milliseconds (converted from microseconds)

| [ (.metric.verb + “_” + .metric.resource), (.value[1] | tonumber/1000)]

4. Pipe and convert the output to csv and redirect it to a file.

| @csv’ > 99thquantile.csv

Data in the output file (99thquantile.csv) is now in the form:

<verb>_<noun>, <max execution time in milliseconds>



This is read as 99% of the time, the PUT endpoints action took less than 5.631 milliseconds.

Appendix C – The preparation environment

All tests were run on 2 separate environments, Mac OS X and CentOS. The following is a toolchain list for each.

Mac OS X

Google Cloud:

Google Cloud SDK 153.0.0

alpha 2017.03.24

beta 2017.03.24

bq 2.0.24

core 2017.04.24


gsutil 4.25


zsh 5.3.1 (x86_64-apple-darwin16.3.0)

git version 2.12.2

GNU Make 3.81


ccze 0.2.1

go version go1.8.1 darwin/amd64


vagrant image used:

vagrant image version: 1703.01

VirtualBox 5.1.22r115126

Vagrantfile used to create CentOS instance

# -*- mode: ruby -*-

# vi: set ft=ruby :

Vagrant.configure(“2”) do |config| = “centos/7”

config.vm.provider “virtualbox” do |vb|

# Customize the amount of memory on the VM:

vb.memory = “8192”


config.vm.provision “shell”, inline: <<-SHELL

yum install -y git docker wget

sudo systemctl start docker



CentOS Linux release 7.3.1611 (Core)

Google Cloud:

Google Cloud SDK 155.0.0

alpha 2017.05.10

beta 2017.05.10

bq 2.0.24

core 2017.05.10


gsutil 4.26


bash 4.2.46(1)-release

GNU Make 3.82


vagrant 1.9.4

[1] See Appendix C for a listing of the complete toolset used to create this document

[2] Before running this step, ensure you have docker running with a minimum of 4GB of memory. Or, if you are compiling on a CentOS VM, make sure you have at least 8GB of RAM.

[3] One of the scripts is confused by CDPATH, so unset this environment variable before running your build

[4] The number of nodes is set to 10 because Kubemark defaults to a 10-node configuration

[5] There will also be a special verb listed called “WATCH”. This is used to gather statistics for the other verbs. It can be disregarded.

Contact Solinea for More Information

Author: Tennis Smith, Senior DevOps Engineer, Solinea