Issue
I am having a hard time to identify the underlying issue for the following latency pattern for the max percentile of my application: src="https://i.stack.imgur.com/ooxwN.png" alt="enter image description here" />
This is a gatling chart that shows 4 minutes of load testing. The first two minutes are warmup of the same scenario (thats why it has no latency graph).
Two triangles (sometimes more) with a nearly identical slope are clearly visible and reproducible across multiple test runs, no matter how many application instances we deploy behind our load balancer:
I am looking for more paths to investigate as I have a hard time googling for this pattern - it strikes me as particularly odd that this triangle is not "filled" but just consists of spikes. Furthermore the triangle feels "inverted": if this would be a scenario with ever-increasing load (which it isn't) I would expect to see this kind of triangle manifest with an inverted slope - this slope just doesn't make any sense to me.
Technical context:
- This is for a Spring Boot application with a PostgreSQL database in AWS
- There are 6 pods deployed in our Kubernetes cluster, auto-scaling was disabled for this test
- Keep-alive is used by our Gatling test (see answer below, turns out this was a lie)
- Kubernetes ingress configuration is left as-is which implicates keep-alive to each upstream if I read the defaults correctly
- Both the database and CPU per pod are not maxed out
- The network uplink of our load testing machine is not maxed out and the machine does nothing else besides running the load test
- The load (requests / sec) on the application is nearly constant and not changing after the warmup / during the measurement
- Garbage collection activity is low
Here is another image to demonstrate the "triangle" before we made some application-side optimizations to request latency:
Solution
This turned out to be a two-part issue:
- we thought our load test was using keep-alive connection, which it didn't (ssl handshakes are pricey, ephemeral ports run out after some time)
- a custom priority based task scheduling system (an earlier request and its subtasks have higher priority than later requests) "lost" it's task priority because of how Kotlin coroutines work (thread A gets suspended during a coroutine and another picks up the remaining work later, losing any threadlocal priority - this can be fixed via
asContextElement()
)
While this does not explain the more than peculiar shape of the latency pattern it did resolve the main issues we had and the pattern is gone.
Answered By - roookeee
Answer Checked By - Dawn Plyler (JavaFixing Volunteer)