Issue
I have a Kubernetes deployment that deploys a Java application based on the anapsix/alpine-java image. There is nothing else running in the container expect for the Java application and the container overhead.
I want to maximise the amount of memory the Java process can use inside the docker container and minimise the amount of ram that will be reserved but never used.
For example I have:
- Two Kubernetes nodes that have 8 gig of ram each and no swap
- A Kubernetes deployment that runs a Java process consuming a maximum of 1 gig of heap to operate optimally
How can I safely maximise the amount of pods running on the two nodes while never having Kubernetes terminate my PODs because of memory limits?
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: my-deployment
spec:
containers:
- name: my-deployment
image: myreg:5000/my-deployment:0.0.1-SNAPSHOT
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: 1024Mi
limits:
memory: 1024Mi
Java 8 update 131+ has a flag -XX:+UseCGroupMemoryLimitForHeap to use the Docker limits that come from the Kubernetes deployment.
My Docker experiments show me what is happening in Kubernetes
If I run the following in Docker:
docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -version
I get:
VM settings:
Max. Heap Size (Estimated): 228.00M
This low value is because Java sets -XX:MaxRAMFraction to 4 by default and I get about 1/4 of the ram allocated...
If I run the same command with -XX:MaxRAMFraction=2 in Docker:
docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -XX:MaxRAMFraction=2 -version
I get:
VM settings:
Max. Heap Size (Estimated): 455.50M
Finally setting MaxRAMFraction=1 quickly causes Kubernetes to Kill my container.
docker run -m 1024m anapsix/alpine-java:8_server-jre_unlimited java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XshowSettings:vm -XX:MaxRAMFraction=1 -version
I get:
VM settings:
Max. Heap Size (Estimated): 910.50M
Solution
The reason Kubernetes kills your pods is the resource limit. It is difficult to calculate because of container overhead and the usual mismatches between decimal and binary prefixes in specification of memory usage. My solution is to entirely drop the limit and only keep the requirement(which is what your pod will have available in any case if it is scheduled). Rely on the JVM to limit its heap via static specification and let Kubernetes manage how many pods are scheduled on a single node via resource requirement.
At first you will need to determine the actual memory usage of your container when running with your desired heap size. Run a pod with -Xmx1024m -Xms1024m
and connect to the hosts docker daemon it's scheduled on. Run docker ps
to find your pod and docker stats <container>
to see its current memory usage wich is the sum of JVM heap, other static JVM usage like direct memory and your containers overhead(alpine with glibc). This value should only fluctuate within kibibytes because of some network usage that is handled outside the JVM. Add this value as memory requirement to your pod template.
Calculate or estimate how much memory other components on your nodes need to function properly. There will at least be the Kubernetes kubelet, the Linux kernel, its userland, probably an SSH daemon and in your case a docker daemon running on them. You can choose a generous default like 1 Gibibyte excluding the kubelet if you can spare the extra few bytes. Specify --system-reserved=1Gi
and --kube-reserved=100Mi
in your kubelets flags and restart it. This will add those reserved resources to the Kubernetes schedulers calculations when determining how many pods can run on a node. See the official Kubernetes documentation for more information.
This way there will probably be five to seven pods scheduled on a node with eight Gigabytes of RAM, depending on the above chosen and measured values. They will be guaranteed the RAM specified in the memory requirement and will not be terminated. Verify the memory usage via kubectl describe node
under Allocated resources
. As for elegancy/flexibility, you just need to adjust the memory requirement and JVM heap size if you want to increase RAM available to your application.
This approach only works assuming that the pods memory usage will not explode, if it would not be limited by the JVM a rouge pod might cause eviction, see out of resource handling.
Answered By - Simon Tesar
Answer Checked By - Timothy Miller (JavaFixing Admin)