Issue
I created a Target Group with two AWS Jenkins instances following the below documentation.
https://www.jenkins.io/doc/tutorials/tutorial-for-installing-jenkins-on-AWS/
Then I created an ALB and use the Target Group as the listener.
I used Amazon ACM and enable HTTPS on my ALB. I added a CNAME Record in the Route 53 for ALB DNS Name.
Now when I am trying to login using the CNAME I am observing the following scenario
If I have multiple EC2 instances in my TG and I keep on trying to login but it only succeeds after 3rd or 4th attempt. What is the reason for this ? How to debug this? Can I setup Cloudwatch logs at the ELB level to check this ?
If I have only one EC2 instance in my TG then the login always succeeds in first attempt.
If I login directly to each of the instances I can login to them always in first attempt.
Also if I enable stickiness at the TG level then even if I have both the EC2 instances in my TG then I can login with my first attempt using the CNAME ? Why do I have to enable stickiness and the impact of this ?
Is there way for me to know if I am deploying a web application(3rd party like Jenkins) if and when I need to enable stickiness and the side effect of doing this action ?
Thanks in advance.
Solution
Load balancers send traffic to one of the EC2 instances in the TG. The Jenkins responds with session tokens and cookies so your browser and the server are in sync.
When there is only 1 instance all the traffic is sent to it. When there are two or more instances then the traffic is sent to each of them in turn, typical round-robin behavior.
The problem is that the Jenkins Controller is not a clusterable resource. Basically Jenkins A has no idea that the other Jenkins exists.
So, what is happening is that the login request goes to Jenkins A, that responds with a session token, then the login redirect happens and your browser makes the request for the dashboard page and sends in the session token, this request gets sent to Jenkins B which promptly denies all knowledge of the session token and bounces you back to the login page.
The Advice from Jenkins is to have a main instance and a "warm" standby that is brought online when the main goes down.
If you are running a cluster in order to build more things then you probably need to run more Agents and connect them to the Controller so that they can be provisioned by an AutoScaling group and scaled up when needed and down when it is all quiet.
Answered By - edwardTew
Answer Checked By - Terry (JavaFixing Volunteer)