Issue
We have several lambda functions, and I've automated code deployment using the gradle-aws-plugin-reboot plugin.
It works great on all but one lambda functions. On that particular one, I'm getting this error:
com.amazonaws.services.lambda.model.ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:*redacted*:the-lambda-that-fails (Service: AWSLambda; Status Code: 409; Error Code: ResourceConflictException; Request ID: 8fef505a-587c-4e77-a257-182d6eecadd0; Proxy: null)
There's an additional caveat to that error, though: It only happens on Jenkins. Running the deployment task from my local machine works. I can kind of reproduce the issue locally by spamming deployments in rapid succession, in which case every second one fails. Which is understandable, considering the error message.
An interesting thing about this is, though, that while it fails with the same error, it does not fail at the same point as jenkins does. When I do that locally, it fails when deploying the environment, on jenkins it always fails when deploying the code. I'm not sure which one the plugin does first, though. Also, it doesn't quite always fail on jenkins. There are rare instances when even the deployment of this lambda succeeds. There are no instances of any of the other ones failing, though.
I am aware of the new lambda states feature, and that it can potentially produce this error. However, since all the other lambdas work, which are using the same code in both build.gradle as well as the jenkinsfile, it seems rather unlikely that this would be my issue.
Here's what the deployment task in gradle looks like:
register<jp.classmethod.aws.reboot.gradle.lambda.AWSLambdaMigrateFunctionTask>("deploy") {
// Create the environment variables from the gradle property configuration.
// users and passwords should be stored in the system properties file, not the projects!
val (environmentProperties, function) = if (branch == "master") {
val webcamServicePutterProd: String by project
val webcamServicePutterProdPwd: String by project
mapLambdaProperties("deployProd_", webcamServicePutterProd, webcamServicePutterProdPwd) to
"lambda-function-name-prod"
} else {
val webcamServicePutterDev: String by project
val webcamServicePutterDevPwd: String by project
mapLambdaProperties("deployDev_", webcamServicePutterDev, webcamServicePutterDevPwd) to
"lambda-function-name-dev"
}
val jarFile = File("build/libs").walk().first { it.name.endsWith("-all.jar") }
functionName = function
zipFile = jarFile
handler = "webcam.yellow.sqs.lambda.WebcamWorker::handleRequest"
publish = true
environment = environmentProperties
}
As mentioned, this is pretty much identical in all the lambdas, apart from properties, obviously. Properties can't really be the issue either though, since they're the same in my local environment and on jenkins.
The deployment execution in the jenkinsfile is pretty unspectacular. It first uploads the jar to S3 for archival, then executes the gradle task for deploying the lambda. Just to be sure, I tried without the S3 upload just in case that might have some obscure connection, but that didn't help either.
stage('Deploy artifact') {
when {
equals expected: 'true', actual: deployArtifact
}
steps {
// archive build on S3
withAWS() {
s3Upload(
workingDir: 'build/libs/',
includePathPattern: '*-all.jar',
bucket: 'yellow-artifacts',
path: "webcam-worker-lambda/${artifactFolder}/"
)
}
// deploy build to lambda
sh './gradlew deploy'
}
}
I've spent hours going over all the configurations of the different lambdas, comparing them, looking for differences that might be a source of the issue, but I'm pretty much out of ideas where the problem might be located by now. Anybody got any hunches?
Solution
I figured it out. You better not hold anything in your mouth, because this is hilarious!
Basically being all out of options, I locked on to the last discernible difference between this deployment and the ones that worked: The filesize of the jar being deployed. The one that failed was by far the smallest. So I bloated it up by some 60% to make it comparable to everything else... and that fixed it!
This sounds preposterous. Here's my hypothesis on what's going on: If the upload takes too little time, the lambda somehow needs longer to change its state. I'm not sure why that would be, you'd expect the state to change when things are done, not to take longer if things are done faster, right? Maybe there's a minimum time for the state to remain? I wouldn't know. There's one thing to support this hypothesis, though: The deployment from my local computer always worked. That upload would naturally take longer than jenkins needs from inside the aws vpc. So this hypothesis, as ludicrous as it sounds, fits all the facts that I have on hand.
Maybe somebody with a better understanding of the lambda-internal mechanisms can add a comment to this explaining how this can happen...
Answered By - UncleBob
Answer Checked By - Robin (JavaFixing Admin)