Fast cold starts for Clojure in AWS Lambda using GraalVM Native Image

Published in Technology

Esko Luontola

Senior Software Architect

Esko Luontola is an avid programmer and usability enthusiast. Esko has been using TDD in every project since 2007 and is also the author of the University of Helsinki’s TDD MOOC (tdd.mooc.fi). He is continually pushing for technical excellence and better software development practices.

December 30, 2020 · 8 min read time

How to AOT compile a Clojure application to a native binary and run it in AWS Lambda as a Docker image.

Recently Amazon released support for container images in AWS Lambda. So instead of creating custom lambda runtimes ourselves, we can use one of the provided runtime libraries or base images to build a container image with Docker and deploy it to AWS Lambda. At the same time Amazon changed AWS Lambda’s billing granularity from 100 ms to 1 ms, making optimizations more worthwhile than before.

This seemed like a good time to try out how AOT compilation with GraalVM Native Image works. The last time I tried it was in 2018 when GraalVM had just been released, and back then it had quite many rough edges. Since then GraalVM has introduced a Java agent which logs reflection usages and generates configuration files for native compilation, which should solve most of the issues I had with Native Image.

My language of choice is Clojure and it’s famous for slow startups, which would benefit greatly from AOT compilation. That would make Clojure a feasible language for use in AWS Lambda, where the cold start time is important. Clojure code uses barely any reflection, unlike your average Java framework, so it should be quite easy to AOT compile. Just turn on :global-vars {*warn-on-reflection* true} in your project.clj file and add type hints when the Clojure compiler can’t infer the Java types automatically (please submit PRs for any Clojure libraries which are missing type hints).

In this article I will go through the main points of how to AOT compile a Clojure app with GraalVM Native Image and package it into a container image for AWS Lambda deployment. All code in this article is available at https://github.com/luontola/native-clojure-lambda as a fully configured project. This article shows only selected snippets from there, so see that project for the full context.

Handler function

As is normal with Lambda, you will need a handler function as an entrypoint to your application. For Java this means a POJO class or a class which implements com.amazonaws.services.lambda.runtime.RequestHandler or RequestStreamHandler. To avoid some Java interop boilerplate, I’m using lambada to generate a class hello_world.Handler which implements RequestStreamHandler and calls my code:

;; src/hello_world/main.clj
(lambada/deflambdafn hello_world.Handler [^InputStream in ^OutputStream out ^Context ctx]
  (println "Hello world")
  (println (slurp in)))

Lambda runtime

The container image support for Lambda includes runtime interface clients which will interface between your handler function and the AWS Lambda runtime. For Java this is the aws-lambda-java-runtime-interface-client library.

To start your lambda handler in a container, you would call the main method in com.amazonaws.services.lambda.runtime.api.client.AWSLambda and pass it “handlerClass” or “handlerClass::handlerMethod” as a parameter. To avoid the need for command line parameters, I wrap it in my own main method so that my uberjar can be called without parameters:

;; src/hello_world/main.clj
(defn -main [& _args]
  (AWSLambda/main (into-array String ["hello_world.Handler"])))

The lambda runtime interface client works so that it looks for an environment variable AWS_LAMBDA_RUNTIME_API which contains the host and port of the lambda runtime API. It will then query http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next for pending lambda invocations and forward them to your handler. But when testing locally, there is no AWS_LAMBDA_RUNTIME_API, so you will need the help of the lambda runtime interface emulator:

Lambda runtime emulator

The Lambda Runtime Interface Emulator (RIE) comes as an aws-lambda-rie binary which you can bundle inside your container image. It takes as a parameter the command for starting your application. From the lack of an AWS_LAMBDA_RUNTIME_API environment variable you can detect that your application is running outside AWS Lambda, in which case the runtime emulator is needed.

For that purpose I’m using a lambda-bootstrap.sh script:

# lambda-bootstrap.sh 
#!/bin/sh
set -e
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
  exec /usr/local/bin/aws-lambda-rie "$@"
else
  exec "$@"
fi

...which is called from the Dockerfile:

# Dockerfile-jvm
CMD ["/lambda-bootstrap.sh", "/usr/bin/java", "-jar", "hello-world.jar"]

When running the container locally, I can expose the port 127.0.0.1:9000:8080 and call my lambda:

# smoke-test.sh
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

GraalVM Native Image

With the information covered this far, you can run a Java/Clojure application in a container image inside AWS Lambda. The next step is AOT compiling it into a native binary. You will need to install Native Image for GraalVM and take into consideration its limitations. The most significant limitations are the lack of runtime code generation and the need of predefining all reflection usages. The first is no problem for most Clojure applications (unless you’re a REPL) and there are helpers for the second one.

When you run your application with GraalVM (the OpenJDK compatible JVM, not Native Image/Substrate VM), you can use the native-image-agent Java agent. It will monitor all reflection, resource and JNI usages, and generate native-image configuration files. By default it will write the files when the process exists, but the aws-lambda-rie doesn’t seem to pass the shutdown signal to the application, so you will need to use the config-write-period-secs parameter to write the configuration files while the process is running.

# Dockerfile-graalvm
CMD ["/lambda-bootstrap.sh", "/usr/bin/java", \
        "-agentlib:native-image-agent=config-merge-dir=/tmp/native-image,config-write-period-secs=5", \
        "-jar", "hello-world.jar"]

With your application running inside GraalVM with the native-image-agent, you can run your test suite against the application to execute all code paths, and at the end you will have the configuration files for native-image. The generated configuration files may be larger than necessary and they might be missing some things, so it’s good inspect and tweak them as necessary.

The configuration files should be packaged inside your JAR file under /META-INF/native-image or its subdirectories. You can also create native-image.properties files to specify the command line arguments to the native-image command.

In my case, there was need for a bunch of configuration specific to the aws-lambda-java-runtime-interface-client library, but I’m working on a PR so that in the future the configuration will be embedded inside the library and you won’t need those any more. Next there were hundreds of reflection usages and resources accessed because of loading Clojure namespaces, but they won’t be needed after AOT compilation, so they can be removed from the configuration.

The only piece of configuration specific to my application is the reflective call of the lambda runtime interface client instantiating my handler class:

# resources/META-INF/native-image/reflect-config.json
[
{
  "name":"hello_world.Handler",
  "methods":[{"name":"<init>","parameterTypes":[] }]
}
]

After the configuration is bundled inside the uberjar, you can AOT compile it with the native-image command. Even for a hello world application this takes 80 seconds while fully utilizing 4 CPU cores and consuming 4 GB memory, so it’s best executed on a heavy CI server. The resulting hello-world binary's size was 24 MB.

# Dockerfile-native
RUN native-image \
        --no-fallback \
        --report-unsupported-elements-at-runtime \
        --initialize-at-build-time \
        -H:+PrintAnalysisCallTree \
        -jar hello-world.jar hello-world && \
    chmod a+x hello-world

The --initialize-at-build-time parameter executes all static initializer blocks at build time. If there are classes with static initializer blocks which must be executed at run time, you can specify them using the --initialize-at-run-time parameter.

Using --initialize-at-build-time is especially important for Clojure, because it avoids all the work that Clojure does when a namespace is loaded (or more precisely, that work is done at build time). In fact, I doubt that it’s even possible to AOT compile Clojure code without this parameter. Without it native-image takes ten times longer than with it, and still hello_world.main.<clinit> calls clojure.lang.Compiler.eval which fails at run time.

Performance

Here is how long a typical cold start takes when running a Clojure hello world lambda locally on OpenJDK 11 (It also prints some system properties and JMX statistics for your curiosity):

START RequestId: ed22ee9a-f21f-4fa1-aa9f-522994605877 Version: $LATEST
Hello world
{}
JVM:
uptime 1450 ms
java.specification.version = 11
java.version = 11.0.9.1
java.vm.name = OpenJDK 64-Bit Server VM
java.vm.version = 11.0.9.1+12-LTS
java.vendor = Amazon.com Inc.
java.vendor.version = Corretto-11.0.9.12.1
GC:
G1 Young Generation - 1 collections, time spent 5 ms
G1 Old Generation - 0 collections, time spent 0 ms
END RequestId: ed22ee9a-f21f-4fa1-aa9f-522994605877
REPORT RequestId: ed22ee9a-f21f-4fa1-aa9f-522994605877	Init Duration: 0.22 ms	Duration: 1482.93 ms	Billed Duration: 1500 ms	Memory Size: 3008 MB	Max Memory Used: 3008 MB

And here is the same code as AOT compiled with GraalVM Native Image:

START RequestId: ccca0597-a65a-4488-8bc1-c13004885c4d Version: $LATEST
Hello world
{}
JVM:
uptime 6 ms
java.specification.version = 11
java.version = 11.0.9
java.vm.name = Substrate VM
java.vm.version = GraalVM 20.3.0 Java 11
java.vendor = Oracle Corporation
java.vendor.version = nil
GC:
young generation scavenger - 0 collections, time spent 0 ms
complete scavenger - 0 collections, time spent 0 ms
END RequestId: ccca0597-a65a-4488-8bc1-c13004885c4d
REPORT RequestId: ccca0597-a65a-4488-8bc1-c13004885c4d	Init Duration: 0.38 ms	Duration: 9.68 ms	Billed Duration: 100 ms	Memory Size: 3008 MB	Max Memory Used: 3008 MB

The cold start time goes from 1500ms down to 10ms! When the application is deployed on AWS Lambda, it’s a few hundred milliseconds slower, likely due to latency in the AWS Lambda infrastructure.

Here is the rundown of a few informal benchmarks of running the hello world application 5-10 times per configuration. The local machine was a maxed out 2020 Intel Macbook Pro 13” and the AWS Lambda was configured to use 256 MB memory.

In addition to the native version starting faster, it’s also faster on subsequent requests. Presumably JIT compiled code would be faster than the AOT compiled code after it has been executing long enough for the optimizations to kick in, but that could take a large fraction of a lambda’s 10-15 minute lifetime.

The slowest part seems to be the AWS Lambda infrastructure outside our application. At best the cold start times are in the 300-400 ms range, but every now and then I see 2-3 second cold starts, which would be quite long for an interactive application (and it costs 100-1000 times more than the couple milliseconds that our code takes to run). I hope Amazon keeps optimizing their cold start times.

Conclusion

AOT compiling a Clojure application with GraalVM Native Image is not exactly a walk in the park, but neither is it an uphill battle; the Java agent helps a lot with Native Image configuration. And after compilation the application start times are fast enough to make Clojure feasible in new frontiers such as AWS Lambda and command line tools.

Photography by: Lionello DelPiccolo, Unsplash.

Esko Luontola

Senior Software Architect