From Flask API to a robust container "FROM scratch"

Summary
Requirements
I - WSGI server compliance and HTTPS
-- 1 - Prepare your app to run with gunicorn (headless)
-- 2 - Add HTTPS
II - Compile the python code
III - Go futher
-- 1 - Docker Compose
-- 2 - Developpement and Engineering
IV - What I learn
-- 1 - My mistakes
---- A - Building on the machine
---- B - Using a dedicated compiler container the wrong way
---- C - Mounting Volumes
---- D - File permission pain
-- 2 - Lesson Learned
Last Words
Preamble
If you want to exercice first before updating your app, you can clone my GitHub repository here. You will find a set of instructions to initialize the project in the README.md.
Requirements
-- A Linux/MacOS workstation with a terminal working
-- Git, python and curl install
-- Docker ready and started (daemon running)
I - WSGI server compliance and HTTPS
1 - Prepare your app to run with gunicorn (headless)
Gunicorn can make your flask app WSGI compliant (Web Server Gateway Interface) to be used to serve your app on production with enhanced security.
To do so install the gunicorn package :
python -m pip install gunicorn
In your main file import the BaseApplication
form gunicorn.app.base
as following :
#!/usr/bin/env python
"""
Filename : main.py
Porject : XXXXX
Nickname : YYYYY
Description : ZZZZZZZ ZZZ ZZZZZZ
"""
from flask import Flask, request, jsonify
from gunicorn.app.base import BaseApplication
...
Again in your main file create a class like this :
# disable pylint warning for abstrac class warning
#pylint: disable=W0223
class GunicornApplication(BaseApplication):
"""Class to run the API using gunicorn WSGI"""
def __init__(self, application, options=None):
self.options = options or {}
self.application = application
super().__init__()
def load_config(self):
config = {key: value for key, value in self.options.items()
if key in self.cfg.settings and value is not None}
for key, value in config.items():
self.cfg.set(key.lower(), value)
def load(self):
return self.application
Now instead of calling app.run()
for example like this :
app.run(host="127.0.0.1", port=5001)
You should create a variable for your gunicorn options and call your newly created class, also modify the listening IP address :
# app.run(host="127.0.0.1", port=5001)
options = {
# change from 127.0.0.1 (localhost) to 0.0.0.0
# to listen every IP, required for container.
"bind": "0.0.0.0:5001",
"workers": 1
}
GunicornApplication(application=app, options=options).run()
Try to start your API and re-test it. Every thing should work as expected.
2 - Add HTTPS
Create a certificate on the same folder than your main file. to create a certificate you can use the command below, it will prompt you some question in order to build the certificate, the most important question is Common Name (e.g. server FQDN or YOUR name) []:
here you should set the Domain Name for your certificate, you could use demo-api.local
as it is safe to use, it shouldn't leave your private network.
openssl req -x509 -newkey rsa:4096 \
-keyout key.pem -out cert.pem \
-days 365 -nodes
With the command above the certificate will be valid for 1 year, after that you should renew it.
Add the certificate in your gunicorn options :
options = {
"bind": "0.0.0.0:5001",
"certfile": "./cert.pem",
"keyfile": "./key.pem",
"workers": 1
}
Start your API.
Test your API now in HTTPS, don't forget to set your Domain Name (replace THE_DOMAIN_YOU_SET) and change the port.
curl --cacert path/to/cert.pem \
'https://THE_DOMAIN_YOU_SET:5001/YOUR_API_PATH' \
--resolve "THE_DOMAIN_YOU_SET:5001:127.0.0.1"
If you use my repository and set your Domain Name as demo-api.local, you can test the 2 API endpoints with the following commands :
curl --cacert cert.pem \
'https://demo-api.local:5001/' \
--resolve "demo-api.local:5001:127.0.0.1"
And :
curl --cacert cert.pem \
'https://demo-api.local:5001/health' \
--resolve "demo-api.local:5001:127.0.0.1"
II - Compile the python code
Create a dockerfile
and fill it with the following information :
# Stage 1: Build stage
FROM python:3.12-bookworm as builder
WORKDIR /work
# Create a empty directory for final container /tmp
RUN mkdir /new_empty_dir
# Update the container
RUN apt-get update -y
RUN apt-get upgrade -y
# Installing requirements to compile python
RUN pip install pyinstaller
RUN pip install staticx
RUN apt-get install patchelf -y
# Install the source code
COPY requirements.txt requirements.txt
COPY main.py main.py
# Compile the app
RUN pip install -r requirements.txt
RUN pyinstaller --hidden-import gunicorn.glogging --hidden-import gunicorn.workers.sync -F main.py -n app_unpackaged.elf
RUN staticx --strip dist/app_unpackaged.elf dist/app.elf
# Stage 2: Final container
FROM scratch
USER 65535
WORKDIR /app/
COPY --chown=65535:65535 --from=builder /work/dist/app.elf /app/app.elf
COPY --chown=65535:65535 --from=builder /new_empty_dir /tmp
ENTRYPOINT ["/app/app.elf"]
Before compiling your program make sure you got a requirements.txt
file, to create it be sure to be in your venv and execute the following command :
pip freeze > requirements.txt
We create a user with the id 65535 in this dockerfile in order to be able to use this container in rootless mode.
So in order to share certificate to your app you should chown
your cert and key file, this command require to be root, you can use it like this :
sudo chown -R 65535 ./*.pem
This command will modify the owner of the file to the user with id 65535 in order to be able to read the file from the user we create for our final container (id 65535). So if your app require a configuration file or another file don't forget to use chown
command in order to be able to read your file from the container.
Change the permission of the files to share certificate to your app (Read only permission to created user 65535 and to the associated group) :
sudo chmod 440 ./*.pem
Build your final container :
docker build --tag app .
Run it (giving your certificates files):
docker run \
--mount type=bind,source=$(pwd)/cert.pem,target=/app/cert.pem,readonly \
--mount type=bind,source=$(pwd)/key.pem,target=/app/key.pem,readonly \
--publish 5001:5001 app
Again test your API, but now from a scratch container :
curl --cacert path/to/cert.pem \
'https://THE_DOMAIN_YOU_SET:5001/YOUR_API_PATH' \
--resolve "THE_DOMAIN_YOU_SET:5001:127.0.0.1"
If you use my repository and set your Domain Name as demo-api.local, you can test the 2 API endpoints with the following commands :
curl --cacert cert.pem \
'https://demo-api.local:5001/' \
--resolve "demo-api.local:5001:127.0.0.1"
And :
curl --cacert cert.pem \
'https://demo-api.local:5001/health' \
--resolve "demo-api.local:5001:127.0.0.1"
III - Go futher
1 - Docker Compose
Use a compose.yml file to harden your container deployement.
You could set Ressources quotas (CPU, RAM, Disk) in order to prevent DoS attack. To know what your app require you could use docker stats
command
You could set the docker and it's volume to read only if no write are required.
You could use a dedicated network to reduce lateral movement.
You could set a health check to watch that your app is running.
You could reduce CAPABILITIES.
You could use seccomp profile.
Here is an example of a docker-compose.yml
with some hardening :
version: '3.9'
services:
app:
container_name: gunicorn_API
build:
context: .
logging:
options:
max-size: "10m"
max-file: "3"
deploy:
resources:
limits:
cpus: '0.1'
memory: 128M
reservations:
cpus: '0.01'
memory: 64M
restart: always
volumes:
- ./cert.pem:/app/cert.pem:ro
- ./key.pem:/app/key.pem:ro
ports:
- '5001:5001'
cap_drop:
- ALL
networks:
- app_network
networks:
app_network:
driver: bridge
2 - Developpement and Engineering
Use a venv when developing in python to reduce final packages size and useless modules embedded.
Use liting/quality tools for example pylint (pip install pylint
) in order to keep your code enjoyable to read.
Use security tools to be sure your base app is safe (for python you can use bandit pip install bandit
).
Write test to simplify future development and be sure that your app run as expected.
Write good documentation (For flask API you could go with a swagger endpoint using flask-restx
module) and don't forget the ARchitecture Dossier, with beautiful diagram, to describe how the app will deploy and integrated...
A part of a good documentation when developing an API is to make sure all your routes are tested; you could use Bruno API client for example.
To successfully assign resource quotas on your container, you can run a stress test using locust testing framework.
Integrate your tests in a CI/CD to make sure you didn't miss/forget something when testing on your workstation.
IV - What I learn
1 - My mistakes
A - Building on the machine
When trying to create this container from scratch, I first try to compile the python app on my pc, "It work my machine" but when containerize the app crashes.
B - Using a dedicated compiler container the wrong way
I was using a dedicated container to build to reduce hardware adhesion/grip but my V1 of the compiler container was a failure.
What I did was using the container to compile, passing the current directory as /work
of my container.
When building from container pyinstaller was able to compile but staticX wasn't able to aggregate all packages with error staticx: /tmp/staticx-pyi-3ek9mteo/base_library.zip: Invalid ELF image: Magic number does not match
.
To fix this error, I share only my source file (main.py
) with requirements.txt
and a volume at ./dist
in order to be able to get the final ELF file.
C - Mounting Volumes
At this point I was thinking that my problems were solved. But when building my final container and running it, the API was starting but when I request the API I get this error :
Traceback (most recent call last):
File "gunicorn/workers/sync.py", line 131, in handle
File "gunicorn/sock.py", line 228, in ssl_wrap_socket
File "gunicorn/sock.py", line 224, in ssl_context
File "gunicorn/config.py", line 2024, in ssl_context
File "gunicorn/sock.py", line 218, in default_ssl_context_factory
IsADirectoryError: [Errno 21] Is a directory
Basically Gunicorn was able to read my certificate and key files (certificate.pem
and key.pem
) when starting but wasn't able to read it when serving the app. So I change my docker run
option from -v
to --mount
from :
docker run -v cert.pem:/app/cert.pem \
-v key.pem:/app/key.pem \
--publish 5001:5001 final-container
To :
docker run \
--mount type=bind,source=$(pwd)/cert.pem,target=/app/cert.pem,readonly \
--mount type=bind,source=$(pwd)/key.pem,target=/app/key.pem,readonly \
--publish 5001:5001 app
D - File permission pain
After all of this my key file wasn't readable I had to chmod
it (from 400 to 440).
Traceback (most recent call last):
File "gunicorn/workers/sync.py", line 133, in handle
File "gunicorn/http/parser.py", line 41, in __next__
File "gunicorn/http/message.py", line 259, in __init__
File "gunicorn/http/message.py", line 60, in __init__
File "gunicorn/http/message.py", line 274, in parse
File "gunicorn/http/message.py", line 326, in read_line
File "gunicorn/http/message.py", line 262, in get_data
File "gunicorn/http/unreader.py", line 36, in read
File "gunicorn/http/unreader.py", line 63, in chunk
File "gunicorn/workers/base.py", line 204, in handle_abort
SystemExit: 1
And my API finally work.
2 - Lesson Learned
When building container directly reduce hardware adhesion/grip by building all your workloads in containers. Reduce to the minimum interaction with your bare-metal machine/OS.
Be aware of permissions and details like --mount
instead of -v
.
Last Words
It is common for new things to fail during the engineering process, but if you have the time keep pushing, it will pay off.