Dockerfile.neuron.dev has been deprecated. Please refer to deep learning containers repository for neuron torchserve containers.
Dockerfile.dev has been deprecated. Please refer to Dockerfile for dev torchserve containers.
- Prerequisites
- Create TorchServe docker image
- Create torch-model-archiver from container
- Running TorchServe docker image in production
-
docker - Refer to the official docker installation guide
-
git - Refer to the official git set-up guide
-
For base Ubuntu with GPU, install following nvidia container toolkit and driver-
-
NOTE - Dockerfiles have not been tested on windows native platform.
If you have not cloned TorchServe source then:
git clone https://github.com/pytorch/serve.git
cd serve/dockerUse build_image.sh script to build the docker images. The script builds the production, dev and ci docker images.
| Parameter | Description |
|---|---|
| -h, --help | Show script help |
| -b, --branch_name | Specify a branch name to use. Default: master |
| -g, --gpu | Build image with GPU based ubuntu base image |
| -bi, --baseimage specify base docker image. Example: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04 | |
| -bt, --buildtype | Which type of docker image to build. Can be one of : production, dev, ci |
| -t, --tag | Tag name for image. If not specified, script uses torchserve default tag names. |
| -cv, --cudaversion | Specify to cuda version to use. Supported values cu92, cu101, cu102, cu111, cu113, cu116, cu117, cu118. cu121, Default cu121 |
| -ipex, --build-with-ipex | Specify to build with intel_extension_for_pytorch. If not specified, script builds without intel_extension_for_pytorch. |
| -n, --nightly | Specify to build with TorchServe nightly. |
| -py, --pythonversion | Specify the python version to use. Supported values 3.8, 3.9, 3.10, 3.11. Default 3.9 |
PRODUCTION ENVIRONMENT IMAGES
Creates a docker image with publicly available torchserve and torch-model-archiver binaries installed.
- To create a CPU based image
./build_image.sh-
To create a GPU based image with cuda 10.2. Options are
cu92,cu101,cu102,cu111,cu113,cu116,cu117,cu118- GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.
./build_image.sh -g -cv cu117- To create an image with a custom tag
./build_image.sh -t torchserve:1.0NVIDIA CUDA RUNTIME BASE IMAGE
To make use of ONNX, we need to use NVIDIA CUDA runtime as the base image. This will increase the size of your Docker Image
./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117DEVELOPER ENVIRONMENT IMAGES
Creates a docker image with torchserve and torch-model-archiver installed from source.
- For creating CPU based image :
./build_image.sh -bt dev- For creating CPU based image with a different branch:
./build_image.sh -bt dev -b my_branch- For creating GPU based image with cuda version 11.3:
./build_image.sh -bt dev -g -cv cu113- For creating GPU based image with cuda version 11.1:
./build_image.sh -bt dev -g -cv cu111- For creating GPU based image with cuda version 10.2:
./build_image.sh -bt dev -g -cv cu102- For creating GPU based image with cuda version 10.1:
./build_image.sh -bt dev -g -cv cu101- For creating GPU based image with cuda version 9.2:
./build_image.sh -bt dev -g -cv cu92- For creating GPU based image with a different branch:
./build_image.sh -bt dev -g -cv cu113 -b my_branch./build_image.sh -bt dev -g -cv cu111 -b my_branch- For creating image with a custom tag:
./build_image.sh -bt dev -t torchserve-dev:1.0- For creating image with Intel® Extension for PyTorch*:
./build_image.sh -bt dev -ipex -t torchserve-ipex:1.0The following examples will start the container with 8080/81/82 and 7070/71 port exposed to localhost.
TorchServe's Dockerfile configures ports 8080, 8081 , 8082, 7070 and 7071 to be exposed to the host by default.
When mapping these ports to the host, make sure to specify localhost or a specific ip address.
For the latest version, you can use the latest tag:
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latestFor specific versions you can pass in the specific tag to use (ex: pytorch/torchserve:0.1.1-cpu):
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cpudocker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 torchserve-ipex:1.0For GPU latest image with gpu devices 1 and 2:
docker run --rm -it --gpus '"device=1,2"' -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpuFor specific versions you can pass in the specific tag to use (ex: 0.1.1-cuda10.1-cudnn7-runtime):
docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cuda10.1-cudnn7-runtimeFor the latest version, you can use the latest-gpu tag:
docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpuThe TorchServe's inference and management APIs can be accessed on localhost over 8080 and 8081 ports respectively. Example :
curl http://localhost:8080/pingTo create mar [model archive] file for TorchServe deployment, you can use following steps
- Start container by sharing your local model-store/any directory containing custom/example mar contents as well as model-store directory (if not there, create it)
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples pytorch/torchserve:latest1.a. If starting container with Intel® Extension for PyTorch*, add the following lines in config.properties to enable IPEX and launcher with its default configuration.
ipex_enable=true
cpu_launcher_enable=true
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/config.properties:/home/model-server/config.properties -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples torchserve-ipex:1.0- List your container or skip this if you know container name
docker ps- Bind and get the bash prompt of running container
docker exec -it <container_name> /bin/bashYou will be landing at /home/model-server/.
- Download the model weights if you have not done so already (they are not part of the repo)
curl -o /home/model-server/examples/image_classifier/densenet161-8d451a50.pth https://download.pytorch.org/models/densenet161-8d451a50.pth- Now Execute torch-model-archiver command e.g.
torch-model-archiver --model-name densenet161 --version 1.0 --model-file /home/model-server/examples/image_classifier/densenet_161/model.py --serialized-file /home/model-server/examples/image_classifier/densenet161-8d451a50.pth --export-path /home/model-server/model-store --extra-files /home/model-server/examples/image_classifier/index_to_name.json --handler image_classifierRefer torch-model-archiver for details.
- densenet161.mar file should be present at /home/model-server/model-store
You may want to consider the following aspects / docker options when deploying torchserve in Production with Docker.
-
Shared Memory Size
shm-size- The shm-size parameter allows you to specify the shared memory that a container can use. It enables memory-intensive containers to run faster by giving more access to allocated memory.
-
User Limits for System Resources
--ulimit memlock=-1: Maximum locked-in-memory address space.--ulimit stack: Linux stack size
The current ulimit values can be viewed by executing
ulimit -a. A more exhaustive set of options for resource constraining can be found in the Docker Documentation here, here and here -
Exposing specific ports / volumes between the host & docker env.
-p8080:8080 -p8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071TorchServe uses default ports 8080 / 8081 / 8082 for REST based inference, management & metrics APIs and 7070 / 7071 for gRPC APIs. You may want to expose these ports to the host for HTTP & gRPC Requests between Docker & Host.- The model store is passed to torchserve with the --model-store option. You may want to consider using a shared volume if you prefer pre populating models in model-store directory.
For example,
docker run --rm --shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p 127.0.0.1:8080:8080 \
-p 127.0.0.1:8081:8081 \
-p 127.0.0.1:8082:8082 \
-p 127.0.0.1:7070:7070 \
-p 127.0.0.1:7071:7071 \
--mount type=bind,source=/path/to/model/store,target=/tmp/models <container> torchserve --model-store=/tmp/models
This is an example showing serving MNIST model using Docker.