API¶
This repository includes a FastAPI server that exposes nnU-Net inference as an HTTP API. The server is implemented in src/nnunet_serve/nnunet_api.py (with the application entrypoint in src/nnunet_serve/nnunet_serve_api.py) and configured by model-serve-spec.yaml.
Model caching¶
Models are cached using a time-to-live cache system, they survive in memory for 5 minutes (300 seconds). Whenever a model is needed, it is checked if it is already cached. If it is not, it is loaded to the pre-specified cache and returned. The cache is cleaned up periodically (every 60 seconds) to free up space.
Run the server¶
Locally¶
# optionally set the port via env var (defaults to 12345)
export NNUNET_SERVE_PORT=12345
uv run uvicorn nnunet_serve.nnunet_serve_api:create_app \
--host 0.0.0.0 \
--port ${NNUNET_SERVE_PORT} \
--reload
Environment variables:
MODEL_SERVE_SPEC: path to a model serve spec file. Defaults tomodel-serve-spec.yamlin the working directory.TOTALSEG_WEIGHTS_PATH: optional override for where TotalSegmentator weights are downloaded/cached. Defaults to<model_folder>/totalsegbased onmodel-serve-spec.yaml.NNUNET_SERVE_PORT: the port the server listens on (default:12345).
Ensure your model-serve-spec.yaml is present and correctly references your models. GPU and nvidia-smi must be available; the server waits for a GPU with enough free memory before running a job.
Running as a Docker container¶
Firstly, users must install Docker. Docker requires sudo if not correctly setup so be mindful of this!. Then:
- Adapt the
model-serve-spec.yamlwith your favourite models; this is the blueprint formodel-serve-spec-docker.yaml(same models but different model directory) - Build the container (
sudo docker build -f Dockerfile . -t nnunet_predict) - Run the container while specifying the relevant ports (50422), GPU usage (
--gpus all), and the model directory (-v /models:/models, as well as the output directory if necessary-v /data/nnunet:/data/nnunet):docker run -it -p 50422:50422 --gpus all -v /models:/models -v /data/nnunet:/data/nnunet nnunet_predict uvicorn nnunet_serve.nnunet_serve_api:create_app. This will launch the inference server. When specifying the output directory - if the outputs are not supposed to be kept, we recommend using a Docker volume which can be easily deleted. If the server is running internally, it might be interesting to mount a directory in the computer where outputs are stored.
Endpoints¶
-
GET /model_info- Returns the server’s model registry resolved from
model-serve-spec.yamland the filesystem. - Response model:
dict[str, Any](JSON object with model entries).
- Returns the server’s model registry resolved from
-
GET /request-params- Returns the JSON schema of the request body for
/infer(Pydantic modelInferenceRequest). - Response model:
dict[str, Any].
- Returns the JSON schema of the request body for
-
POST /infer- Runs inference for one or multiple models.
- Response model:
InferenceResponse(see response schema below).
-
POST /infer_file- Accepts an archive upload (zip, tar, etc.), stores it, builds an
InferenceRequest, and delegates to/infer. Keep in mind that while thennunet_serveAPI does not requirestudy_pathfor/infer_file, it still requiresseries_folders. This is to eliminate any ambiguity when selecting the relevant series for predictions. - Returns a job ID and inference result.
- Response model:
dict[str, Any](includes job_id and same fields as/infer).
- Accepts an archive upload (zip, tar, etc.), stores it, builds an
-
GET /download/{job_id}- Serves the zip file containing the inference outputs for the given job ID.
- Response class:
FileResponse(application/zip).
-
GET /healthz- Simple health check endpoint.
- Response model:
dict[str, Any]with{"status": "ok"}.
-
GET /readyz- Readiness probe indicating whether models are loaded and a GPU is available.
- Response model:
dict[str, Any]with status and additional fields.
-
GET /expire- Expires the TTL cache.
- Response model:
dict[str, Any]with status and message.
Request body schema (InferenceRequest)¶
please also refer to nunet_serve.api_datamodels.InferenceRequest
Required fields:
nnunet_id: string or list of strings. Must match a modelid,name, or any alias frommodel-serve-spec.yaml.study_path: string path to the study root directory (only for/inferendpoint; not necessary for/infer_file).series_folders:- Single model: list of relative series folder names under
study_path. - Multiple models: list of lists, one per model, each a list of relative series folder names under
study_path. - DICOM inputs (
is_dicom=true): each entry must point to a directory containing a single DICOM series (not a study root). For multi-series inputs per model (e.g., T2/DWI/ADC), additional series are rigidly resampled to the first series’ geometry for inference. output_dir: directory where outputs will be written.
Common optional fields (with server defaults or per-model default_args):
class_idx: integer or list of integers per model. Keeps only selected classes in outputs and probability maps.checkpoint_name: checkpoint filename in each model folder. Defaultcheckpoint_final.pth(or fromdefault_args).tmp_dir: temp directory. Default.tmp.is_dicom: boolean. If true, reads DICOM series and exports DICOM-SEG/RTStruct using modelmetadata. Defaultfalse.tta: boolean. If true, enables mirroring. Defaulttrue.use_folds: list of ints. Default[0]unless overridden.proba_threshold: float or list of floats per model; required ifsave_proba_map=true.min_confidence: float or list of floats per model; filters candidate components.intersect_with: path to a mask image to intersect candidates; seemin_intersection.min_intersection: float IoU threshold for candidate filtering. Default0.1.crop_from: path to a mask used to crop inputs by bounding box. Seecrop_paddingandcascade_mode.crop_padding: tuple of three ints. Default(10, 10, 10).cascade_mode: string, one ofintersectorcrop. Defaultintersect.- Export controls:
save_proba_map: boolean. If true, exports probability maps. Requiresproba_thresholdnot null.save_nifti_inputs: boolean. If true andis_dicom=true, exports input volumes as NIfTI.save_rt_struct_output: boolean. If true andis_dicom=true, also exports RT Struct.suffix: string appended to output filenames (e.g.,prediction_<suffix>.nii.gz).
Notes:
- For multi-model requests (
nnunet_idis a list),series_foldersmust be a list of lists of the same length, and list-valued parameters (class_idx,proba_threshold, etc.) can be supplied per model. Defaults are merged accordingly. - When
is_dicom=true, each model must havemetadatadefined inmodel-serve-spec.yaml(eitherpathto a DCMQI JSON template or an inline metadata object). Otherwise inference will fail. series_foldersalso supports cascade references viafrom:<model_or_alias>,from:<model_or_alias>=<label>, andfrom:<model_or_alias>[<index>]. Missing upstream stages are injected automatically when needed.
Response schema (POST /infer)¶
On success (HTTP 200):
time_elapsed: seconds to complete the request.nnunet_path: string or list of model paths used.metadata: metadata object(s) used for DICOM export (if any).request: echoed request body.status:done.- Exported file paths (per-stage directories
stage_0,stage_1, ...): nifti_prediction: list of paths toprediction[_<suffix>].nii.gz.nifti_proba: list of paths toproba[_<suffix>].nii.gzifsave_proba_map=true.nifti_inputs: list of input NIfTI paths ifsave_nifti_inputs=true.- If
is_dicom=true:dicom_segmentation: list of paths toprediction[_<suffix>].dcm.dicom_struct: list of paths tostruct[_<suffix>].dcmifsave_rt_struct_output=trueand masks are non-empty.dicom_fractional_segmentation: list of paths to fractional DICOM-SEG for probability maps ifsave_proba_map=true.
- Empty predictions: when a stage’s mask is empty, DICOM-SEG/RTStruct export is skipped for that stage.
On failure:
- HTTP 400 for invalid
nnunet_idor invalidseries_foldersshape; payload includesstatus="failed"anderrormessage. - HTTP 400 if
series_foldersis missing or inconsistent with the number of models. - HTTP 500 for runtime exceptions during inference; payload includes
status="failed"anderror.
Response schema (POST /infer_file)¶
On success (HTTP 200):
job_id: unique identifier for the inference job.- All fields from the
/inferresponse schema are included (time_elapsed,nnunet_path,metadata,request,status, exported file paths, etc.). - The
requestfield reflects the original request payload (withoutstudy_pathas it is inferred from the uploaded file).
On failure (HTTP 400/500):
- Same error structure as
/inferwith an additionaljob_idfield when applicable. - Payload includes
status="failed"and anerrormessage describing the issue.
Examples¶
- Discover models and schema
curl -s http://localhost:12345/model_info | jq .
curl -s http://localhost:12345/request-params | jq .
- Run single-model inference (NIfTI inputs)
curl -X POST http://localhost:12345/infer \
-H 'Content-Type: application/json' \
-d '{
"nnunet_id": "prostate_whole_gland",
"study_path": "/data/study01",
"series_folders": ["inputs/seriesT2"],
"output_dir": "/data/out/study01",
"use_folds": [0,1,2,3,4],
"tta": true,
"save_proba_map": true,
"proba_threshold": 0.1,
"min_confidence": 0.5
}'
- Run multi-model cascade with DICOM input and RT Struct
curl -X POST http://localhost:12345/infer \
-H 'Content-Type: application/json' \
-d '{
"nnunet_id": ["prostate_whole_gland", "prostate_zone"],
"study_path": "/data/study02",
"series_folders": [["inputs/seriesT2"], ["inputs/seriesT2"]],
"output_dir": "/data/out/study02",
"is_dicom": true,
"cascade_mode": "intersect",
"save_rt_struct_output": true
}'