Skip to content

Deploy a Model API

This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning.

The API response includes the model ID and the model deployment status. After receiving the response, use the dockStatusId to call the Get Dock Status API and verify the successful deployment of the model.

Method POST
Endpoint https://{host}/api/public/models/:{modelId}/deploy?modelType={modelType}
Content Type application/json
Authorization X-api-key - The API key used for authentication.

Where can I find the API key? Learn more.

Query Parameters

PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES
host The environment URL. For example, https://gale.kore.ai. String Required N/A
modelId The model ID to deploy. String Required N/A
modelType Type of model being deployed. String Required ["openSource", "fineTune"]

Sample Request

For an Opensource Model Source

curl --location 'https://dev-agent-platform.kore.ai/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource' \
--header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
    "name": "Flant5_model",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviouly": true
  }'

For a Finetune Model Source

curl --location 'https://preprod-gale.kore.ai/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune' \
--header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
    "name": "gpt2",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviouly": true
  }'

Body Parameters

The following deployment parameters can be configured and passed in the body:

General Parameters

PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES
name Name of the model to deploy. String Required N/A
isDeployedPreviously Indicates if the model was deployed before. Boolean Optional [true, false]

Hyperparameters

PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES
temperature Controls randomness of output. Float Required 0-2
maxTokens Maximum tokens allowed. Int Required 0-512
topP Controls nucleus sampling. Float Required 0-1
topK Controls top-K sampling. Int Required 1-100
stopSequence Stop sequences for the model. Array Optional N/A

Scaling Parameters

PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL RANGE
maxBatchSize Maximum batch size. Int Optional 1-256
minReplicas Minimum replicas. Int Optional 1-10
maxReplicas Maximum replicas. Int Optional 1-50
scaleUpDelay Delay before scaling up (ms). Int Optional 1-1000
scaleDownDelay Delay before scaling down (ms). Int Optional 50-2000

Deployment Device & Optimization

PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES
deviceType Device type for deployment. String Required ["g4dn.xlarge", "g5.xlarge", "g5.2xlarge", "g6e.xlarge", "g4dn.12xlarge", "g5.12xlarge", "g5.48xlarge", "g4dn.metal"]
optimizationInfo Optimization details. Object Optional N/A
optimizationType Type of optimization. String Optional ["ctranslate2", "vllm"]
quantizationType Type of quantization. String Optional ["no_quantization", "int8_float16"]

Sample Response

{
  "dock-statusId": "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1",
  "modelId": "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1",
  "jobType": "MODELS",
  "action": "DEPLOY",
  "status": "IN_PROGRESS"
}

Response Parameters

PARAMETER DESCRIPTION TYPE
dockStatusId The unique identifier for tracking the model deployment. String
guardrail The model that was deployed. String
jobType Specifies the type of job (e.g., "MODELS"). String
action Indicates the performed action ("DEPLOY"). String
status Deployment status ("SUCCESS", "IN_PROGRESS", or "FAILED"). String