Deploy a Model API¶
This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning.
The API response includes the model ID and the model deployment status. After receiving the response, use the dockStatusId
to call the Get Dock Status API and verify the successful deployment of the model.
Method | POST |
Endpoint | https://{host}/api/public/models/:{modelId}/deploy?modelType={modelType}
|
Content Type | application/json |
Authorization | X-api-key - The API key used for authentication.
|
Where can I find the API key? Learn more.
Query Parameters¶
PARAMETER | DESCRIPTION | TYPE | REQUIRED/OPTIONAL | ENUM VALUES |
host | The environment URL. For example, https://gale.kore.ai .
|
String | Required | N/A |
modelId | The model ID to deploy. | String | Required | N/A |
modelType | Type of model being deployed. | String | Required | ["openSource", "fineTune"] |
Sample Request¶
For an Opensource Model Source
curl --location 'https://dev-agent-platform.kore.ai/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource' \
--header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
"name": "Flant5_model",
"hyperParameters": {
"temperature": 1,
"maxTokens": 512,
"topP": 1,
"topK": 50,
"stopSequence": []
},
"scalingParameters": {
"maxBatchSize": 10,
"minReplicas": 1,
"maxReplicas": 2,
"scaleUpDelay": 30,
"scaleDownDelay": 600
},
"deviceType": "g5.xlarge",
"optimizationInfo": {
"optimizationType": "",
"quantizationType": ""
},
"isDeployedPreviouly": true
}'
For a Finetune Model Source
curl --location 'https://preprod-gale.kore.ai/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune' \
--header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
"name": "gpt2",
"hyperParameters": {
"temperature": 1,
"maxTokens": 512,
"topP": 1,
"topK": 50,
"stopSequence": []
},
"scalingParameters": {
"maxBatchSize": 10,
"minReplicas": 1,
"maxReplicas": 2,
"scaleUpDelay": 30,
"scaleDownDelay": 600
},
"deviceType": "g5.xlarge",
"optimizationInfo": {
"optimizationType": "",
"quantizationType": ""
},
"isDeployedPreviouly": true
}'
Body Parameters¶
The following deployment parameters can be configured and passed in the body:
General Parameters
PARAMETER | DESCRIPTION | TYPE | REQUIRED/OPTIONAL | ENUM VALUES |
name | Name of the model to deploy. | String | Required | N/A |
isDeployedPreviously | Indicates if the model was deployed before. | Boolean | Optional | [true, false] |
Hyperparameters
PARAMETER | DESCRIPTION | TYPE | REQUIRED/OPTIONAL | ENUM VALUES |
temperature | Controls randomness of output. | Float | Required | 0-2 |
maxTokens | Maximum tokens allowed. | Int | Required | 0-512 |
topP | Controls nucleus sampling. | Float | Required | 0-1 |
topK | Controls top-K sampling. | Int | Required | 1-100 |
stopSequence | Stop sequences for the model. | Array | Optional | N/A |
Scaling Parameters
PARAMETER | DESCRIPTION | TYPE | REQUIRED/OPTIONAL | RANGE |
maxBatchSize | Maximum batch size. | Int | Optional | 1-256 |
minReplicas | Minimum replicas. | Int | Optional | 1-10 |
maxReplicas | Maximum replicas. | Int | Optional | 1-50 |
scaleUpDelay | Delay before scaling up (ms). | Int | Optional | 1-1000 |
scaleDownDelay | Delay before scaling down (ms). | Int | Optional | 50-2000 |
Deployment Device & Optimization
PARAMETER | DESCRIPTION | TYPE | REQUIRED/OPTIONAL | ENUM VALUES |
deviceType | Device type for deployment. | String | Required | ["g4dn.xlarge", "g5.xlarge", "g5.2xlarge", "g6e.xlarge", "g4dn.12xlarge", "g5.12xlarge", "g5.48xlarge", "g4dn.metal"] |
optimizationInfo | Optimization details. | Object | Optional | N/A |
optimizationType | Type of optimization. | String | Optional | ["ctranslate2", "vllm"] |
quantizationType | Type of quantization. | String | Optional | ["no_quantization", "int8_float16"] |
Sample Response¶
{
"dock-statusId": "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1",
"modelId": "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1",
"jobType": "MODELS",
"action": "DEPLOY",
"status": "IN_PROGRESS"
}
Response Parameters¶
PARAMETER | DESCRIPTION | TYPE |
dockStatusId | The unique identifier for tracking the model deployment. | String |
guardrail | The model that was deployed. | String |
jobType | Specifies the type of job (e.g., "MODELS"). | String |
action | Indicates the performed action ("DEPLOY"). | String |
status | Deployment status ("SUCCESS", "IN_PROGRESS", or "FAILED"). | String |