Deploy a Model API¶

This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning.

The API response includes the model ID and the model deployment status. After receiving the response, use the dockStatusId to call the Get Dock Status API and verify the successful deployment of the model.

Method	POST
Endpoint	`https://{host}/api/public/models/:{modelId}/deploy?modelType={modelType}`
Content Type	application/json
Authorization	`X-api-key` - The API key used for authentication.

Where can I find the API key? Learn more.

Query Parameters¶

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
host	The environment URL. For example, `https://gale.kore.ai`.	String	Required	N/A
modelId	The model ID to deploy.	String	Required	N/A
modelType	Type of model being deployed.	String	Required	["openSource", "fineTune"]

Sample Request¶

For an Opensource Model Source

curl --location 'https://dev-agent-platform.kore.ai/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource' \
--header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
    "name": "Flant5_model",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviouly": true
  }'

For a Finetune Model Source

curl --location 'https://preprod-gale.kore.ai/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune' \
--header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3' \
--header 'Content-Type: application/json' \
--data '{
    "name": "gpt2",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviouly": true
  }'

Body Parameters¶

The following deployment parameters can be configured and passed in the body:

General Parameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
name	Name of the model to deploy.	String	Required	N/A
isDeployedPreviously	Indicates if the model was deployed before.	Boolean	Optional	[true, false]

Hyperparameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
temperature	Controls randomness of output.	Float	Required	0-2
maxTokens	Maximum tokens allowed.	Int	Required	0-512
topP	Controls nucleus sampling.	Float	Required	0-1
topK	Controls top-K sampling.	Int	Required	1-100
stopSequence	Stop sequences for the model.	Array	Optional	N/A

Scaling Parameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	RANGE
maxBatchSize	Maximum batch size.	Int	Optional	1-256
minReplicas	Minimum replicas.	Int	Optional	1-10
maxReplicas	Maximum replicas.	Int	Optional	1-50
scaleUpDelay	Delay before scaling up (ms).	Int	Optional	1-1000
scaleDownDelay	Delay before scaling down (ms).	Int	Optional	50-2000

Deployment Device & Optimization

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
deviceType	Device type for deployment.	String	Required	["g4dn.xlarge", "g5.xlarge", "g5.2xlarge", "g6e.xlarge", "g4dn.12xlarge", "g5.12xlarge", "g5.48xlarge", "g4dn.metal"]
optimizationInfo	Optimization details.	Object	Optional	N/A
optimizationType	Type of optimization.	String	Optional	["ctranslate2", "vllm"]
quantizationType	Type of quantization.	String	Optional	["no_quantization", "int8_float16"]

Sample Response¶

{
  "dock-statusId": "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1",
  "modelId": "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1",
  "jobType": "MODELS",
  "action": "DEPLOY",
  "status": "IN_PROGRESS"
}

Response Parameters¶

PARAMETER	DESCRIPTION	TYPE
dockStatusId	The unique identifier for tracking the model deployment.	String
guardrail	The model that was deployed.	String
jobType	Specifies the type of job (e.g., "MODELS").	String
action	Indicates the performed action ("DEPLOY").	String
status	Deployment status ("SUCCESS", "IN_PROGRESS", or "FAILED").	String