AI & Machine Learning

Model Deployment & Serving

Deploy ML models as REST endpoints or batch jobs using NATIS Model Serving.

7 min read · Updated May 2025

NATIS Model Serving deploys registered MLflow models as production REST endpoints with auto-scaling, A/B testing, traffic splitting, and built-in monitoring. No Kubernetes or container expertise required.

Deploying a Model via UI

  • 1. Navigate to AI/ML → Model Registry and select your model.
  • 2. Click Create Serving Endpoint in the top-right corner.
  • 3. Give your endpoint a name (e.g., customer-churn-v1).
  • 4. Select the model version to deploy (Staging or Production version).
  • 5. Configure compute: CPU Tiny (1 core), Small (4 cores), Medium (8 cores), or GPU (1 × NVIDIA T4).
  • 6. Set auto-scaling: Min Provisioned Concurrency and Max Concurrency.
  • 7. Click Create Endpoint. The endpoint will be live in ~5 minutes.

Calling the REST Endpoint

Use the A/B Testing configuration (Endpoint Settings → Traffic Splitting) to route a percentage of traffic to a new model version before fully promoting it.

PYTHON
import requests
import json

endpoint_url = "https://app.natis.vn/serving-endpoints/customer-churn-v1/invocations"
token = "dapiXXXXXXXX"  # Personal Access Token

# Prepare input data
payload = {
    "dataframe_records": [
        {
            "customer_id": "cust_123",
            "total_orders_90d": 12,
            "total_spend_90d": 2450.00,
            "avg_order_value": 204.17,
            "days_since_last_order": 7,
            "clv_tier": "medium"
        }
    ]
}

response = requests.post(
    endpoint_url,
    headers={
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    },
    data=json.dumps(payload)
)

result = response.json()
print(f"Churn probability: {result['predictions'][0]:.2%}")
# Output: Churn probability: 23.40%

Was this page helpful?

Thanks for your feedback!