Setup LLM Provider in VS Code Extensions (Cline, Kilocode)
All models are accessed through Cloudflare AI Gateway (
demo-hkmci). Users only need their personal gateway token (cfut_...) β no provider keys required.
π€ User Guide
Step 1 β Get your Gateway Token
API Token: [Ask your admin β starts with cfut_...]
Step 2 β Configure the Extension
Cline:
- Click the Cline icon in the Activity Bar
- Click the Settings (gear) icon at the top of the Cline panel
- Under Provider, select OpenAI Compatible
- Fill in the fields below
Kilocode:
- Click the Kilocode icon in the Activity Bar
- Click your profile/account icon (bottom-left of the Kilocode panel) β Settings
- Go to the Providers tab
- Click + Add Provider and select OpenAI Compatible
- Fill in the fields below
| Field | What to enter |
|---|---|
| Provider | OpenAI Compatible |
| Base URL | (copy from the model table below β include the full URL) |
| API Key | (your gateway token: cfut_...) |
| Model | (type the Model ID manually β see note below) |
Important β Model field: The model list will not auto-populate. You must type the Model ID directly into the model field (e.g.
minimaxai/minimax-m2-maas). Do not wait for a dropdown.
API Key: Enter the
cfut_...token as-is. The extension sends it asAuthorization: Bearer cfut_...which the gateway accepts.
Step 3 β Select a Model
Use the tables below to pick a Base URL + Model ID pair. Every model uses OpenAI Compatible as the provider type.
Step 4 β Verify it Works
After saving, open a new chat in the extension and send a short message (e.g. βsay hiβ). If you see a response, the setup is complete. If you get an auth error, double-check the API Key. If you get a model error, make sure you typed the Model ID exactly as shown.
Screenshots
Kilocode:

Cline:

Available Models
Google Gemini (via Google AI Studio)
Base URL:
https://gateway.ai.cloudflare.com/v1/b326904912840c25f63808a1d1e479aa/demo-hkmci/google-ai-studio/v1beta/openai
| Model | Model ID | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| Gemini 3.1 Pro | gemini-3.1-pro-preview |
$2 / $4* | $12 / $18* | 1M | Best quality; preview |
| Gemini 3 Flash | gemini-3-flash-preview |
$0.50 | $3 | 1M | Fast & balanced; preview |
| Gemini 3.1 Flash Lite | gemini-3.1-flash-lite-preview |
$0.25 | $1.50 | 1M | Cheapest; preview |
Prices per million tokens (USD). * = higher price applies when input > 200K tokens.
Model IDs verified working via CF AI Gateway on 2026-04-09.
GCP Vertex AI MaaS β MiniMax, Kimi, GLM
Base URL:
https://gateway.ai.cloudflare.com/v1/b326904912840c25f63808a1d1e479aa/demo-hkmci/google-vertex-ai/v1/projects/mcps-testing-cloudflare-ai/locations/global/endpoints/openapi
| Model | Model ID | Input | Output | Context | Notes |
|---|---|---|---|---|---|
| MiniMax-M2 | minimaxai/minimax-m2-maas |
$0.30 | $1.20 | 1M | Coding & agentic; 10B active / 230B total MoE |
| Kimi K2 Thinking | moonshotai/kimi-k2-thinking-maas |
$0.60 | $2.50 | 256K | Thinking model; 32B active / 1T total MoE |
| GLM-5 | zai-org/glm-5-maas |
$1.00 | $3.20 | 128K | Latest GLM; 40B active / 744B total |
| GLM-4.7 | zai-org/glm-4.7-maas |
$0.60 | $2.20 | 128K | Stable; good multilingual |
Prices per million tokens (USD). Source: GCP Vertex AI partner model pricing.
All model IDs verified working via CF AI Gateway on 2026-04-09.
Newer versions (MiniMax-M2.5, Kimi K2.5, GLM-5.1) exist in GCP Model Garden but are deploy-only β no MaaS endpoint yet.
Cloudflare Workers AI (CF-hosted, no GCP cost)
Base URL:
https://gateway.ai.cloudflare.com/v1/b326904912840c25f63808a1d1e479aa/demo-hkmci/workers-ai/v1
| Model | Model ID | Cost | Context | Notes |
|---|---|---|---|---|
| Nvidia Nemotron 3 120B | @cf/nvidia/nemotron-3-120b-a12b |
CF Neurons | 128K | 12B active MoE |
| Moonshot Kimi K2.5 | @cf/moonshotai/kimi-k2.5 |
CF Neurons | 256K | CF-hosted version of Kimi K2.5 |
| GLM 4.7 Flash | @cf/zai-org/glm-4.7-flash |
CF Neurons | 131K | Fast, multilingual |
Workers AI uses Cloudflareβs own infrastructure (not GCP). Billed in CF Neurons ($0.011/1K Neurons) with a free daily allowance. These are different model instances from the GCP Vertex AI ones above.
Anthropic Claude (Under Investigation)
Two paths are available β choose based on what the admin has configured:
Option A β Direct Anthropic via Gateway (requires Anthropic BYOK key in gateway) Provider Type: Anthropic Base URL:
https://gateway.ai.cloudflare.com/v1/b326904912840c25f63808a1d1e479aa/demo-hkmci/anthropic
Option B β Claude via GCP Vertex AI (requires Claude access approved in GCP) β οΈ Pending Provider Type: Anthropic Base URL:
https://gateway.ai.cloudflare.com/v1/b326904912840c25f63808a1d1e479aa/demo-hkmci/google-vertex-ai/v1/projects/mcps-testing-cloudflare-ai/locations/us-east5/publishers/anthropic/models
| Model | Model ID | Input | Output |
|---|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 |
$5.00 | $25.00 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 |
$3.00 | $15.00 |
Prices per million tokens via GCP Vertex AI.
Which Model Should I Use?
| Goal | Recommended Model |
|---|---|
| Best quality / complex reasoning | Gemini 3.1 Pro or Kimi K2 Thinking |
| Coding & agentic tasks | MiniMax-M2 or GLM-5 |
| Fast & cheap for everyday tasks | Gemini 3 Flash or Gemini 3.1 Flash Lite |
| Long context (>128K) | Gemini models or Kimi K2 Thinking |
| Multilingual | GLM-4.7 or Gemini 3 Flash |
| Zero/low cost (within free quota) | Cloudflare Workers AI models |
π§ Admin Guide: Gateway Setup
Architecture Overview
Cline / Kilocode
β cfut_... (per-user gateway token)
βΌ
Cloudflare AI Gateway (demo-hkmci)
βββ /google-ai-studio/... β Google Gemini [BYOK: AI Studio key]
βββ /google-vertex-ai/... β GCP Vertex AI MaaS [BYOK: service account JSON] β
βββ /workers-ai/v1 β CF Workers AI [built-in, CF Neurons]
βββ /anthropic β Anthropic Claude [BYOK: not configured]
Each user gets their own cfut_... token created from the gateway dashboard. No provider keys are ever exposed to users.
Gateway Details
| Setting | Value |
|---|---|
| Account | Master Concept Demo (mcmsp.dev) |
| Account ID | b326904912840c25f63808a1d1e479aa |
| Gateway Name | demo-hkmci |
| Authentication | Enabled |
| Gateway Token | created per user from dashboard; distributed as cfut_... |
Provider Keys (BYOK) Configuration
Navigate to: Cloudflare Dashboard β AI β AI Gateway β demo-hkmci β Provider Keys
| Provider | Alias | Key Source | Status |
|---|---|---|---|
| Google AI Studio | default |
aistudio.google.com/apikey |
β Configured |
| Google Vertex AI | default |
GCP service account JSON | β Configured (MaaS: MiniMax-M2, Kimi K2, GLM-5/4.7) |
| Anthropic | default |
console.anthropic.com |
β¬ Not configured |
Always set alias to
defaultso the gateway injects it automatically without requiring users to hold provider keys.
Customer Provisioning
Option A β Dashboard (manual)
AI Gateway β demo-hkmci β Settings β Authenticated Gateway β Create authentication token
- Name: userβs name (e.g.
alex) - Permission: Run
Copy the cfut_... value β it is shown only once.
Option B β Script (automated)
Gateway tokens are standard Cloudflare API tokens and can be created programmatically using /user/tokens.
One-time setup: Create a master CF API token from the dashboard with:
- User β API Tokens β Edit permission
Store it securely β this is the only dashboard step required.
Provision a token for each user:
CF_API_TOKEN=<master-token> \
CF_ACCOUNT_ID=b326904912840c25f63808a1d1e479aa \
python3 /Users/zorro/Dev/cloudflare/create_ai_gateway_token.py \
--gateway-id demo-hkmci \
--token-name alex
The script prints the cfut_... value β save it immediately, it will not be shown again.
Script location:
/Users/zorro/Dev/cloudflare/create_ai_gateway_token.py
How to Deliver Config to Users
Send users the following (nothing else β no provider keys):
Cloudflare AI Gateway β VS Code Setup
API Token: cfut_... β their personal token
For Base URLs, model IDs, and setup steps, see:
https://sharehub.zorro.hk/documents/2026-04-01-vscode-llm-provider-setup.html
Revoke a user
Dashboard: AI Gateway β demo-hkmci β Settings β Authenticated Gateway β [find token] β Delete
API:
# First get the token ID
curl "https://api.cloudflare.com/client/v4/user/tokens" \
-H "Authorization: Bearer <master-token>"
# Then delete by ID
curl -X DELETE "https://api.cloudflare.com/client/v4/user/tokens/<token-id>" \
-H "Authorization: Bearer <master-token>"
Revocation is immediate.
GCP Vertex AI Setup
Billing is enabled. Service account key is stored in the gateway as the Google Vertex AI BYOK.
- GCP Project:
mcps-testing-cloudflare-ai - Service Account:
cloudflare-ai-gateway-vertex@mcps-testing-cloudflare-ai.iam.gserviceaccount.com - Active MaaS models: MiniMax-M2, Kimi K2 Thinking, GLM-5, GLM-4.7 (all via
locations/global/endpoints/openapi)
Adding a New Provider
- Get the API key for the provider
- Go to demo-hkmci β Provider Keys β find provider β click
+ - Paste the key, set alias to
default, save - Update the User Guide table above with the new Base URL and model IDs
- Republish this note:
/kf-cli:publish 2026-04-01-vscode-llm-provider-setup.md
Captured: 2026-04-01 Last Updated: 2026-04-09 (added Vertex AI MaaS pricing; added Worker proxy for per-customer key provisioning)