Skip to content

Ollama User Guide - External API Version

Before You Begin

The following items must be ready before following this guide.

Item Description
gcube account Sign up at gcube.ai
Credit balance GPU usage costs apply on the gcube platform (billed hourly)

Overview

What is Ollama?

Ollama is a platform that lets you download and run open-source AI language models in a local environment. This guide walks you through running Ollama on gcube's cloud GPU environment and using the DeepSeek model via Chatbox WebUI.

Representative AI models available in Ollama:

Model Developer Features
Llama 3 Meta Excellent natural language processing
Phi 3 Microsoft Research Strong reasoning and language understanding
Mistral Mistral AI Optimized for various language tasks
Gemma 2 Google Strong at natural language processing and generation
CodeGemma Google Specialized in code generation and completion

Step 0 — Create and Log In to gcube Account

0-1. Sign Up

Go to https://gcube.ai and click the "Sign Up" button in the upper right. Complete email verification to create your account.

0-2. Log In

After signing up, log in on the same page.

0-3. Check Credits

gcube charges based on GPU usage time. Check your credit balance on the dashboard before use.

Billing Notice

Workloads are billed hourly from the time of deployment until stopped. Always stop the workload after use. Refer to Workload - Stop Workload for instructions.


Step 1 — Register Workload on gcube

1-1. Access Workload Page

Go to https://gcube.ai/ko/demand/workload/list.

① Register a new workload or ② select an existing one to modify.

New workload registration screen


1-2. Enter Description

Enter the workload name.

Example: ollama

Workload description input screen


1-3. Container Settings

Enter the following in order.

Item Value
Registry Type Docker Hub
Container Image ollama/ollama:latest
Container Port 11434

Tip

After entering the container image, click Validate Image next to it. Once validation is complete, the container port (11434) will be filled in automatically.

Official image reference: https://hub.docker.com/r/ollama/ollama

Container settings screen


1-4. Environment Variable Settings

Enter the following two environment variables. Both are required.

Key Value Description
OLLAMA_HOST 0.0.0.0 Allow access from all network interfaces (required for external WebUI connection)
OLLAMA_ORIGINS * Disable CORS restrictions to allow API calls from external clients such as Chatbox

Options settings screen


1-5. Select GPU Specs

Select the GPU specs that match your use case.

Tier Description
Tier 1 High performance
Tier 2 High reliability
Tier 3 Individual users

Model and GPU Selection Guide

The required GPU memory varies based on model size (number of parameters).

Model Tag Parameters Recommended GPU Memory
deepseek-r1:1.5b 1.5B 4GB or more
deepseek-r1:7b 7B 8GB or more
deepseek-r1:8b 8B 8GB or more
deepseek-r1:14b 14B 16GB or more

This guide uses the deepseek-r1:8b and Tier 3 — RTX 4070 (12GB) combination for a good balance of performance and speed.

GPU specs selection screen


1-6. Final Confirmation and Deploy

Check the estimated hourly cost for the selected specs.

Billing Information

The amount shown is the maximum hourly rate. You are billed proportionally to actual usage time, so always stop the workload after testing.

Select 'Deploy Immediately' and click 'Register' to complete deployment.

Estimated cost confirmation screen


Step 2 — Run DeepSeek Model

2-1. Check Created Workload

Click the workload name on the workload page to enter the detail screen.

Workload info screen

Key items available on the detail screen:

  • Overview: Workload number, status, service URL, etc.
  • Container: Image, port, creation/deploy/termination timestamps, etc.
  • GPU Specs: GPU information, etc.
  • Deployment Status: Pod status, container logs, terminal, etc.

Workload detail screen


2-2. Launch Container Terminal

When the pod status shows 'Running', click Container Terminal to launch it.

Tip

Immediately after deployment, it may take a few minutes for the pod to be ready. Wait until the status shows 'Running' before proceeding.

Container terminal launch screen


2-3. Download and Run DeepSeek Model

Enter the following command in the terminal. The model is approximately 4.7GB and may take a few minutes to download.

ollama run deepseek-r1:8b

Installation preparation screen

Once the download is complete, the model will start automatically.


Step 3 — Connect DeepSeek via Chatbox WebUI

Chatbox is a WebUI you can use directly in the browser without installation. Enter the gcube workload's service URL as the API host to connect to the running DeepSeek model.

Go to https://web.chatboxai.app/copilots.


3-1. Configure gcube API

When you first access Chatbox, the following screen appears. Click the "Settings" button at the bottom of the left menu bar.

Chatbox start popup


3-2. Select Ollama API

Click "Model Provider" in the settings menu. Then select "Ollama" from the list of AI models.

Chatbox model provider screen


3-3. Enter Service URL and Model

Copy the "Service URL" from the Overview tab of the gcube workload detail screen.

Tip

The service URL is located in the Overview section at the top of the workload detail screen. It takes the form https://xxxxxxxx.gcube.ai.

Paste the copied Service URL into the API Host field. Click "Import" next to the model field — a popup will appear showing the downloaded model. Click the "+" next to the model you want to add, then close the popup to add it.

Service URL and model input screen


3-4. Start Using

Click "ESC" to return to the main screen.

Chatbox main screen

Click "New Chat" from the left menu.

New chat screen

Click ① "Select Model" at the bottom of the chat input area, then click ② the downloaded model.

Model selection screen

You can now use it freely.

Chat screen


Usage Example

Q: Who does Dokdo belong to?

Usage example screen


Step 4 — Stop and Delete Workload

Important

If you do not stop the workload, charges will continue to accrue even when not in use.

4-1. Stop Workload

Click the "Stop Deployment" button for the running workload in workload management. When the status changes to 'Deployment Stopped', billing stops.

Tip

After stopping, you may need to re-download the model when restarting. If you use it frequently, consider keeping it running and deleting it when done instead of stopping.

Stop workload screen

4-2. Delete Workload

If you no longer need the workload, delete it from the workload list. Deleting removes all data inside the container (including downloaded models).


Troubleshooting (FAQ)

Q. The pod status is not changing to 'Running'.

Some preparation time is needed immediately after deployment. Try refreshing the page after a few minutes. If still not resolved, check the container logs in the Deployment Status tab.

Q. Model download is too slow.

The DeepSeek-r1:8b model is approximately 4.7GB. It may take time depending on network conditions. Do not close the terminal and wait until the download is complete.

Q. The model is not showing in Chatbox.

Check the following in order:

  1. Confirm the workload pod status is 'Running'
  2. Verify the service URL entered in the API Host is correct (check if https:// is included)
  3. Confirm the model name is entered exactly as deepseek-r1:8b

Q. Do I need to reinstall the model after stopping and restarting the workload?

If you Stop the container and restart it, existing data may not be preserved. If you Delete it, you must re-download the model.