GPU CloudSolutionsModel TrainingInferenceFine-TuningData PreparationPricingDocsAboutContact
Get StartedLog In

// TOKEN FACTORY

Token Factory

Serverless inference API for the best open-source models. Pay per token, scale instantly, no infrastructure to manage.

Start for free

Begin with $1 in free credits to explore our models through the Playground or API. Start building in minutes.

Playground

A web interface to try out and compare different AI models without writing any code. Test prompts, adjust parameters, see results instantly.

Two flavors

Choose between fast flavor for time-sensitive tasks and base flavor for economical processing of larger workloads.

Text to text

Prices shown are per 1 million tokens. Batch inference is automatically billed at 50% of the base real-time model price.

Model Flavor Input / 1M tokens Output / 1M tokens
DeepSeek-R1-0528 FAST $2.00 $6.00
BASE $0.80 $2.40
DeepSeek-V3-0324 FAST $0.75 $2.25
BASE $0.50 $1.50
Llama-3.3-70B-Instruct FAST $0.25 $0.75
BASE $0.13 $0.40
Llama-3.1-405B-Instruct BASE $1.00 $3.00
Llama-3.1-8B-Instruct FAST $0.03 $0.09
BASE $0.02 $0.06
Qwen3-235B-A22B BASE $0.20 $0.80
Qwen3-32B FAST $0.20 $0.60
BASE $0.10 $0.30
QwQ-32B FAST $0.50 $1.50
BASE $0.15 $0.45
Gemma-2-9b-it BASE $0.03 $0.09
Gemma-2-2b-it BASE $0.02 $0.06

Vision

Multimodal models that accept both text and image inputs. Prices per 1 million tokens.

Model Flavor Input / 1M tokens Output / 1M tokens
Qwen2.5-VL-72B-Instruct BASE $0.30 $0.90
Llama-3.2-11B-Vision BASE $0.05 $0.15

Embeddings

Convert text into high-dimensional vector representations for search, similarity, and retrieval.

Model Price / 1M tokens
BAAI/bge-en-icl $0.02
BAAI/bge-multilingual-gemma2 $0.02
intfloat/e5-mistral-7b-instruct $0.02

How it works

01

Get your API key

Sign up and receive an API key instantly. Start with $1 in free credits — no credit card required.

02

Call the API

OpenAI-compatible API. Switch your base URL and you're running on {{COMPANY_NAME}} infrastructure. Drop-in replacement.

03

Scale automatically

No capacity planning. We handle auto-scaling, load balancing, and failover. You just send requests.

OpenAI-compatible API

Switch your existing OpenAI code to {{COMPANY_NAME}} with a single line change. Our API is fully compatible — same request format, same response structure.

Supports streaming, function calling, JSON mode, and all standard chat completion parameters.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.company.com/v1",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="DeepSeek-R1-0528",
    messages=[{
        "role": "user",
        "content": "Explain transformers."
    }]
)

Questions and answers

We host the most popular open-source models including DeepSeek R1 & V3, Llama 3.3 & 3.1 (8B to 405B), Qwen3 (14B to 235B), QwQ-32B, Gemma 2, and embedding models. New models are added regularly.

Fast flavor uses more GPU resources per request for lower latency — ideal for real-time applications. Base flavor is optimized for throughput and cost — ideal for batch processing and async workloads.

Yes. Our API follows the OpenAI chat completions format. You can use any OpenAI SDK by changing the base URL and API key. Supports streaming, function calling, and JSON mode.

Batch inference is automatically billed at 50% of the base real-time model price. Submit requests in bulk and results are returned asynchronously — ideal for large-scale data processing.

Default rate limits are generous and scale with your usage. For enterprise workloads requiring higher limits, contact our sales team for custom arrangements.

Start building with Token Factory

$1 in free credits to get started. No credit card required.

Explore

All prices are shown without any applicable taxes, including VAT. Prices are per 1 million tokens unless otherwise noted.