AI Infrastructure · Pricing Breakdown

Fly.io Pricing in 2026: Plans, Cost & Free Trial

Q: How does Fly.io charge for multi-region AI deployment?

Fly.io charges for the machines running in each region — no per-region infrastructure fee. Adding a second region means adding machine instances in that region, billed at the same per-second rate as your primary region. A performance-2x machine running in 3 regions (US-East, EU-London, Asia-Singapore) costs roughly 3x a single machine — about $92/month total. There are no load balancer fees, VPC charges, or regional networking overhead charges. Inbound data is free; outbound bandwidth from all regions combined counts toward your monthly allowance. This makes multi-region AI deployment significantly cheaper on Fly.io than equivalent AWS or GCP multi-region setups.

Q: Does Fly.io charge for deployments or builds?

Fly.io does not charge for deploys or builds. You pay only for running machine time, volume storage, and outbound bandwidth. When you deploy a new version of your AI application, Fly.io briefly runs both the old and new machines simultaneously during the cutover — you're charged for that overlap, but it typically lasts under 60 seconds and costs fractions of a cent. Docker image builds happen on Fly.io's build infrastructure at no charge. For AI applications with frequent deployments during active development, this zero-cost deployment model keeps iteration costs minimal.

Q: How much does Fly.io GPU inference cost compared to RunPod?

Fly.io GPU Machines and RunPod serve different use cases. RunPod's serverless GPU pricing charges per inference request with a per-second billing model — better for intermittent or batch GPU workloads where you don't want to pay for idle GPU time. Fly.io GPU Machines are always-on (you pay for the machine running, not per request) — better for real-time inference APIs that need consistent availability without cold starts. For always-available LLM inference where users make requests continuously, Fly.io's model provides consistent latency. For batch processing or infrequent inference where per-request billing is more economical, RunPod's serverless is typically cheaper. Contact Fly.io for current GPU Machine pricing.

Q: How much volume storage do I need for AI model weights on Fly.io?

Volume storage requirements depend entirely on your model. At $0.15/GB/month, storage is relatively inexpensive. Rough guidelines: a 7B parameter model with 4-bit quantization (Q4_K_M) requires approximately 4-5GB. A 13B model with Q4 quantization needs 7-8GB. A full-quality Llama 3 70B model requires 40+GB for 4-bit quantized weights. Text embedding models (all-MiniLM, nomic-embed) are much smaller at 100-500MB. For a service running a single 7B model, a 10GB volume ($1.50/month) provides adequate space with room for the model, any fine-tuning adapters, and storage for cached data.

Q: Can Fly.io machines be interrupted or preempted during AI inference?

Standard Fly.io machines are not preempted — they run continuously until you stop or destroy them. This is an important distinction from AWS spot instances or RunPod Community Cloud pods, which can be reclaimed with short notice during high demand. Fly.io's non-preemptible machines are more expensive than spot alternatives, but AI inference requests in progress are never interrupted by infrastructure events. For production AI APIs where mid-inference interruption would cause user-facing errors, Fly.io's consistent availability is worth the premium over preemptible compute options.

Every Fly.io plan, what's actually included at each tier, and whether the cost holds up against the alternatives.

AIPriceRadar Updated 2026-06-16 8 min read

Fly.io

All plans, costs, and what's included — clearly explained.

✓ Free Trial Free Plan Available

Fly.io Plans & Pricing

Fly.io uses usage-based pricing: you pay for machine compute time by the second, volume storage by the GB per month, and outbound bandwidth above the free allowance.

There are no fixed plan tiers — a free monthly allowance covers basic usage, and everything beyond that is metered.

For AI applications, the key cost drivers are machine size (RAM and CPU requirements determine which machine type fits your AI workload), volume storage for model weights, and regional machine count (more regions means more machines running).

No credit card is required for the free allowance.

Plan	Price	Best For
Hobby (Free)	Free	Individuals & light usage
Pay-as-you-go Most Popular	$5/mo	Most popular choice
Scale	Custom	Enterprise & custom needs

Is Fly.io Worth the Price?

Fly.io's pricing advantage for AI teams comes from two places: the global distribution model and the persistent VM architecture.

On global distribution: deploying an AI API to three regions on AWS means three EC2 instances, three Application Load Balancers, three sets of VPC networking, and three times the configuration and maintenance overhead. On Fly.

io, adding a second or third region adds only the compute cost for machines in those regions — no additional infrastructure fees, no per-region load balancers, no VPC configuration. Fly.io's anycast networking routes users to the nearest region automatically.

For a three-region AI inference deployment serving users globally, Fly.io's total cost is typically 40-60% less than an equivalent multi-region AWS setup when infrastructure overhead is included.

Fly.io has no named plan tiers — instead, a free monthly allowance plus per-unit pricing applies to all accounts. The free monthly allowance includes 3 shared-CPU VMs (256MB RAM each), 3GB of persistent volume storage, and 160GB of outbound data transfer.

No credit card is required. This allowance resets each month and is sufficient for deploying small AI applications, testing the deployment workflow, and evaluating Fly.io's architecture before committing to paid resources. Shared-CPU machines start around $1.

94/month for the smallest 256MB RAM size. For AI workloads requiring dedicated compute, performance machines provide guaranteed CPU capacity. A performance-1x machine (1 dedicated vCPU, 2GB RAM) runs approximately $15.49/month.

A performance-2x machine (2 dedicated vCPU, 4GB RAM) costs approximately $30.74/month. A performance-4x (4 vCPU, 8GB RAM) runs approximately $61.48/month.

These prices assume the machine runs continuously for the full month — machines stopped or destroyed mid-month are only charged for runtime.

→ Read our full Fly.io review · Compare Fly.io alternatives

Fly.io Free Trial: What's Included?

Fly.io's free monthly allowance functions as a permanent free tier — not a time-limited trial. Three shared-CPU VMs, 3GB of volume storage, and 160GB of outbound bandwidth are included every month at no cost, with no credit card required.

Explore Fly.io: Fly.io Coupons & Deals Fly.io Review Fly.io Alternatives Browse All AI Tool Deals

Frequently Asked Questions

Quick Answer

How much does Fly.io cost per month for an AI inference API?

A single-region Python AI inference API on Fly.io costs approximately $30-65/month depending on machine size. A performance-2x machine (2 dedicated vCPU, 4GB RAM) runs about $31/month and handles FastAPI services with LangChain or similar frameworks. Add a 10GB volume for model or data storage at $1.50/month. For a two-region deployment serving global users, double the machine costs. GPU Machines cost significantly more — contact Fly.io for current GPU pricing. Compare this to Render's Starter plan at $25/month (1 CPU, 2GB RAM), which is cheaper for low-RAM workloads but more expensive for 4GB+ RAM requirements.

Is Fly.io free to start with no credit card?

Yes — no credit card required for the free monthly allowance. Fly.io provides 3 shared-CPU VMs (256MB RAM each), 3GB of volume storage, and 160GB of outbound bandwidth every month at no cost. This is enough to deploy a test AI application, explore the CLI and deployment workflow, and evaluate Fly.io's architecture. For AI services requiring larger machines, GPU access, or more volume storage, a payment method is required. Creating an account and deploying within the free allowance takes under 15 minutes with the Fly CLI.

How does Fly.io charge for multi-region AI deployment?

Fly.io charges for the machines running in each region — no per-region infrastructure fee. Adding a second region means adding machine instances in that region, billed at the same per-second rate as your primary region. A performance-2x machine running in 3 regions (US-East, EU-London, Asia-Singapore) costs roughly 3x a single machine — about $92/month total. There are no load balancer fees, VPC charges, or regional networking overhead charges. Inbound data is free; outbound bandwidth from all regions combined counts toward your monthly allowance. This makes multi-region AI deployment significantly cheaper on Fly.io than equivalent AWS or GCP multi-region setups.

Does Fly.io charge for deployments or builds?

Fly.io does not charge for deploys or builds. You pay only for running machine time, volume storage, and outbound bandwidth. When you deploy a new version of your AI application, Fly.io briefly runs both the old and new machines simultaneously during the cutover — you're charged for that overlap, but it typically lasts under 60 seconds and costs fractions of a cent. Docker image builds happen on Fly.io's build infrastructure at no charge. For AI applications with frequent deployments during active development, this zero-cost deployment model keeps iteration costs minimal.

How much does Fly.io GPU inference cost compared to RunPod?

Fly.io GPU Machines and RunPod serve different use cases. RunPod's serverless GPU pricing charges per inference request with a per-second billing model — better for intermittent or batch GPU workloads where you don't want to pay for idle GPU time. Fly.io GPU Machines are always-on (you pay for the machine running, not per request) — better for real-time inference APIs that need consistent availability without cold starts. For always-available LLM inference where users make requests continuously, Fly.io's model provides consistent latency. For batch processing or infrequent inference where per-request billing is more economical, RunPod's serverless is typically cheaper. Contact Fly.io for current GPU Machine pricing.

How much volume storage do I need for AI model weights on Fly.io?

Volume storage requirements depend entirely on your model. At $0.15/GB/month, storage is relatively inexpensive. Rough guidelines: a 7B parameter model with 4-bit quantization (Q4_K_M) requires approximately 4-5GB. A 13B model with Q4 quantization needs 7-8GB. A full-quality Llama 3 70B model requires 40+GB for 4-bit quantized weights. Text embedding models (all-MiniLM, nomic-embed) are much smaller at 100-500MB. For a service running a single 7B model, a 10GB volume ($1.50/month) provides adequate space with room for the model, any fine-tuning adapters, and storage for cached data.

Can Fly.io machines be interrupted or preempted during AI inference?

Standard Fly.io machines are not preempted — they run continuously until you stop or destroy them. This is an important distinction from AWS spot instances or RunPod Community Cloud pods, which can be reclaimed with short notice during high demand. Fly.io's non-preemptible machines are more expensive than spot alternatives, but AI inference requests in progress are never interrupted by infrastructure events. For production AI APIs where mid-inference interruption would cause user-facing errors, Fly.io's consistent availability is worth the premium over preemptible compute options.

What machine size does Fly.io recommend for running LLama models?

Machine size depends on the model and quantization level. For 7B parameter models with 4-bit quantization, a performance-2x machine (4GB RAM) provides adequate memory with some headroom. For 13B models with Q4 quantization, plan for 8GB+ RAM (performance-4x or larger). For 70B models, you'll need GPU Machines — CPU inference at that scale is impractically slow for real-time use. Generally, allocate RAM equal to the model's quantized file size plus 1-2GB overhead for the inference server and OS. Monitor actual RAM usage after deployment and resize if you're approaching the ceiling.

Does Fly.io offer a startup program for AI companies?

Yes — Fly.io has a startup program providing compute credits to qualifying early-stage companies. AI startups from recognized accelerators (Y Combinator, Techstars, and others) typically qualify. Credits offset compute costs during early product development, reducing the runway impact of infrastructure spending. The application process is through Fly.io's website. In addition to financial credits, accepted startups often receive access to the Fly.io team for technical guidance on architecture and optimization. For pre-revenue AI startups where every dollar of runway matters, applying for Fly.io credits before committing to monthly compute spend is worth the effort.

How does Fly.io egress pricing work for streaming AI responses?

Outbound bandwidth beyond the 160GB monthly free allowance is billed per GB. For AI response streaming, each LLM response consumes bandwidth proportional to its length. A 1,000-token text response is approximately 4KB; a 4,000-token response is about 16KB. A moderately active AI chat API handling 50,000 requests per month at an average 2,000 tokens per response generates roughly 400MB of outbound data — well within the free bandwidth allowance. High-volume AI APIs handling millions of requests monthly with verbose responses (code generation, long-form content) should calculate expected bandwidth and include it in cost estimates. Current bandwidth rates are at fly.io/pricing.

Was this guide helpful?

Thanks for the signal. We'll keep this guide sharp.

Affiliate disclosure. AIPriceRadar may earn a commission when you click links and make a purchase. Our picks, ratings, and pricing breakdowns are independently verified. Affiliate relationships never influence which tools we recommend. Pricing data was current as of 2026-06-16; verify on the official site before paying.