Fine-Tuning AI Models: Comparing the Costs of OpenAI vs Azure OpenAI

Update 2024-07-01: Microsoft have updated their billing to bill based on the number of tokens in your training file, so the info in this guide doesn’t apply anymore. Read more about the changes here.

Update 2024-04-26: Updated with the new Azure pricing (good thing I kept that Excel file)

Update 2023-11-30: (Finally) updated pricing for OpenAI GPT 3.5 Turbo

About

Recently, Microsoft announced that Azure OpenAI would support fine-tuning everyone’s favorite OpenAI models, including instruct models such as GPT-3.5-Turbo but also base models like Davinci-002 and Babbage-002.

To tell you the truth, I got excited. I had been waiting for Azure to support fine-tuning OpenAI models ever since OpenAI announced this for their own hosted models back in August, and was eager to try them out.

But.

After looking at how much it costs to fine-tune and then use a fine-tuned model, I became just a bit less enthusiastic. It felt expensive.

Thing is: if you want to fine-tune a model, you will pay between $34 and $68 per compute hour, depending on the model. For who knows how many hours, as this will depend on your dataset. And this is just the training cost mind you, you will also need to pay between $1.7-$3 per hour for running the fine-tuned models.

This means we’re looking at anywhere between $1,224 to $2,160 per month just to run the fine-tunes, without even looking at the training costs.¹

Now, I understand why this is the case: with Azure, you get a certain degree of isolation for your training jobs, and benefit from added security and privacy. On the other hand, OpenAI’s fine-tuning seems to cost way less. How come? How do they do it?

Well.

OpenAI charges you by thousand tokens, not by the hour for this. Specifically, it costs between $0.0004 - $0.0080 per thousand tokens to fine-tune a model, and $0.0016-$0.0120 per thousand tokens to run the fine-tuned models.

Which, you know, apples and oranges.

So let’s find a way to compare the two.

The first step would be to aggregate everything we know, so here’s the aggregated pricing data for both Azure OpenAI and plain old OpenAI. For the record, I’m using Azure’s North Central US pricing (most of the others don’t have Babbage-002 and Davinci-002 available anymore).

Hosting	Model	Training per hour	Hosting per hour	Training per 1k tokens	Input per 1k tokens	Output per 1k tokens
Azure	Babbage-002	$34	$1.70		$0.0004	$0.0004
OpenAI	Babbage-002			$0.0004	$0.0016	$0.0016
Azure	Davinci-002	$40	$2		$0.0020	$0.0020
OpenAI	Davinci-002			$0.0060	$0.0120	$0.0120
Azure	GPT-3.5-Turbo-4k	$45	$3		$0.0005	$0.0015
Azure	GPT-3.5-Turbo-16k	$68	$3		$0.0005	$0.0015
OpenAI	GPT-3.5 Turbo			$0.0080	$0.0030	$0.0060

You’ll notice that in general OpenAI’s “per 1k tokens” prices are between 4 to 6 times higher than on Azure. That’s significant. But is it significant enough to compensate for that pesky Hosting per hour column? Maybe!

Let’s find out.

Another thing you might be quick notice is that without discussing fine-tuning datasets or the GPUs we use for training, we can’t really know how to map training per compute hour to training per 1,000 tokens. How many tokens can you ingest in an hour? It really depends, so in order to keep things simple we’ll just ignore that for now.

What we can map though, is the running costs.

And, to simplify things just a little bit, let’s pretend to forget about output usage and only compare the input costs.

Hosting	Models	Hosting per hour	Input per 1k tokens	Base cost per month
Azure	Babbage-002	$1.70	$0.0004	$1,224.00 ²
OpenAI	Babbage-002		$0.0016	$0.00
Azure	Davinci-002	$2.00	$0.0020	$1,440.00
OpenAI	Davinci-002		$0.0120	$0.00
Azure	GPT-3.5-Turbo (4k & 16k)	$3.00	$0.0005	$2,160.00
OpenAI	GPT-3.5 Turbo		$0.0030	$0.00

There, that’s better. When calculating Base cost per month, I’m assuming all months have 30 days and all days have 24 hours, which is naive, I know.

1,000,000 tokens

Now, let’s see how much it costs to input 1,000,000 tokens in a single month.

Hosting	Models	Hosting per hour	Input per 1k tokens	Base cost per month	Input per 1,000k tokens	Input per 1,000k tokens + Hosting
Azure	Babbage-002	$1.70	$0.0004	$1,224.00	$0.40 ³	$1.224,40
OpenAI	Babbage-002		$0.0016	$0.00	$1.60	$1,60
Azure	Davinci-002	$2.00	$0.0020	$1,440.00	$2.00	$1,442.00
OpenAI	Davinci-002		$0.0120	$0.00	$12.00	$12.00
Azure	GPT-3.5-Turbo (4k & 16k)	$3.00	$0.0005	$2,160.00	$0.50	$2,160.50
OpenAI	GPT-3.5 Turbo		$0.0030	$0.00	$3.00	$3,00

Ouch.

Going from $1,6 to $1.224 a month for a fine-tuned Babbage-002 is quite…daunting. Same for GPT-3.5 Turbo, I mean…how many Pumpkin Spice Double Mocha Latte Supreme 🎃✊ does one have to skip to afford this? I fear the answer might be too many, at least for scrappier apps who want to keep things nimble and run lots of small experiments. They’ll probably go and host their fine-tune their models on OpenAI without even thinking about it (unless they’re part of Microsoft for Startups I guess, then it’s anyone’s game).

But what about more established enterprises?

500,000,000 tokens

Let me whip out my trusty Excel sheet and simulate something like 500,000,000 tokens.

Hosting	Models	Hosting per hour	Input per 1k tokens	Base cost per month	Input per 500,000k tokens	Input per 500,000k tokens + Hosting
Azure	Babbage-002	$1.70	$0.0004	$1,224.00	$200.00 ⁴	$1,424.00
OpenAI	Babbage-002		$0.0016	$0.00	$800.00	$800.00
Azure	Davinci-002	$2.00	$0.0020	$1,440.00	$1,000.00	$2,440.00
OpenAI	Davinci-002		$0.0120	$0.00	$6,000.00	$6,000.00
Azure	GPT-3.5-Turbo (4k & 16k)	$3.00	$0.0005	$2,160.00	$250.00	$2,410.00
OpenAI	GPT-3.5 Turbo		$0.0030	$0.00	$1,500.00	$1,500.00

Not brilliant, but OHMYGOD DIDYOUSEE Davinci-002? Running it in Azure is like 2.5 times as cheap as running it in OpenAI. The others aren’t that far but still – it’s more cost-efficient to run them in OpenAI. Then again, if increased security and privacy are your thing, then it’s probably worth using Azure OpenAI even at this stage.

Let’s crank it up to 11.

1,100,000,000 tokens

Hosting	Models	Hosting per hour	Input per 1k tokens	Base cost per month	Input per 1,100,000k tokens	Input per 1,100,000k tokens + Hosting
Azure	Babbage-002	$1.70	$0.0004	$1,224.00	$440.00 ⁵	$1,664.00
OpenAI	Babbage-002		$0.0016	$0.00	$1,760.00	$1,760.00
Azure	Davinci-002	$2.00	$0.0020	$1,440.00	$2,200.00	$3,640.00
OpenAI	Davinci-002		$0.0120	$0.00	$13,200.00	$13,200.00
Azure	GPT-3.5-Turbo (4k & 16k)	$3.00	$0.0005	$2,160.00	$550.00	$2,710.00
OpenAI	GPT-3.5 Turbo		$0.0030	$0.00	$3,300.00	$3,300.00

1,1 billion tokens ⁶. That’s all it took to realize Azure OpenAI’s pricing is actually, quite reasonable. Even for Babbage (but let’s be honest, who here still finetunes Babbage?).

So it looks like Azure OpenAI is the clear choice for Davinci-002, and somewhat clear choice for Babbage-002 and GPT 3.5 Turbo, depending on your usecases.

Conclusion

If you’re small and scrappy, and don’t care a lot about all that security stuff that’s not fun at all to think about, then it’s probably best to stick with OpenAI’s offering instead of Azure. They’re cheap to get started, and will be cost effective for a while.

Specifically, until you get to a few hundred million tokens. That’s when, depending on the model you’re fine-tuning, you may want to think about choosing Azure OpenAI. In some cases (Davinci) it’s a no-brainer, while in other cases it depends.

If you’re inputting over 1 billion tokens a month then Azure OpenAI is pretty much the way to go.

That being said, don’t forget that I’ve conveniently left out of the comparison any training and output costs. While I expect the output costs to scale in pretty much the same way as the inputs, the training costs might make all the difference.

I would recommend running your own tests before making a decision, but you were probably going to do that already 😉.

P.S.

To tell you the truth, I started this post convinced that “Microsoft bad, how dare they charge us that much”, and after running the numbers I pretty much ended with “huh, once you go over 1 billion tokens, it’s actually quite reasonable”. Math is wonderful.

Also:

After you deploy a customized model, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment is deleted. The deployment of a customized model is inactive if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period.

The deletion of an inactive deployment doesn’t delete or affect the underlying customized model, and the customized model can be redeployed at any time.

Some resources:

If you’ve enjoyed this analysis, you might want to join my Substack below to receive emails with interesting stuff every two weeks or so.

All those news about Llama3 fine-tunes getting close to GPT-3.5 level begin to look more and more appealing, eh? ↩︎
$1,7 * 24 hours * 30 days ↩︎
$0.0004 * 1,000 ↩︎
$0.0004 * 1,000 * 500 ↩︎
$0.0004 * 1,000 * 1,100 ↩︎
From Wikipedia, and I quote: “Billion is a word for a large number”. ↩︎

About#

1,000,000 tokens#

500,000,000 tokens#

1,100,000,000 tokens#

Conclusion#

P.S.#