Update 2023-11-30: (Finally) updated pricing for OpenAI GPT 3.5 Turbo
Recently, Microsoft announced that Azure OpenAI would support fine-tuning everyone’s favorite OpenAI models, including instruct models such as GPT-3.5-Turbo but also base models like Davinci-002 and Babbage-002. All in public preview for now, but still.
To tell you the truth, I got excited. I had been waiting for Azure to support fine-tuning OpenAI models ever since OpenAI announced this for their own hosted models back in August, and was eager to try them out.
After looking at how much it costs to fine-tune and then use a fine-tuned model, I became just a bit less enthusiastic. It felt expensive.
Thing is: if you want to fine-tune a model, you will pay between 34$ and 102$ per compute hour, depending on the model. For who knows how many hours, as this will depend on your dataset. And this is just the training cost mind you, you will also need to pay between 1,7$-7$ per hour for running the fine-tuned models.
This means we’re looking at anywhere between $1,224 and $5,040 per month just to run the fine-tunes, without even looking at the training costs.1
Now, I understand why this is the case: with Azure, you get a certain degree of isolation for your training jobs, and benefit from added security and privacy. On the other hand, OpenAI’s fine-tuning seems to cost way less. How come? How do they do it?
OpenAI charges you by thousand tokens, not by the hour for this. Specifically, it costs between $0.0004 - $0.0080 per thousand tokens to fine-tune a model, and $0.0016-$0.0120 per thousand tokens to run the fine-tuned models.
Which, you know, apples and oranges.
So let’s find a way to compare the two.
|Hosting||Model||Training per hour||Hosting per hour||Training per 1k tokens||Input per 1k tokens||Output per 1k tokens|
You’ll notice that in general OpenAI’s “per 1k tokens” prices are between 4 to 6 times higher than on Azure. That’s significant. But is it significant enough to compensate for that pesky Hosting per hour column? Maybe!
Let’s find out.
Another thing you might be quick notice is that without discussing fine-tuning datasets or the GPUs we use for training, we can’t really know how to map training per compute hour to training per 1,000 tokens. How many tokens can you ingest in an hour? It really depends, so in order to keep things simple we’ll just ignore that for now.
What we can map though, is the running costs.
And, to simplify things just a little bit, let’s pretend to forget about output usage and only compare the input costs.
|Hosting||Models||Hosting per hour||Input per 1k tokens||Base cost per month|
There, that’s better. When calculating
Base cost per month, I’m assuming all months have 30 days and all days have 24 hours, which is naive, I know.
Now, let’s see how much it costs to input 1,000,000 tokens in a single month.
|Hosting||Model||Hosting per hour||Input per 1k tokens||Base cost per month||Input per 1,000k tokens||Input per 1,000k tokens + Hosting|
Going from $1,6 to $1.224 a month for a fine-tuned Babbage-002 is quite…daunting. Same for GPT-3.5 Turbo, I mean…how many Pumpkin Spice Double Mocha Latte Supreme 🎃✊ does one have to skip to afford this? I fear the answer might be too many, at least for scrappier apps who want to keep things nimble and run lots of small experiments. They’ll probably go and host their fine-tune their models on OpenAI without even thinking about it.
But what about more established enterprises?
Let me whip out my trusty Excel sheet and simulate something like 500,000,000 tokens.
|Hosting||Model||Hosting per hour||Input per 1k tokens||Base cost per month||Input per 500,000k tokens||Input per 500,000k tokens + Hosting|
Not brilliant, but OHMYGOD DIDYOUSEE Davinci-002? Running it in Azure is like twice as cheap as running it in OpenAI. The others used to be close as well, but then OpenAI decided to slash prices for 3.5 Turbo, and now here we are – it’s still more cost-efficient efficient to run them in OpenAI. Then again, if increased security and privacy are your thing, then it’s probably worth using Azure OpenAI even at this stage.
Let’s crank it up to 11.
|Hosting||Model||Hosting per hour||Input per 1k tokens||Base cost per month||Input per 1,100,000k tokens||Input per 1,100,000k tokens + Hosting|
1,1 billion tokens 6. That’s all it took to realize Azure OpenAI’s pricing is actually, quite reasonable. Except for Babbage which, let’s be honest, won’t see as much usage as the other two.
But the others? Well, before OpenAI slashed their prices for fine-tuning GPT 3.5 Turbo, Azure was the definite pick for this kind of volume. A fine-tuned 3.5 Turbo would have cost something like $13,200.00 per month to run. Now it’s only $3,300.00 per month, which is significantly less than Azure OpenAI’s $7,410.00.
So it looks like Azure OpenAI is the clear choice for Davinci-002, a somewhat clear choice for Babbage-002, and a very muddy if not unclear choice for GPT 3.5 Turbo. Yay!
If you’re small and scrappy, and don’t care a lot about all that security stuff that’s not fun at all to think about, then it’s probably best to stick with OpenAI’s offering instead of Azure. They’re cheap to get started, and will be cost effective for a while.
Specifically, until you get to a few hundred million tokens. That’s when, depending on the model you’re fine-tuning, you may want to think about choosing Azure OpenAI. In some cases (Davinci) it’s a no-brainer, while in other cases it depends.
If you’re inputting over 1 billion tokens a month then Azure OpenAI is pretty much the way to go. Unless you’re using GPT 3.5 Turbo, but even then it might be a good idea, for security and privacy and all that enterprise-y stuff.
That being said, don’t forget that I’ve conveniently left out of the comparison any training and output costs. While I expect the output costs to scale in pretty much the same way as the inputs, the training costs might make all the difference.
I would recommend running your own tests before making a decision, but you were probably going to do that already 😉.
To tell you the truth, I started this post convinced that “Microsoft bad, how dare they charge us that much”, and after running the numbers I pretty much ended with “huh, once you go over 1 billion tokens, it’s actually quite reasonable”. Math is wonderful.
After you deploy a customized model, if at any time the deployment remains inactive for greater than fifteen (15) days, the deployment is deleted. The deployment of a customized model is inactive if the model was deployed more than fifteen (15) days ago and no completions or chat completions calls were made to it during a continuous 15-day period.
The deletion of an inactive deployment doesn’t delete or affect the underlying customized model, and the customized model can be redeployed at any time.
If you’ve enjoyed this analysis, you might want to join my Substack below to receive emails with interesting stuff every two weeks or so.