I’ve been thinking about the best ways to use a model router ever since Azure AI Foundry added one in preview coupled with last week’s GPT-5 announcement, and…I have some thoughts on this. It basically depends on what you’re trying to do, what’s clear is that this needs to be a conscious decision.

Cons

Here’s what worries me

  1. You’ll get wildly varying answer quality, depending on your inputs. As Ethan Mollick noticed with the new ChatGPT, you might get an answer from one of the best AIs available, or from one of the worst AIs. And you won’t know and/or be able to change that after the fact.
  2. The router, by definition, needs to be lightweight to keep latency low (read: small, fast, less capable). This means that there’s a non-zero chance for it to misinterpret subtler nuances in your inputs and mess up the routing.
  3. Running an evaluation suite on your outputs is one level of magnitude harder, now that the outputs are generated by a non-deterministic model selection. You’ll probably need to add a new eval suite just for the router, and hope for the best.
  4. If your inputs pretty much look the same (i.e. structured data extraction from standard-ish documents as opposed to chatbots), it doesn’t make sense to continually test several models, which may or may not get picked by the router. Just pick the one with the best cost-to-quality ratio and use that.

Pros

That being said, I see a lot of value in using a router when

  1. The inputs vary wildly (like, say, with chatbots), and you’re not able to predict beforehand how difficult they are to answer properly
  2. It’s more important to maximize cost savings and/or minimize latency, even if this (may) hurt the quality of the outputs
  3. Speaking of quality, it’s especially useful when the quality of the answers doesn’t need to be top-notch. Think planning a birthday, as opposed to designing a nuclear power plant.

Conclusion

That being said, I definitely recommend everyone to try and implement a simple router just to see how well it works and if it’s worth it. The easiest, simplest, most basic way to do it imo is to use another LLM call (preferably the smallest LLM you have got available) to decide whether the request is “easy”, “medium”, or “hard” (or whatever, these are just samples). Then, route the request to some corresponding models and see if it works.

And, when you’re ready to try out something more “profesh”, remember that there’s an updated model router in Azure AI Foundry. Just putting this out there 🙂.