Hi, I’m Vlad 👋

I’m a software architect, founder, and Microsoft MVP on AI. I write and speak about machine learning in general and large language models in particular. Follow me on Substack or plain old RSS.

Here are some of my highlights from the past decade:

  • – Co-founded NRGI.ai, a startup focused on forecasting energy prices and connecting small businesses with energy suppliers. As the technical co-founder, it was exciting to know my price forecasts were used by big names such as Electrica and Hidroelectrica.
  • – Partenered with some old friends and joined Strongbytes as Head of AI to try and create a kick-ass outsourcing company. Before that I had served as technical director for Maxcode, also focused on outsourcing.
  • – Co-founded NDR – an AI conference, the first of its kind in Iaşi – along with the friends at Codecamp. Handled the agenda, scouted for speakers and MC’d every edition.

You’ll find some of the apps I’ve built on GitHub and on the Chrome Web Store, while videos of some of my favorite talks can be found on YouTube.

To get in touch, just pick your favorite social platform below and drop me a line 👋.

Prompt Caching with Azure OpenAI

How Azure OpenAI’s prompt caching feature works, its benefits, caveats, and a quick experiment

January 12, 2025 · 9 min

Grabit, the Web Page Downloader

A web page downloader for humans and large language models alike

January 7, 2025 · 2 min

(Better) Dependency Injection in FastAPI

A bit of a rant on the state dependency injection in Python/FastAPI, and an implementation using the Injector and FastAPI-Injector libraries

December 15, 2024 · 5 min

Lessons Learned 2 - 8 December 2024

Interesting things I’ve learned in week 2 - 8 December 2024 (apart from the fact that democracy is fragile)

December 8, 2024 · 4 min

Fine-Tuning AI Models: Comparing the Costs of OpenAI vs Azure OpenAI

Understand the differences in pricing between Azure OpenAI and OpenAI for fine-tuning AI models, with a detailed analysis of token and hosting costs.

July 1, 2024 · 7 min

How I've Used Whisper to Transcribe, GPT-4 to Summarize, DALL*E to Illustrate, and Text-to-speech to Narrate OpenAI's DevDay Keynote

I heard you like OpenAI, so I used OpenAI’s Whisper to transcribe the OpenAI DevDay Keynote, OpenAI GPT-4 Turbo to summarize the transcript, come up with ideas that illustrate the main points and generate DALL-E prompts for said ideas, OpenAI DALL·E 3 to generate the images, and OpenAI Text to Speech to narrate the summary. Xzibit would be like, so proud.

February 10, 2024 · 14 min

Orca-2 and How to Run It on Apple Silicon with llama.cpp

About Orca-2 The fine folk at Microsoft Research have recently published Orca 2, a new small large language model and apparently, it’s quite good! Just look at the test results below – on average, both the 7B and the 13B variants are significantly better than Llama-2-Chat-70B, with Orca-2-13B superseding even WizardLM-70B. Pretty cool! 🚀 I also love the idea behind it: prompting a big large language model (in our case GPT-4) to answer some rather convoluted logic questions while aided by some very specific system prompts, and then fine-tune a smaller model (Llama-2-7B and 13B respectively) on just the question and answer pairs, leaving out the detailed system prompts....

December 5, 2023 · 5 min

Resources for Building an Internet-Connected Search Assistant from Scratch (Poor Man’s BingChat)

These are the slides and notebook I’ve used during my talk on how to build an Internet-connected search assistant almost from scratch. AKA Poor Man’s BingChat. First time I talked about it was at Codecamp Iasi, where it’s gotten a lot of positive feedback, plus it was awesome to share the stage with established speakers (and personal heroes of mine) like Mark Richards, Venkat Subramaniam, Eoin Woods, and Dylan Beattie. Yes, you can see them in the hero picture 😱....

November 27, 2023 · 14 min · Vlad Iliescu

Running Llama2 on Apple silicon with llama.cpp

Recently, I was curious to see how easy it would be to run run Llama2 on my MacBook Pro M2, given the impressive amount of memory it makes available to both CPU and GPU. This led me to the excellent llama.cpp, a project focused on running simplified versions of the Llama models on both CPU and GPU. The process felt quite straightforward except for some instability in the llama.cpp repo just as I decided to try it out, and which has been fixed in the meantime....

September 20, 2023 · 3 min

The One Where Bing Becomes Chandler: A Study on Prompt Injection in Bing Chat

An experiment with prompt injecting Bing Chat – successfully changing its persona, exploring data extraction potential, limitations, and future implications.

April 10, 2023 · 9 min

3 Tips for Working with Azure ML Compute Instances

My top 3 tips for working better, faster, and just a bit stronger with Azure ML Compute Instances

March 18, 2023 · 6 min

Azure ML Managed Online Endpoints - Quickstart

A quickstart guide to deploying machine learning models in production using Azure Machine Learning’s managed online endpoints

February 18, 2023 · 9 min

How to run Stable Diffusion Web UI on Azure ML Compute Instances

A guide to creating GPU compute instances on Azure ML, installing Stable Diffusion, and running AUTOMATIC1111’s Web UI.

January 29, 2023 · 12 min

Continuous Deployment for Azure ML Pipelines with Azure DevOps

Because life’s too short to deploy things manually

August 29, 2021 · 9 min

GitHub Copilot: First Impressions

A glimpse of the upcoming paradigm shift in how we do development

July 18, 2021 · 6 min

3 Ways to Pass Data Between Azure ML Pipeline Steps

The issue with machine learning pipelines is that they need to pass state from one step to another. When this works, it’s a beautiful thing to behold. When it doesn’t, well, it’s not pretty, and I think the clip below sums this up pretty well. made a Rube Goldberg machine pic.twitter.com/gWRNnmm5Ic — COLiN BURGESS (@Colinoscopy) April 30, 2020 Azure ML Pipelines are no stranger to this need for passing data between steps, so you have a variety of options at your disposal....

April 26, 2021 · 11 min

How I Got Caching Working with Netlify and Cloudflare, or How I Almost Ditched Cloudflare for No Good Reason

A story about love, loss, and caching

March 31, 2021 · 6 min

Reverse Engineering an Azure AutoML Forecasting Model

How to create a model based on an Azure AutoML-trained baseline, using standard open-source components where possible and adapting AutoML specific code where needed

March 24, 2021 · 12 min

Using Azure Automated ML to Predict Ethereum Prices (Crypto Prices with ML)

The first in a series of articles about building production machine learning systems in Azure, thinly veiled as an attempt to predict cryptocurrency prices

January 24, 2021 · 14 min

Deploying a Machine Learning Model with Azure ML Pipelines

Machine learning pipelines are a way to describe your machine learning process as a series of steps such as data extraction and preprocessing, but also training, deploying, and running models. In this article, I’ll show you how you can use Azure ML Pipelines to deploy an already trained model such as this one, and use it to generate batch predictions multiple times a day. But before we do that, let’s understand why pipelines are so important in machine learning....

December 30, 2020 · 15 min

Getting Started with Automated ML in Azure

A step by step introduction to Automated Machine Learning in Azure while gathering data, creating the necessary Azure resources, and automatically training a model

November 15, 2020 · 15 min

Migrating from Mercurial to Git (and from Bitbucket to GitHub)

A quick and dirty tutorial for migrating from Mercurial to everyone’s favorite distributed version control system, Git

March 8, 2020 · 2 min

[Talk] Getting Started with Machine Learning Using Azure Machine Learning Studio and Kaggle Competitions

Long title, I know 🤫. It used to be shorter, as some earlier versions of this talk were called ‘Predicting Survivability on the Titanic’, but this time I wanted to experiment a bit and make it real easy for the audience to decide whether or not this would be interesting for them. And so they did. You see, they wanted to learn more about machine learning. And, the way I see it, the two tools I talked about - Azure Machine Learning Studio and Kaggle Competitions - can help you get started with ML, while also making it fun to do so....

March 20, 2019 · 2 min · Vlad Iliescu

[Talk] Machine Learning in Azure: Service versus Studio

This is a more detailed version of my Boy meets Girl talk, created specially for Microsoft Ignite | The Tour Amsterdam 2019. Whereas Boy meets Girl was mostly focused on how to deploy a trained model using either Azure ML Service or ML Studio, here I wanted to create a more in-depth comparison of the two tools. This is what led me to the concept of having multiple rounds, with the audience voting for their favourite tool (truth be told, I think I just wanted another go at delivering something similar to my TypeScript versus CoffeeScript talk 🤓)....

March 20, 2019 · 3 min · Vlad Iliescu

[Talk] Boy meets Girl: A Machine Learning Deployment Story

This was a fun talk to write :). Ever since I saw Azure ML Service being announced, I knew I wanted to compare it with ML Studio, a tool with which I had a bit more experience. And so I did. Since 45 minutes is nowhere near enough to compare the two tools (lesson re-learned the hard way while designing Service versus Studio), I decided to only compare their deployment capabilities, given an already trained model....

February 23, 2019 · 2 min · Vlad Iliescu