Vlad Iliescu

How I've Used Whisper to Transcribe, GPT-4 to Summarize, DALL*E to Illustrate, and Text-to-speech to Narrate OpenAI's DevDay Keynote

I heard you like OpenAI, so I used OpenAI’s Whisper to transcribe the OpenAI DevDay Keynote, OpenAI GPT-4 Turbo to summarize the transcript, come up with ideas that illustrate the main points and generate DALL-E prompts for said ideas, OpenAI DALL·E 3 to generate the images, and OpenAI Text to Speech to narrate the summary. Xzibit would be like, so proud.

Orca-2 and How to Run It on Apple Silicon with llama.cpp

About Orca-2 The fine folk at Microsoft Research have recently published Orca 2, a new small large language model and apparently, it’s quite good! Just look at the test results below – on average, both the 7B and the 13B variants are significantly better than Llama-2-Chat-70B, with Orca-2-13B superseding even WizardLM-70B. Pretty cool! 🚀 I also love the idea behind it: prompting a big large language model (in our case GPT-4) to answer some rather convoluted logic questions while aided by some very specific system prompts, and then fine-tune a smaller model (Llama-2-7B and 13B respectively) on just the question and answer pairs, leaving out the detailed system prompts. ...

Resources for Building an Internet-Connected Search Assistant from Scratch (Poor Man’s BingChat)

These are the slides and notebook I’ve used during my talk on how to build an Internet-connected search assistant almost from scratch. AKA Poor Man’s BingChat. First time I talked about it was at Codecamp Iasi, where it’s gotten a lot of positive feedback, plus it was awesome to share the stage with established speakers (and personal heroes of mine) like Mark Richards, Venkat Subramaniam, Eoin Woods, and Dylan Beattie. Yes, you can see them in the hero picture 😱. ...

Running Llama2 on Apple silicon with llama.cpp

Recently, I was curious to see how easy it would be to run run Llama2 on my MacBook Pro M2, given the impressive amount of memory it makes available to both CPU and GPU. This led me to the excellent llama.cpp, a project focused on running simplified versions of the Llama models on both CPU and GPU. The process felt quite straightforward except for some instability in the llama.cpp repo just as I decided to try it out, and which has been fixed in the meantime. Incidentally, this prompted me to document the whole process, just in case I want to do it again in the future. ...

The One Where Bing Becomes Chandler: A Study on Prompt Injection in Bing Chat

An experiment with prompt injecting Bing Chat – successfully changing its persona, exploring data extraction potential, limitations, and future implications.

3 Tips for Working with Azure ML Compute Instances

My top 3 tips for working better, faster, and just a bit stronger with Azure ML Compute Instances

Azure ML Managed Online Endpoints - Quickstart

A quickstart guide to deploying machine learning models in production using Azure Machine Learning’s managed online endpoints

How to run Stable Diffusion Web UI on Azure ML Compute Instances

A guide to creating GPU compute instances on Azure ML, installing Stable Diffusion, and running AUTOMATIC1111’s Web UI.

Continuous Deployment for Azure ML Pipelines with Azure DevOps

Because life’s too short to deploy things manually

GitHub Copilot: First Impressions

A glimpse of the upcoming paradigm shift in how we do development