GitHub Copilot is a tool that helps you write better, faster, and most importantly, more code.

I’ve been lucky enough to use it for the past few weeks and so far has proven quite useful, having earned a place in my toolbox despite its rough edges. I also feel it signals a coming change in how we develop and reason about systems, a change which will allow us to go up a few layers of abstraction in the coming decades.

But those decades are too far off into the future, let’s see what happens before them.

For starters, I’m increasingly convinced that in the near future, three to five years tops, we’ll all be writing a whole lot more comments, use a whole lot more descriptive names for everything, and write a whole lot less code.

But not just yet

We’ll also do code reviews. Lots and lots of code reviews. Like, all the time. The algorithm will have to be kept in check.

Let me tell you why I think that’ll happen.

The Good

GitHub Copilot has been described as ‘magical’, ‘god send’, ‘seriously incredible work’, et cetera. I agree, it’s a pretty impressive tool, something I see myself using daily. Especially once they add support for PyCharm. Heck, I’ve been using it daily while Cmd+Tabbing between PyCharm and VSCode, writing code in PyCharm whenever I wanted to think for myself and in VSCode whenever I wanted the algorithm to do it for me.

In my experience, Copilot excels at writing repetitive, tedious, boilerplate-y code. With minimal context, it can whip up a function that slices and dices a dataset, trains and evaluates several ml models, and, if you ask it nicely, also makes a nice batch of french fries. Not just that, it can look at an example and a list of items, and apply that example to each and every item in the list, the kind of stuff you’d record a quick macro to fix.

dict(zip(words, words_in_english))

The Bad

When it comes to more advanced stuff, Copilot’s usefulness is a bit more nuanced.

It’s ability to generate a large amount of code that may or may not do the right thing is not to be trifled with. At times it’s brilliant, at other times..less so. This is especially visible when writing important code, code you need to focus on and make sure you get right. Code reviews come into play here by the way, and they’ll become more important as tools like this gain traction.

GitHub Copilot can also suggest using obsolete versions of libraries, use syntactically incorrect or undefined code, and it will happily fill in hyperparameters for non-existent ml algorithms.

It was an honest mistake

I’ve found it helps to think of it as a preview version of Tesla’s Autopilot, where every 10 minutes or so it may or may not swerve into the opposite lane, so you need to pay attention at all times. Hands on the wheel, eyes on the road, close that tab running YouTube.

Long story short, while most of these issues will be fixed in time it looks like others might take their place. For the moment, you should limit its usage if you don’t know or don’t care what you’re doing. There be dragons.

The Research

I’ve found the paper on Codex, the GPT language model that powers GitHub Copilot to be quite insightful when trying to understand when to use and when not to use Copilot, its strengths and weaknesses.

Here are some of my favorite bits from that paper.

Potential

Codex has the potential to be useful in a range of ways. For example, it could help onboard users to new codebases, reduce context switching for experienced coders, enable non-programmers to write specifications and have Codex draft implementations, and aid in education and exploration.

Having a Copilot model transfer learn your company’s codebase and then suggest patterns and modules used throughout the company, now that would be a dream come true. Just think how much this’ll help standardize your patterns and practices. It will most likely not happen in the next decade, as the computing power needed to run & train a version of the model will remain prohibitive for a while, but I can definitely see this happening in the long run.

I’m also really excited about enabling non-programmers to write specs. Specifically, testers. Testers who cannot write the tiniest bit of code to test an API or an UI, but who can write a description of what they want to achieve. Most of the code they need should be simple enough that Copilot gets it right the first time, and it would massively increase their productivity.

That’s already possible to some extent, even the current preview version of Copilot.

🤭

Limitations

Due to the limitations described above as well as alignment issues described below, Codex may suggest solutions that superficially appear correct but do not actually perform the task the user intended. This could particularly affect novice programmers, and could have significant safety implications depending on the context. We discuss a related issue in Appendix G, namely that code generation models can suggest insecure code. For these reasons, human oversight and vigilance is required for safe use of code generation systems like Codex.

Code reviews, code reviews, code reviews. But even they might not help for long because:

One challenge researchers should consider is that as capabilities improve, it may become increasingly difficult to guard against “automation bias.”

So we’ll be hit by a double-whammy: the better GitHub Copilot and similar systems become, the less willing we’ll be to look for bugs in the generated code. And when we do look for bugs in the generated code, they’ll be really subtle and hard to identify.

I’m curious to see what safeguards we’ll build against these issues.

Incorrect Code

Applying this framework, we find that Codex can recommend syntactically incorrect or undefined code, and can invoke functions, variables, and attributes that are undefined or outside the scope of the codebase.

Yup.

Less is More

Moreover, Codex struggles to parse through increasingly long and higher-level or system-level specifications.(…) We find that as the number of chained building blocks in the docstring increases, model performance decreases exponentially.

That’s an interesting one, I had been under the impression that the more details I’d write in a docstring the better Copilot would perform. The exact opposite appears to be true.

Parting Words

I’m excited. Real excited. I think we’re fast approaching a paradigm shift in how we do development, taking us up one level of abstraction. I look forward to the day when a Copilot-powered compiler takes in my English description and compiles it to Python, or JavaScript, or C#, or all of them.

The future is now, might as well embrace it.

P.S.

No, no part of this article has been generated by Copilot, all the good and the bad are mine to own. God knows I’ve tried to use it for the intro though.

Guess so

By the way, if you’ve enjoyed this article you might want to read the others, too. I usually write a new one each month, focused mostly on Azure ML but with other stuff thrown in for good measure.

Just make sure to subscribe below and you’ll get them fresh from the oven.

Maybe you’d like to join the Hacker News conversation or show the Twitter thread some ❤️?