GitHub Copilot: Will It Take Our Jerbs?

July 23, 2023

Normally, if I'm writing about robots, I'm writing about Kraftwerk. But it's 2023: The year that AI is goddamned EVERYWHERE. I'm pretty sure this stuff is the real deal. This ain't no blockchain'd NFT. I've been using Github Copilot daily for the past 3 months or so. I can't decide if it's a game changer (yet) or not. In this post, I will hold forth about what I've experienced using the tool thus far.

Background - What Is GitHub Copilot?

It's a natural language model trained on publically available code. Basically, GitHub and Microsoft have scraped up a shitload of public repos and used that data to create a tool that guesses what you're trying to do based on patterns it's noticed. Here are a couple of things from the official description that stand out to me.

Copilot analyzes the context in the file you are editing, as well as related files, and offers suggestions from within your text editor. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI.

The friction between open files, related files, and whatever else the model has previously ingested can be tricky. More on this later.

GitHub Copilot is trained on all languages that appear in public repositories. For each language, the quality of suggestions you receive may depend on the volume and diversity of training data for that language. For example, JavaScript is well-represented in public repositories and is one of GitHub Copilot's best supported languages. Languages with less representation in public repositories may produce fewer or less robust suggestions.

Representation issues are a problem here as well. I see this extending beyond the representation of individual languages. What other kinds of bias can be introduced here?

But I have digressed (duh). Basically, with Copilot, you pop open an editor and it tries to guess at what you're doing and fill in the blank. So how does it work in practice?

Stack Overflow, Watch Out

I feel like software development generally consists of three broad phases:

What am I going to do to solve a given problem?
How will I accomplish this? What tool / pattern / flow of instructions am I going to provide?
Implementation.

As of now, Copilot is really only helpful for step 3 of this flow (in my opinion). But in this application, it is already VERY helpful.

Think of a case where you need a regex. Say you want to test that an email is valid. We've all done this so many times, but who memorizes the regex? Copilot did.

const emailRegex = /[^@]+@[^\.]+\..+/;

In this example, I just declared a semantically reasonable variable, and Copilot did the rest. Then, with the next keystroke, it guessed that I might want a function to apply this regex. Copilot spits one out, complete with types...

const emailRegex = /[^@]+@[^\.]+\..+/;

function validateEmail(email: string): boolean {
    return emailRegex.test(email);
}

So it all looks right. Do we just take Copilot's word for it? Let's write a unit test for this stuff, with Copilot's help.

Tests, and Some Questions

I made a test file that is sitting next to the file exporting the above email function. By simply opening the file, Copilot provided the following suggestion:

describe("validateEmail", () => {
  it("should return true if email is valid", () => {
    const email = "
  })
})

I'm actually suprised at this - it's broken code and doesn't actually spec out a test. I would've expected a full suite of tests for something as simple and semantically named as this email example. But this kind of touches on the point I made in the intro - I feel like Copilot sometimes struggles between the immediate file being edited, files in the project as a whole, and the broader bucket of data it's inhaled. I would assume that the more a tool is used in a given project, the more it would become familiar with its various naming conventions, types, etc. I assume this is the case, but I haven't necessarily seen that in practice, and there's not any kind of transparency into how Copilot has generated a given guess.

Moreover, and this is maybe a bit paranoid, but if someone is using the same tool to both write the implementation of a given piece of logic and to TEST it, does that not leave plenty of room for the system to use its own bias to create false positives? Shit!

React

I'll just briefly note that Copilot is really handy with React and especially TSX. There's a shitload of React on Github, hence plenty of data for training the model. Copilot is great at implying the types of props passed through components, and it's really good at generating semantically correct markup, better than a lot of human-written stuff out there. I think tools like this could really help when it comes to the low hanging HTML fruit that devs often forget about. That stuff helps a lot with accessibility.

Weirdness

In experimenting for this post, I asked Copilot to write a function that adds a list of numbers together:

function addListOfNumbers(list: number[]): number {
    let sum = 0;
    for (let num of list) {
        sum += num;
    }
    return sum;
}

There's nothing wrong with this answer, but it jumped out at me. Using a reduce/proc function is generally what I've seen across languages to accomplish this. Yet the example here is what I would consider the more "junior" approach. It got me thinking - does that matter? The task is accomplished either way. Is the larger set of code that this model is trained on full of more "junior" code?

The pattern extended to other languages, too. When asked to add up a list, Copilot wrote out functions/methods that declared a variable outside of a loop and added to it, rather than reducing the input list down.

Will it take our jobs?

Sort of? It will certainly augment them. Before Copilot came along, software development (for most engineers) was already done through layers and layers of abstraction. No one is writing machine level instructions. We're all using Javascript, a high level language that is interpeted/compiled down to lower level languages, and so on. I picture AI tools like this one as a new layer of abstraction. Maybe that mental model is off, but for now, it works for me.

Earlier in this post, I described three rough phases of software development:

What am I going to do to solve a given problem?
How will I accomplish this? What tool / pattern / flow of instructions am I going to provide?
Implementation.

Copilot is great for part 3, but I don't see it helping with parts 1 and 2 for now. As is the case with any large language model, the AI has NO IDEA what it's doing, or why it's doing it. It is merely interpreting precedent and spitting it back based on probablility.

But down the road, who knows? There's reason to think that points 1 and 2 will be backed by so much data and precedent that hardly anything will be truly novel anymore...

From a rosier perspective, having robots spend time solving fairly trivial problems frees up minds to work on problems that haven't yet been solved, and NEED TO BE. Maybe the robots can help us move brainpower from dating and food delivery apps to system that can distribute solar power more efficiently. Or something.

Frontend Dad Blog.