All this talk I’m seeing about AI being close to replacing programmers indicates there’s a significant gap between what people think programming is like and what programming is actually like. I get the sense that most people who don’t work in tech think that programming is like sitting down in front of a computer, saying to yourself, “alrighty, let’s make an app,” and expertly busting out code until you have a fresh app. It’s more like getting onboarded into an organization that has hundreds of thousands of lines of archaic, institutional code, and being tasked with finding and fixing the 1-10 lines that happen to be somehow causing the most urgent bug, and then doing this over and over.
There’s a reason we tend to call it “software development” or “software engineering” instead of “programming.” The actual “programming” (the code composition, the fingers-on-the-keys) is a very small part of the job. Most of the job is code maintenance (which is doing things like fixing bugs) and technology integration (which is doing things like connecting a UI framework to an API that provides data). Yes, there is composition of novel functionality involved, it’s just a comparatively small and easy part of the work. Most software work—and the hardest software work—is maintaining big existing institutional software. And even when you are creating new things, the state of the software universe is such that almost all of the work has already been done for you, and all you have to do is the technology integration: people have already created all the libraries and frameworks and resources that implement basically any fundamental thing you could ever need to do, and there’s no reason to re-implement it, so the code that you write is basically connecting together all of those libraries and frameworks and resources.
I spent several hours last weekend doing exactly this. I had an idea for an app, I knew what API provided the data I needed, and I knew what frontend framework I wanted to use (real-world software expertise has a lot more to do with knowledge of different frameworks and APIs and their use cases than it does with writing slick code), so I set up the project and started working on it. The work was about 80% reading the API’s documentation, 18% configuring my API keys and downloading the example project and things like that, and 2% writing code to hook everything up.
From over here in the Real World of programming, the question of whether AI1 will replace or even just meaningfully displace programmers does not cause concern. Language model AIs like ChatGPT might be close to pretty good automatic code composition, but it just doesn’t matter.
In the same way that ChatGPT can write banal prose, it can write banal code—what we call “boilerplate” (brainless code that is just sort of necessary to get things to work; boilerplate is often abstracted away into a package so that instead of having to write it you only have to call a function)—that almost works.
You know what else can produce boilerplate that almost works? Stack Overflow, the programming Q&A site universally loved by programmers all over the world. ChatGPT is like a glorified Stack Overflow, and it’s really not even that good, because it’s not so good at identifying issues in code, which Stack Overflow can do for you in minutes. But anyway maybe we’re really close to a glorified Stack Overflow. Maybe in just a few more years an AI will be able to respond to complex problem definitions, identify issues in code, and write code that works. But it still wouldn’t matter.
There’s a commandment you often hear when you’re getting started in programming: “Thou shalt not copy-paste code from Stack Overflow.” It’s a good rule, because copy-pasting code gets you in trouble—if you don’t understand what’s in there, then when something breaks, you won’t be able to fix it. And something will break.
In real-life programming, there’s this enormous emphasis on readability and comprehension. It’s super important that we all write code in such a way that we’re able to understand what it’s doing. Because eventually, someone is going to be that person being tasked with fixing a bug caused by your code (it is even likely to be you), and if they can’t figure out what’s going on with that code, they won’t be able to fix it. Again, real-life programming is much more about code maintenance—there is a lot more work to do on existing software than there are fresh new apps to make—and code maintenance is mostly about reading code. It is said that programming is 80% reading code and 20% writing code, but it would be more accurate to say that programming is 80% reading code, 15% editing code, 5% writing code (and remember that the whole 100% of programming is still only a small part of the software development job).
If you started using ChatGPT or any other AI to write your code, you would be committing the same sin as someone copy-pasting code from Stack Overflow. Because when it comes time to do maintenance, someone will ask: “What’s this code doing?” or “Why did you do it this way?” or “How can we make it do this instead of this?” or “How can we add on another module right here?” and you will only be able to answer: “I don’t know, the AI wrote it.”
But will a future language model be something more than a glorified Stack Overflow? It’s only a matter of time before AIs are able to compose not just boilerplate but really novel code, you say. I say, code-writing AIs might become more like interns at best. Deep learning AIs will never really be able to move beyond boilerplate, because deep learning models are only capable of picking up on recurring patterns and relationships (even if they are very complex patterns and relationships), and the patterns in the code are the boilerplate. The rest of the code is specific to each project. Maybe, someday, there will be an AI sophisticated enough to write things like unit tests—little functions that you write to automatically verify that your real functions produce the expected results. They’re not quite boilerplate, because you do have to understand what the function you’re testing is supposed to do, but they are pretty brainless. I can imagine it’s possible that a deep learning model would be able to analyze a function definition and produce a unit test. But:
The big salaries fund the software maintenance and the tech integration, the work that requires intimacy with the org’s codebase, knowledge of frameworks and resources, and raw Experience with software. It’s the interns that typically work on the code composition tasks (the comparatively easy stuff), like writing unit tests, which is one of the few places where orgs still commonly need more composition. So yes, maybe some future language models will displace software engineering interns, or will write unit tests for us. Hurrah if so. Writing unit tests is incredibly boring.
Or here’s another angle on it: in real-life programming work, when that work does happen to be code composition, the boilerplate is not the bottleneck. If you’re an experienced programmer, these things are not hard, and they are not time consuming. Why would you even bother going to the OpenAI website and typing in a prompt to get the boilerplate (maybe only after some finagling of the prompt to get the output right) to copy into your code editor? Just write it out yourself real quick. This stuff is the easy stuff. The bottleneck is the documentation-reading, the data-interpretation, the code-comprehension. The bottleneck is always the problem at hand, it’s never the boilerplate, it’s never the patterns. The expertise of the programmer isn’t in knowing how to set up a Flask server, or anything else you could get ChatGPT to write, or even in being able to write basic functionality that ChatGPT can’t write—it’s in knowing how to deal with the specificities and intricacies of the problem you’re working on.
Nonetheless, ChatGPT’s code-writing capabilities are impressive. It’s a little shocking that a computer can write almost usable code, (Isn’t this supposed to be the thing that we’re worried about? A computer program writing its own code until it achieves super-human intelligence?) and future language models can only get better. We can at least imagine some future language model that accepts prompts in English and does amazing things—maybe it could analyze the codebase and identify bugs and write fixes that keep existing conventions, maybe it could somehow ingest messy real-world data and figure out how to write an interface on it, maybe it could have built-in knowledge of frameworks and resources, etc. Even if these things are extremely difficult and a long way off, we can at least imagine them, and there is no reason to believe they would be impossible.
Alas, it still doesn’t matter. Even this hypothetical super code-writing AI would not meaningfully displace programmers.
If you’re using a language model to write code, what you’re doing is using English as a programming language. It’s the exact same job, just a different representation. It’s exactly like using ChatGPT to compile English into Python code the way gcc compiles C++ into binary. So you’re choosing English as your programming language over Python, C++, and all the other programming languages. And English is a terrible programming language.
Because that’s another thing that people are confused about when it comes to programming, and this is something that even programmers don’t all recognize. The code isn’t for the computer—it’s for you. It’s for humans. If the coding language was meant for the computer, we would all be writing in pure binary instead of these abstracted and symbolic languages. The Python/C++/whatever-code isn’t some obstacle that we are trying to overcome. The code is the interface that we designed to be able to program the computer. It’s what we need. It’s objective, explicit, unambiguous, (relatively) static, internally consistent, and robust. English has none of these properties—it’s subjective, meaning is often implicit, and ambiguous, it’s always changing, contradictions appear, and its structure does not hold up to analysis.
Consider, for instance, the simple existence of the term “prompt engineering,” which describes the practice of iteratively fine-tuning the prompts you submit to ChatGPT (or whatever) to get your desired output. Prompt engineering is a matter of some identification of what the model responds to, and a lot of guesswork. It’s a matter of treating the natural language of the prompt as a formal language, manipulating symbols which lose their human meaning and intuitive structure like some sort of abstract association game. You’re working with the worst programming language imaginable.
There are properties of programming languages that make it difficult to write code, for instance, the demand that you declare what a variable’s type is, or that you adhere to a very rigid syntax, but it’s exactly these properties (which you are circumventing by using a language model) that make the programming language useful. The creators of C++ didn’t include these properties to be malicious or to make it harder to write code, they did it because it makes a programmer’s job easier. These properties make it possible to know what the program is going to do, or they make errors identifiable earlier on, or they prevent side effects. All of the exasperating demands and specificities of C++ or any other programming language are what make it a good programming language. It’s how we specify large, sophisticated, complicated software such that it’s unlikely to break, it will do exactly what we want it to, and we can come back to it, read it, and figure out what’s going on.
(Programmers reading this know that Python, which is over 30 years old, already constitutes a shift away from more arcane-looking programming languages like C++ into more human-looking text: and it comes with a price. When you work in C++, you’re much more likely to identify errors before you even start the program; with Python, errors are liable to appear at any time during a run, which can make it much harder to identify them and more time consuming to fix them. Some programmers hate Python for this reason. The demands of some projects and domains simply do not allow for a language like Python, which is no doubt easier to learn and write and read than C++, to be used.)
It might seem like it would be convenient to generate code from natural language prompts, or it might even seem like a really good thing—now everyone can program computers! just tell the computer what you want to do, in plain English!—but unfortunately, it’s not. Using a code-generating AI instead of a programming language would simply mean that your job is figuring out how to use natural language to specify software instead of a programming language, and that wouldn’t be an improvement. Trying to specify a piece of software in English would be a proper nightmare. I can say so with confidence because this is the first part of every software project—you do your best to describe it in natural language first so that everyone is on the same page and you have a good idea of what you’re going to do. Inevitably, the natural language specification falls short (and this is a serious understatement). There are all these considerations, technical details, compatibilities, versions, integrations, real-world data, and so many other things you have to worry about. This is the job of a programmer, not so much the programming.
Software is hard. Computers are difficult, finicky, alien things. Programming languages are our most promising source of power over them. I imagine a world where, instead of hiring programmers, managers simply tell AIs what they want in plain English, then pat themselves on the back for saving so much on payroll; now the manager is the programmer, and he’s writing code in English: and I laugh to myself heartily.
I’m talking about all AI short of superhuman artificial general intelligence (in which case, all bets are off), i.e. any feasible deep-learning model.