GitHub and OpenAI Launch a New AI Tool
GitHub and OpenAI have launched a technical preview of a new AI tool called Copilot. The tool is available inside the Visual Studio Code editor and autocompletes code snippets. According to GitHub, Copilot does more than just parrot back code it’s seen before. It instead analyzes the code you’ve already written and generates new matching code, including specific functions that were previously called.
Copilot is built on a new algorithm called OpenAI Codex, which OpenAI CTO Greg Brockman describes as a descendant of GPT-3. GPT-3 is OpenAI’s flagship language-generating algorithm, which can generate text sometimes indistinguishable to human writing. It’s able to write so convincingly because of its sheer size of 175 billion parameters, or adjustable knobs that allow the algorithm to connect relationships between letters, words, phrases, and sentences.
This project is the first major result of Microsoft’s $1 billion investment into OpenAI, the research firm now led by Y Combinator president Sam Altman. Since Altman took the reins, OpenAI has pivoted from a nonprofit status to a “capped-profit” model, took on the Microsoft investment, and started licensing its GPT-3 text-generation algorithm.
While GPT-3 generates English, OpenAI Codex generates code. OpenAI plans to release a version of Codex through its API later this summer so developers can built their own apps with the tech, a representative for OpenAI told The Verge in an email. Codex was trained on terabytes of openly available code pulled from GitHub, as well as English language examples.
While testimonials on the site rave about the productivity gains Copilot provides, GitHub implies that not all the code utilized was vetted for bugs, insecure practices, or personal data. The company writes they have put a few filters in place to prevent Copilot from generating offensive language, but it might not be perfect.
Copilot’s website says, ““Due to the pre-release nature of the underlying technology, GitHub Copilot may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs.” Given criticisms of GPT-3’s bias and abusive language patterns, it seems that OpenAI hasn’t found a way to prevent algorithms from inheriting its training data’s worst elements.
The company also warns that the model could suggest email addresses, API keys, or phone numbers, but that this is rare and the data has been found to be synthetic or pseudo-randomly generated by the algorithm. However, the code generated by Copilot is largely original. A test performed by GitHub found that only 0.1 percent of generated code could be found verbatim in the training set.