Training and Sharing Large Language Models

The Data Exchange Podcast: Connor Leahy on building models and datasets for the natural language research community.

Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.

This week’s guest is Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronnounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models. As NLP research becomes more computationally demanding and data intensive, there is a need for researchers to work together to develop tools and resources for the broader community. While relatively new, EleutherAI has already released a models and data that many researchers are benefitting from.

Take the 2021 NLP Industry Survey and get a free pass to the 2021 NLP Summit.

We discussed EleutherAI’s projects including:

Large language models: GPT-Neo and GPT-NeoX
Open datasets for language modelling: The Pile

We also discussed the state of natural language research, the rise of large language models, embeddings, and a new role that Connor and others have described as “prompt engineer” (see here and here, as well as attempts to automatically discover diverse prompts).

Connor Leahy:

So right now we’re going to be in a period where there are people I know personally, people that are just beautiful prompt engineers, in a way that they can just make these models do things that I can’t make them do. They can peruse much better text, much more beautiful prose or better answers or something. So there is that and I think “prompt engineers” will exist for some time. But at some point that too will be replaced by AI. At some point AI will be so smart that there will be no need for prompt engineering. When will that happen? Well I don’t know that it might be anytime soon, it might take a long time.

… In my day job, if you’re a maestro prompter, we would like to talk to you, we might hire you. Because we experiment with stuff like this. … we think that this will become a thing. Currently no one really is a “prompt engineer”. Of the few people I know that I would consider “master prompters” or “prompt engineers”, are more like hobbyists. .. I know a few people that have become really masterful prompters and are able to use these models to write stories and stuff. But I expect to also be actual “prompt engineers” in the future.

Download a complete transcript of this episode by filling out the form below:

Related content:

A video version of this conversation is available on our YouTube channel.
“Navigate the road to Responsible AI”
Jack Morris: “Increasing the robustness of natural language applications”
Marco Ribeiro: “Testing Natural Language Models”
Neil Thompson: “The Computational Limits of Deep Learning”
Rumman Chowdury: “Responsible AI meets Reality”
Ram Shankar: “Securing machine learning applications”

Subscribe to our Newsletter:
We also publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.

2021 NLP Survey

The 2021 NLP Industry Survey is now open and we need your help. The survey takes less than 5 minutes to fill out and in exchange we’ll send you a copy of the survey results + a FREE pass to the 2021 NLP Summit (a virtual conference slated for October).

Begin Survey

[Image: Books in Byeol-Madang Library, at the Starfield COEX Shopping Mall in Seoul from Wikimedia.]

The Data Exchange Podcast: Connor Leahy on building models and datasets for the natural language research community.

2021 NLP Survey

Share this:

Like this:

Discover more from The Data Exchange