How to use OpenAI's GPT-3 to clean up data

Learn how to use Open AI and GPT-3 inside the spreadsheet to clean up company data, addresses, capitalize text and more.

About OpenAI and GPT

OpenAI developed GPT, a type of artificial intelligence model that is capable of generating human-like text. It uses a machine learning technique called pre-training to learn the statistical patterns of a large dataset and then fine-tunes its knowledge on a smaller dataset to perform a specific task. It is trained on a dataset with billions of elements, including a significant part of the internet webpages, books and more.

GPT can be used for a variety of language-based tasks, such as language translation, summarization, question answering, and text generation. It can generate coherent and coherently structured paragraphs, articles, and even entire books, given a prompt or topic. GPT has the ability to learn and adapt to new tasks quickly, making it a powerful tool for natural language processing.

NOTE: There are known limitations to these models and the output is not guaranteed to be factually accurate.

How to use OpenAI in Rows

OpenAI template

There's a near-infinite amount of tasks you can solve using OpenAI. Follow this guide on how to set up the integration and use this template showcase to get started with 10 pre-built examples, follow along the list, or watch our video tutorial.

Using the ASK_OPENAI function

The OpenAI integration contains one function - ASK_OPENAI - that you can use to perform hundreds of tasks

You can use the ASK_OPENAI function via the Autocomplete in the editor, or via the Actions wizard.

ASK_OPENAI on the editor

The function has 4 parameters: 1 mandatory and 3 optional, with the following signature: ASK_OPENAI(prompt,[temperature], [max_tokens],[model]) prompt. Let's go one by one.

Prompt

The prompt is the instruction to give to the model. This is where you'll enter the "ask" you want the AI to answer. You can use the prompt to solve a task by explicitly writing it in prose or you can use spreadsheet functions to concatenate values in different cells to construct the prompt. Examples:

  • Use =ASK_OPENAI("Population of France is, in millions, :") if you are not using any cell references inside the prompt.
  • Use =ASK_OPENAI(CONCATENATE("The population of ",A2," in millions, is: ")) if you're using A2 as an input (the country name) to the prompt.

Prompt creation with Concatenate

Temperature (optional)

The temperature is the sampling temperature to use, varying between 0 and 1. Use 1 for creative applications, and 0 for well-defined answers.

If you're doing tasks that require a factual answer (e.g. country populations, capitalize text), then 0 (the default) is a better fit. If you're using the AI for tasks where there aren't definite answers - such as generating text, summarizing text, or translating - then experiment with a higher temperature

Max_tokens (optional)

This max_tokens represents the maximum number of tokens to generate in the completion. You can think of tokens as pieces of words. Here are a few helpful rules of thumb examples from the Open AI Help center:

  • 1 token ~= 4 chars in English
  • 1 token ~= 3/4 words

You can use any number starting with 0. The default value is 200. Most models have a context length of 2048 tokens, except for the newest models which support a maximum of 4096. For tasks that require more text output - text generation/summarization/translation - pick a higher value (e.g. 250).

Model (optional)

The AI model to use to generate the answer. By default, it uses "text-davinci-003". You can see here a list of all of the available GPT-3 models.

Examples of ASK_OPENAI to clean up data

There are several ways to use OpenAI to clean lists of data:

  • Clean up Company names: Clean up a list of company names and remove legal abbreviations and filler text.
  • Clean up addresses: Extract Zip Code, State, and Country Code from an address.
  • Capitalize words: Correctly fix capitaliztion in a list of words (e.g. company names)
  • Classify email providers: Clean up a list of emails by classifying the email providers and personal or company addresses.

Clean up Company names

Goal: Clean up a list of company names and remove legal abbreviations and filler text.

Example:

1=ASK_OPENAI(CONCATENATE("Remove legal entity abbreviations like GmbH, LLC, Inc., emojis, special characters and unnecessary text from ",A2,". Company name: "))

Details: Assumes that A2 contains the company name.

Cleanup Companies GIF

Clean up Addresses

Goal: Extract Zip Code, State, and Country Code from an address.

Examples:

Zip Code:

1=ASK_OPENAI(CONCATENATE("The Zip code of ",A2," is: "))

State:

1=ASK_OPENAI(CONCATENATE("The State of ",A2," is: "))

Country Code:

1=ASK_OPENAI(CONCATENATE("The Country Code of ",A2," is: "))

Details: All examples assume that A2 contains the company name.

Cleanup addresses GIF

Capitalize words

Goal: Correctly fix capitaliztion in a list of words.

Example:

1 =ASK_OPENAI(CONCATENATE("Capitalize the words in the following text: ",A2))

Details: All examples assume that A2 contains the company name.

Capitalize

Classify email providers

Goal: Clean up a list of emails by classifying the email providers and personal or company addresses.

Example:

1 =ASK_OPENAI(CONCATENATE("Classify this email provider address as either 'personal' or 'company'. Don't return anything else. Email: ",A2))

Details: All examples assume that A2 contains the email address.

Email providers

💡 Be specific in the prompt to reduce variability in the AI response. In this example by adding "Don't return anything else" to the prompt it guarantees that the response only contains the word personal or company.