How to use OpenAI's ChatGPT to clean up data

Learn how to use OpenAI and GPT-3 inside the spreadsheet to clean up company data, addresses, capitalize text and more.

How to use OpenAI in Rows

OpenAI template

There's a near-infinite amount of tasks you can solve using OpenAI. Follow this guide on how to set up the integration and use this template showcase to get started with 10 pre-built examples, follow along the list, or watch our video tutorial.

Connecting the OpenAI integration in Rows

You can find the OpenAI integration by browsing the integrations gallery and searching for "OpenAI".

OpenAI integration in gallery

To connect the integration and use the power of AI inside Rows all you need is an API Key. You can get your API key by going to the View API Keys option on your OpenAI account. If you don't have an account yet, sign-up here. All free accounts have API access.

API Key panel in OpenAI

Now simply copy the API key, go to the OpenAI integration page, press Connect, paste it and click Connect. Your Rows workspace is now connected to your OpenAI account and you're ready to go.

OpenAI integration page

Using the OpenAI functions

The OpenAI integration comes with five proprietary functions that automate prompts to address specific types of tasks:

  • ASK_OPENAI(), which aims at leveraging the power of GPT to solve general tasks.
  • CREATE_LIST_OPENAI(), which is designed specifically to create tables and list of dummy data, for testing purposes.
  • CLASSIFY_OPENAI(), which is designed specifically to classify texts into a given set of tags.
  • TRANSLATE_OPENAI(), which translates texts from/into a wide range of languages.
  • APPLY_TASK_OPENAI(), which is designed specifically to clean up or apply logic rules to data.

You can use them via the Autocomplete in the editor,

ASK_OPENAI on the editor

or via the Actions wizard:

Screenshot 2023-04-19 at 16.58.52

All OpenAI functions need to be configured through mandatory and optional parameters, depending on their purpose. Let's go through them one by one.

Prompt

The prompt is the instruction to give to the model in our most generic function ASK_OPENAI(). This is where you'll enter the "ask" you want the AI to answer. You can use the prompt to solve a task by explicitly writing it in prose. Example:

1=ASK_OPENAI("Generate 100-word paragraph about the latest iPhone release")

Ask anything to OpenAI

Tips for creating Prompts

The Open AI integration uses its Completions capability, which means that the artificial intelligence model will predict the next word(s) that follow the prompt. With that in mind, here are a few tips on how to construct the right prompt for your task:

  • Be specific: The more specific the prompt, the most likely it is to get the intended result. If you're looking for the Population of the country in millions, "The Population of France, in millions is: " is a better prompt than simply "The Population of France".
  • Give examples: You can train the model on the type of answer you're looking for. If you are using Open AI for text classification, use the prompt to give a couple of examples of inputs and expected outputs. For example: "Categorize job title by function name. Head of Marketing:Marketing, COO:C-Level, CMO: "
  • Phrase the end of the prompt as the start of the answer: The model will answer with a direct continuation to the prompt. Use that insight to end the prompt with the structure you expect from the answer. If you want to use the OpenAI integration to summarize text, be clear on how to start. Example: "What are 2 main takeaways from this review: ",A2(cell reference with the product review)," ? Summarize it into 2 bullet points. Main takeaways: ")

Temperature (optional)

The temperature is common to all functions and is used to fine tune the sampling temperature, varying between 0 and 1. Use 1 for creative applications, and 0 for well-defined straight answers.

If you're doing tasks that require a factual answer (e.g. country populations, capitalize text), then 0 (the default) is a better fit. If you're using the AI for tasks where there aren't definite answers - such as generating text, summarizing text, or translating - then experiment with a higher temperature, which allows the engine to better capture text nuances and idiomatic expressions.

Max_tokens (optional)

This max_tokens represents the maximum number of tokens to generate in the completion. It's present in all OpenAI functions. You can think of tokens as pieces of words. Here are a few helpful rules of thumb examples from the OpenAI Help center:

  • 1 token ~= 4 chars in English
  • 1 token ~= 3/4 words
  • 100 tokens ~= 75 words
  • 1-2 sentences ~= 30 tokens
  • 1 paragraph ~= 100 tokens
  • 1,500 words ~= 2048 tokens

You can use any number starting with 0. The default value is 200. Most models have a context length of 2048 tokens, except for the newest models which support a maximum of 4096. For tasks that require more text output - text generation/summarization/translation - pick a higher value (e.g. 250).

Model (optional)

The AI model to use to generate the answer. It can be chosen in both functions, and by default, it uses "gpt-3.5-turbo". Below you find a list of all of the available GPT-3.5 models:

LATEST MODELDESCRIPTIONMAX TOKENSTRAINING DATA
gpt-3.5-turboMost capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.4,096 tokensUp to Sep 2021
gpt-3.5-turbo-0301Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates, and will only be supported for a three month period ending on June 1st 2023.4,096 tokensUp to Sep 2021
text-davinci-003Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text.4,097 tokensUp to Jun 2021
text-davinci-002Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning4,097 tokensUp to Jun 2021

Number of items (optional)

The number of items is available only in the CREATE_LIST_OPENAI() function, and represents the expected number of items in the list.

1=CREATE_LIST_OPENAI("Full names and email address",5,,500)

Tags and multi-tag (optional)

The tags and multi-tag properties are available only in the CLASSIFY_OPENAI() function. The first is mandatory and represents the categories you want your text to be classified into.

For example, if you need to classify a list of product reviews in column A, into positive, neutral, negative and very negative, you just need to input those tags separated by a coma, as follows:

1=CLASSIFY_OPENAI(A2, "positive, neutral, negative, very negative")

The second is optional and can be "true" (default) or "false". If true, the function can use more than one tag to classify your text. If false, it will only use one tag. Use false when you need a mutually exclusive strict categorization.

Language

The language is available only in the TRANSLATE_OPENAI() function, and indicates the destination language for your translation tasks. Use the function as follows:

1=TRANSLATE_OPENAI(B1,"hebrew")

Task and text

The task and text are available only in the APPLY_TASK_OPENAI() function, and are used to specify the logic rule to some text.

For example, if you need to capitalize a string of text, use the function as follows:

1=APPLY_TASK_OPENAI("Capitalize all letters", "i HavE a doG")

Clean up Company names

Goal: Clean up a list of company names and remove legal abbreviations and filler text.

Example:

1=ASK_OPENAI(CONCATENATE("Remove legal entity abbreviations like GmbH, LLC, Inc., emojis, special characters and unnecessary text from ",A2,". Company name: "))

Details: Assumes that A2 contains the company name.

Cleanup Companies GIF

Clean up Addresses

Goal: Extract Zip Code, State, and Country Code from an address.

Examples:

Zip Code:

1=ASK_OPENAI(CONCATENATE("The Zip code of ",A2," is: "))

State:

1=ASK_OPENAI(CONCATENATE("The State of ",A2," is: "))

Country Code:

1=ASK_OPENAI(CONCATENATE("The Country Code of ",A2," is: "))

Details: All examples assume that A2 contains the company name.

Cleanup addresses GIF

Capitalize words

Goal: Correctly fix capitaliztion in a list of words.

Example:

1 =ASK_OPENAI(CONCATENATE("Capitalize the words in the following text: ",A2))

Details: All examples assume that A2 contains the company name.

Capitalize

Classify email providers

Goal: Clean up a list of emails by classifying the email providers and personal or company addresses.

Example:

1 =ASK_OPENAI(CONCATENATE("Classify this email provider address as either 'personal' or 'company'. Don't return anything else. Email: ",A2))

Details: All examples assume that A2 contains the email address.

Email providers

💡 Be specific in the prompt to reduce variability in the AI response. In this example by adding "Don't return anything else" to the prompt it guarantees that the response only contains the word personal or company.