How to use OpenAI's ChatGPT to clean up data
Learn how to use OpenAI and GPT-3 inside the spreadsheet to clean up company data, addresses, capitalize text and more.
You can find the OpenAI integration by browsing the integrations gallery and searching for "OpenAI".
To connect the integration and use the power of AI inside Rows all you need is an API Key. You can get your API key by going to the View API Keys option on your OpenAI account. If you don't have an account yet, sign-up here. All free accounts have API access.
Now simply copy the API key, go to the OpenAI integration page, press Connect, paste it and click Connect. Your Rows workspace is now connected to your OpenAI account and you're ready to go.
The OpenAI integration comes with five proprietary functions that automate prompts to address specific types of tasks:
- ASK_OPENAI(), which aims at leveraging the power of GPT to solve general tasks.
- CREATE_LIST_OPENAI(), which is designed specifically to create tables and list of dummy data, for testing purposes.
- CLASSIFY_OPENAI(), which is designed specifically to classify texts into a given set of tags.
- TRANSLATE_OPENAI(), which translates texts from/into a wide range of languages.
- APPLY_TASK_OPENAI(), which is designed specifically to clean up or apply logic rules to data.
You can use them via the Autocomplete in the editor,
or via the Actions wizard:
All OpenAI functions need to be configured through mandatory and optional parameters, depending on their purpose. Let's go through them one by one.
prompt is the instruction to give to the model in our most generic function ASK_OPENAI(). This is where you'll enter the "ask" you want the AI to answer. You can use the prompt to solve a task by explicitly writing it in prose. Example:
1=ASK_OPENAI("Generate 100-word paragraph about the latest iPhone release")
Tips for creating Prompts
The Open AI integration uses its Completions capability, which means that the artificial intelligence model will predict the next word(s) that follow the prompt. With that in mind, here are a few tips on how to construct the right prompt for your task:
- Be specific: The more specific the prompt, the most likely it is to get the intended result. If you're looking for the Population of the country in millions, "The Population of France, in millions is: " is a better prompt than simply "The Population of France".
- Give examples: You can train the model on the type of answer you're looking for. If you are using Open AI for text classification, use the prompt to give a couple of examples of inputs and expected outputs. For example: "Categorize job title by function name. Head of Marketing:Marketing, COO:C-Level, CMO: "
- Phrase the end of the prompt as the start of the answer: The model will answer with a direct continuation to the prompt. Use that insight to end the prompt with the structure you expect from the answer. If you want to use the OpenAI integration to summarize text, be clear on how to start. Example: "What are 2 main takeaways from this review: ",A2(cell reference with the product review)," ? Summarize it into 2 bullet points. Main takeaways: ")
temperature is common to all functions and is used to fine tune the sampling temperature, varying between 0 and 1. Use 1 for creative applications, and 0 for well-defined straight answers.
If you're doing tasks that require a factual answer (e.g. country populations, capitalize text), then 0 (the default) is a better fit. If you're using the AI for tasks where there aren't definite answers - such as generating text, summarizing text, or translating - then experiment with a higher
temperature, which allows the engine to better capture text nuances and idiomatic expressions.
max_tokens represents the maximum number of tokens to generate in the completion. It's present in all OpenAI functions. You can think of tokens as pieces of words. Here are a few helpful rules of thumb examples from the OpenAI Help center:
- 1 token ~= 4 chars in English
- 1 token ~= 3/4 words
- 100 tokens ~= 75 words
- 1-2 sentences ~= 30 tokens
- 1 paragraph ~= 100 tokens
- 1,500 words ~= 2048 tokens
You can use any number starting with 0. The default value is 200. Most models have a context length of 2048 tokens, except for the newest models which support a maximum of 4096. For tasks that require more text output - text generation/summarization/translation - pick a higher value (e.g. 250).
model to use to generate the answer. It can be chosen in both functions, and by default, it uses "gpt-3.5-turbo". Below you find a list of all of the available GPT-3.5 models:
|LATEST MODEL||DESCRIPTION||MAX TOKENS||TRAINING DATA|
|gpt-3.5-turbo||Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.||4,096 tokens||Up to Sep 2021|
|gpt-3.5-turbo-0301||Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates, and will only be supported for a three month period ending on June 1st 2023.||4,096 tokens||Up to Sep 2021|
|text-davinci-003||Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text.||4,097 tokens||Up to Jun 2021|
|text-davinci-002||Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning||4,097 tokens||Up to Jun 2021|
Number of items (optional)
number of items is available only in the CREATE_LIST_OPENAI() function, and represents the expected number of items in the list.
1=CREATE_LIST_OPENAI("Full names and email address",5,,500)
Tags and multi-tag (optional)
multi-tag properties are available only in the CLASSIFY_OPENAI() function. The first is mandatory and represents the categories you want your text to be classified into.
For example, if you need to classify a list of product reviews in column A, into positive, neutral, negative and very negative, you just need to input those tags separated by a coma, as follows:
1=CLASSIFY_OPENAI(A2, "positive, neutral, negative, very negative")
The second is optional and can be "true" (default) or "false". If true, the function can use more than one tag to classify your text. If false, it will only use one tag. Use false when you need a mutually exclusive strict categorization.
language is available only in the TRANSLATE_OPENAI() function, and indicates the destination language for your translation tasks. Use the function as follows:
Task and text
text are available only in the APPLY_TASK_OPENAI() function, and are used to specify the logic rule to some text.
For example, if you need to capitalize a string of text, use the function as follows:
1=APPLY_TASK_OPENAI("Capitalize all letters", "i HavE a doG")
Goal: Clean up a list of company names and remove legal abbreviations and filler text.
1=ASK_OPENAI(CONCATENATE("Remove legal entity abbreviations like GmbH, LLC, Inc., emojis, special characters and unnecessary text from ",A2,". Company name: "))
Details: Assumes that A2 contains the company name.
Goal: Extract Zip Code, State, and Country Code from an address.
1=ASK_OPENAI(CONCATENATE("The Zip code of ",A2," is: "))
1=ASK_OPENAI(CONCATENATE("The State of ",A2," is: "))
1=ASK_OPENAI(CONCATENATE("The Country Code of ",A2," is: "))
Details: All examples assume that A2 contains the company name.
Goal: Correctly fix capitaliztion in a list of words.
1 =ASK_OPENAI(CONCATENATE("Capitalize the words in the following text: ",A2))
Details: All examples assume that A2 contains the company name.
Goal: Clean up a list of emails by classifying the email providers and personal or company addresses.
1 =ASK_OPENAI(CONCATENATE("Classify this email provider address as either 'personal' or 'company'. Don't return anything else. Email: ",A2))
Details: All examples assume that A2 contains the email address.
💡 Be specific in the prompt to reduce variability in the AI response. In this example by adding "Don't return anything else" to the prompt it guarantees that the response only contains the word