2023 W11: GPT 3.0 vs 3.5

Humberto Ayres Pereira, CEO and Co-Founder, Rows

Henrique Cruz, Head of Growth, Rows

Every week I post about one thing that happened at Rows. We're building in public!

---

On Friday we upgraded all our functions that use OpenAI.

By default they were all using GPT-3 models (text-davinci-003) and we updated them to use GPT-3.5 models (gpt-3.5-turbo).

The 3.5 generation is, OpenAI says, smarter. It's also 10x cheaper than the previous model. So this is something that interests our users a lot.

We did a case by case comparison of current test use cases. Our conclusion is that the quality of responses is similar, if you're willing to adapt the prompts a bit.

The newer GPT-3.5-turbo models do seem a tad bit smarter for everyday spreadsheets tasks.
On some occasions this improved model still outputs unwanted periods or quotes in responses. (Those can be removed if you ask in the answer to remove them explicitly).

Note. We know that the 4.0 generation is out too. At this moment it's in a limited beta, which means limited availability (quantity of requests) and limited speed of execution. It's significantly more expensive too, at 15-30x the price of the previous generation. This is all evolving very fast, so things will change. We're on it!

Testing 3 vs. 3.5 models

To test the models, we build a spreadsheet in Rows (duh). You can open it to check results, or duplicate it to play with it.

There's several pages, and they cover different use cases for OpenAI in a spreadsheet. Below we link directly to each of the page.

Classification
- GPT-3.5 wins, by a small margin.
- The first table uses CLASSIFY_OPENAI() to classify social network messages according to topic. We built this function so that you only need to add the text, the tags, and it does the job for you; the function is quite smart, as it allows single-tag or multi-tag results. Results are similar, though with 3.5 sometimes you get a period (".") at the end. You can see the formulas in columns C and D.
- The second table runs 2 tests to classify job titles. The first test (columns B and C) uses our generic function ASK_OPENAI(), and results are the same. The second test (cols D and E) uses our special function CLASSIFY_OPENAI(). Results are better in the 3.5.
Clean up
- Same performance.
- The first table extracts legal endings and unwanted decorations to get to a pure company name.
- The second table breaks down parts of an address.
- Both these tests use the generic ASK_OPENAI(). Note to self/team: Maybe we should build a custom function for it 😉.
Summarize
- Same performance.
- The table uses ASK_OPENAI() to summarize text into 2 bullets.
Translate
- Better translations for 3.5, though of one of them comes with extra double quotes.
- The table uses TRANSLATE_OPENAI() to translate text between several languages.
Find Facts
- Slightly better answers for 3.5, including for Rows.
- The table uses ASK_OPEN() to find the full address including company name.
Create Lists
- Same performance.
- Here I used our Wizard to generate a table with the US Presidents and their birthdays.

Very important Note: In the functions used, you will see that we specifically typed the model to be used. Normally you don't actually need to refer the model in the function, as by default we use the best (gpt-3.5-turbo) that's also the cheaper option. We did this so that it's explicit what we're doing. Later on, as other models emerge, this test spreadsheet will still work exactly the same.

In conclusion, the upgrade is only marginally better for common spreadsheet use cases, but given that it's also 10x cheaper, we're very happy with results.

Soon we will have more on the AI topic.

More building in public next week!

- Humberto (and Henrique, who built the spreadsheet in the first place)