- Published on
ChatGPT Plugins - A Step by Step Tutorial & Best Practices - (Part 1)
- Authors
- Name
- John Hwang
- @nextworddev
ChatGPT Plugins: A Step by Step Tutorial & Best Practices - (Part 1)
I recently built a ChatGPT plugin to retrieve real time stock and crypto prices, and wanted to share what I learned.
In this guide, I will go over key concepts that any ChatGPT developer should know:
- what is a ChatGPT plugin (in plain English)
- how ChatGPT plugins are different from iOS apps
- how ChatGPT plugin dispatch model works
- gotchas and issues with building and using ChatGPT plugins
- whether you should build a ChatGPT plugin
By the end of this tutorial, you will 1) know what ChatGPT plugins are, 2) and have built a ChatGPT plugin that allows you to fetch real time crypto and stock prices, from withinChatGPT.
This is part 1 or a 2-part series on ChatGPT plugin tutorial. This (part 1) covers the concepts, and part 2 covers a code walkthrough. Follow me on twitter to be alerted when part 2 is published.
All code for the plugin is available in this github repo.
Before jumping into the code, let’s level-set on what ChatGPT Plugins are, and what they are used for.
What is a ChatGPT Plugin
Note, ChatGPT plugins are currently under a developer preview. Make sure to request to join the developer preview here: https://openai.com/waitlist/plugins
ChatGPT plugins are basically are what apps are to your iPhone or Android phones.
Just like apps, plugins are extra capabilities that users can “enable”. Examples include:
- Booking reservations at restaurants (OpenTable plugin)
- Getting latest info on tickets and other travel planning info (Expedia plugin)
- Giving ChatGPT access to live Internet (Browser plugin)
- and so forth.
These are not capabilities you have out of box with ChatGPT. So plugins basically “extend” what ChatGPT can do.
Also notable is that all these interactions with plugins happen only via “chat”.
For example, when interacting with OpenTable plugin, you are not clicking on any buttons, but directly asking “When can I get a table at Wolfgang's steakhouse?”.
This may or may not be the best experience compared to clicking on buttons, depending on the use case (see the sections on whether you should build a plugin).
What Are ChatGPT Plugin's Use Cases
So what are "good fits" for building ChatGPT plugins?
Plugins are useful for enabling ChatGPT to work on outside data or take actions outside of ChatGPT (like booking reservations or sending emails). For example, you could imagine a plugin that allows you to directly send your ChatGPT response as an email.
We will be building a plugin that fetches stock and crypto prices, so it falls into the “plugin fetching outside data” category.
But if your plugin would benefit from a custom / fancy / complex user interface, then I wouldn't build a ChatGPT plugin. That basically rules out most entertainment use cases.
More useful plugins are things like the OpenAI's retrieval plugin, which allow users to pull custom data (live or real time data, paywalled data, vendor data, etc) into ChatGPT. This is useful because ChatGPT - by definition - doesn’t have your custom data, so this type of plugin is necessary.
But before building, we need to learn how plugins work in more detail.
How ChatGPT Plugins Work
It is the easiest to think of plugins as ChatGPT version of iOS apps - but there are some key differences. I’ll first talk about what’s similar, then highlight the differences.
- Like iOS apps, plugins have a plugin store where you can discover plugins and activate them.
- Some plugins will redirect you to login and make accounts, while some plugins will require no login.
The key differences between plugins and iOS apps are these:
Unlike iOS apps, there is no separate visual user interface. All interactions with plugins will be text based! (I imagine voice interactions will come at some point).
iOS apps are basically binaries (software) that you install onto your device. But ChatGPT plugins are actually just REST APIs, and user doesn’t need to install anything. User just “enables” plugins, essentially. Here are some implications:
- No download required means that users can easily activate or disable plugins
- For developers, they are making REST APIs that work for any device (iOS, Android, Web) and don’t have to target any specific device (i.e. no separate iOS vs Android versions of plugins)
But the most important difference is the “dispatch model”.
- For iOS apps, the user explicitly decides when and when not to use the app - by opening and closing it.
- For ChatGPT, it is ChatGPT that decides when and when not to utilize the plugin. When you type “book me a reservation at XYZ restaurant”, ChatGPT will discern that OpenTable plugin might be useful, and passes through your request to the OpenTable’s plugin server.
- Note, this won’t happen if OpenTable plugin is not activated by the user.
- And all ChatGPT to Plugin interactions are done via REST API calls
This is important so I’ll reiterate: ChatGPT is basically the control center for utilizing plugins. Unlike iOS apps, where you need to actively press buttons, etc, ChatGPT does all those interactions in the background and just gives you the end result, or ask clarifying questions.
This model is very much like Alexa (the device) in that you can ask it something and Alexa will interact with services in the background.
But this also means that users don’t have explicit control over when or when not to use a plugin. And it opens the possibility of ChatGPT “misinterpreting” the user and using a plugin when it’s not supposed to, and vice versa.
This can be confusing. For example, say both OpenTable and Expedia plugins are enabled. User says “book a reservation at the Hyatt” which also has a restaurant. Which plugin should be used?
To understand this problem further, let’s look at how ChatGPT figures out which plugin to invoke?
How ChatGPT Routes Requests to Plugins
Let’s learn how ChatGPT uses plugins by tracing through the query “Where is Apple trading at?”. Let’s assume our stock and crypto price plugin is installed already.
Step 1- All ChatGPT plugins need to provide OpenAI with two files: ai-plugin.json
and openapi.yaml
. These are the files that tell ChatGPT:
- What does your plugin do - at a high level - and how does the user login. This is what
ai-plugin.json
is basically about. - How do we (ChatGPT) call your (plugin) service. Since ChatGPT plugins are REST APIs, it requires well-formed HTTP requests. So we need a file that describes what APIs are available in the plugin, how to send requests, and what responses to expect. That’s what
openapi.yaml
is for.
These two files are validated when plugin is installed, and read again at plugin runtime (when it’s used).
Here’s a small section of ai-plugin.json
for our stock price plugin, to drive this home. Note the description_for_model
field. This is what ChatGPT reads to figure out if your plugin is useful in answering user’s question.
...
"description_for_model": "Plugin for fetching the latest stock and crypto prices in real time",
"auth": {
"type": "none"
},
...
Here’s a section of openapi.yaml
that shows which API path (and request method) our plugin exposes. ChatGPT will read this file into its context essentially, and figure out GET request to /get_prices
is a tool it can use. ChatGPT looks at the description
field to figure out which API endpoint to use within your plugin.
paths:
"/get_prices":
post:
summary: Get Prices
operationId: get_prices_get_prices_post
description: Fetch the latest stock or crypto price
Step 2 - When user types a query - say “Where is Apple trading at?” - ChatGPT will look at all ai-plugin.json
from its list of activated plugins, and see if any of them are relevant or useful to answering the question.
When plugin is “chosen”, you will see a green box like this.
If it deems your plugin useful, then it will use openapi.yaml
to figure out how to make valid API requests to the plugin. (yes - multiple API calls can be made in one turn).
Then, it will “construct” the HTTP request, send it, and receive a response. Below, you see the exact request and response ChatGPT sent and received from the plugin.
We got $169.595 as the response.
Step 3 - ChatGPT will then synthesize the result
The final step is ChatGPT using the plugin’s response, and generating some answer in words.
If these boxes for the plugin interactions weren’t shown, we would have never known that plugins were involved at all.
This seemed all straightforward, but there are many gotchas and issues when building more complex plugins. Here are some.
Downsides and Gotchas
The above example of handling “Where is Apple trading at?” shows that ChatGPT does a lot, which also means more things can go wrong at each step.
And each time there’s a mishap, the query will be retried, which adds significant delays and causes bad experience.
Let’s zoom out and see what ChatGPT does for even a simple plugin use, and what can go wrong:
- ChatGPT figures out when and which plugin is relevant to the query (plugin selection)
- Issue: Might select the wrong plugin
- ChatGPT figures out how to call your plugin (plugin invocation)
- Issue: Might hallucinate the wrong API (pretend like the plugin has an API that isn’t implemented)
- Issue: Might form an invalid request
- ChatGPT figures out how many API calls are needed (query planning)
- Issue: Might not realize that it needs to make 2 separate calls to get stock price data and fundamentals data (e.g. P/E) but just make one API call.
Usually ChatGPT is good at eventually figuring out what went wrong (if any) and recovering. The real problem is that every such mistake in intermediate step adds to delays in getting answers, which leads to user frustration.
LLMs like GPT3.5 and GPT4 are really great but they still make these mistakes occasionally - and often enough for things to be annoying.
Luckily, there are ways to mitigate these issues - mainly by keeping your plugin simple and providing really good descriptions for your plugins in the ai-plugin.json
and openapi.yaml
files.
Should You Build a ChatGPT Plugin?
From the user’s perspective, I’d give ChatGPT plugins a 5/10. But for developers, I think the time to learn how to build plugins is now - so they will be in a good position to capitalize once OpenAI improves the user experience.
For use cases such as data retrieval, I think plugins are perfect fit. I don’t see ChatGPT going anywhere as a medium, and it will also get better at utilizing plugins, as well as being faster.
Thus, there’s going to be some market for plugins, and hence it is up to ambitious developers to work on discovering and implementing for such use cases.
Recap
In this post, we explored what ChatGPT plugins are, how they work, and some gotchas and advice.
In part 2 of this tutorial, we will actually build it and I’ll walk you through the code. Follow me on twitter to be alerted when it drops.