The $113 Threshold for OpenAI-Powered Products
LLMs are amazing, but most people aren’t thinking about the unit costs of doing business using them, and therefore what use cases they are viable to support.
If you’re building a product that uses OpenAI’s APIs under the hood, and if you want to have a long-term viable business, then you need to be charging your customers at least $113/month. I’m sharing the research that informs this figure for three reasons:
First, in our justified exuberance for shipping fast with large language models (LLMs) like ChatGPT, it’s easy to get intoxicated to the point where we forget that part of Product’s job is to assess the business viability of a solution, and OpenAI’s API costs add up a lot faster than they seem.
Second, from what I’ve experienced, a lot of product managers have forgotten how to methodically determine the unit economics for a product, and OpenAI provides us with a timely and relevant example.
Third, the fact that the number is over $100/month teaches us that many use cases won’t be suitable for LLMs (for the time being).
For anyone not concerned with #1 and #2, scroll to the bottom. I made an easy calculator for you to use with your unique product inputs.
PLEASE CHECK: https://openai.com/api/pricing/ as PRICES CHANGE REGULARLY
So how did I get to $113/month? Let’s build up the math one step at a time:
Understanding inputs and outputs to LLMs
We know that LLMs take a text input prompt (“input”) and they return a response answer (“output”). OpenAI charges its API customers separate prices for these inputs and outputs.
The price is based on the number of tokens in the input text and the number of tokens in the output text. But what the heck is a token? In short, a token is a sequence of characters loosely correlated with a word, and it is how the computer encodes meaning to the words and the text that we type. The computer, however, is parsing meaning more granularly than us, so some shorter words count as just one token, and some more complex ones are multiple tokens.
In essence, though, we’re getting charged based on how much text goes into the prompt submitted to the OpenAI API, and then again based on how much text the AI returns in its response.
But it gets more complicated…
The intuition is deceptively simple, however. For entirely user-generated requests, this would mean smaller submissions in the form of questions from the user and then longer responses back from OpenAI. Consider that “Where should I eat in New York City?” is about 9 tokens long, but the response I get is 293 tokens long, and it covers many options around the city, so there is pretty serious asymmetry between what requests will cost versus responses.
It gets more complicated, though, because most apps using OpenAI are doing extensive “prompt engineering” to fully articulate all the parameters that describe how the AI should respond, including providing background, personas, and even examples. The actual prompt length likely requires a discussion with an engineer to understand. It might not be obvious. For example, the product manager might think the input cost is the number of tokens in “Good date-night ideas,” but each user prompt like this might be wrapped in an envelope of 1,000 tokens’ worth of additional context. The longer the prompt that surrounds the user’s input, the higher the cost for every single transmission going in.
So now we need to understand what a reasonable rule of thumb is for how long a question might be and how long a response might be. This is where I can give you a rough idea but can’t do your homework for you. For prompts with very little adornment, if you will, it feels sensible to me to assume 25 tokens going in. And for responses of modest complexity coming out, 500 tokens feels like a safe average bet.
Now let’s convert into dollars…
Great, for OpenAI’s GPT-4 API, that’s $0.03 per 1,000 tokens going in and $0.06 per 1,000 tokens coming back out. We’ll need to divide our total by 1,000 later to account for this. (Note: as of 7/25/24, the prices are now $5/1M input tokens and $15/1M output tokens, which roughly drop the prices quoted in this article by about 4x. The calculator at the bottom has been updated to reflect these new default values.)
Now we need to estimate how many requests and responses a user will catalyze each day that they are using our product. Let’s assume they use the product every day of the year. For a search-based product, 30 searches per day doesn’t feel unreasonable.
For a single user of the product for a year, that’s 273,750 input tokens and 5,475,000 output tokens. When we divide by 1,000 tokens and multiply by our per-1,000 token costs, we get $8.21 per year for the inputs and $328.50 per year for the outputs, or a total of $336.71 per year for OpenAI usage per user.
…and account for proper business margins
But this isn’t the end of the story, because we don’t build products with zero margins or we go out of business. In fact, if the cost of goods sold (COGS) is outside of the 20% to 30% range, we’ll certainly raise eyebrows, and that’s assuming that OpenAI is the only cost for what we are selling. Assuming it is, we need to sell our product for at least $1,346.85 per year.
In monthly SaaS terms, we’re talking about a plan that runs $112.24 a month, or for simplicity about $113/month. Anything less than this may not prove to be viable for the business.
Every team with a unique product will have different usage patterns in terms of how heavily it implicates OpenAI. For example, a tool that summarizes emails might get 5 to 10 summary requests per user per day and might have fairly short questions (“summarize this”), except that the prompt itself also includes the email, so that’s probably at least 1,000 tokens for a modest thread among colleagues for each input. You’ll have to assess who your users are, how they are interacting, and how much prompt engineering is done. Further, OpenAI has multiple flavors of its API. GPT-4-32k is double the cost I just showed, so if you need that one, double everything here, up to $226/month. Meanwhile, GPT-3.5-Turbo is 3% of the cost of GPT-4. The point is that these details matter, they determine what use cases are business-viable, and PMs should be asking “which GPT do we need to support our use case” and “how hard will we hammer it?”
To make this easy, I’ve built out a simple calculator you can play with to test your assumptions, along with a variety of prompt and response examples to help you choose the right number of tokens.
Hi! If you enjoyed these insights, please subscribe, and if you are interested in tailored support for your venture, please visit our website at First Principles, where we focus on product to help the world’s most ambitious founders make a difference.