There are two different questions merchants ask about AI shopping visibility, and they need different answers. "Where do I rank today?" is a snapshot, and we covered how to take one in how to see where your store ranks in AI shopping. This post is about the second question, the one that actually tells you whether your work is paying off: how do I track it over time?
Tracking is what turns AI visibility from an anecdote into a metric. One check tells you ChatGPT skipped you on a Tuesday. Twelve identical checks across three months tell you whether you are gaining ground, losing it, or watching a competitor eat your category. Here is the system.
Step 1: Freeze a query list
Everything downstream depends on asking the same questions every time, so start by writing them down. Eight to twelve queries, chosen the way a stranger shops:
- Category head terms. "Ceramic pour-over dripper," "merino running socks." The searches that bring customers you do not have yet.
- Qualified variants. Add the modifiers real shoppers use: "under $50," "for wide feet," "ships to Canada." Assistants handle qualifiers well, and they change the answer set.
- One or two brand checks. Your brand name, as a control. If you ever stop winning your own name, something is badly wrong.
The rule that matters: once the list is written, do not fiddle with it. Numbers are only comparable if the questions stay identical. Add new queries at the end if you must, but never rewrite old ones.
Step 2: Build the log
A spreadsheet is genuinely enough. Six columns:
| Column | What goes in it |
|---|---|
| Date | The run date, so trends line up |
| Query | The exact text, copy-pasted |
| Surface | ChatGPT, Shop app, Gemini, Perplexity |
| Present | Did any product of yours appear at all |
| Position | Where in the recommendation list |
| Beaten by | Competitors ranked above you |
Add a notes column for what the assistant actually said, because the prose is diagnostic gold. An assistant quoting last month's price, calling your product the wrong material, or citing a competitor's review page is telling you exactly which data layer to go fix.
Step 3: Probe the surfaces that matter
Run the list against the assistants your customers actually use. In practice that means ChatGPT and the Shop app first, since both read Shopify's Global Catalog, then Gemini, and Perplexity if research-heavy shoppers matter in your category. Keep each run clean: fresh conversation, no context from previous questions, the query verbatim.
Two realities to respect while you do this. Assistants vary their answers, so a one-position wobble between runs is noise, not news. And the surfaces do not agree with each other, because they read different mixes of catalog data, web crawl, and their own signals. That disagreement is information: present in Shop but absent in ChatGPT points at a different failure than absent everywhere, and why your products don't show up in AI shopping walks the failure tree.
Step 4: Sample the catalog underneath
The assistant's-eye view tells you what shoppers experience. To understand why, go one layer down. For Shopify merchants, the index behind ChatGPT, Gemini, Copilot, and Shop is the Global Catalog, and it is directly queryable: you can run your fixed query set against the catalog itself and record presence, position, and the competing offers behind each search. That gives you a cleaner, less noisy series than assistant answers, and it separates "the catalog ranks me poorly" from "the assistant chose not to mention me."
This layer also exposes the competition mechanics worth logging: when several stores sell the same product, the catalog clusters them and one offer wins the recommendation, a dynamic we unpack in the agentic buy box. If you sell anything that other stores also carry, track whether you are winning the cluster, not just appearing in it.
Step 5: Read trends, not snapshots
The discipline that makes the log worth keeping: never react to one run. React to slopes. Presence trending up across your query set after a catalog cleanup means the fixes are being absorbed. A competitor newly above you on four queries at once usually means they changed something, price, availability, data quality, worth investigating. A single bad Tuesday means nothing.
Monthly is the right cadence for most stores. Add an off-schedule run two or three weeks after any major fix, because the catalog re-indexes on Shopify's schedule, not yours, and you want to see the change land. The scorecard framing, what winning and losing actually look like across five dimensions, is in how to tell if you're winning or losing in the Shopify Catalog.
When a tool earns its keep
The manual system above works, and running it even once will put you ahead of nearly every store in your category. Its weakness is stamina. Month three is where spreadsheets go to die, and a tracking series with holes in it loses most of its value.
This is the point of the rank tracking inside AgentReady: it runs a fixed query set against the catalog on a schedule, charts your presence and position over time, and pairs each movement with the diagnosis, the category, price, or data change that explains it, so the "why" arrives attached to the "what." The honest framing is the same one we hold ourselves to everywhere: no tool can promise you rankings. What tracking buys you is the feedback loop, seeing which fixes move the numbers, and the early warning when a competitor starts winning your queries.
Before you track anything, though, make sure the foundations are not quietly broken, because there is no point charting your position on surfaces that cannot read your store. The free AI-Readiness Checker grades that in about a minute, and it is where every tracking program should start.

Comments
Every comment here comes from a verified email. Write yours, confirm from your inbox, and it's live.
Loading comments…