Identifying Traffic from Generative AI / LLMs



Everyone is saying it:
traffic from Generative AI / LLMs (like ChatGPT, Perplexity, Claude, Google Gemini/AI Mode, and more) is exploding. At Adobe we've written about how Gartner foresees a 50% decrease in brands' organic search traffic by 2028. And how Search Engine Land has seen an 800% increase in LLM referral traffic between Q3 and Q4 of 2024. Even Adobe Analytics data has shown a 3,500% increase in traffic to US retail sites from generative AI sources.
These are big numbers, and they cannot be ignored.
Everyone is also asking the same question:
how can I confidently identify this LLM traffic? Sure, I'm being told that referrals are increasing, I'm seeing organic search referrers are dropping, but what are my options for identifying this traffic?
Well ... it's complicated.
There are many LLMs and there are many ways to prompt those LLMs. Just off the top of my head:
- 1. chatgpt.com
- 2. ChatGPT MacOS/Windows/iOS/Android Apps
- 3. perplexity.ai
- 4. Perplexity MacOS/Windows/iOS/Android Apps
- 5. Google AI Mode
- 6. Google Gemini
- 7. Google Gemini iOS/Android Apps
- 8. claude.ai
- 9. Claude MacOS/Windows/iOS/Android Apps
- 10. Microsoft Copilot on the web (there's multiple domains for this for some reason)
- 11. Copilot MacOS/Windows/iOS/Android Apps (and again, somehow I've got two different Copilot apps installed on my Mac, so...)
And as you either already know, learned from my LinkedIn Learning course, read about in this post, or are about to learn:
Web traffic can be identified in two ways.
Referrer information provides websites the exact URL of the previous page that linked the user to your web page. This is extremely valuable information - it's what identifies that a user cam from a social media page, or a search engine, or if it's missing, entered your webpage in directly. Over the past few decades, referrer data has degraded. Google was the first one to purposefully degrade it - in the name of privacy they removed the keyword that was searched on Google.com so marketers could no longer see the term. Instead, we can now only see that the user searched on Google, and landed on the website, because of this referrer data. Is removing it from the referring URL actually helping consumer privacy? Decide for yourself as you learn about..
Landing Page Query Parameters can be stuffed with valuable information about the referring page that marketers can use to attribute behavior back to. For example, if I share this link on Bluesky: https://blog.ericmatisoff.com/?cid=bluesky then I can use my web analytics tool to align the clickthrough and any subsequent behavior back to that link, because I included cid=bluesky in the URL. Of course, this doesn't help for organically clicked links - whether search results or social media posts or anything that the website owner doesn't own - because as a marketer, I can't demand that every single person on the Internet uses a unique query parameter. So generally, these query parameters are used to differentiate Paid and Owned links, but not Earned. Paid links like Paid Search, Paid Social, Display, Marketing Emails can put anything into their URLs, including query parameters. This means that Paid Search Ads on Google can have query parameters in them that identify the keyword that the visitor searched in them. I guess consumer privacy matters less if they click a paid ad... The same can be said for Owned links - these are links on other parts of the web that the brand owns. Whether unpaid social posts from the brand or other domains that are linking, query parameters can help attribute behavior back to them.
OK, so what?
That's the right question to ask - so what? Well, since these are the only two ways to identify referring traffic, it must also be the only way that marketers can identify traffic from those 11+ LLMs I have listed above. Sounds easy, right? Just get a list of the domains and use them to identify LLM-driven traffic. Boom. blog post done?
But what about the apps? Well, apps can't pass referrers to your browser the way that websites can. And since these apps are sending links that are neither Paid nor Owned, we just have to hope that the LLMs are sending information in query parameters out of the kindness of their hearts? Well guess what, some of them are. Sometimes. And others aren't. Usually.
So it's consistent right?
Nope. Or at least there's barely any consistency across the different LLMs, in fact, there isn't even significant consistency across links within a single LLM.

Well they're not completely inconsistent. They're a little bit consistent in that when the LLMs that send query parameters do feel like sending query parameters, they use utm_source
. So this means if chatgpt adds a query parameter to the landing page that it's sending visitors to, it'll look like: https://blog.ericmatisoff.com/?utm_source=chatgpt.com
I've seen perplexity use this same utm_source query parameter, but for them it'd look like: https://blog.ericmatisoff.com/?utm_source=perplexity
So I set out on a mission to try and track whether or not Referrers or Query Parameters are getting set across these LLMs. It's messy business, because the LLMs each have multiple ways of using them (.com, OS apps, mobile apps...) and multiple pay tiers (does Pro set different parameters than Plus?), AND even multiple places that they link to websites (directly inline in the response vs the "sources" of the response, and probably more).
This means that sometimes, chatgpt.com includes the utm_source query parameter, sometimes it doesn't. Yet Gemini, Claude, and Copilot have never sent a query parameter in my testing.
Either way, I set out on this mission to try and figure out what is being set consistently, what is being set inconsistently, and what is totally missing.
Here's a first stab at organizing the results of my research. I'm going to keep testing and occasionally update this table as I find more consistency.
LLM: chatgpt.com
Referrer: https://chatgpt.com/
Query Parameter: utm_source=chatgpt.com (sometimes)
LLM: ChatGPT MacOS App
Referrer: None
Query Parameter: utm_source=chatgpt.com (sometimes)
LLM: perplexity.ai
Referrer: https://www.perplexity.ai/
Query Parameter: utm_source=perplexity (sometimes)
LLM: Perplexity MacOS App
Referrer: None
Query Parameter: utm_source=perplexity (sometimes)
LLM: Google AI Mode
Referrer: https://www.google.com
Query Parameter: None
LLM: gemini.google.com
Referrer: https://gemini.google.com
Query Parameter: None
LLM: claude.ai
Referrer: https://claude.ai
Query Parameter: None
LLM: Microsoft Copilot at https://m365.cloud.microsoft
/
Referrer: https://m365.cloud.microsoft/
Query Parameter: None
LLM: Copilot MacOS App
Referrer: None
Query Parameter: None
See what I mean? A ton of inconsistency. And honestly kind of crazy that Google doesn't have a Query Parameter. I mean, this means it's just getting bucketed into Organic Search? That's koo koo.
For ChatGPT, it seems that they're angling to use referrals with the utm_source=chatgpt.com query parameter to attribute shopping/commerce use cases. I'm going to keep testing with multiple prompts to see if the query parameter only shows for these types of use cases. More on this soon.
Anyway - I've got a number of different prompts that I'm running to test these out and I end up clicking each of the different links in the LLM response. This means that the "sometimes" responses in the Query Parameters answers above means that some links have it, some don't, sometimes those same links do have the query parameter on some prompts, and sometimes in other prompts, links in the same location don't have the query parameter.
Anything else?
In fact, yes. This article has so far only focused on traffic that is referred to websites from LLMs. But what about when an LLM is scraping content from your site in order to provide answers to prompts? That's a whole other consideration. I'll give you the CliffsNotes.
Wouldn't it be great to know what information LLMs are scraping from your site? Giving you, essentially, a glimpse directly into the type of content being used to reply to LLM prompts? Of course it would be! But there's a challenge - LLMs use crawlers that generally don't trigger JavaScript. And this means that your average web analytics platform won't even know they're there. Web logs and CDN logs do, however, still have this behavioral data. All you need is access to those logs, and then to know which user-agents each of the LLMs are using (here are OpenAI / Chat GPT's user-agents) and then you can see which pages on your site are being accessed.
Is this easy? Oh heck no. But that's where technologies like Adobe LLM Optimizer come in to play - to make that data more accessible.
Oh and one more thing - Operators. We've already started to see these newfangled things where Operators or Agents are doing the bidding of the user based on a prompt. ChatGPT Agent (as Operator is now called) can do exactly this. Prompt "build me an agenda for an upcoming trip to Chicago on my Google Calendar and book some reservations for dinner each night". The agent will then go to my calendar, ask me to log in, then find the trip, confirm it with me, then find some dinner reservation ideas, check with me to see if I'm interested, then book them on my behalf. Now THIS experience - this uses JavaScript and will be captured in web analytics tools. Well, assuming the user-agent isn't blocked by any bot-blocking services that you're using. So it's worth looking into this by checking your Data Feeds and your bot data to understand what is and isn't filtered out, and then to analyze it.
Oy vey.
Yeah, I know, it's a lot. I hope the information is still helpful for you, even if all over the place - because there's just so much changing and so rapidly. Keep an eye out for content from my Adobe colleague, Brian Au, who helped with this article and will be publishing content on these topics and more.
If you've got other LLMs that you'd like me to test send them my way. Next up is testing iOS apps too.
If you're seeing consistently different results, send them my way too. This latest tech wild west is changing rapidly.
Go forth and analyze that LLM referral traffic (if you can), you Rockstar you!