But there is a fundamental problem: fewer and fewer people are answering - and more and more of those who do are AI agents.
I explore these two converging trends below. Then, I’ll show that anybody (including me) can easily set-up an AI agent to earn some money with taking surveys. I’ll then estimate the impact of this further down the line in three main fields and propose some solutions.
Problem 1: The increase of non-response rates
If you use survey data, it probably hasn’t gone unnoticed: survey response rates have plummeted. In the 1970s and 1980s, response rates ranged between 30% and 50%. Today, they can be as low as 5% .
To give some (shocking) examples: the UK's Office for National Statistics (ONS) experienced a drop in response rates from approximately 40% to 13%, leading to instances where only five individuals responded to certain labor market survey questions. In the US, the current population survey dropped from a 90% response rate to a record low of 65%. (...)
Problem 2: The increase of AI agents
How difficult is it to build an agent? So… I did what any overcaffeinated social data nerd would do. I built a simple python pipeline for my own AI agent to take surveys for me (don’t worry I promise that I didn’t actually use it!). The pipeline I built just requires me to:
- Access to a powerful language model (I just used OpenAI’s API - but perhaps for research representativeness of the distribution an uncensored model is way better!).
- A survey parser: this can be as simple as a list of questions in a .txt file or a JSON pulled from Qualtrics or Typeform. The real pros would scrape the survey live though!
- I prompted it with a persona. The easiest is to built a mini “persona generator” that rotates between types: urban lefty, rural centrist, climate pessimist, you name it.
That’s it. With a bit more effort, this could scale to dozens or hundreds of bots. Vibe coding from scratch (see my previous Substack on how to do vibe coding ) would work perfectly too.
Don’t worry btw, I didn’t deploy it on a real platform. But other people did. Below, I extrapolated the trends of AI agents based on data points in existing research since data is very hard to find...
Downstream problems
Let’s explore how this impacts three main fields in which surveys are used: political polls, market research and public policy.
Political polls. Many polls depend heavily on post-stratification weighting to correct for underrepresentation in key demographic groups. But when response rates fall and LLM answers increase, the core assumptions behind these corrections collapse. For instance, turn-out models become unstable: if synthetic agents overrepresent politically “typical” speech (e.g., centrist or non-committal), models overfit the middle and underpredict edges. Similarly, calibration failures increase: AI-generated responses often mirror majority-opinion trends scraped from high-volume internet sources (like Reddit or Twitter), not the minority voter. This results in high-confidence and stable predictions that are systematically biased.
Market research. AI-generated responses are, by design, probabilistic aggregations of likely human language conditioned on previous examples. That’s great for fluency and coherence, but not good for capturing edge-case consumer behavior. Real customer data is heteroskedastic and noisy: people contradict themselves, change preferences, or click randomly. AI, in contrast, minimises entropy. Synthetic consumers will never hate a product irrationally, misunderstand your user interface, or misinterpret your branding. This results in product teams building for a latent mean user, resulting in poor performance across actual market segments, particularly underserved or hard-to-model populations.
Public policy. Governments often rely on survey data to estimate local needs and allocate resources: think of labor force participation surveys, housing needs assessments, or vaccine uptake intention polls. When the data is LLM generated this can result in vulnerable populations becoming statistically invisible and lead to underprovision of services in areas with the greatest need. Even worse, AI-generated answers may introduce feedback loops: as agencies “validate” demand based on polluted data, their future sampling and resource targeting become increasingly skewed.
So what can we actually do about this?
Unfortunately, there’s no silver bullet (believe me - if there were, my start-up dream would be reality and I’d already have a VC pitch deck and a logo). But here are a few underdeveloped but in my humble opinion promising ideas:
by Lauren Leek, Lauren's Data Substack | Read more:
Image: Lauren Leek compilation of sources
[ed. I never answer surveys because, why assist people in figuring out new and innovative ways to manipulate and sell me things (including politicans)? So, I'm not surprised this tool is tanking. What is surprising is the claim that AI bots are a significant reason. I guess if you're a professional survey taker and have the coding skills then yeah, it would make sense to automate the process (more surveys, more money). But really, how many people can do that? More than anything, I'm surprised that prediction markets aren't mentioned here. Those seem to be the most accurate and granular tools for achieving the same purpose these days.]