As a Growth PM, big part of my job, is doing data analysis. Usually I use a combination of different tools depending on the situation and task at hand. For some situations good old SQL still proves to be very useful, for day to day tasks I come back to Mixpanel, for more advanced type of analytics I use python libraries.
So generally, I am not limited in my skills when it comes to performing all sort of analysis. But last few months I have been more actively using LLMs (Claude/ChatGPT) to see if I can optimise and speed up my work. Going back to the title of the article “Are LLMs good for doing analytics ?”. I would have use good old cliche - It depends :) Of course there is a place for LLM analytics and it democratasises some use cases but there are many caveats here I want to mention first before I give my opinion on where LLM analytics creates the most value.
As a Growth PM, a big part of my job is data analysis. Depending on what I’m tackling, I’ll usually jump between different tools: good old SQL remains incredibly useful for certain tasks, Mixpanel is my go-to for day-to-day tracking, and for more advanced analytics, Python libraries often do the trick.
Generally speaking, I feel pretty comfortable handling most types of analysis. But lately, I've been experimenting more actively with LLMs (like Claude and ChatGPT) to see if they can help optimize or speed up parts of my workflow. Which brings me back to the question in the title: Are LLMs actually good for analytics? I'll have to lean on the classic cliché - it depends. LLM-driven analytics definitely have their place, and they're great at democratizing certain analytical tasks. However, there are some important caveats we need to consider before diving deeper into exactly where LLM analytics offer the most value.
Limitations of using LLM for Analytics
Non-deterministic nature of LLMs
I won’t bore you with a lecture on how LLMs operate under the hood. But by now, most of you probably know that LLM outputs are inherently non-deterministic. In simpler terms, giving the same input to an LLM multiple times can yield different results each time. For certain tasks, variability isn't necessarily a bug; it can even be a feature. This is especially true for creative work, where the lines between good and bad are often subjective.However, when it comes to fields like hard sciences (math, physics, etc.) and, data analytics, results are usually not subjective; they have an objective, innate ground truth that consistently applies under identical conditions. Current LLMs can manage some uncertainty by using tool calls (i.e., relying on deterministic systems to perform specific actions) or by employing built-in code interpreters (OpenAI leverages Python, Claude uses JavaScript). Yet, even these workarounds only reduce the level of uncertainty; they don't eliminate it completely. Therefore, LLM outputs will never be purely binary (correct or incorrect) but rather will sit somewhere along a spectrum of correctness, typically expressed as a probability from 0% to 100%.
While the probability of accuracy can be very high in many cases, even a small analytical error can have significant repercussions. Of course, humans also make mistakes, but this doesn't negate the caution needed when using LLMs for analytics.
How does this personally impact my use of LLMs for analytics? Because I can't fully trust the outputs to be 100% accurate, I typically spend extra time manually running basic tests to catch any obvious errors in the queries generated by LLMs. Another approach could involve creating your own LLM Judge, essentially prompting a second LLM to evaluate the first one's output. Both of these methods require additional time, so each time, you’ll have to assess whether the productivity gains from using an LLM outweigh the cost of verifying its outputs. If the answer is yes, then the task is likely a good candidate for using an LLM.
LLM inference time and breaking the flow state
For analytics tasks specifically, you often need to use reasoning models or agents to improve the reliability of your results. However, these reasoning models aren't instantaneous; they take a noticeable amount of time to generate their outputs, which introduces a subtle but important issue.Personally, when I’m working on complex analytical tasks, after some initial effort, I typically enter a deeply focused state, often called a "flow state." In this state, I can dive into one problem for extended periods without distraction. However, what I’ve noticed when working with LLM-based tools is that the waiting period for the model's output frequently interrupts my flow. Instead of staying engaged, I find myself drifting toward distractions like Instagram or TikTok during these brief waiting periods.
As many of you probably know firsthand, it's not easy to regain your flow state once it's broken. Constant context switching due to waiting on LLM responses also creates additional cognitive fatigue. To manage this effectively, you'll need to cultivate some AI-specific work hygiene. For example, while waiting for the LLM to complete its inference, consider doing adjacent tasks related to your analysis or planning your next analytical step, something to help maintain your context and avoid straying too far from your main focus.
Cost of micro-adjustments
Going from 0 to 1 (i.e., getting about 80% of the task done) using LLMs often feels like magic, but optimizing that final 20% is notoriously challenging. Sometimes, I get an impressive result on my very first try. More often than not, however, I still need to make minor, targeted adjustments (things like tweaking graph labels, changing the visualization style, or reordering values). Unfortunately, making these small adjustments isn't always as straightforward as you'd expect.
If you're anything like me, you might become overly fixated on getting these tiny details perfect through the LLM, wasting way too much time trying to perfect your prompts, only to end up frustrated when the model doesn’t cooperate. From my experience, it's usually better to accept "good enough" rather than chase perfection with an LLM. Otherwise, you'll quickly burn through any productivity gains you initially achieved. Given sufficient skills in Python, SQL, or any other analytical tool, it's often quicker and far less frustrating to handle these micro-adjustments yourself.
Complexity of the task
It probably won't surprise: the more complex a task is, and the more unpredictable its context (e.g., data quality, dataset size, or org’s data maturity), the harder it becomes to manage the limitations I described earlier (points 1–3)
Verifying simple tasks, like basic segmentation or visualization, is straightforward. However, as soon as you move into sophisticated, multi-step analyses involving multiple confounding variables, things can quickly get messy. When dealing with advanced analytics, especially scenarios with numerous variables that must be controlled and accounted for, relying solely on an LLM without proper analytical expertise can lead you to mistakenly accept false-negative outcomes.Non-technical users should be particularly aware of the risks and unknown unknowns they face in these scenarios. Ideally, they should either consult internal data experts or set up an LLM Judge to assist with validation. Remember, the complexity of verification and the need for rigorous test coverage grow in proportion to task complexity.
With increasingly complex tasks, the cost of micro-adjustments (mentioned earlier) rises significantly. LLMs making changes across different parts of a complex analysis might inadvertently break analysis parts that previously were correct, making iterative refinement challenging.
Another factor to keep in mind is context rot and attention dilution. The more data points and variables involved, the higher the likelihood an LLM will hallucinate or inadvertently overlook critical details. This circles back to the fundamental nature of LLMs as non-deterministic systems, as opposed to traditional, rule-based computing.
Finally, data quality plays a major role. Many organizations lack strict data governance policies, leaving you with messy datasets full of quirks. If you're not already familiar with these quirks, it's unrealistic to expect an LLM to understand the organizational context for you. Therefore, when working with complex datasets, be mindful of the inherent limitations and ensure your data is in a state that the LLM can correctly interpret.
Productivity Gain ≥ Task_Complexity × (Verification_Cost + Inference_Time + Micro_Adjustment_Cost) + Cost of Error
This formula helps assess whether an LLM will improve your productivity, based on your task's complexity and your level of expertise. Some elements can be optimized:
Task Complexity
Break the task into smaller subtasks to make it more manageable and easier to verify.Verification Cost
As mentioned earlier, well-prompted LLM Judges can help validate outputs and catch major gaps.Inference Time
Improving your AI workflow hygiene—better prompting, caching patterns—reduces wait times and boosts efficiency.Micro Adjustments
If your skills allow, fine-tuning small details manually can sometimes be faster and more accurate.Cost of Error
Reduce error risk by increasing verification checks. If you're not a technical data user, share your approach with a data analyst to validate insights.
Ultimately, LLMs are just one of many tools available for analytics. At the time of writing, LLMs still come with several limitations, but that doesn't mean it isn't valuable. It all depends on how you plan to use it and which tasks you're solving. Sometimes, the productivity gains you anticipate from an LLM can be offset by hidden costs such as verification effort, inference delays, and the complexity of micro-adjustments. One should not forget the cost of error as well that may come from blindly accepting False Negative output from LLM. If these costs start to outweigh the benefits, it might make sense to fall back on human-driven analytics, supplemented with occasional LLM assistance.