What a Trustpilot star rating doesn't tell you
A star average is shaped as much by how a company collects reviews as by how good it is. We compared UK energy and insurance reviews on the same platform, and the gap had little to do with quality.
If you manage a brand's public reputation, the star average on Trustpilot is the number you get asked about. It is also the number that tells you the least.
We have been running large batches of public reviews through Sunbeam, which reads the open text at scale and groups it by what customers actually mention. Two of the batches sat next to each other in a way that made a point worth writing down: a set of UK domestic energy suppliers, and a set of UK car insurers.
The same platform, two different worlds
The energy suppliers clustered near the top. On an NPS-style scale of -100 to 100, the major suppliers all sat in the low-to-mid 90s. The car insurers clustered far lower, mostly in the 50s, some lower still.
Read quickly, that says energy is wonderful and insurance is mediocre. Anyone who has actually dealt with a UK energy supplier in the last two years will find that hard to believe. So what is going on?
The most likely explanation is not driven by service quality, but the way in which these companies collect reviews. Energy suppliers tend to invite a review at the end of a routine, neutral-to-positive interaction, e.g. a completed switch, a smart meter installed, or an engineer who turned up on time. However insurance is different - a large share of insurance reviews are written by people who came to Trustpilot unprompted, usually because a claim or a renewal went badly and they wanted somewhere to say so.
One process samples happy, low-stakes moments, while the other lets the average sink because the people most motivated to write are the ones who were let down. The star number is measuring the collection habit at least as much as the underlying experience.
Why this matters for anyone benchmarking
If the average is partly an artefact of how reviews are gathered, then comparing your score to a competitor's tells you very little. A 4.8 and a 4.2 in different sectors, or even in the same sector with different review-invitation practices, are not measuring the same thing. You cannot call this a true customer experience benchmark.
The second problem is what the average hides inside a single company. When we read the energy reviews by theme, the high scores did not mean the absence of pain. There were sharp negative pockets sitting underneath, whether that be refunds that dragged on, billing disputes that never quite resolved or a specific tariff change that a cluster of customers were angry about. None of it moved the headline, because it was a thin slice of a large, invitation-padded sample. But it was still real, and it was still the thing those customers would tell a friend about.
The insurers ran the other way. The low averages hid genuinely loved parts of the journey. Pricing and the online buying experience scored well. People liked getting a quote. The score collapsed at the claim and the renewal, and because those moments are emotionally loud, they dominated the average and buried the parts that worked.
The only number that survives is the theme
The takeaway is not that Trustpilot is broken. It is that the star average is a summary statistic doing a job it was never built for. It compresses thousands of specific experiences into one figure, and the compression throws away exactly the part you can act on.
If you own a public review score, the useful questions are not "is it going up or down" but "which themes are dragging, which are carrying, and is a new one emerging that the average has not caught yet". A high score with a worsening refunds theme is a warning. A low score with a loved buying experience is a map of what to protect while you fix the claim.
You get to those questions by reading the reviews grouped by theme, not by watching the average. That is the whole reason we built Sunbeam: paste a public review URL and it groups the text into themes, scores the sentiment of each, and shows you where the real signal is hiding under the number everyone reports.
If you want to see it on your own reviews, you can run a page through it here: sunbeam.cx/try.