Why aggregate review scores hide the signal that matters
Three case studies show the same pattern. Aggregate sentiment is a bad starting point. What surfaces actionable signal is segmentation, and the right kind depends on what's hiding the signal.
We've been pulling public App Store reviews into Sunbeam dashboards as case studies. Three recent posts cover MyFitnessPal (1,981 reviews), Trading 212 (926 reviews), and Discord (1,854 reviews), 4,761 reviews in total. They tell three different product stories. They share a methodology lesson. Aggregate sentiment is a bad starting point. What surfaces actionable signal is segmentation, and the right kind of segmentation depends on what's hiding the signal.
Why aggregate review sentiment is the wrong starting point
Most teams reading their own App Store or Trustpilot reviews do one of two things. They look at the headline NPS or star rating. Or they read the loudest five reviews and react to those.
Both are wrong for the same reason. A review pile is multiple stories at once, and aggregating them across stories produces a number that doesn't actually mean what it looks like it means.
A +20 NPS could mean 80% of customers are mildly happy and 20% are mildly unhappy. It could also mean 60% of customers love a specific feature, 35% are furious about a specific decision, and the math averages out to +20. The first case calls for nothing in particular. The second calls for an immediate fix to a specific feature for a specific cohort.
The aggregate doesn't distinguish. Segmentation does. Each of the three case-study posts shows a different kind of segmentation surfacing different actionable signal that aggregate sentiment was hiding.
Segmentation by suggestion intent in MyFitnessPal's reviews
MyFitnessPal shipped a redesign in mid-April 2026. App Store reviews about app updates jumped from a steady 4 to 86 spans per fortnight to 597 in the launch window. The aggregate sentiment cratered.
Reading the reviews top-down looks like an undifferentiated pile of "I hate the new design." Most analytics tools would summarise that as "redesign-related sentiment is negative" and stop.
The segmentation move that surfaces signal: extract specific actionable suggestions, then count which ones come up most often. Sunbeam's suggestion extraction returned 99 reviews containing the same single ask. Give us a "classic look" toggle that lets us choose between the old interface and the new.
That's the strongest single piece of guidance MyFitnessPal could draw from those 1,981 reviews. Not "the redesign is bad" (too vague to act on), but "ship a setting that lets us revert" (specific, actionable, defensible). The full breakdown is in the MyFitnessPal case study.
Segmentation by user cohort in Trading 212's reviews
Trading 212's aggregate NPS sits around +19. By itself, that looks fine. Slightly positive, no urgent signal.
The data segments cleanly by what surface customers are reviewing. Beginner-facing surfaces (Cash ISA, Learning Tools, basic ease-of-use) are strongly positive. The single most frequent topic in the dataset is "overall ease of use," at 416 reviews, mostly positive.
Active-trader surfaces are catastrophic. Order Controls sits at NPS -85, with 25 of 27 reviews negative. CFD Trading is -72. The Positions view is -64. These aren't outlier reviews. They're the experience of an entire user cohort.
The same +19 aggregate that looked fine across the dataset is, broken out by cohort, +47 for one population and -85 for another. Two different stories being averaged into something that isn't true of either.
For Trading 212, the action implied by segmentation is different from the action implied by aggregate. The aggregate suggests "everything is fine, marginal improvements." The segmentation suggests one specific UI change for active traders. Full breakdown in the Trading 212 case study.
Segmentation by topic theme in Discord's reviews
Discord's reviews are dominated by a news cycle: age verification, biometric data collection, recent breaches, the Palantir association. Privacy and Security as a category sits at NPS -94. By volume and by negativity, that's most of what shows up.
A team reading the reviews would spend most of their time on those topics, and would be right to in the sense that they are the loudest signal. But "act on the privacy crisis" is largely a legal and policy decision, not a product-team-actionable item. The product team that owns Nitro, Orbs, advertising, and other monetization surfaces sees their reviews drown in privacy noise that has nothing to do with what they own.
The segmentation move that surfaces signal: filter by category. Drop the privacy and security cluster, look at what's underneath in the surfaces a specific team owns.
In Discord's monetization surfaces, three distinct complaint clusters separate cleanly. Structural pricing objections to Nitro at $9.99 a month (23 reviews). A Nitro purchase activation bug where payment goes through but subscription doesn't activate (9 reviews). And a recent Orb video-reward cut from 700 to 200 (10 reviews). Three different kinds of fix, three different owners, separable as long as you don't aggregate them with the privacy noise. Full breakdown in the Discord case study.
The pattern
In all three cases, the aggregate sentiment was misleading or unactionable on its own. In all three cases, the right segmentation surfaced specific, customer-articulated, actionable findings.
Three different segmentation moves did the work:
- By suggestion intent. Count the specific actionable asks customers are making. Useful when reviews are clustered around an event (a launch, a redesign, a price change) and the aggregate is dominated by reaction rather than direction.
- By user cohort or product surface. Split the dataset by which type of customer or which feature is being talked about. Useful when a product serves multiple distinct populations whose experiences are different.
- By topic theme. Filter out a dominant news topic to see what's underneath. Useful when a controversy or external event is drowning the product signal.
Different review piles call for different segmentation moves. The same dashboard built from reviews can be segmented all three ways depending on what story you're trying to surface.
What this means for product teams reading their own reviews
If your aggregate review score doesn't change much month over month, that doesn't mean nothing is happening in your reviews. It probably means a few different stories are averaging out to the same number. Re-cut the data and the stories show up.
A useful starting question is: what's the dominant complaint cluster, and what would the data look like if I filtered it out? Often the most actionable signal is the second-loudest one.
A useful follow-up question is: which suggestions are customers literally asking for, by count? Customer-articulated requests have a specificity that sentiment scores don't. They give you a roadmap line, not a vibe.
A useful third question is: who is each cohort of reviewer? A beginner has a different relationship with the product than a power user. A monthly subscriber has different priorities from a free user. Reviews aggregated across these cohorts blur each one.
Most companies don't run these segmentations because each one is a day or two of reading reviews and tagging them. Sunbeam's dashboards do this in a few hours from public reviews, no integration required. The MyFitnessPal, Trading 212, and Discord dashboards linked from this post were each built without access to those companies' internal data.
Run this on your own reviews
If your reviews look noisy or undifferentiated, the same analytical moves take minutes to set up against your own App Store reviews. Try it at sunbeam.cx/try.