Stars Lie, Topics Don't: How to Read Trustpilot Data Without Being Fooled

Feedback Analysis

Make customer feedback your superpower

Want to see how Wonderflow works for your team?

Book a demo

Why a 4.3-star market and a 1.3-star market can be telling the same story.

When analysts compare Trustpilot scores across markets, they often assume they're comparing customer satisfaction. Usually, they're comparing review collection systems.

It's common to find the same brand scoring 4.3 stars in one country and 1.3 in another despite selling the same products, using similar operations, and following the same customer-service policies. The gap looks like evidence of a major performance problem. Taken at face value, you'd conclude that customer service in the second market is more than three times worse.

Most of the time, it isn't.

That gap is one of the most misread signals in Voice of Customer data. Before you brief a turnaround team or escalate to a regional director, it pays to understand what Trustpilot data actually is, how it's collected, and why the headline star number is the least reliable part of it.

What Trustpilot Actually Collects 

Trustpilot is a service-review platform. Each review is a 1–5 star rating plus free text, attached to a company "profile" and for global brands, there's usually one profile per country. It's an open platform: anyone with a genuine buying or service experience can leave a review, and they don't need to have purchased anything or have the business "verify" them.

What matters for analysis is that not all reviews arrive the same way. Trustpilot labels them by collection method:

  • Organic (spontaneous): The customer goes to Trustpilot on their own initiative and writes a review. Nobody asked them to. These reviews carry no invitation label.
  • Invited: The business sent a request, it could be an email or a link, asking the customer to review.
  • Verified: The invitation was triggered automatically by a real transaction (an order confirmation, a shipping notification), so Trustpilot can confirm the experience happened and labels it "Verified."

This distinction is visible on Trustpilot and has a direct impact on the resulting score. Since 2020, Trustpilot's "Transparent Inviting" feature makes each company's invited-versus-organic split publicly visible. And Trustpilot has said that a meaningful share of all reviews on the platform are organic, arriving with no prompt from the business at all.

The bias beneath the stars: spontaneous vs. campaign reviews

Here's the part that breaks naive comparisons. Spontaneous and solicited reviews are not drawn from the same population.

Trustpilot itself acknowledges the skew: organic reviews tend to lean negative, because people are most motivated to seek out a review form after a bad experience. Invited reviews, by contrast, pull in customers who had a perfectly fine experience but would never have thought to post about it unprompted, so they lean positive.

So a market that runs systematic post-purchase review invitations builds a balanced sample: happy, neutral, and unhappy voices all show up. A market that does nothing collects only the spontaneous reviews, which means mostly complaints. Same underlying service quality but wildly different star averages.

A balanced view is important here. Critics, including consumer watchdogs, have noted that invitation-based review programs can produce ratings that are consistently higher than expected. The issue isn't that invited reviews are more legitimate and organic reviews are less so. Instead, each method tends to attract a different set of customers, and the resulting rating reflects whichever group is most represented.

The score amplifies the effect

Even if the review mix were identical everywhere, the TrustScore still wouldn't behave like a simple average because 3 factors are matter, which are: 

  1. Recency: Recent reviews carry more weight than older ones. A surge of complaints over the last few months can move a score quickly even when historical performance was strong.
  2. Frequency:  Businesses with a steady flow of reviews tend to be more stable than those with sporadic bursts.
  3. A Bayesian anchor. New profiles are initially pulled toward a neutral starting point ( 3.5-star reviews) until sufficient review volume accumulates, which pulls low-volume profiles toward the middle until enough real reviews dilute the anchor.

The practical upshot:The star rating shown on a Trustpilot profile is often different from a simple average of the reviews in a dataset. As a result, two businesses that provide the same level of service can end up with noticeably different ratings simply because of differences in how many reviews they receive, how recent those reviews are, and how frequently new reviews are posted.

The Same Pattern Appears Across Online Reviews

This is not an issue specific to Trustpilot or limited to platforms for customer review analytics.

Research on online reviews consistently finds that people with strong opinions are more likely to leave feedback than those with average experiences.

The outcome is a polarized distribution:

  • Many 5-star reviews
  • Many 1-star reviews
  • Relatively few middle ratings

Researchers often describe this as a J-shaped distribution.

When feedback is collected from all customers rather than only the most motivated ones, ratings become less extreme and more representative, which makes the underlying issue a selection bias. Soliciting reviews is one of the few mechanisms shown to partially de-bias the picture, precisely because it coaxes the moderate middle into speaking up.

What to do instead: four rules for reading review data

If you manage Voice of Customer, CX, or e-commerce performance across markets, this is how to keep the stars from misleading you.

1. Segment by collection method before you compare. Before drawing conclusions, separate reviews by collection method and look at the balance between invited and organic feedback. Trustpilot provides this breakdown, and it can make a significant difference. For example, a market with a 1.3-star rating based entirely on organic reviews and a market with a 4.3-star rating driven mostly by invited reviews could, in practice, be delivering a very similar level of service.

2. Read the topics, not just the score.  The most useful signals are the issues customers repeatedly mention. If the same concerns, such as delivery delays or poor customer support, appear consistently across multiple markets, that points to a genuine operational problem, regardless of whether the average rating is 1.3 or 3.7 stars. Ratings can be influenced by how reviews are collected, but recurring customer feedback provides a much clearer view of what needs attention.

3. Normalize for recency and volume: Because the TrustScore gives more weight to recent reviews, it is more useful to track changes in incoming reviews over time than to focus on the lifetime average. An older profile with few recent reviews provides limited insight into current performance, whereas a sudden increase in negative feedback can be a strong indicator of emerging issues.

4. Fix the collection gap and the service gap separately. If a low-scoring market simply isn't soliciting reviews, launching a balanced invitation program will both reveal true sentiment and lift the public score. However, if customers are genuinely experiencing issues with delivery, service, or support, collecting more reviews will not solve the underlying problem. Before taking action, determine whether the low score reflects a review collection issue or a real operational challenge.

The takeaway

Star ratings are useful for attracting attention. The brands that get real value from Trustpilot are the ones that look past the average to the structure of the data: how it was collected, when, and above all, what customers are actually saying.

That is why topic-level Voice of Customer analysis consistently outperforms score-based benchmarking. Once reviews are grouped into themes across Trustpilot, marketplaces, app stores, and owned channels, the focus shifts from how customers rated an experience to what customers actually experienced.

The score gets attention.

The topics explain the problem.

Sources & further reading

  • Trustpilot Help Center — How are reviews collected? (organic vs. invited vs. verified)
  • Trustpilot Help Center — Submitting a Verified review vs. an Organic review
  • Trustpilot press — Transparent Inviting (2020) and the move to a five-point, half-star TrustScore (2019)
  • Trustpilot Business — How your TrustScore works (recency, frequency, Bayesian average)
  • Hu, Pavlou & Zhang — Why Do Online Product Reviews Have a J-Shaped Distribution? (Communications of the ACM / SSRN)
  • Schoenmueller, Netzer & Stahl — Online Review Solicitations Reduce Extremity Bias (Management Science)

The most valuable insights rarely come from the rating itself. They come from understanding what created the gap.

About Wonderflow

Wonderflow helps leading consumer brands transform unstructured feedback into actionable insights. Its AI Product Intelligence platform analyzes millions of online ratings, reviews, surveys, and customer comments, empowering teams to make smarter product, marketing, and customer experience decisions.