What can I learn from this page? | Learnings and recommendations on statistical significance in employee surveys |
Who is this guide for? | Account Admins, Survey Admins, |
We’re often asked if differences between populations or scores are statistically significant: “Our Engagement score dropped 5% since last year - is that statistically significant?”
It can be hard to understand if differences between past surveys or between groups are meaningful, especially dealing with people data. Here we explore the concept of statistical significance in the context of employee feedback surveys, like engagement surveys.
What is statistical significance, in layman’s terms?
Statistical significance is the likelihood that an observed difference in scores could be a chance effect if the true underlying difference was actually really zero. For example: “Our engagement score dropped 5% since last year - are employees meaningfully less engaged than last year? Or is that observed score difference merely chance or noise?” In order to answer this question, many people will want to know if a 5% drop is statistically significant.
Why statistical significance can be misleading in employee surveys
There are a range of reasons why we believe statistical significance testing is potentially misleading when interpreting data from employee surveys. Here are some of the main problems:
Statistical tests are very sensitive to sample size and don’t help you understand the ‘size’ of differences (known as effect size). With larger companies and larger groups, you will often find that almost any minor difference is statistically significant. The opposite occurs with smaller companies where even sizable differences may not reach technical significance levels. So for large companies everything is significant and for small companies nothing is significant.
Statistical testing is designed to answer a somewhat abstract question that may not even be applicable to the question we want to ask of our data. Statistical tests assess whether the results we see are likely to represent some unseen greater population of people we couldn’t survey. However, typically we are just trying to understand how the specific people in our teams or companies feel. The results we have are in effect for the entire population we are interested in, which means the results are the results.
Focusing just on significant differences can lead you to be distracted from what are the most important things to focus on. It is more important that you use the impact analysis to guide you on what areas to focus on, and regardless of significance, you are best off then attending to the biggest groups that have the biggest opportunities to improve in that area. (Even better is that our focus agent will help you do this.)
Using statistical significance: What we've learned, and what we recommend
We've found that discussions around sample size, statistical significance, and the like can often derail the true purpose of the survey, which is to receive feedback from your employees and act on it. All analytical tools (including significance testing) should be relied on only for directional guidance and moving to action.
We recommend keeping it simple. One approach is to shift the phrasing to "statistically meaningful" (because statistical significance is very much a defined term, and you have to consider sample size, margin of error, and confidence interval).
In this case, generally speaking:+/- 5 points is statistically meaningful for large departments and the overall company score.
Another approach that we recommend that can be useful is presenting the data in terms of the people they represent. For example, you could ask, "how do we feel that women are almost 4x more likely than men to disagree with the statement "People from all backgrounds have equal opportunities to succeed at my company"?". Using the concept of "the number of people agreeing to something" can be a helpful way to frame the conversation and keep the focus on understanding what your employees are telling you.
📌 Note: It isn't just Culture Amp that approaches the use of statistical significance to determine the meaningfulness of results with significant caution; some reputable scientific journals have actually banned its use in peer reviewed scientific publications.
💬 Need help? Just reply with "Ask a Person" in a Support Conversation to speak with a Product Support Specialist.