A Common Problem with Survey Data - And How to Avoid It

Sampling bias occurs when some members of the group you are trying to understand are less likely to be included in your data than others. Survey data is especially vulnerable to sampling bias. The data you collect is often not representative of the whole group that received the survey but rather the subgroup that was willing to complete the survey — as illlustrated in the “sketchplanation” above. So here are some quick tips to avoid this type of bias in your survey data:

  • Clearly define the group and related subgroups that you want to understand. Consider what might be necessary to collect sufficient data from all of the subgroups.

  • Follow up with those who don’t respond to the survey to understand why they didn’t respond. Did you ask the wrong questions or target the wrong audience? Apply these insights next time you are planning a survey.

  • Make your survey brief and easy to understand.

  • Finally, don’t overinterpret your survey date. Assess which types of respondents were the least likely to respond and interpret accordingly.

For a fuller explanation of types of sampling bias and strategies to avoid it, check out this SurveyMonkey article.


Let’s talk about YOUR data!

Got the feeling that you and your colleagues would use your data more effectively if you could see it better? Data Viz for Nonprofits (DVN) can help you get the ball rolling with an interactive data dashboard and beautiful charts, maps, and graphs for your next presentation, report, proposal, or webpage. Through a short-term consultation, we can help you to clarify the questions you want to answer and goals you want to track. DVN then visualizes your data to address those questions and track those goals.


How to Extract Meaning from Survey Data

You just conducted a survey of your clients. participants, board members, visitors, or community members, and chances are you used some Likert Scales in that survey. In other words, you asked respondents to state their level of agreement or disagreement on a symmetric agree-disagree scale. A typical 5-level Likert scale is:

Strongly Disagree - Disagree - Neither Agree nor Disagree - Agree - Strongly Agree

Here’s a FAQ (which is more like a QYSA: Questions You Should Ask) on visualizing Likert Scale data to extract useful information.

Have you collected data just once or multiple times with this survey?

If this is a one-time deal, then I would suggest that you visualize the data using a stacked bar chart. Exactly which type of stacked bar chart depends on what you are trying to understand and show. Check out this Daydreamming With Numbers blog post: 4 ways to visualize Likert Scales. It walks you through various options. If you have collected the data two or more times, read on.

Should I calculate average scores and compare them?

A common way to look at change over time with Likert Scale data is to assign numerical values to each response (e.g. Strongly Disagree: 1, Disagree: 2, Neither Agree nor Disagree: 3, Agree: 4, Strongly Agree: 5) then calculate the average across respondents at two or more points in time and compare them. Some may even use a statistical test (such as a paired sample t-test) to assess whether the averages are “significantly” different. This may seem like an obvious way to deal with the data, but there are problems with it:

  • The distance between 4 and 5 is always the same as the distance between 2 and 3. However, the distance between Agree and Strongly Agree is not necessarily the same as the distance between Disagree and Neither Agree nor Disagree. So we may be distorting respondents’ opinions and emotions by assigning numbers to these response options.

  • Respondents are often reluctant to express strong opinions and thus gravitate to the middle options. Averaging a bunch of middle options (2, 3, and 4) only amplifies the impression that respondents are on the fence.

  • Averages do not give us a sense of the range of responses. The average of these 4 responses (5,1,1,1) is the same as the average of these 4 responses (2,3,2,1). Also averages result in fractional results which can be hard to interpret. Does an increase from 4.32 to 4.71, even if it’s statistically significant, really mean anything in the real world? At best, we can say that the aggregated results changed from somewhere between Agree and Strongly Agree to another place that is a little closer to Strongly Agree.

What are alternatives to calculating averages?

Visualize the spread of responses. If you don’t have too many questions (or can group questions together) some simple side-by-side stacked bar charts might do the trick. See sketch 1 below.

Use the mode or median rather than average.The mode is the number that occurs most often in a data set and may be a good way to describe the data if one response dominated. The median is the middle value when a data set is ordered from least to greatest. If responses tend toward one end of the scale (i.e. are skewed), it may be more reasonable to use the median rather than the average. If you feel that the assumption of equal spacing between response options is legit, then you might stick with the average.

Visualize average, mode, or median using one of the following chart types (see sketches 2-4) to understand and show change over time.

To see past data tips, click HERE.


Let’s talk about YOUR data!

Got the feeling that you and your colleagues would use your data more effectively if you could see it better? Data Viz for Nonprofits (DVN) can help you get the ball rolling with an interactive data dashboard and beautiful charts, maps, and graphs for your next presentation, report, proposal, or webpage. Through a short-term consultation, we can help you to clarify the questions you want to answer and goals you want to track. DVN then visualizes your data to address those questions and track those goals.