I Used Free ChatGPT To Analyze Survey Data. Here's What I Learned.

1. You collect this type of survey data.

Here’s a familiar scenario. You survey your clients, participants, donors, volunteers, etc. and you include some “Other, please specify” options or other “open-ended” questions to better understand respondents’ opinions, experiences, etc.

2. But you don’t know what to do with it.

You collect your survey data but don’t have the time and/or analytical skills to deal with this qualitative data.* Maybe you create one of those horrible word clouds or, even more likely, you just analyze the quantitative data and ignore the qualitative data.

If you had the time and know-how, you might have “coded” the data in order to analyze it. This involves assigning themes to each open-ended survey response in Excel (or the like) or perhaps using one of these free tools.

*Quantitative data is numerical, countable, or measurable. Qualitative data is interpretation-based, descriptive, and relating to language.

3. But what about AI?

You’ve heard that it’s supposed to make tedious, repetitive tasks much easier, and coding survey responses certainly qualifies as both. Could you use the free version of ChatGPT to get this job done? I shared your curiosity and gave it a try. Bottom line: It helped to identify themes to use as codes but it didn’t do all the work for me. For a little longer description of my experience, keep reading.

4. Prepare for AI.

I watched this YouTube Video based on this article to learn how to craft a prompt that would likely get me what I wanted. I also found free survey data on the City of Chicago Data Portal to use for my experiment. The survey asked 43rd Ward residents about “other priorities” for their ward. I thought I could just upload the CSV data file to Chat, but it turns out you need the paid version for that. So I ended up pasting in the survey answers after entering the prompt. Also note that I used publicly available data. You should think twice about entering any type of sensitive data into Chat.

5. Craft the prompt.

Here it is. Yes, it’s long and yes, I say “please,” although I’m not sure if that affected the results!

6. Here’s what happened.

I first tried pasting in the prompt plus the data but that was too long for Chat. So I had to feed the data (all 23 pages) in batches of about 3 pages at a time and despite entreaties to Chat to update the charts based on ALL of the data I shared so far, it only gave me charts for the last batch I had entered, and I had to combine them in Excel. At first I was impressed with the almost instant tables, but I felt my AI assistant wasn’t quite listening to my instructions or just not understanding them. Still, I did develop this list of themes and Chat did code each survey response according to these themes, but I would not feel comfortable relying on these results and would want to possibly combine some of these themes and read through all the responses to see if I agree with the coding.

To see past data tips, including those about other chart types, scroll down or click HERE.


Let’s talk about YOUR data!

Got the feeling that you and your colleagues would use your data more effectively if you could see it better? Data Viz for Nonprofits (DVN) can help you get the ball rolling with an interactive data dashboard and beautiful charts, maps, and graphs for your next presentation, report, proposal, or webpage. Through a short-term consultation, we can help you to clarify the questions you want to answer and goals you want to track. DVN then visualizes your data to address those questions and track those goals.


A Common Problem with Survey Data - And How to Avoid It

Sampling bias occurs when some members of the group you are trying to understand are less likely to be included in your data than others. Survey data is especially vulnerable to sampling bias. The data you collect is often not representative of the whole group that received the survey but rather the subgroup that was willing to complete the survey — as illlustrated in the “sketchplanation” above. So here are some quick tips to avoid this type of bias in your survey data:

  • Clearly define the group and related subgroups that you want to understand. Consider what might be necessary to collect sufficient data from all of the subgroups.

  • Follow up with those who don’t respond to the survey to understand why they didn’t respond. Did you ask the wrong questions or target the wrong audience? Apply these insights next time you are planning a survey.

  • Make your survey brief and easy to understand.

  • Finally, don’t overinterpret your survey date. Assess which types of respondents were the least likely to respond and interpret accordingly.

For a fuller explanation of types of sampling bias and strategies to avoid it, check out this SurveyMonkey article.


Let’s talk about YOUR data!

Got the feeling that you and your colleagues would use your data more effectively if you could see it better? Data Viz for Nonprofits (DVN) can help you get the ball rolling with an interactive data dashboard and beautiful charts, maps, and graphs for your next presentation, report, proposal, or webpage. Through a short-term consultation, we can help you to clarify the questions you want to answer and goals you want to track. DVN then visualizes your data to address those questions and track those goals.


How to Extract Meaning from Survey Data

You just conducted a survey of your clients. participants, board members, visitors, or community members, and chances are you used some Likert Scales in that survey. In other words, you asked respondents to state their level of agreement or disagreement on a symmetric agree-disagree scale. A typical 5-level Likert scale is:

Strongly Disagree - Disagree - Neither Agree nor Disagree - Agree - Strongly Agree

Here’s a FAQ (which is more like a QYSA: Questions You Should Ask) on visualizing Likert Scale data to extract useful information.

Have you collected data just once or multiple times with this survey?

If this is a one-time deal, then I would suggest that you visualize the data using a stacked bar chart. Exactly which type of stacked bar chart depends on what you are trying to understand and show. Check out this Daydreamming With Numbers blog post: 4 ways to visualize Likert Scales. It walks you through various options. If you have collected the data two or more times, read on.

Should I calculate average scores and compare them?

A common way to look at change over time with Likert Scale data is to assign numerical values to each response (e.g. Strongly Disagree: 1, Disagree: 2, Neither Agree nor Disagree: 3, Agree: 4, Strongly Agree: 5) then calculate the average across respondents at two or more points in time and compare them. Some may even use a statistical test (such as a paired sample t-test) to assess whether the averages are “significantly” different. This may seem like an obvious way to deal with the data, but there are problems with it:

  • The distance between 4 and 5 is always the same as the distance between 2 and 3. However, the distance between Agree and Strongly Agree is not necessarily the same as the distance between Disagree and Neither Agree nor Disagree. So we may be distorting respondents’ opinions and emotions by assigning numbers to these response options.

  • Respondents are often reluctant to express strong opinions and thus gravitate to the middle options. Averaging a bunch of middle options (2, 3, and 4) only amplifies the impression that respondents are on the fence.

  • Averages do not give us a sense of the range of responses. The average of these 4 responses (5,1,1,1) is the same as the average of these 4 responses (2,3,2,1). Also averages result in fractional results which can be hard to interpret. Does an increase from 4.32 to 4.71, even if it’s statistically significant, really mean anything in the real world? At best, we can say that the aggregated results changed from somewhere between Agree and Strongly Agree to another place that is a little closer to Strongly Agree.

What are alternatives to calculating averages?

Visualize the spread of responses. If you don’t have too many questions (or can group questions together) some simple side-by-side stacked bar charts might do the trick. See sketch 1 below.

Use the mode or median rather than average.The mode is the number that occurs most often in a data set and may be a good way to describe the data if one response dominated. The median is the middle value when a data set is ordered from least to greatest. If responses tend toward one end of the scale (i.e. are skewed), it may be more reasonable to use the median rather than the average. If you feel that the assumption of equal spacing between response options is legit, then you might stick with the average.

Visualize average, mode, or median using one of the following chart types (see sketches 2-4) to understand and show change over time.

To see past data tips, click HERE.


Let’s talk about YOUR data!

Got the feeling that you and your colleagues would use your data more effectively if you could see it better? Data Viz for Nonprofits (DVN) can help you get the ball rolling with an interactive data dashboard and beautiful charts, maps, and graphs for your next presentation, report, proposal, or webpage. Through a short-term consultation, we can help you to clarify the questions you want to answer and goals you want to track. DVN then visualizes your data to address those questions and track those goals.