A Simple and Powerful Way To Extract Meaning From Your Data

May 23, 2018

Here’s an easy, quick, and powerful way to visualize your data. It leads to significant insights in seconds.

You have probably seen quadrants graphs. People, programs, or projects are graphed along two measures, one on the Y-axis and one on the X-axis. The graph is divided into four quadrants based on the average or midpoint (or some other meaningful dividing point) of the two measures. That makes it sound more complicated than it really is. Check out the quadrants graph above. Each circle is a participant in a tutoring program. The measures are: grade point average (Y-axes) and attendance in weekly tutoring (X-axis). So participants in the:

Top right quadrant are above average on both their tutoring attendance and their GPA.

Top left quadrant are above average on GPA but below average on tutoring attendance.

Bottom right quadrant are below average on GPA but above average on tutoring attendance.

Bottom left quadrant are below average on both GPA and tutoring attendance.

Well, if the tutoring program is designed to boost GPA, then you’d hope to see most of the participants in the top right quadrant. Or you’d at least want to see participants who are low on attendance also low on GPA. But if there are participants in the other quadrants, we need to figure out why these particular students defy our predictions. For example, what else in going on with participants with high attendance/low GPA that might be undermining their progress?

Other measure pairs of interest to many nonprofits might include:

Value/Action: How do staff members who value a certain program or curriculum actually perform in putting that program or curriculum into action? If not well, why not? (Survey data would be needed here.)

Cultivation/Donation Level: Are the donors you are cultivating the same ones making the largest donations to your organization? If not, why not?

Cost/Funds Raised: Did the highest cost fundraising events result in the most funds raised? If not, why not?

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

How To Change The Behavior of Your Participants And Donors With Data

May 16, 2018

Sometimes it's best to harness the power of the crowd rather than resist it.

Sometimes we do the right thing not because it’s the right thing but because (wait for it) other people are doing it. And this doesn’t only apply to middle schoolers. It’s all of us. Sociologists call it “social influence,” and it can be a powerful force for good or ill. What does this have to do with data? Well, to follow the lead of others, we must first know what they are doing. And that’s where data comes in.

We all know that teens' friends' drinking habits can affect their own. So a common approach to reducing substance abuse among adolescents is to encourage them to resist the influence of peers. Yet, research evidence suggests that rather than attempting to tamp down the power of social influence, we would do better to harness it. Consider an intervention called “normative education” designed to reduce substance abuse among students. Rather than subjecting young people to long lectures or counseling, this approach is simply about sharing data. Students are shown data about the prevalence of drinking among their peers, which is usually lower than kids expect. This information, in turn, reduces substance abuse among all students in a school, more so than does resistance training. (Check out the research evidence to learn more.)

So if we want to change the behavior of our clients, participants, visitors, or donors, we should consider making data visible about what others, like them, are doing. Take the case of donors. Over a century ago, two YMCA executives developed a potent fundraising strategy that relied on the social influence. As told by Steve MacLaughlin in Data Drive Nonprofits, the strategy included time-bound fundraising campaigns that focused on sharing information with prospective donors about major gifts already made by prominent others. They also published campaign clocks and thermometers to keep the public apprised of their progress and of the urgency to make gifts before the campaign deadline.

This doesn't mean we should give up on convincing clients, participants, visitors, or donors to do something differently, but we also should consider simply sharing data with them.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

The Why of the Y (Axis)

May 9, 2018

When was the last time you pondered the Y-axis? Wait, you might be thinking, what’s the Y-axis again? In a typical chart (think bar chart, line graph), it’s the vertical axis. For example, in a bar chart, the Y-axis indicates the length of the bars and thus how much of something is in a category or at a certain point in time. The categories or time periods are indicated along the horizontal or X-axis. You might recall from your middle school days that the X and Y axes are part of the Cartesian coordinate system that René Descartes (pictured here) invented in the 17th century.

What's to ponder about a Y-axes? Well, at least two questions:

1. What should be the lowest and highest number on the axis?

2. What should be the interval between numbers on the axis?

The lowest point is called the “origin.” It’s where the Y and X axes intersect. Much has been written about the importance of starting the Y-axis at zero because, when you don’t, you can make a small difference look like a big one (see the two bar charts below for a case in point.) However, when all the numbers you are charting are not anywhere near zero, then starting at zero can make differences hard to detect (see the two line graphs below for examples.) And, if your high points are too high, your data will be crammed into the upper part of your chart, leaving a lot of useless empty space below.

Think of the low point and the high point as reference points for your data. Do you want to show progress compared to historic low or high points? Do you want to show progress in relation to goals? The answer to such questions will help you decide where to start and end your Y-axis.

As for what falls in between these two points, you should consider the range of your data points and how much accuracy and ease your viewer will want. If the data ranges from 2 to 12 with slight differences between points, then you might want intervals of .01. However, if the data ranges from 6 to 10,000, then intervals of 10, 100, or even 1,000 might be sufficient to give viewers a general, easy-to-interpret sense of the data.

If you’d like to ponder the Y-axis a bit further, check out this great video from Vox.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Frans Hals - Portret van René Descartes, André Hatala [e.a.] (1997) De eeuw van Rembrandt, Bruxelles: Crédit communal de Belgique, ISBN 2-908388-32-4.

Drill Into Your Data

May 2, 2018

I’m not sure how a term from construction and dentistry became so ubiquitous in the world of data analysis. Perhaps because when you are “drilling down” into data, you are going deeper. Drilling down means viewing data at increasing levels of detail. (By the way, the word for viewing data at decreasing levels of detail is called “rolling-up”, a culinary term?)

Applications such as Tableau and Qlik Sense allow you to create interactive data visualizations, which means users can use filters to drill down into the data. If, for example, you see an overall downward trend in program participation, you might want to see if the trend holds for subgroups of participants such as women, men, or those in certain age groups.

Why is drilling down important? Because it helps you to identify both strengths to build on and problems to address. An overall upward trend hides problems in subgroups. Perhaps participants in a certain age group are not doing as well as others in a substance abuse program. Conversely, overall negative results hide positive findings. For example, although on average the wages of participants in an employment program have gone down, they may have increased for a subgroup who entered the program after a certain date.

If, after using various filters, it appears that results vary significantly across a certain type of category, you might want to create several small visuals and place them side by side to more easily make comparisons among subgroups (for more on this, see Tip #25).

Bottom line: Don’t only look at the forest. Check out groups of trees to get the whole story.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

How To Consume Data

April 25, 2018

Charts, graphs, maps, and other types of data visualizations (aka “data viz”) often pull me in, especially if they are visually striking. But until I became versed in the art and science of data visualization, even dazzling charts often would frustrate me. I could not extract their meaning quickly and thus moved on.

There are five steps in quickly consuming a data viz. I know that doesn’t sound quick, but most steps take only seconds to do. In each step, you answer a simple question. The questions are:

1. What’s this about? What question is it answering?

This first question comes from a 1940 classic book called How To Read A Book by Mortimer J. Adler. Adler maintains that you don’t save time on books by learning to speed-read. Instead, you save time by making an informed decision about what to and what not to read. And the best way to make this decision is to do an “inspectional read” which means skimming through titles, headings, tables of context, etc. Similarly, when you encounter a chart, map, or graph in text, skim over it by reading the title and subtitle, and any captions or annotations. Then determine what its about and, more specifically, what question it is trying to answer.

2. What’s my guess about the answer to that question?

This might seem like an unnecessary step, but studies have shown that comprehension increases when a reader forms questions about a text before consuming it. A question primes your brain for an answer. The more our curiosity is piqued, the easier all learning becomes.

3. What’s the quality of the data?

This might be the most important step and the least likely to be taken. At least determine the source of the data and whether the source appears to be reliable and credible. True, individuals will disagree on which sources are reliable and credible. Some of us, however, might be wary of data from institutions with clear political leanings or agendas. If no data source is noted, the viz is not worth your time.

For extra credit, look for information on what is and what is NOT included in the data. Consider, for example, the time period of the data and the demographics of people represented by the data. You are trying to determine if the data are equal to the task of the visualization. Can it really answer its question(s)? Or are there gaps in the data that weaken its ability to answer the questions fully or at all?

4. What more can I learn from the structure of the viz

If you have gotten this far, you are engaged by the viz. Now consider what it all means. A visualization is, by nature, an abstraction of reality. It shows data collected in the real world using position, color, shape, and size to represent the data. Thus it’s important to understand what these visual cues mean in the particular viz you are consuming.

5. What is the answer to the question and what questions am I left with?

Finally, consider what answers you see in the viz and how they compared to your expectations. And to prime yourself to consider future information on the same subject, ask yourself what else you’d like to know about it.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Data Viz Vs. Infographic

April 18, 2018

Infographic and data visualization often are used interchangeably. And, indeed, the distinction is not hard and fast. They both focus on showing rather than telling. They explain something using more visual cues than words or numbers and so take advantage of our visual superpowers. (For more on these superpowers, see Tip #1.) The difference is that an infographic is more of a story, and a data visualization is more of a tool.

An infographic typically uses images to lead the viewer through a story. Some of those images might be visualizations of data. For example, the point of this infographic is to show the viewer the negative impact of homelessness in contrast to the positive impact of a program called Ability Housing. Infographics are usually meant to explain or show something to people who are not all that familiar with the topic.

A data visualization, unlike an infographic, uses visual cues (shape, color, size, etc.) primarily to represent data. Think bar chart, line graph, pie chart, and maps. And though the creator of the data visualization may have a story he/she wants to tell, the viewer can use the visualization to discern any number of stories.

For example, on the quadrants chart below, each circle represents an educational strategy. The strategies are plotted along two measures: how much importance educators place on the strategies and how often they put these strategies into practice. We can use this chart as a tool to decide what to do next. Clearly, most of the educators represented in the data already feel these strategies are important. But they use the tactics less than 50 percent of the time. So we need not waste time explaining the value of the strategies to them. Instead, we should figure out what is getting in the way of their implementing the strategies.

So if you are looking to tell a specific story particularly to an outside audience, consider an infographic. If you are looking for a tool to explore data, consider a data visualization.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Data Viz Lineup

April 11, 2018

Eyes beat memory according to Tamara Munzner. Her idea (which she shares with others, including myself) is simple. It’s easier to compare two things you can see at the same time than to compare something you can see to something you can only remember.

When several small visualizations are placed side by side (called “small multiples”), you can see the power of eyes over memory. Take a few seconds to check out this great small multiples viz by Doug McCune. You can quickly scan the images to make easy comparisons.

In each chart, the X-axis shows time of day, and the Y-axis shows number of crimes. Daytime crimes are displayed with yellow bars in the top half of the chart. Night-time crimes with blue bars on the bottom.

It’s easy to see that driving under the influence and drunkenness occur more often during the night and trespassing and suicide occur more often during the day. It would be much harder to draw this conclusion flipping through pages or clicking through screens.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Flatten Your Data

April 2, 2018

Bells and whistles can be a problem when visualizing data. Edward Tufte, the grandfather of modern data viz, entreated us to remove any non-data ink. The idea is to focus on what matters — the story the data is telling us — without any unnecessary distractions.

Making visualizations look three-dimensional is almost always a distraction and a distortion. To make something look 3D, you have to use a technique called “foreshortening” which means that parts that are supposed to be perceived as closer in space are larger (see red slice of the pie in the image below), and parts that are supposed to be perceived as farther away are smaller (see green and blue slices). The angles represented on the 2D chart on the left, as you can see, are distorted on the 3D chart on the right, making it more difficult to judge the relative size of the slices.

Another way of creating the illusion of three dimensions is to obscure some objects with others to make it appear that one object is in front of another. But, of course, this is a problem for accurate assessment in a data viz. For example, in the 3D bar chart below, the green bars for "C" are barely visible whereas the flat image shows the green bars clearly.

Is it ever a good idea to make data visualizations look 3D? Yes, but rarely. The rule is simple. Only use 3D visualizations for 3D spatial data such as a diagram showing airflow over a spacecraft. Otherwise keep it flat.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

How to Extract Your Head From The Sand

March 20, 2018

We think we know more than we actually do. In fact, we are wired that way. This illusion helps us to get along in the world. But it also gets us into trouble sometimes. Like when we are planning what our organization should do next.

Overtime, we have relied less on our own abilities to build houses, cure diseases, or fix toilets and more on others’ knowledge in these areas. We each specialize, gaining more in-depth knowledge in one area than any generalist could. Then we trade our knowledge for that of others. Seems like a good idea, but there are downsides.

We so effectively collaborate that, as Sloman and Fernbach argue in The Knowledge Illusion: Why We Never Think Alone, the lines between our understanding and that of others blurs together. We perceive the others’ knowledge as our own, even when that “knowledge” is actually baseless opinion. “This is how a community of knowledge can become dangerous,” according to Sloman and Fernbach.

Every organization has its orthodoxies, but not all of them are true. How can we distinguish the truth from our own and others' deeply-held but false beliefs?

The answer is: data. In other words, we can use the scientific method and put our assumptions to the test. If we cannot find evidence (aka data) that sufficiently refutes our assumptions, we can feel encouraged that we MIGHT be right, as long as new data doesn’t come along and undermine our beliefs.

Progress in organizations — and in all of human history — starts with the concession that we might be wrong. As Yuval Noah Harari suggests in Sapiens: A Brief History of Humankind, the scientific revolution was the point in history when “humankind admits its ignorance and (as a result) begins to acquire unprecedented power.”

On a more modest scale, we can start asking questions like: what would we expect to see in the short and long run if our programs work how we expect them to? And then we can look for data that either supports or refutes our expectations. And if we bristle at spending our time with data when so much else needs to be done, we can make data more digestible by visualizing it. (See Data Tip #1 for more on the power of data visualization.)

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Photo by Tyler Nix on Unsplash

Upcoming Data Viz Workshops

March 6, 2018

This week's tip is short and sweet: bring your planning, fundraising, communications, and assessments to life by visualizing them. Check out my upcoming workshops below. Three are in the Chicago area and one is online.

Data visualization: Using Your Organization’s Secret Weapons To Boost Fundraising and Impact

Tuesday, April 17, 2018, 8:00 am to 9:30 am at the Evanston Community Foundation, One Rotary Center, 1560 Sherman Ave., Evanston, IL.

Click here for more info and to register.

60-SECOND DATA TIP #8 (3).png — Data Visualization Webinar: Using Your Organization’s Secret Weapons To Boost Fundraising and Impact

Thursday, May 24, 2018, 1:00 pm to 2:00 pm online.

Click here for more info and to register.

The Power of Data Mirrors

February 28, 2018

Looking into a data mirror can be a powerful experience. In the 60-Second Data Tip series, we have talked quite a bit about nonprofit managers, fundraisers, board members, and funders looking at organization-wide or individual program data to understand what to do next. And last week, we spoke about sharing data charts, graphs, and maps with our clientele to better understand trends. However, data can be a tool not just for planning and evaluation at the organizational level, but for personal change.

You may ask, don’t I already know a lot about myself? Do I really need to consult a data chart for self discovery? Well, research evidence suggests that we often think we know more than we actually do. We are wired to rely on the knowledge of others and sometimes we mistake their knowledge for our own (stay tuned for a tip on this). Also, sometimes we simply are not paying much attention to ourselves.

So a data mirror can be revealing. For example, you may think you spent the whole day on our feet, but the data on your Fitbit may show you otherwise.

As discussed in Data Tip #1, nonprofits tend to have a lot of data that never gets used or used well. Instead, it collects virtual dust on your server. But what if you blew the dust off of some of that data, visualized a single client’s data (e.g. her level of participation in your programs over time) and shared it with her? The data could lead to a conversation about what promoted progress and what stumbling blocks led to downward trends. Regular data feedback can be motivating, as we know from Fitbits and video games and goal-setting apps. Sharing data with clientele could be a secret weapon you didn’t know you had.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Give Voice To Data Points

February 21, 2018

We are, each of us, data points in an unlimited number of data stories. These stories can be told using charts, maps, and graphs. Depending on the topic, you might be a data point lost in a crowd of other points (in other words, you are in the norm) or you might be hanging out on the fringe (aka an outlier). Either way, you hold valuable information that the chart, map, or graph cannot tell on its own.

Let’s say that you’re a dot on a scatterplot. A scatter plot uses horizontal and vertical axes to plot data points. They show the relationship (or correlation) of one variable to another. A scatterplot might show the amount of time spent in a program—say an employment training program—on one axis and an intended program outcome—say current wages—on the other. Each point is a participant in the program.

Of course, you would hope to see that wages increase as duration in the program increases, at least to a certain point. But what if that’s not the case? That’s when it’s good to get the data points' points of view. Share the scatterplot with a few participants with different durations in the program and at different wage levels and ask: "Why do you think the data looks like this?"

They are going to share their personal experiences, their understanding of causes and effects in their own lives. And these stories will help you to understand general trends across all of the participants in the program. Perhaps one will tell you that he dropped out of the program several times when he gained new employment, which reduced his time in the program but also increased his wages. This insight, along with insights from other individuals (either collected informally or through a survey) can lead your organization to program reforms which, in turn, change trends in your data.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Images created by Ilaria Bernareggi and n.o.o.m. for Noun Project.

A Valentine's Day Post On (Data) Relationships

February 14, 2018

Data on any one thing isn’t that interesting. You might know the ages of all the participants in a program or the average grant amounts for a group of foundations. But that doesn’t really tell you anything until you look at that data in relationship to something else. That “something else” usually falls into these categories: time, other data, space, rank, or networks.

Time. Is participation now greater or less than in the past? To see this, you need a line chart with some measure of time, such as each month of a year, along the horizontal (aka X) axis and number of participants along the vertical (aka Y) axis. Each point shows how many people participated in a given month. Connecting the dots gives you a slope, which instantly shows you whether participation is increasing, decreasing, or varying over time.

Other data. You might want to know how participation relates to other data you have on participants. For example, are the ages of participants related to their satisfaction with your program (as reported on a survey)? In this case, you could use a scatter plot with satisfaction scores along the vertical axis and age along the horizontal axes. Each dot shows the age and satisfaction of a single participant. If the dots suggest a rough increasing slope, then older participants are often more satisfied with your program than younger ones. You might then color dots representing females red and those representing males as blue to see if and how gender relates to the age and satisfaction of participants.

Space. To show where participants live in relation to each other and to your organization, participant dots can be placed on a map. If you size the dots to show another factor, such as income, then you have a bubble map.

Rank. These types of charts show your data in relationship to a scale that indicates importance, prevalence, or some other metric. Perhaps the most common type of chart in this category is the tree diagram, which is often used to show reporting relationships among staff in an organizational chart. You might use it to show the educational institutions of participants in your program, starting with school districts at the top, individual schools in the middle, and classrooms at the bottom.

Networks. In network visuals, the relationships among individuals, groups, things, concepts, etc., are shown using connecting lines. For example, you might visualize participants as dots and the connecting lines show what other participants they referred to your program. In this way, you can quickly distinguish frequent referrers from infrequent ones.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Data Is Not The Answer

February 6, 2018

Love might be the answer. But data is not. Data is more a suggestion than a solution. We get data-driven suggestions all of the time: movie suggestions from Netflix, book suggestions from Amazon, mate suggestions form Match.com.

Netflix data can take us only so far. Once we get their suggestions, we then apply knowledge that even Netflix doesn’t have: what mood we are in right now, whom we plan to watch the movie with, etc. Netflix’s suggestions + our knowledge/wisdom can lead to a good decision.

The data we house in our organizations also can make suggestions worthy of our consideration. But we must apply knowledge and wisdom before moving forward. A key source of this information are staff members, at different levels of an organization, who can apply their experience and professional knowledge. Executives are more likely to apply broad knowledge from the field while those on the ground are more likely to apply first-hand knowledge gleaned from experiences with certain programs, clients, etc. Accessing this knowledge is as simple as showing a line chart to staff and asking: why do you think this happening?

Another source of invaluable wisdom is our clientele (service users, participants, visitors, patients, etc.) Unfortunately, many organizations do not tap this resource well or at all. Clientele knowledge will be the topic of a future data tip.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Show Order

January 30, 2018

We visualize data to take advantage of our visual superpowers. And when doing so, we should keep in mind how our mind works. We humans are great at detecting patterns, even when none exists (think conspiracy theories). From an evolutionary perspective, pattern recognition has helped us to understand what we see and make predictions that help us survive and reproduce.

Order is a particular type of pattern. It is the arrangement of people or things in relation to each other according to a particular sequence. So when there is an order to our data, we should show it. Our pattern-seeking minds will thank us for delivering up a real pattern and making it so easy for us to see.

For example, arrange bars on a bar charts in descending order so that viewers can easily pick out the top/bottom or the most/least. In this visualization from The Economist, we can easily see that Japan is the most expensive place to make pancakes (assuming you are buying all of your ingredients there.) It also gives you a sense of what is driving the difference in cost of pancake ingredients: butter in Japan, eggs in Switzerland.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Camel by Tatiana Belkina from the Noun Project

Choose the Right Viz for the Job

January 22, 2018

When you think of visualizing data, your mind probably goes to bar graphs or maybe pie charts. However, there are many more species of visualizations. Ever heard of a waterfall or a circular area chart? Your first decision when visualizing data is what type of chart or graph to choose and that depends on what you want to show and what type of data you have.

I highly recommend Andrew Abela’s simple decision tree called “Chart Suggestions—A Thought-Starter” (see image below). It’s based on Gene Zelazny's classic work Saying It With Charts. The decision tree starts with the basic question: “What would you like to show?” And provides four options:

Comparison. You have two or more groups of things or people and you want to see which group is largest or smallest (or somewhere in between) on some measure. You also may want to see how these groups compare on the measure over time.

Distribution. You have a bunch of data points (e.g. the ages of participants in a program or test scores of students in a class) and you want to know how spread out or bunched up they are. Are most of the ages, test scores (whatever) near the average? Or is there a wide range? Are there some extreme outliers?

Composition. You want to understand who or what makes up a larger group such as how many of the participants in a program are in different age brackets or how many have been in the program for different lengths of time.

Relationship. You want to know if one thing is related to another, either at one point in time or over time. Do participants in a mental health program report less distress over time? Do those with lower incomes have higher heart rates?

Once you answer this basic question, the decision tree helps you to choose a specific chart based on the type of data you have. Abela’s chart chooser includes the types of charts you are most likely to select. But there are more rare species out there. To learn more about the wide array of ways to visualize data, check out the Data Visualization Catalogue.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Color Coordinate Your Data

January 17, 2018

Color is a great tool for drawing attention to certain data points in a graph, chart, map, or diagram. But, WARNING, color also can confuse the viewer. Adopting a few rules of thumb will turn a rainbow of confusion into an elegant and clear picture:

1) Limit one meaning per color. If you are color coding a map and assigning blue to a certain income range, then do not use blue to mean anything else in that map or adjacent related visuals. Blue always means that specific income range.

2) Limited color palate. Limit your graph, chart, map, or diagram to a few complementary or monochromatic colors. Remember the color wheel? (See image above.) Choose complementary colors that are on opposite sides of the wheel: think orange and blue and yellow and purple. Or choose several tones of one color (a monochromatic color scheme). Looking for an effective ready-made color palate? Check out sites like color-hex.

3) Avoid reds with greens. Seven to ten percent of men are red-green colorblind. They can’t tell the difference between the two. So avoid using them both on a visualization.

4) Dial-up one data point and mute the rest. If you want to draw attention to one point, line, bar, or pie slice, give it a bright color and color the rest a muted shade or gray.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

What Averages Obscure

January 10, 2018

Nonprofits (and everyone else) are addicted to averages. We like to talk about how participants do on average. We might describe how many visitors we have in an average week. But how much are we missing when we focus solely on averages? Short answer: it depends, but it could be a lot. If I only showed you the average sized guy in the picture, would you appreciate the full range of sizes?

To figure out what and how much we are missing, we need to calculate—or better yet show—how spread out our data points are. Understanding the spread gives us an idea of how well the average or the median represents the data. When the spread of values in the data set is large, the average obscures the real picture more than when the spread is small.

Spread measures include range, quartiles, absolute deviation, variance and standard deviation. For more on these measures, check this out.

A great way to quickly grasp the spread of your data is to make a box plot. A box plot (aka. box and whisker diagram) shows the distribution of data including the minimum, first quartile, median, third quartile, and maximum. The box plots below show the affordability of neighborhoods in five cities. Each red circle represents a zip code area. The gray boxes show where 50 percent of the zip code areas fall on the affordability scale. And the median is where the dark gray meets the light gray. You can see that, in general (i.e according to the median), New York is more affordable than Los Angeles. However, New York has some zip code areas that are much less affordable than the median seems to suggest.

So when looking at your data, don’t just look at averages, also consider the spread.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Image created by Moxilla for Noun Project.

A Good Cause Is No Coincidence

January 2, 2018

We all know that correlation does not equal causation. Just because something occurs with something else, doesn’t mean that one caused the other. If you do a dance and then it rains, that’s not enough evidence that the dance caused it to rain. Even if it rains almost every time you dance, it could be that something else is causing both the rain and your dancing. Perhaps a drop in barometric pressure causes your joints to hurt and you dance to loosen them up while the same drop in pressure causes rain. It’s a silly example. But you get the point. (Check out this hilarious website which shows other spurious correlations, such as the one between cheese consumption and death-by-bedsheets.)

Nonprofits (and everyone else) often make erroneous claims based on correlation. We might conclude, based on our data, that participation in our employment training leads to higher wages over time. Well, maybe. But perhaps employment in our city is on the rise and affecting everyone, not just participants in our program. Or maybe our program tends to attract participants who are quite motivated to find jobs and would do just as well without the program.

Correlation is necessary but not sufficient to prove causation. Indeed causation is a very high bar to reach. You must have three conditions: 1) correlation: two factors co-occur, 2) precedence: the supposed cause comes before the supposed effect in time, and 3) no plausible alternatives. This third condition is the trickiest. It involves ruling out other causes of the observed effect. So you can see why even carefully designed studies can rarely produce incontrovertible evidence of causation. (For more on establishing cause and effect, read this.)

Short of hiring researchers to design and conduct rigorous (and usually expensive) studies of your work, you can at least consider plausible alternatives. When you observe something good or bad happening in your organization, consider possible causes both within and outside of your organization’s control. If possible, try altering just one factor, collect data over time, chart it, and see if a trend changes. Explore rather than assume.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.

Photo by Ariel Lustre on Unsplash

Simplify

December 20, 2017

This data tip comes from the grandfather of modern data visualization: Edward Tufte. He originally recommended the elimination any non-data ink from data visualizations. Although today we might think more in terms of pixels than ink. The idea is to remove any distractors from the story that a data visualization--such as a bar chart or line graph--shows. Such distractors can include bells and whistle such as bars on a bar chart drawn as people or buildings (Tufte called this “chartjunk”). But there are more subtle distractors like graph lines and background color. The two images here show the same data, but the one on the right is stripped down to the essentials: no graph lines, no axis titles, only the visual information necessary to see the slope and to quantify it. So next time you visualize data, try simplifying so that your story shines through.

Note: 60-Second Data Tips will resume in January 2018. Happy New Year!

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.