Clean Data Tell Clear Stories


Many nonprofits have entry-level staff or multiple staff entering data into management information systems or spreadsheets. The result can be “dirty” data — data with a troubling level of inaccuracy because it has not been entered correctly and/or consistently. If, for example, Michael Smith is entered twice, once with a middle initial and once without, then tracking his progress through your program will be difficult.

To make sure data is accurate and thus of any value at all, it should be regularly "cleaned." A few simple procedures for cleaning data include:

  • Spell Check: Use a spell checker to find values that are not used consistently, such as a program name.
  • Remove Duplicate Rows or Entries: Sort data or use conditional formatting in Excel to find duplicates. Filter data for unique values to find near duplicates.
  • Find and Replace: Use find and replace function to correct data entered incorrectly in multiple rows or entries.
  • Use Upper Case and Trim: Change all text to upper case and remove extras spaces before and after values to ensure consistency. The UPPER(text) and TRIM(text) functions in Excel can do this.
  • Make Date Format Consistent: Make sure that dates are entered in a consistent format (such as MM/DD/YYYY). Excel has several functions that can help you convert date formats.

See other data tips in this series for more information on how to effectively visualize and make good use of your organization's data.