Data Cleaning 101

This paper form the Robert Wood Johnson Medical School outlines a step-by-step process for verifying that data values are correct or, at the very least, conform to some a set of rules through the use of a data cleaning process.

Contents

  • A Sample Data Set
  • Description Of The File Patients,Txt
  • Checking For Invalid Character Values
  • Using A Data Step To Identify Invalid Character Values
  • Using Proc Print With A Where Statement To List Invalid Data Values
  • Using A Where Statement With Proc Print To List Out-Of-Range Data
  • Using User Defined Formats To Detect Invalid Values
  • Checking For Invalid Numeric Values
  • Using Proc Means, Proc Tabulate, And Proc Univariate To Look For Outliers
  • Using A Data Step To Check For Invalid Values
  • Using Formats For Range Checking
  • Extending Proc Univariate To Look For Lowest And Highest Values By Percentage
  • Creating Another Way To Find Lowest And Highest Values
  • Checking A Range Using An Algorithm Based On Standard Deviation

Source

Cody, R. Robert Wood Johnson Medical School, Dept of Environmental and Community Medicine. (n.d.).Data cleaning 101. Retrieved from website: http://www.ats.ucla.edu/stat/sas/library/nesug99/ss123.pdf

0
No votes yet
Rate this Resource:
This resource is useful for:
A special thanks to this page's contributors
Author
Melbourne.

Comments

There are currently no comments. Be the first to comment on this page!

Add new comment

Login Login and comment as BetterEvaluation member or simply fill out the fields below.