Data Cleaning 101

This paper form the Robert Wood Johnson Medical School outlines a step-by-step process for verifying that data values are correct or, at the very least, conform to some a set of rules through the use of a data cleaning process.

Contents

  • A Sample Data Set
  • Description Of The File Patients,Txt
  • Checking For Invalid Character Values
  • Using A Data Step To Identify Invalid Character Values
  • Using Proc Print With A Where Statement To List Invalid Data Values
  • Using A Where Statement With Proc Print To List Out-Of-Range Data
  • Using User Defined Formats To Detect Invalid Values
  • Checking For Invalid Numeric Values
  • Using Proc Means, Proc Tabulate, And Proc Univariate To Look For Outliers
  • Using A Data Step To Check For Invalid Values
  • Using Formats For Range Checking
  • Extending Proc Univariate To Look For Lowest And Highest Values By Percentage
  • Creating Another Way To Find Lowest And Highest Values
  • Checking A Range Using An Algorithm Based On Standard Deviation

Source

Cody, R. Robert Wood Johnson Medical School, Dept of Environmental and Community Medicine. (n.d.).Data cleaning 101. Retrieved from website: http://web.archive.org/web/20060212051538/http://www.ats.ucla.edu/stat/s... (archived link)

0
No votes yet
Rate this Resource:
This resource is useful for:
A special thanks to this page's contributors
Author
Melbourne.

Comments

Anonymous's picture
Danielle McArthur
Rating: 
0

This resource is no longer available at this url.

Add new comment

Login Login and comment as BetterEvaluation member or simply fill out the fields below.