Resources
This paper form the Robert Wood Johnson Medical School outlines a step-by-step process for verifying that data values are correct or, at the very least, conform to some a set of rules through the use of a data cleaning process.
Contents
- A Sample Data Set
- Description Of The File Patients,Txt
- Checking For Invalid Character Values
- Using A Data Step To Identify Invalid Character Values
- Using Proc Print With A Where Statement To List Invalid Data Values
- Using A Where Statement With Proc Print To List Out-Of-Range Data
- Using User Defined Formats To Detect Invalid Values
- Checking For Invalid Numeric Values
- Using Proc Means, Proc Tabulate, And Proc Univariate To Look For Outliers
- Using A Data Step To Check For Invalid Values
- Using Formats For Range Checking
- Extending Proc Univariate To Look For Lowest And Highest Values By Percentage
- Creating Another Way To Find Lowest And Highest Values
- Checking A Range Using An Algorithm Based On Standard Deviation
Sources
Cody, R. Robert Wood Johnson Medical School, Dept of Environmental and Community Medicine. (n.d.).Data cleaning 101. Retrieved from website: http://web.archive.org/web/20060212051538/http://www.ats.ucla.edu/stat/s... (archived link)
'Data cleaning 101' is referenced in:
Method