Archive data for future use

Putting systems in place to store de-identified data so that they can be accessed for verification purposes or for further analysis and research in the future, researchers can extend the range of the data collection efforts and encourage future innovation and collaboration.

Data can be archived for future use in a number of ways. It can be uploaded onto a website, stored within an institutional repository, or deposited within a specialist data archive centre.

The advantages of depositing data with a data centre are that data will be protected, regularly maintained and converted to different software formats as technology advancement requires. It will also manage any restricted access requirements to sensitive data and allow for easy public dissemination and access (Van den Eynden et al. 2011, p. 4).

Advice for using this method

When creating a data management plan for the archiving of your data, it’s important to consider the following:

  • Data documentation (Van den Eynden et al. 2011, p. 9):
    • Original project aims and objectives
    • Data collection methodology
    • Information on hardware and software used in collection and analysis
    • Dataset structure of files
    • Data quality and data cleaning processes
    • Access restrictions and confidentiality information
    • Variable names, labels and descriptions
    • Codes and classification schemes
    • Definitions of terminology
    • Missing values
    • Weighting and grossing variables
  • Converting data into file formats suitable for long-term, widely accessible storage
    • ​​From Van den Eynden et al. (2011, p. 12)​:
    • ​Quantitative data with extensive metadata: 
      • SPSS portable (.por)
      • Delimited text and command ('setup') file, containing metadata
      • Structured text or mark-up file (e.g. DDI XML file) containing metadata information
    • Quantitative data with minimal metadata:
      • ​​Comma-separated values file (.csv)
      • Tab-delimited file (.tab), including delimited text of character set with SQL data definition statements if necessary 
    • Geospatial data
      • ESRI Shapefile
        • Necessary: .shp, .shx, dbf
        • ​Optional: .prj, .sbx, .sbn
      • Geo-referenced TIFF (.tif, .tfw)
      • CAD data (.dwg)
      • Tabular GIS attribute data
    • Qualitative data
      • eXtensible Mark-up Language (XML) text with appropriate Document Type Definition (DTD) or schema (.xml)
      • Rich Text Format (.rtf)
      • Plain text data, ASCII (.txt)
    • Digital Image Data
      • ​​TIFF version 6 uncompressed (.tif)
    • Digital Audio Data
      • ​Free Lossless Audio Codex (.flac)
      • Waveform Audio File Format (.wav)
    • Digital Video Data
      • ​​MPEG-4 (.mp4)
      • motion JPEG 2000 (.jp2)
    • Documentation
      • Rich Text Format (.rtf)
      • PDF/A of PDF (.pdf)
      • OpenDocument Text (.odt)
  • Organization of files and folders  (Van den Eynden et al. 2011, p. 13-4)
    • File and folder names should be short but meaningful
    • It’s useful to include some metadata in the file names, such as dates of creation and modification and file type
    • Folders could be organized by:
      • Data type
      • Research activity
      • Material
  • Data quality
  • Secure storage of data 
    • Storage facilities should be safe from fire and flood
    • Digital media storage should be regularly checked and upgraded every two-five years
    • Controlled access should be put in place to avoid non-authorized use
  • Ethics and consent
    • Informed consent
      • Participants should be made aware of how their data will be stored, used and archived in the future, and what measures will be taken to protect their anonymity before they give their consent
    • Anonymizing data 
      • Easiest to do this during the process of collection and analyzing, rather than at the end of the project
      • Keep an original copy of the data and a record of changes made, which should be stored separately to the edited files
      • Identifying information should be removed or aggregated
      • Generalize specific details
      • Replace named persons with pseudonyms

Resources

Sources

Van den Eynden, V., Corti, L., Woolard, M., Bishop, L. and Horton, L. (2011). Managing and sharing data: Best practice for researchers. UK Data Archive, University of Essex: Essex.

'Archive data for future use' is referenced in: