TEDS Data Dictionary

21 Year Study Data Files

Contents of this page:

Introduction

This page relates specifically to data collected in the 21 Year study. More general issues relating to the storage and organisation of TEDS data files are discussed on another page.

Raw Data files

These are currently stored in the \System\Rawdata\21yr\ folder, and the list below refers to files and sub-directories within this folder.

  • 21yr.accdb.
    This is the Access database file (2007 format) containing aggregated and cleaned TEDS21 questionnaire data that were collected by means of paper booklets (but not data collected electronically via the CMS or web backup). The database contains booklet data that were entered by two means: optical scanning and manual keying - data entered by these two methods have been aggregated together. It also contains administrative data relating to the data collection, for example dates when participants were contacted.
    This Access database is now treated as the master copy of the paper booklet and administrative data, and is the source of such data for the analysis dataset. The main data tables in the database are TwinPhase1Part1, TwinPhase1Part2 (containing data entered from the phase 1 twin paper booklets), TwinPhase2Part1, TwinPhase2Part2 (containing data entered from the phase 2 twin paper booklets) and ParentPhase1 (containing data entered from the phase 1 parent booklets). There are two admin data tables: TEDS21progress and GgameCovidProgress, containing administrative data relating respectively to the TEDS21 data collections and to the g-game and covid study data collections.
  • \Export\ subdirectory, containing raw data files that have been exported from the Access database. They are csv text files. They collectively contain all the TEDS21 paper questionnaire data, plus essential admin data such as return dates, that are needed for building the dataset (alongside the web data). The files in this subdirectory are:
    • TEDS21admin.csv (admin data from table TEDS21Progress in the Access database, including questionnaire return dates)
    • ParentPhase1.csv (TEDS21 phase 1 parent paper questionnaire data, exported from the Access database)
    • TwinPhase1Part1.csv, TwinPhase1Part2.csv (TEDS21 phase 1 twin paper questionnaire data, exported from the Access database)
    • TwinPhase2Part1.csv, TwinPhase2Part2.csv (TEDS21 phase 2 twin paper questionnaire data, exported from the Access database)
  • \web data files\ subdirectory, containing containing aggregated web and app questionnaire/test data files. There is one file for each of the main data collections at age 21+. In each file, identifying fields (like names) have been removed. The files have been saved in csv text format. These files were aggregated, with some cleaning, from the raw analysis files that were originally downloaded from the web and app servers. In the case of each TEDS21 data collection, web and app data have been aggregated together into a single file. The web data files are as follows.
    • 21yr_twin_phase1_TEDS21.csv (TEDS21 twin phase 1 app and web questionnaire data).
    • 21yr_twin_phase2_TEDS21.csv (TEDS21 twin phase 2 app and web questionnaire data).
    • 21yr_parent_TEDS21.csv (TEDS21 parent phase 1 app and web questionnaire data).
    • 21yr_ggame.csv (twin g-game study web test data).
    • 21yr_covid1.csv (twin covid phase 1 study web questionnaire data).
    • 21yr_covid2.csv (twin covid phase 2 study web questionnaire data).
    • 21yr_covid3.csv (twin covid phase 3 study web questionnaire data).
    • 21yr_covid4.csv (twin covid phase 4 study web questionnaire data).

Data flow

The 21 Year study involved a number of independent data collections, from different participants (parents and twins) and in distinct phases (phase 1, phase 2, g-game, covid phases). In each TEDS 21 data collection (but not in the g-game or covid studies), participants had a choice of methods of providing data: paper booklets, web (backup system) or app/web (CMS system). In addition, TEDS 21 paper booklets were entered by two methods: optical scanning and manual keying. As a result, there were originally multiple raw data files for each TEDS 21 data collection.

The TEDS21 data files have subsequently been simplified and aggregated as far as possible, in order to simplify the dataset construction process. Firstly, data from the paper booklets, whether optically scanned or entered manually, have been aggregated together within the Access database. Secondly, data from the web have been aggregated together with data from the app, so there is now a single web+app csv data file for each TEDS21 data collection (parent, twin phase 1, twin phase 2).

This process of aggregation has simplified the task of building the 21 Year dataset from the raw data. Further details are in the page describing the processing of 21 Year data, but the overall data flow is outlined here:

  1. TEDS21 app and web data. For each data collection (parent, twin phase 1, twin phase 2):
    1. At the end of data collection, there were separate raw data files of web and app data downloaded from the servers.
    2. Each raw data file was cleaned, removing any direct identifiers, and recoding each variable to be consistent with the paper data coding.
    3. The cleaned app and web data files were aggregated together. Duplicates, in the form of twins who had completed the questionnaire twice, were eliminated.
    4. Where text were collected (e.g. for some medical conditions), these were numerically coded and added to the same file.
    5. A single, cleaned raw data file was saved. This is the csv file, named above and stored in the \web data files\ folder.
  2. TEDS21 paper data. For each data collection (parent, twin phase 1, twin phase 2):
    1. During data collection, some paper questionnaires were optically scanned, into a set of raw text files. Other paper questionnaire were manually entered, directly into the Access database.
    2. At the end of data collection, all the raw scanned files were aggregated, cleaned, and inserted into the Access database together with the paper data. Cleaning involved correcting any coding differences and elimination of any duplicates.
    3. Where text were collected (e.g. for some medical conditions), these were numerically coded in exactly the same way as for the app/web data. The coded data fields were added to the database tables alongside other item variables.
    4. The cleaned and aggregated paper data are now stored in the tables of the Access database, as described above.
    5. Prior to dataset construction, data from the Access database tables are exported into csv text files in the \Export\ folder as listed above.
  3. G-game and covid study data. For each data collection (g-game and covid phases 1, 2, 3 and 4):
    1. At the end of data collection, raw data files were downloaded from the web server. There were in fact two raw files per data collection, containing summary data and detailed item data respectively.
    2. The two files were merged together and cleaned. Cleaning involved recoding of variables, removal of any direct identifiers, and incorporation of any coded data from recoded text items.
    3. A single, cleaned csv text file was saved in the \web data files\ folder. File names are listed above.
  4. Building the dataset. Each of the cleaned and aggregated files, as described above, is exported into SPSS where they are merged together for further processing. This subsequent processing is described in more detail in the 21 Year processing page.

Dataset files

These files are currently stored in the \System\Datasets\21yr\ folder. The following list refers to items within this folder.

  • Udb9456_full.sav - the SPSS version of the full 21 Year dataset, including every variable
  • \working files\ - this subdirectory contains various intermediate files, saved during the process of converting the raw data into the dataset. These files include working datasets TEDS21parent, TEDS21twin, ggame_and_covid, u2merge, u3clean, u4derive, u5label, u6double (all .sav files), saved from scripts 1 to 6. The latter file is identical (except for the name) to the full dataset mentioned above.

Syntax files (scripts)

These files are currently stored in the \System\Scripts\21yr\ folder.
Note that these are SPSS syntax files. The names of the scripts are U1a_import_TEDS21, U1b_import_ggame_and_covid, U2_merge, U3_clean, U4_derive, U5_label, U6_double (all .sps files). The processing carried out by these scripts is described on another page.