TEDS Data Dictionary

LLC NHS data linkage

Contents of this page:

Introduction

This page describes the sample and structure of TEDS datasets that are made available to researchers within the LLC TRE, where the twins' TEDS data are linked to NHS medical records. This page does not give a full explanation of the linkage procedure, which is described in more detail on the main TEDS website ( https://www.teds.ac.uk/). The outline below is given in order to provide a description of how the linked twin sample has been established.

LLC is the UK Longitudinal Linkage Collaboration, led by the universities of Bristol and Edinburgh and including a range of longitudinal research studies including TEDS. Their website is https://ukllc.ac.uk/.

The TRE is the Trusted Research Environment managed by the LLC. Any researcher wishing to analyse the TEDS data linked to NHS medical records may only do so by applying for access to this highly secure research environment.

The linkage of TEDS twin data to twin medical records, within the LLC TRE, follows approval by the CAG: the Health Research Authority Confidentiality Advisory Group. Their web site is https://www.hra.nhs.uk/about-us/committees-and-services/confidentiality-advisory-group/. The linkage has gone ahead with full ethical approval from the NHS.

TEDS has approval to carry out the linkage under Section 251 of the NHS Act 2006. Under Section 251, TEDS is authorised to link twin medical records after carrying out fair processing. The fair processing involves making contact with those twins for whom we have recent and reliable address and email details, to fully inform them of the planned linkage and to give them an opportunity to opt out. The fair processing materials sent to twins are provided on the TEDS website (linked above), including the TEDS Medical Record Linkage Information Sheet, and an animation describing the linkage.

Every twin who has been in contact with TEDS in recent studies (since age 21), and for whom we hold address and email details believed to be correct, has been sent the fair processing materials both by post and by email. These communications were followed by postal and email reminders. On each occasion, twins are invited to contact TEDS if they wish to opt out of the NHS linkage. This process is ongoing, and twins may opt out at any time.

The linkage sample

The sample of twins whose NHS data are linked within the LLC TRE has been established through fair processing as outlined above. The sample includes twins who meet the following criteria:

  • They participated recently in TEDS, so we are confident that their contact details are correct.
  • They have been sent the fair processing materials by post (with a postal reminder) and by email (with an email reminder).
  • They have not asked to opt out of the linkage.
  • They have not withdrawn from TEDS.

At the time of writing, the sample sent for linkage includes approximately 10,700 individual twins. Roughly 80% of them are paired twins and the remainder are unpaired twins.

The actual number of twins linked to NHS medical records within the LLC TRE may be smaller if some of the twins cannot be identified and linked. Twins who have opted out nationally, independently of TEDS, will also be removed during the linkage process. Once placed in the LLC TRE, all twin records are de-identified, and it is not possible for TEDS staff or researchers to determine which twins have or have not been linked successfully, nor to determine which twins have opted out nationally.

The linked twin sample will change gradually over time. More twins will be sent the fair processing materials if and when a recent contact is established; and more twins will decide to opt out or withdraw from TEDS. TEDS can send an updated sample for linkage every 3 months. This means that the linkage sample, and versions of linked datasets, are likely to change at 3-month intervals.

Dataset details

Single entry

As described above, the sample of twins whose NHS data are linked, and whose TEDS data will be placed in the LLC TRE, is determined by the fair processing. Twins are removed from the dataset samples if (a) they are not contacted during fair processing, in other words if they are not recent participants with known contact details; (b) if they have opted out of linkage following fair processing; or (c) if they have withdrawn from TEDS at any time.

The TEDS datasets uploaded to the LLC TRE will include only this sample of twins, which is somewhat smaller than the sample in conventional TEDS datasets. Those twins not in the sample must have their data removed from the datasets placed in the LLC TRE. In many cases, only one twin from a given pair is selected and may have their data included.

The data for ineligible twins cannot easily be removed from conventional TEDS double-entered datasets, because this would involve removal of twin data from some rows and cotwin data from other rows.

Therefore, ineligible twins are conveniently removed by the use of single-entered datasets. Each row of a TEDS single-entered dataset contains data for the twin, but not for the cotwin. The single-entered datasets uploaded to the LLC TRE therefore contain the twin variables having names ending in "1", but not the cotwin variables having names ending in "2".

Note that the linked NHS datasets within the LLC TRE will also be single-entered.

If double-entered datasets are required for twin modelling analysis within the TRE, then it will be necessary for the researcher first to merge the required datasets then convert from single-entry to double-entry. TEDS will be able to assist by providing sample scripts that will achieve this.

All data specifically relating to an ineligible twin will be removed from each single-entered TEDS dataset. This will include all data provided by the twins themselves, and data provided by any teacher of the twin in earlier studies. In data provided by the parent of the twin, data reported by the parent specifically describing that twin will be removed, for example the parent-reported SDQ measure. However, parent data describing the home environment, for example the home CHAOS measure, will be included in the dataset if it does not refer specifically to the given twin.

Dataset variables

Nearly all variables in the TEDS datasets uploaded to the LLC TRE will be familiar variables that are fully documented within this data dictionary.

As described above, the TEDS datasets in the LLC TRE will be single-entered. They will contain variables describing each included twin (variable names ending in "1") and per-family variables originating from parent questionnaires (variable names ending neither in "1" nor in "2"). However, they will not include variables describing the cotwin of each included twin (variable names ending in "2").

The datasets will additionally include some new variables that have been created or derived especially for use within LLC datasets. These are described in the following table.

Variable Examples Coding or values Description
STUDY_ID STUDY_ID Hashed string values Unique, pseudonymous twin identifier used in all TEDS datasets in the LLC TRE, including the linked NHS datasets. See the scrambled IDs page for more information.
LLC age variables gpbLLCage (twin age when the 7 Year parent booklet was returned);
zmhLLCage1 (twin age when the 26 Year MHQ was collected)
Integer number of months Twin age in months, for use in LLC datasets, replacing the usual TEDS age variables used elsewhere. See the background variables page for more information.
LLC date variables aLLCdate (year and month when the 1st Contact booklet was returned);
pcwebLLCdate1 (year and month when the 16 Year twin web tests were started)
String in format 'yyyy-mm' Partial date, in the form of year and month, for use in LLC datasets. Such 'timestamp' variables are required by the LLC for the purpose of sequencing TEDS events (like questionnaire measures) relative to events in the NHS data (like diagnoses). See the background variables page for more information.

There are many LLC age and date variables, one for each of the main TEDS data collections. For derivation details, see the derived variables page of the relevant study.

In TEDS datasets uploaded to the LLC TRE, these special age variables (measured in integer months) replace the age variables more usually used in TEDS datasets used in other contexts (where age is measured in decimal years). The ages in months are designed to be compatible with the LLC date variables that provide the month and year of each data collection. The background variables page provides more details about these variables.

Shared datasets

Access to the TEDS datasets and linked NHS data within the LLC TRE will be subject to approval firstly by TEDS and secondly by the UK LLC, through application processes that are not described here. Successful application will further be subject to agreement and signing of contractual documents.

When these formalities have been completed, the researcher will be given access within the LLC TRE to three datasets:

  1. The NHS medical record dataset(s). These datasets will be documented elsewhere, not on this page.
  2. A generic TEDS dataset of background and demographic variables. The variables in this dataset are outlined below.
  3. A customised TEDS dataset for the project, as agreed through the data request process.

All three datasets will include the STUDY_ID variable, as mentioned above, a unique but de-identified twin identifier that will allow the datasets to be linked or merged together.

Each dataset uploaded to the LLC TRE is limited to a maximum number of 1024 variables. If a project requires a greater number of variables in the customised TEDS dataset, then the dataset will effectively be divided into two or more parts; each part will contain the STUDY_ID variable, and each will be uploaded for the researcher's use. The researcher will then be required to merge them together within the LLC TRE.

The generic TEDS dataset of background variables will be shared with all TEDS researchers who are granted access. Most of the variables in this dataset are those described on the background variables page:

  • randomfamid and twin. These will enable researchers to identify paired twins with a common family identifier, and to double-enter the datasets if needed.
  • random, for filtering one twin per pair.
  • The twin sex and zygosity variables.
  • The standard exclusion variables.
  • The standard genotypic covariates, for use with any polygenic scores included in the customised dataset.

The generic dataset will include the following additional variables from the 1st Contact dataset (parent-reported) and from the 26 Year MHQ dataset (twin self-reported):

  • Ethnic origin: aethnic and zmhethnic1.
  • Parent SES: ases and its components amohqual, afahqual, amosoc, afasoc, amagechl.
  • Twin SES: zmhses1 and its components zmhhqual1, zmhempinc1, zmhecvul1.
  • Language spoken at home, at time of 1st contact: alang.

The generic dataset will also include the following special LLC variables, as described above on this page:

  • STUDY_ID
  • aLLCdate, the month and year when 1st Contact data were collected.