TEDS Data Dictionary

TEDS Glossary

This page gives brief explanations for terms commonly used in the TEDS study, whose meanings in the context of TEDS may differ from their everyday meanings. These terms are all used on other pages in this data dictionary.

The terms are listed in alphabetical order below. To find a term, either scroll down the page or click on a link below.

Definitions

This is an alphabetical list of terms with their definitions in the context of the TEDS study.

Address problem
This refers to a TEDS parent or twin whose recorded address is known to be incorrect. A participant becomes categorised as an address problem when TEDS mail addressed to them is returned undelivered. The address problem category is removed after successful tracing of the participant, i.e. when a new address is confirmed and recorded. While a participant is categorised as an address problem, they will be excluded from TEDS mailings.
Cohorts
In the context of early TEDS booklet studies, 'cohort' typically referred to birth year of the twins: 1994, 1995 and 1996. In the context of more recent TEDS studies, 'cohort' usually refers to school year groupings: cohort 1 for twins born January 1994 - August 1994, cohort 2 for September 1994 - August 1995, cohort 3 for September 1995 to August 1996, cohort 4 for September 1996 - December 1996. These cohort groupings were generally made for administrative reasons: from the 7 Year study onwards, twins' teachers were contacted one cohort at a time, during the school year in which the twins reached a given age. These cohort categories are arbitrary in many respects, although in some studies the data collection procedures changed from one cohort to the next. For further details, see the main TEDS study pages (links top left of this page).
Data cleaning
This term refers to processes by which 'unclean' data are identified and corrected. Examples of unclean data include invalid or infeasible variable values; unidentifiable data, or data with incorrect subject IDs; duplicated data (recorded more than once for the same subject); and empty rows of data (e.g. from a blank booklet). Various steps can be taken to clean the raw data, at various stages before the data are added to the analysis dataset. These steps include quality control measures during data entry; validation of data items by data entry software; and routine checking of all raw data after data entry. For full details, see the data cleaning page.
Data entry
The conversion of data in its rawest form (historically on paper) into coded electronic form. In TEDS studies, most data have been entered by one of two methods: (1) manual data entry; (2) optical scanning. TEDS data entry work has often been contracted out to commercial companies (notably NOP and Group Sigma), although some manual data entry has been done within TEDS by admin staff. See the data entry page for further details.
Double entered data
In the context of TEDS analysis and twin datasets, double entry refers to a way of structuring the dataset to enable between-twin comparisons during analysis. In double entered data, each twin's data appears alongside their co-twin's data in the same row so that, for example, correlations can be made between paired twins. This means that each twin's data is duplicated, appearing twice in the dataset (as twin and co-twin). To distinguish between the same data items for twin and co-twin, conventional variable names end in '1' for the twin and '2' for the co-twin. To distinguish between individual twins (as cases) in different rows of the dataset, each twin is given a unique twin ID in the dataset; a twin birth order variable indicates the elder and younger twins of each pair (with values 1 and 2 respectively); and a family ID, unique to each twin pair, is shared between the two twins. Note that double entering of twin data does not imply that the data have to be entered twice (which would be error-prone); instead, the raw data are copied and re-combined in the required way. See the data processing page for descriptions of how this is done both for parent data and for twin data.
Exclusions from analysis
See the exclusions page for more detail. In the context of the TEDS datasets, these are cases that are excluded from analysis, typically filtered using double entered variables exclude1/2. To exclude individual twins exclude using the filter exclude1=0. To exclude twin pairs from analysis, exclude using the filter (exclude1=0 AND exclude2=0). These exclusion variables exclude all cases that fall into one or more of the following categories: (1) medical exclusions; (2) perinatal outliers; (3) unknown twin sex and/or zygosity; (4) absence of 1st Contact data. Medical exclusions apply to individual twins (hence the double entered variables) whereas the other three categories apply to twin pairs.
Exclusions from participation
The major TEDS studies have generally included as many parents and twins as possible. Nevertheless, parents and twins may be excluded (i.e. not invited) for a variety of reasons. Until the age of 18, inclusion or exclusion was applied to entire families; in more recent studies, it may be decided independently for the parent and each twin. Families, parents or twins are most commonly excluded for the following reasons: (a) withdrawn - always excluded; (b) inactive - always excluded since age 10; (c) address problems - always excluded from postal invitations; (d) phone problems - excluded from calling; (e) email problems - excluded from email invitations; (f) medical exclusion from studies - usual considered on a case-by-case basis, and generally excluded from reminders; (g) special cases - considered on a case-by-case basis, and generally excluded form reminders.
Fair processing
Fair processing is the process by which twins are contacted, by post and by email, to give them advance notice of the TEDS NHS medical record linkage that is being done as part of the LLC consortium, within their TRE. The fair processing process gives every contacted twin an opportunity to opt out of the TEDS NHS linkage process. Only twins who have been active recently in TEDS, and for whom TEDS has contact details which are confidently thought to be correct, are included in the fair processing mailings; hence, only these same twins, except for those who choose to opt out or withdraw, are included in the linked TEDS NHS datasets in the LLC TRE. Further details of TEDS NHS linkage and fair processing may be found on the main TEDS web site (www.teds.ac.uk).
Group Sigma
Group Sigma is a commercial company whose services were used extensively by TEDS during later TEDS studies (from the 7 Year study onwards). The main service provided by Group Sigma was optical scanning for data entry. For further details, see the main TEDS study pages (links top left of this page).
Inactive families
These are TEDS families who were part of the initial ONS sample, but who have not returned data in any of the main TEDS studies. Such families have effectively been removed from the TEDS sample since the time of the 10 Year study: they have not been included in study samples, and have not been sent the annual TEDS newsletter. See also the 1st Contact page.
Manual data entry
Default method of data entry, in which data from its original raw source (typically paper, although it could be audio) is typed in at a keyboard to a computerised system. The software used for manual data entry can involve a simple worksheet (e.g. Excel or SPSS), with a column for each variable and a row for each case. Better still are customised systems (e.g. using Access or web forms) in which the layout of fields on screen is matched to the layout of items on the paper source, or matched to the order of items in an audio interview. The electronic data may be saved in a variety of formats, for example in Excel or SPSS files, or Access database tables. See the data entry page for the pros and cons of this method.
LLC
The UK Longitudinal Linkage Collaboration, a consortium of longitudinal studies including TEDS. The primary purpose for which TEDS has joined the LLC is the secure linkage of NHS medical records for research analysis. Further details about the LLC may be found at their website, https://ukllc.ac.uk/. See also TRE and fair processing on this page, and the LLC page in this data dictionary.
Medical exclusions: analysis
See the exclusions page for more detail. Medical exclusion is a routine exclusion from analysis because of a severe medical condition. The medical exclusion category now applies to individual twins, not twin pairs, and is applied using double entered variables medexcl1/2 (1=exclude, 0=include). The medical exclusion category generally applies to medical conditions that are likely to interfere with participation in TEDS activities and/or are likely to be associated with mental impairment. Medical exclusions are now treated as age-independent, because most such medical conditions have been present at or soon after birth, or diagnosed in childhood; they therefore apply to longitudinal analysis. Most medical exclusions are conditions in the following categories: (1) severe ASD; (2) severe cerebral palsy; (3) chromosomal disorders; (4) inherited or single-gene disorders associated with mental impairment; (5) brain damage or disorders affecting brain function; (6) profound deafness or complete blindness; (7) global developmental delay. Other severe medical conditions have been considered on a case-by-case basis before being categorised as medical exclusions.
Medical exclusions: participation
Within the TEDS admin database, certain twins are categorised as potential exclusions from studies because of severe medical conditions. (Note that this category is distinct from the analysis medical exclusions). During selection of a sample for a new TEDS study, these 'exclude from studies' twins are generally considered on a case-by-case basis, depending on the demands of the study. This category depends not only on available information about the severity of each twin's medical condition but also on the twin's recent history of participation. Twins who have been active in more recent studies are unlikely to be classified in this way, even if their conditions are severe.
NOP
NOP (National Opinion Poll), or specifically their data-processing division called NOP Numbers, is a commercial company whose services were used extensively by TEDS during the early studies (up to and including the 7 Year study). The main services provided by NOP were data entry during the early booklet studies, and telephone calling during the first cohort of the 7 Year study. For further details, see the main TEDS study pages (links top left of this page).
ONS sample
The initial TEDS sample of 16810 families, recruited via the ONS (Office for National Statistics). ONS contacted the families of all twins born in England and Wales in the years 1994, 1995, 1996, and asked them whether they would be part of the TEDS study. For further details, see the 1st Contact Study page.
Opt out
A family (or individual parent or twin) may opt out, or refuse to participate, in a given TEDS study, with the option of participating in later studies. Opting out is therefore temporary, unlike withdrawing which is permanent. Opted out individuals are excluded from subsequent contacts, such as reminders, in the given study.
Optical scanning
Data entry method, in which pages are passed through a machine and responses detected by the reflection of light from pre-defined regions of the page. Has been used extensively in TEDS from the 7 Year study onwards (most scanning has been done by Group Sigma). Ideally suited to questionnaires in which categorical responses are collected by means of tick boxes. Can also be used to detect numbers or characters, using optical character recognition software, although this is less reliable. Optical scanning is generally only possible with suitably designed questionnaires (in which each response is recorded in a clearly defined area of the page). The scanning machine requires an initial set-up for a given design of questionnaire, in order to specify the areas of the page to scan and the appropriate electronic codes to be recorded. The electronic data are typically saved in plain text files, e.g. csv files with one row per case and commas between variables. See the data entry page for the pros and cons of this method.
Passively withdrawn
A family (or individual family member) may be judged to be passively withdrawn if they no longer participate in any way, but have not asked explicitly to withdraw from TEDS. Families become passively withdrawn not only by failing to respond to mailings, but also by failing to inform TEDS when they change address or telephone number. Passively withdrawn participants are difficult to classify, as they sometimes unexpectedly take part in new studies; they are generally not, therefore, excluded from the sample selections for new studies. inactive.
Perinatal outliers
These are families in which one or both twins, or the mother of the twins, were subject to extreme adverse circumstances at or around the time of birth. Perinatal outliers are routinely treated as exclusions from analysis of the TEDS data. Five perinatal factors are taken into account when making such exclusions: (1) low birth weight; (2) short gestational age; (3) maternal drinking during pregnancy; (4) long period of special care after birth; (5) long stay in hospital after birth. For further details of the criteria used, see the description of the aperinat variable in the derived variables page of the 1st Contact dataset. All the perinatal criteria are based on responses in the 1st Contact booklet.
Phone problem
This refers to a TEDS family (or individual family member) whose recorded phone numbers are all known to be incorrect, or with no recorded phone numbers at all. A participant becomes categorised as a phone problem when all the recorded phone numbers have been tried and found to be incorrect (at which point the numbers are generally deleted). The phone problem category is removed after successful tracing of the family, i.e. when a new phone number is confirmed and recorded. While a participant is categorised as a phone problem, they will be excluded from TEDS calling lists.
Random variable
In a double entered dataset, the data for each twin pair appear twice, in two separate rows (firstly for twin 1, with twin 2 as co-twin; secondly for twin 2, with twin 1 as co-twin). This duplication of data can cause errors in the results of twin pair comparisons such as correlations; to avoid such errors, it is necessary to filter the dataset first, selecting each twin pair exactly once. The 'random' variable is designed to be used for such filters. It has values 0 and 1, such that in each pair of twins, one twin has the value 0 and the other twin has the value 1. Filtering on either random=0 or random=1 therefore selects each twin pair exactly once, selecting elder twins from some pairs and younger twins from other pairs. The randomisation of elder and younger twins using this variable avoids any possible confounding effects of twin birth order. For further details of how 'random' is computed, see the data processing page.
Special cases
In the TEDS admin database, a small number of families are categorised as special cases if they have unusual family circumstances that could affect participation in studies (the reason is also recorded in the form of text). Special cases may be excluded from new study samples, considered on a case-by-case basis. Special cases are also routinely excluded from reminders during TEDS studies.
Spin-Offs and Sub-studies
A sub-study generally means a study involving a relatively small, selected subset of the TEDS families, rather than the entire sample. This may be a specialised study administered within TEDS or a separately-funded study administered outside TEDS. A spin-off study generally refers to the latter: it is a type of sub-study that is run by collaborative researchers, either within KCL (e.g. E-Risk, ECHO) or occasionally at another univeristy (e.g. Eating, OSCCI). This distinction is not always clear-cut, and the two terms are sometimes used interchangeably. For example, some spin-offs have been partly or wholly administered by TEDS staff even though funded separately and with an external collaborative PI.
Tracing
This term refers to attempts to obtain correct contact details (address and/or phone, and sometimes email) for participants that have been categorised as address problems or phone problems. Typical tracing methods include re-trying any postal address, phone number or email address that have been recorded for the family; contacting any family relatives or friends that have been recorded; and liaising with sub-studies (such as E-Risk) that might have had contact with the family. Commercial tracing services, such as those based on the electoral register, have also sometimes been used.
TRE
Trusted Research Environment. A highly secure and regulated platform in which NHS medical records may be analysed, alongside linked TEDS datasets. The linkage service is carried out by NHS Digital (https://digital.nhs.uk/). At the time of writing, TEDS has obtained permission to use linked NHS datasets within the TRE set up for the LLC.
Wave
A wave refers to an administrative phase of data collection for a given study. This has a similar meaning to cohort, except that while 'cohort' refers to a grouping based on twin ages, 'wave' refers to a group of families contacted at a particular time. See for example the 8 year study (where wave 1 comprised cohorts 1 and 2) and the 12 year study (where wave 2 comprised cohorts 2, 3 and 4).
Withdrawn
These are families that have been removed from the TEDS study at their own request. When a family withdraws, their contact details are permanently deleted from the TEDS admin records, and the family is not contacted again. However, any data provided (with consent) by the family prior to withdrawal is retained, and will continue to be used as part of the TEDS dataset. From twin age 18 onwards, individual twins and parents may choose to withdraw independently of each other.