TEDS Data Dictionary

TEDS DNA and Genotyping Studies

Contents of this page:

Introduction

This page describes the collection of DNA samples from TEDS twins, together with a broad outline of the selection of twins for genotyping. This page does not describe the genotypic dataset. Note that DNA samples have been sought only from the twins, not from parents or from siblings.

The DNA sampling methods are described in detail below; sampling involved the collection of cheek swabs, and later saliva sample tubes, sent by post. There have been several phases of DNA collection, as described below. Generally, families (or individual twins) were prioritised for DNA collection if they had recently returned data to TEDS. This helped to ensure that active families were contacted, and that the genetic data obtained from the DNA could be related to recent phenotypic data. In the later phases, mailings were supplemented by telephone calling to raise consent and to issue reminders.

In the initial phases of DNA collection, E-Risk families were not contacted by TEDS because E-Risk had collected DNA samples during their visits, and an arrangement was in place for these DNA samples to be shared with TEDS. These initial samples eventually became depleted, and E-Risk families were included in the later TEDS re-sampling phases.

Collection of DNA in the 2007-09 phases was primarily for the WTCCC study, for genotyping on the Affymetrix platform. Collection of salivary DNA in the 2014-15 phase was primarily for OEE genotyping within the SGDP Centre, to supplement the sample of twins genotyped in the WTCCC study. Selection of twins for both these phases was complex, and is described further below.

Sampling method

The TEDS twin DNA samples have been collected using cheek swabs (until 2009) and then using saliva samples (from 2014). DNA collection in the earliest phases was administered by parents, and in later phases by twins themselves. In all cases, the samples were collected at home and returned by post to TEDS.

Cheek swabs

In all DNA collections until 2009, cheek swabs were collected from twin pairs (not from unpaired twins) with the explicit consent of the parent. Each family was sent a pack containing the following materials:

  • A letter inviting them to take part
  • A consent form
  • A sheet of instructions for sampling
  • An information sheet
  • Two sealed, labelled tubes containing cotton wool buds and a preserving fluid
  • A padded return envelope for the tubes and consent form

Over the years of this study, there have been many slightly different versions of the letter, consent form, instruction sheet and information sheet. A recent version combined the letter, consent form and instruction sheet (pdf) in one document; a recent version of the information sheet (pdf) is also reproduced here.

Each tube in the pack was pre-labelled with the ID and name of one of the twins, to ensure that the returned samples could be properly identified.

The initial pack mailing was followed up with one or more written reminders for families that had not returned samples promptly. In some phases of the study, families were also phoned to encourage them to return their samples. See collection phases below for further details.

The contents of each pack (consent form and samples) were logged in the TEDS admin database on return to the TEDS office by post. If the samples were returned and the consent form was completed, then the samples were passed to the SGDP lab for extraction. If the samples were returned but the consent form had not been completed, the family was contacted again to obtain consent; no samples were extracted unless written consent had been obtained.

As a reward for returning the DNA samples, families of same-sex twin pairs were offered DNA zygosity tests (via the consent form). Requests for zygosity tests were passed to the lab along with the returned samples.

Saliva samples

In the final phase of DNA collection, from 2014 to 2015, saliva samples were collected from individual twins rather than from pairs, with the explicit consent for the twins themselves (because the twins were now aged over 16). The aim here was to maximise the twin sample for OEE genotyping, which is described in more detail below. Each twin was sent a pack containing the following:

  • A letter with an invitation to take part (pdf)
  • A consent form (pdf)
  • A sheet of instructions for sampling (pdf)
  • An information sheet (pdf)
  • One proprietary salivary DNA pack including a tube, a funnel, a lid containing preserving fluid and printed instructions
  • A padded return envelope for the tube and consent form

The saliva tube in the pack was pre-labelled with the ID and name of the twin, to ensure that every returned sample could be properly identified, and to avoid confusion in cases where both twins were sent tubes at the same address.

The initial pack mailing was followed up with one or more written and email reminders for twins who had not returned samples promptly. In certain prioritised cases (335 twins) callers were allocated to phone the twins to remind them to return their samples.

The contents of each pack (consent form and saliva sample) were logged in the TEDS admin database on return to the TEDS office by post. Every sample returned with a completed consent form was passed directly to the SGDP lab for extraction. If a sample was returned but the consent form had not been completed, the twin was contacted again to obtain consent; no sample was extracted without written consent.

As a reward for returning the saliva sample, each twin was offered a £15 electronic flexecode voucher, which could be redeemed at a variety of online retail outlets.

Collection phases

There have been five main phases of DNA collection in TEDS, and these are summarised in the table below. In all phases, families were excluded if they had withdrawn from TEDS, if they were known address problems, or if they were medical exclusions.

Phase Years Sample type Selection criteria Contact methods Families contacted Number of packs returned % returned with consent
Returned with consent Refused Returned but no consent
1 1998 to 2003 Cheek swabs
  • Families who returned the 4 year booklets
  • Not in E-Risk
Mail only (no phone calls). Up to 3 written reminders sent to families that did not respond 7680 5089 1312 101 66.3%
2 2005 Cheek swabs New samples:
  • Families who returned data at age 7 or later
  • No response (or not contacted) in phase 1
  • Not in E-Risk
Families were initially phoned for verbal consent, before sending the pack. Up to 2 written reminders were sent to families who had given verbal consent but had not returned their packs. 1801 943 172 5 52.4%
Re-sampling:
  • Sample provided in phase 1
  • One or both twin samples severely depleted
469 368 14 3 78.5%
3 2007 to 2008 Cheek swabs New samples:
  • Families who returned data at age 12
  • No response (or not contacted) in phase 2, and no response or a refusal (or not contacted) in phase 1
Families were initially mailed the pack; those who did not respond promptly to the initial mailing were contacted by phone for verbal consent. In addition, up to 2 written reminders were sent to families who had not returned their packs. 1234 436 410 2 35.3%
Re-sampling:
  • Families who returned data at age 12
  • Sample provided in phase 1 or phase 2
  • One or both twin samples severely depleted
1258 990 58 3 78.7%
4 2008 to 2009 Cheek swabs Resampling only (no new samples), specifically for WTCCC study: see below for selection criteria. Families were initially mailed the pack; they were phoned for verbal consent and reminders if they did not return their packs promptly. 872 492 55 2 56.4%
5 2014 to 2015 Saliva Both new samples and re-sampling, specifically for OEE genotyping: see below for selection criteria. Mail only, with reminders, in most cases. Some prioritised twins were given phone reminders by callers. 7275
[individual twins]
2211 507 9 30.4%

Genotyping

The WTCCC study

Selected TEDS twin DNA samples were included in the WTCCC-2 (the Welcome Trust Case Control Consortium phase 2), for a genetic study of reading and mathematics ability in 2009. For this study, TEDS submitted DNA samples, as well as phenotypic reading and maths data, to the Sanger Institute for a large sample of twins.

The genetic data extracted from the DNA samples were eventually returned to TEDS. The data from the WTCCC study have therefore enabled TEDS researchers to perform their own genome wide association studies (GWAS), and other genotypic analysis studies, relating to a wide range of different phenotypes.

Having provided the initial "discovery" sample to WTCCC, TEDS was also required to supply a replication sample to WTCCC. TEDS also prepared its own in-house replication sample. The selection criteria for these three twin samples are summarised in a table below.

For their discovery and replication samples, WTCCC specified certain minimum criteria for the mass, concentration and purity of each DNA sample. These formed part of the selection criteria when TEDS were preparing the twin samples. In the TEDS DNA collection study (see collection phases above), phase 3 was carried out prior to and in preparation for the WTCCC study, in order to maximise the number of twins available for selection; phase 4 was carried out during the preparation of the WTCCC discovery and replication samples, in order to re-sample DNA where existing samples had been found to fall below the thresholds set by WTCCC.

For all three twin samples (WTCCC discovery, WTCCC replication and in-house replication), the following exclusions were made:

  • Medical exclusions
  • Perinatal outliers
  • Ethnic origin not known to be white
  • English not known to be the language spoken at home
  • Unknown twin sex
  • Twin birth order records has been changed (leading to doubt over identify of DNA samples)

Further exclusions and selection criteria are described in the table below.

Twin sample WTCCC discovery sample WTCCC replication sample in-house replication sample
DNA sample criteria
  • Concentration and mass to be measured by fluorimetry
  • Mass > 5 micrograms
  • Concentration > 50 nanograms per microlitre
  • Concentration and mass to be measured by fluorimetry
  • Mass > 3 micrograms
  • Concentration > 60 nanograms per microlitre
  • Concentration and mass to be measured either by fluorimetry or by spectrometry
  • Mass > 1 microgram according to fluorimetry OR mass > 2 micrograms according to spectrometry
Phenotypic data criteria Each selected twin was required to have at least one of the following:
  • Data from the maths web test at age 12
  • Data from at least one of the three reading web tests at age 12 (PIAT, GOAL, Yes/No)
  • Data from both the maths web test and the PIAT reading web test at age 10
Where appropriate (e.g. when selecting the best twin within a twin pair), priority was given to twins having data from a greater number of the four relevant web tests at age 12.
Each selected twin was required to have at least one of the following:
  • Reading/maths web data as for the WTCCC twin samples (see left)
  • Data from any other web tests at age 12
  • Data from booklets (parent and/or twin and/or teacher) at age 12
Twin criteria Only one twin per pair. If both twins eligible then select the twin with more phenotypic data (more maths and reading web tests completed at age 12); if both twins have the same amount of data, select the twin having a larger mass of DNA.
  • Not already selected for the WTCCC discovery sample
  • Not an MZ cotwin of a twin selected for the WTCCC discovery sample
  • Only one twin per pair. (If both twins eligible then select as for the discovery sample - see left)
This sample was allowed to overlap with the WTCCC replication sample but not with the WTCCC discovery sample:
  • Not already selected for the WTCCC discovery sample
  • Not an MZ cotwin of a twin selected for the WTCCC discovery sample
  • Both twins could be selected from a DZ pair (if neither already selected for the WTCCC discovery sample)
  • Only one twin could be selected from an MZ pair. (If both twins eligible and one twin already selected for WTCCC replication, then select this twin; if neither twin selected for WTCCC replication, select the twin with the greater DNA sample mass.)
Number of twins selected 4440 2750 4923

In order to maximise the size of each of these twin samples, TEDS staff went through several cycles of collecting DNA samples from families, extracting and quantifying DNA in the lab, reprecipitating DNA samples to increase the concentration, and re-assessing the best twin from each pair. The selection criteria (shown in the table above) evolved over time according to practical considerations as well as the requirements of the WTCCC. The selection of families for phases 3 and 4 of DNA collection (especially for re-sampling) were gradually refined accordingly.

As part of the WTCCC study, some twin pairs who had not completed the maths and reading web tests in wave 2 of the 12 year study were contacted for a follow-up wave of web data collection in 2008. This is described in more detail on the 12 Year Study page. Twins who participated in this wave of web testing were also re-sampled (if necessary) in phase 4 of the DNA collection.

The genotypic data were eventually returned to TEDS by the Sanger Institute, as a subset of the 4440 twins who had been included in the discovery sample. Some had failed quality control measures at Sanger, and after further quality control steps within TEDS, the final sample of genotypic data included 3152 individual twins. This sample is sometimes referred to as the "Affymetrix sample" because this is the platform on which they were genotyped. This Affymetrix sample of 3152 twins included only unrelated twins (one per pair), from both MZ and DZ pairs.

The Affymetrix sample of genotypic data was later supplemented by a sample on the OEE platform - see below.

The OEE study

The 'OEE study' had the broad aim of maximising the size of the genotypic data sample for TEDS twins, building on the existing Affymetrix sample. DNA samples were genotyped in the SGDP labs on the OEE platform. This genotyping used existing (cheek swab) DNA samples where available, supplemented by newly-collected (salivary) DNA samples. The collection of new salivary DNA samples, as described above, took place in 2014-15. The OEE genotyping took place in several waves in 2015-16.

Over the two years of the study, and successive waves of actual genotyping, the criteria for selecting twin DNA samples were modified. The criteria for selecting suitable DNA samples were modified by trial and error, as feedback from genotyping determined the minimum sample characteristics (concentration, mass, quality) that were likely to produce a successful result. The phenotypic criteria for selecting appropriate twins were gradually broadened: in the earliest waves, unrelated twins with recent data were prioritised; in later waves, twins with less recent data were also selected, along with DZ twin pairs. Rather than describing the selection criteria for each wave, the paragraphs below describe the final criteria that applied broadly to the entire sample.

Salivary DNA collection and the related genotyping took place in cycles, such that twins previously selected but with failed genotyping (for whatever reason) would be re-selected if a new DNA sample could be collected. Similary, a twin previously rejected might be selected in a later wave after the selection criteria (phenotypic or DNA-related) had been relaxed. A similar approach was taken with the selection of a twin from a pair; if one twin had been prioritised but genotyping had failed, the other twin might subsequently be prioritised if suitable.

The OEE sample was designed to supplement the existing Affymetrix (WTCCC) sample. The 3152 twins already genotyped on Affymetrix, with satisfactory QC, were not genotyped again on OEE. The aim was to include the maximum number of unrelated twins from MZ pairs (one twin per pair), plus the maximum number of paired DZ twins, plus the maximum number of unpaired DZ twins where pairing proved infeasible. The only phenotypic data requirement was that 1st Contact data should be available, subject to certain exclusions.

Exclusion of twins from the sample were made as follows:

  • Individual twins already genotyped on Affymetrix (WTCCC)
  • No 1st Contact data available
  • Twin ethnic origin was non-white, or unknown
  • Medical exclusion (other than autism/ASD)
  • Perinatal outliers (as defined in the 1st Contact dataset)
  • Co-twin already genotyped (Affymetrix or OEE) in an MZ pair

After exclusion, twins were selected according to the availability of a suitable DNA sample. The final minimum DNA criteria were as follows:

  • A new saliva sample was available (regardless of extracted DNA mass/volume/concentration)
  • If only a cheek swab DNA sample was available, then it should have a volume of at least 8ul, a concentration of at least 30ng/ul, and a mass of at least 240ng.

For many twins (especially those with the older cheek swab DNA), multiple samples and dilutions were stored in the lab; a process of selecting the 'best' available sample was needed. Similary, in the case of MZ pairs, only one twin could be chosen and this was done on the basis of the twin with the 'best' DNA sample. Samples were prioritised in the following order:

  1. Use a salivary DNA sample if available
  2. For an MZ pair with salivary DNA samples, select the one with the higher mass
  3. If no saliva sample is available, preferentially select a cheek swab sample with 'ideal' characteristics: a volume of at least 8ul, a concentration of at least 50ng/ul, and a mass of at least 400ng (over a sample with minimal characteristics as described above)
  4. Prioritise certain named types of plates of cheek swab DNA, known to be newer and higher-quality (often used for the WTCCC study) over certain older and more obscure types of plate.
  5. Select the cheek swab sample with higher mass

The number of twins selected for genotyping is difficult to measure exactly. Some were selected repeatedly after genotyping failure then trying again with different samples. Some twins were initially selected but rejected prior to genotyping (for example, because the DNA sample could not be found or was found to be inadequate). A few twins were included by mistake then eliminated during QC checks, and so on. In all, in addition to the 3152 twins previously genotyped on Affymetrix (WTCCC study), around 4500 more unrelated twins and around 4000 DZ cotwins were selected for genotyping.

The genotypic sample

The TEDS twin genotypic data sample now includes both the Affymetrix and OEE data. The data from the two platforms were combined and subjected to common QC checks, after which the Affymetrix sample size dropped from 3152 to 3057 twins. The size of the combined sample can be described as follows:

  • 10346 individual twins:
    • 3057 genotyped on Affymetrix
    • 7289 genotyped on OEE
  • Counting pairwise:
    • 3320 DZ pairs in which both twins have been genotyped
    • 3706 pairs of any zygosity in which only one twin has been genotyped (2666 MZ, 1017 DZ, 23 unknown zygosity)
    • A total of 7026 pairs containing either one or two genotyped twins (hence it is possible to select 7026 unrelated genotyped twins)

The availability of phenotypic data among the genotyped twins is highly variable. On the whole, those who were genotyped had of course provided DNA samples, which generally meant that they were responsive at least up to the 4 year study and/or in more recent studies. Of the 10346 individual genotyped twins, for example:

  • 10337 have 1st Contact data
  • 8200 have 4 year booklet data
  • 8263 have 7 year parent booklet data
  • 6935 have 12 year twin booklet data
  • 7390 have 16 year GCSE/exam questionnaire data

As usual, the availability of data from some other TEDS phenotypic studies is more limited because not all twin cohorts were included, or because data returns were lower.

Data sharing

Please refer to the TEDS privacy policy and the TEDS data access policy, on the main TEDS web site, for detailed statements describing our policies for sharing data including DNA and genotypic data.

The twin DNA samples are not available for external sharing. They will only be used for new research by TEDS researchers within KCL.

Similarly, the raw genotypic data are not available for sharing outside KCL. The data access policy describes circumstances under which the genotypic data may be analysed (within KCL) as a part of a collaborative research project.

Polygenic scores are variables that are derived from the raw genotypic data for TEDS twins. These scores are easily shared and, unlike the raw genotypic data, they do not carry any risk of making participants identifiable. See the polygenic scores page for further details.