TEDS Data Dictionary

21 Year Study

Contents of this page:

Introduction

The "21 Year Study" is here used conveniently to include several data collections that took place over a period of roughly five years after twins reached the age of 21. The first and largest of these data collections is referred to as the "TEDS 21 Study".

The 21 Year Study data were collected in the following ways:

  • TEDS-21 twin questionnaires.
    In two separate data-collection phases. Wide-ranging measures relating to emerging adulthood.
  • TEDS-21 Parent questionnaires.
    Collected in the first phase of TEDS 21 only. Parental SES and twin behaviour measures.
  • Twin "g-game".
    Cognitive test data collected using a gamified web test.
  • Covid-19 twin questionnaires.
    Collected in four phases in which the same questionnaire was repeated, during the 2020-21 covid-19 crisis. Wide-ranging measures designed to measure reactions to the crisis.

The first phase of TEDS 21 questionnaires included parents as well as twins. All TEDS cohorts were included. Data collection started in June 2017.

The second phase of TEDS 21 questionnaires included twins only, incorporating measures that could not be accommodated in the first phase due to length. All TEDS cohorts were included. Data collection started in February 2018.

In all phases of TEDS 21 questionnaire data collection, participants were offered a variety of means to return the data: via a smartphone app, via the web, or on paper.

The "g game", comprising a cognitive web test, included all cohorts of TEDS twins and involved all those who could be invited to take part by email. Data collection took part during April 2020.

The covid-19 questionnaires, also administered exclusively via the web, involved the same twin sample as the g-game. The study involved several phases in which the same questionnaire (with minor variations) was repeated; the first phase started directly after the g-game in April 2020, roughly a month after the "lockdown" imposed in the UK for the covid-19 crisis. The second phase followed roughly two months later, with "lockdown" controls still in place. The third phase followed in October 2020, with some restrictions still in place but the national lockdown temporarily lifted. The fourth and final phase followed in March 2021, at the end of a further strict "lockdown" that had started in December 2020.

The measures used in the study are described in full in a separate page.

Summary table

This table summarises some essential features of the different data collections in the 21 Year Study.

Data collection: TEDS 21 phase 1 TEDS 21 phase 2 G game Covid phase 1 Covid phases 2/3/4
Participants Parents and twins,
all cohorts
Twins only,
all cohorts
Sample selection All contactable families All twin pairs in which a family member had participated in phase 1 and/or at least one twin had email. All individual twins with email addresses Twins with email, in pairs where at least one twin had participated in g-game or earlier covid phases
Numbers invited 10571 families 8529 twin pairs 7945 families (6673 twin pairs plus 1272 unpaired) 4343 families (4059 twin pairs plus 284 unpaired)
Data collection methods Mobile app, web or paper questionnaires.
Participants invited by email and post.
Web tests.
Twins invited by email.
Web questionnaire.
Twins invited by email.
Timing June 2017 to February 2019. February 2018 to February 2019. April to May 2020. Phase 1: April-May 2020. Phase 2: June-July 2020.
Phase 3: October 2020.
Phase 4: March 2021.
Approximate twin ages 20.5 to 24 years 21 to 24 years 23 to 26 years 23 to 27 years

The samples

The sample selected for the first phase of the TEDS 21 study included all the TEDS sample (twins and parents) except for the following: (a) families and individuals withdrawn from TEDS; this included entire families in which one or both twins were withdrawn, even if the parent was not withdrawn; (b) individuals who were uncontactable because of address problems and the lack of viable email addresses. "Medical exclusions", as defined for previous TEDS studies, were generally included in initial invitations but not in reminders. In families where the parent had withdrawn but not the twins, the twins were included in the study. However, in families where either or both twins had withdrawn, the entire family was excluded because it was felt inappropriate to ask the remaining family member(s) questions about the withdrawn twin(s).

Family members were contacted independently, even if living at the same address, with email messages and/or postal invitations addressed separately to each twin and parent.

In TEDS 21 phase 1, 20986 twins and 10443 parents were invited to take part, from a total of 10571 different TEDS families. Hence, of the 13945 families in the original TEDS sample, around 3380 families were not contacted. Of these, roughly 1620 families had withdrawn (or, at least, one or both twins had withdrawn); 1750 were uncontactable; and fewer than 10 were excluded because of other special family circumstances.

The phase 1 invited families included roughly 200 in which only one or two family members were contacted. These included around 20 families in which the parent had withdrawn but the twins remained in TEDS; however the majority were cases where the parent or one or both twins were not contactable, because of address problems and absence of email addresses.

TEDS 21 phase 2 involved twins but not parents. The twin sample for phase 2 was the same as for phase 1 except that the following categories of twins were removed: (a) twins who had withdrawn during phase 1; (b) twins who had become uncontactable due to address and email problems during phase 1; (c) twins who had opted out of phase 1 for any reason that would also affect phase 2; (d) twin pairs in which neither twin had a viable email address, neither twin had participated in phase 1, their parent had not participated in phase 1, and neither twin had been genotyped.

As a result, in phase 2, 17128 individual twins were invited to take part, including 8529 twin pairs and 70 unpaired twins. Hence, around 1970 twin pairs who had been invited in phase 1 were not contacted, and these were nearly all in category (d) above. The fact that these excluded twins did not have email addresses, and had not been genotyped, was a sign of inactivity in the various twin studies that had taken place since age 16; and not having had a response to the postal contact in phase 1, it was not thought worthwhile to contact them again by post in phase 2.

At the start of TEDS 21 phase 1 data collection, twin ages ranged from roughly 20.5 to 23.5 years.

The g-game and covid-19 studies were both started in April 2020, soon after the "lockdown" movement restrictions imposed (in the UK and elsewhere) to slow down the spread of the covid-19 virus. Because these restrictions applied to TEDS staff as well as to twin participants, the sample for both studies was restricted to those individual twins who could be contacted by email. Both studies were administered exclusively via the web, without other options such as paper questionnaires. The following categories of twins were removed from the sample for the g-game and phase 1 of the covid study: (a) individual twins who had withdrawn; (b) individuals who were not contactable by email, either because there was no email address recorded in the TEDS admin system or because they had been logged as email problems. Generally, "medical exclusions" were included in these studies except in a small number of selected cases where the medical conditions were known to be very severe.

The number of individual twins invited to take part in the g-game and covid phase 1 studies was 14,619, including 6674 twin pairs and 1271 unpaired twins. Hence, of the 16810 families in the original TEDS sample, around 6000 families were not contacted. Of these, roughly 1720 families had withdrawn; 4280 were uncontactable by email; and around 10 were excluded because of other special family circumstances. The 1271 families in which only one twin was contacted included 60 families in which one twin was withdrawn or excluded for other special circumstances; the remainder of the co-twins simply did not have viable email addresses recorded.

For phase 2 of the covid study, the sample of twins invited was reduced by removing twin pairs in which neither twin had participated in either the g-game or phase 1. This reduced the number of individual twins invited to 8394, including 4059 twin pairs and 276 unpaired twins. Hence, compared with phase 1 and the g-game, the number of families contacted was reduced by around 3610.

For phases 3 and 4 of the covid study, the sample was the same as in phase 2, with only very minor adjustments. Twin pairs were added to the sample if they had recently made contact with TEDS for any reason; a few twins were removed if their email addresses were found to be incorrect.

At the start of the g-game and covid-1 studies, twin ages ranged from roughly 23 to 26 years. Twin ages differed by only a few months at the start of the covid-2, covid-3 and covid-4 studies.

The data returns for the 21 Year studies are summarised in a separate page. There are further pages comparing samples and returns across different TEDS studies.

TEDS 21 data collection

Preparation and timing

Preparations for TEDS 21 started in late 2015. A commercial company called ETT was recruited to develop a customised app that would allow twins and their parents to complete TEDS activities on mobile devices (smartphones and tablets). The app was compatible with devices running Android and IOS operating systems. ETT simultaneously developed a web version (for browsers on desktop and laptop computers), allowing participants to switch between app and web if they wished - all versions were linked to the same database. The app, and the accompanying web presentation, were designed to be be used flexibly to incorporate a wide range of measures, especially questionnaires, for an indefinite number of future studies. These app and web versions are administered via a CMS (content management system), hence they are sometimes referred to as the "CMS version" of a given questionnaire.

The activities planned for TEDS 21 included questionnaires and cognitive tests, which were piloted during 2016 while the app was being developed. See the pilot studies page for further details. After piloting, a decision was made to split the data collection into two or more phases, with the first phase to include only questionnaire measures with a general theme of emerging adulthood. Further questionnaire measures, and the cognitive tests (g-game), were postponed until subsequent phases of data collection.

Once the phase 1 measures were finalised, a decision was made to implement additional versions of the questionnaires, in order to give participants more choice and to maximise data returns. Firstly, paper booklet versions were produced. Secondly, a "backup" web version (for parents and for twins) was implemented in-house by a TEDS researcher, Nic Shakeshaft.

Although ETT had already developed a web version, the backup web version had two advantages. Firstly, it was compatible with a wider range of browsers and devices, including older browser versions and mobile devices. Secondly, it was faster for participants to complete, presenting many items per page (the ETT web version presented only one item per screen) and without noticeable delays between pages (the ETT web version had a noticeable delay of around a second between items).

Each individual twin, and each parent, was allocated unique login details (username and password). These login details were valid in all electronic versions of the relevant questionnaire: the app version, ETT's web version, and the backup web version.

Phase 1 of TEDS 21, involving questionnaire measures for twins and their parents, started in June 2017. As a precaution against unexpected technical problems, invitations were initially sent in small waves of 300 families, gradually building to larger and larger waves of thousands of families. Cohort 1 families were contacted first, during June 2017, in case any unexpected problems should necessitate a change of plans before contacting the other cohorts. No serious problems arose, and cohorts 2, 3 and 4 were invited during July 2017. Data collection for phase 1 is expected to continue until early 2019.

Phase 2 of TEDS 21, involving further questionnaire measures for twins (but not parents), started in February 2018. Email invitations were sent in waves of 500 twin pairs, over a period of several weeks. Over this same time period, twins included in the sample who did not have viable email addresses were sent postal invitations. Data collection for phase 2 is expected to continue until early 2019.

Implementation differences

As described above, each TEDS 21 questionnaire was implemented in several different ways:

  • Versions implemented by ETT, sometimes referred to as the "CMS version":
    • The app (for phones and tablets)
    • A web version running on the same infrastructure
  • The backup web version designed in-house, sometimes referred to as the "backup version"
  • Paper booklets

Compatibility of data between different implementations was essential, hence for a given data collection (parent phase 1, twin phase 1, twin phase 2) the measures had to be presented in equivalent ways. In particular, the wording of each question, and the wording of each response option, had to be identical. As far as possible, questions were presented in the same order and with the same instructions.

For phase 1, in chronological order, the ETT app and web versions were implemented first. These were followed by the backup web version, and finally the paper booklet version. Hence, the backup web version was designed to be as similar as possible to the app version of each questionnaire, and the paper version was designed to be as similar as possible to both app and web implementations. For phase 2, the backup web version was implented first, because it had been found that the questionnaires were easier to modify and update in the backup system than in the ETT system. The phase 2 questionnaire was refined and updated in a series of prototypes in the backup web system before being copied relatively quickly into equivalent versions on paper and in the ETT system.

The ETT app and web versions were essentially identical in terms of wording, order of items, branching and other rules, and general appearance. However, there were some minor differences between these ETT versions, the backup web version and the paper version, as follows.

  • Number of questions visible.
    In the ETT app and web versions, only one question could be presented at a time. In the backup web version, questions were typically tabulated for each measure and each web page displayed a group of questions. This was similar to the presentation in the paper version.
  • Response selection.
    The paper version used conventional tick boxes. The backup web version used similar tick boxes or radio buttons, or drop-down lists where there were many response options. In the ETT app/web versions, instead of ticking a box, usually the respondent would tap or click on the chosen response then tap/click on a "next" button; or for some questions, the respondent would drag a slider to choose from a range of ordinal response options.
  • Reviewing responses.
    A paper booklet allows the respondent to review earlier responses and even to amend some responses. The backup web version allowed this in a more limited form, within a page of questions, but without the option of going back to previous pages. The ETT app/web version presented one question per screen, without the option of going back to previous questions, hence it did not allow a participant to review any previous responses.
  • Branching.
    Some multi-part questions were conditional, whereby the later parts only needed to be answered if a particular response was given to the initial part. In the app and web versions, this could be implemented using logical branching, so that follow-up questions were either hidden or disabled if they did not apply. The paper version naturally relied on written instructions to direct participants whether or not to answer follow-up questions.
  • Order of questions and measures.
    There were minor differences in the order in which some questions or measures were presented. The ETT app/web versions were designed first but were then difficult to modify; hence any late additions tended to be placed at the end of a given section. In the backup web version, these late additions could be moved to a more natural order in the questionnaire. In the paper version, the order was swapped for a few measures in order to aid page layout; and the CoTEDS measure was moved to the end to help separation for administrative data entry.
  • Optional responses.
    In the paper version, participants could arbitrarily choose not to answer any given question. In the electronic versions, most questions were compulsory (participants could not proceed without answering) unless explicitly programmed to be optional. This could be done in more than one way. The response options could be supplemented by a "prefer not to answer" option, which would appear in coded form in the raw data - this implementation was generally preferred in the backup web version, and was occasionally used in the ETT version. Alternatively, in the ETT version, a question could be made optional with the appearance of a separate "prefer not to answer" button on screen - if selected, this would appear as a missing value in the data.
  • Interval between items.
    The ETT web version imposed a noticeable time interval between successive items, of roughly one second. This, with the requirement to click on "next" in addition to selecting a response on each screen, made a significant difference to the completion time. In the ETT app version, the delay between items was only a fraction of a second and was not noticeable. In the backup web version, there was no noticeable delay between successive screens of questions.
  • Coding.
    The underlying coding of responses sometimes differed between versions - this was generally a programming choice for the electronic versions, and a design choice for data entry of the paper version. Coding differences did not alter the presentation for participants. The electronic versions collected time variables and device-related variables that were not applicable to the paper version.
  • Layout and instructions.
    The wording of instructions, typically given at the start of each measure or group of measures, differed according to the presentation. Page layout, fonts and so on were chosen for legibility and clarity according to the implementation. These were generally trivial differences that should not have had any effect on data collection.

Data collection: TEDS 21 phase 1

At the start of the study, TEDS had records of email addresses for just over 14000 twins (nearly 70% of the total) and over 5000 parents (nearly 50% of the total). These individuals were all sent email invitations. The remainder were sent postal invitations.

In the initial twin email invitation, (pdf) each twin was given their login details and instructions for downloading the app. A version of the information sheet was appended at the end of the email. The aim, in the first instance, was to encourage all twins to use the app if possible. Details of the web and paper versions were not explicitly mentioned in the initial email invitation.

The initial parent email invitation (pdf) was similar, with the addition of the link to the backup web version, and an instruction to contact TEDS if a paper booklet was preferred. This approach was taken because it was thought that many parents might prefer to complete the questionnaire on the web or on paper rather than using an app. A version of the information sheet was appended at the end of the email.

The initial twin postal invitation included a printed information sheet in addition to a letter (at this stage, no paper booklets were sent, nor a paper consent form because twins were being asked for on line consent). The content of the letter was similar to the email invitation, except that the link to the backup web version was provided as an alternative to the app. These twins were harder to contact than those with email addresses, so it was decided to send them more information in the initial invitation.

The parent postal invitation included a letter incorporating a written consent form, a printed information sheet, a copy of the paper booklet (pdfs), and a freepost return envelope. The invitation therefore gave these parents all options (app, web, paper) for completing the questionnaire.

Parents and twins with email addresses were initially sent up to two email reminders if there was no response. For parents and twins for whom mobile phone numbers were recorded, a text message reminder was sent. In these reminders, participants were given both app details and the link to the backup web version, and were prompted to contact TEDS if they wanted to be sent a paper booklet instead.

After this initial round of email and text reminders, those who had originally been invited by email were sent a postal invitation/reminder. This took the same form as the postal invitations sent to those without email addresses, including a copy of the paper booklet for each parent but not, at this stage, for twins.

Later reminders became more focused on target groups of twins and parents: those who had started but not finished (in the web or app); families with partial but not complete data; and twins who had been active in recent previous studies. The main target was to increase the completion of data for twin pairs.

Targeted reminders included the use of email, phone and post, often with the promise of a forthcoming prize draw for participating twins. Postal reminders for prioritised sets of twins included paper booklets with a contact and consent form (pdfs). Twins who had finished were asked to remind their co-twins also to finish. Likewise, parents who had finished were asked to remind their twins to finish.

By the closing stages of phase 1, non-participating twins had been sent up to 6 email reminders, a mobile phone text reminder, and up to two written reminders; targeted groups of twins had also been sent paper booklets and had been contacted by callers.

Data collection: TEDS 21 phase 2

Phase 2 data collection was broadly similar to phase 1 but with some key differences. Firstly, parents were not included in phase 2. Secondly, the number of invited twin pairs was reduced in size by nearly 20%, with the removal of largely inactive twins (see sample description above). Thirdly, a higher proportion of twins could be invited by email in phase 2; this was partly because of the collection of new email addresses during phase 1, and partly because twins eliminated from the sample were largely those without email addresses. Fourthly, patterns of response (and participant feedback) during phase 1 led to changes in methods of contact, for example with the backup web and paper versions being introduced at earlier stages in phase 2.

At the start of phase 2, 17128 twins were invited to take part. 14221 of these twins had viable email addresses (83% of the total) and were invited by email; the remainder were invited by post.

In the initial twin email invitation, each twin was given their login details, instructions for downloading the app, and a link to the backup web version. Twins were also told that a paper version was available on request. A version of the information sheet (pdf) was appended at the end of the email.

The initial twin postal invitation, for twins without email addresses, included a copy of the paper booklet and a printed information sheet in addition to a letter with consent form (pdfs). The content of the letter was similar to the email invitation.

Reminders followed a similar pattern to those in phase 1 (with the omission of parent contacts). By the closing stages of phase 2, non-participating twins had been sent up to 5 email reminders, a mobile phone text reminder and up to two written reminders; targeted groups of twins had also been sent paper booklets and had been contacted by callers.

As phases 1 and 2 progressed in parallel during the latter half of 2018, twins who had completed phase 1 but not phase 2 (or vice versa) became increasingly the targets for reminder efforts such as booklet mailings and calling.

By the time of the TEDS 21 study, all twins had reached the age of at least 20 years and were adults. Twins were therefore asked individually to give consent. Parents were asked to complete consent forms for their own data submissions, but were not asked for consent for their twins to participate.

The app and web versions all included built-in electronic consent forms and requests to update contact details:

  • After logging in (app or web), and before starting the questionnaire measures, each participant was required to complete a consent form.
  • Linked to the consent form was a copy of the information sheet describing the study.
  • In order to consent, the participant was required to tick boxes to say that they had read the information sheet and agreed to their data being used in the ways described, then to click on a button to confirm.
  • After consent, each participant was asked to provide their email address. This was compulsory for twins (required for sending a reward voucher) but optional for parents.
  • Each participant was then asked for an optional phone number.
  • Each participant was then presented with the address currently held in the TEDS admin records, and was asked whether it was correct.
  • If the answer was no, the participant was asked to record their correct address.
  • Each twin was then asked whether the address given was also that of their parent(s).

Paper booklets were sent with an accompanying paper version of the consent and contact details form (phase 1 parent and phase 1 twin, phase 2 twin versions (pdfs)). Twins and parents were asked to return this form alongside the booklet in the freepost envelope provided.

Whichever version was used, the contact details were extracted and used to update the TEDS admin database for future contacts with each participant.

Part of the phase 1 questionnaire, for twins and for parents, asked for information about the twins' children or pregnancies. This information was gathered for the CoTEDS (Children of TEDS) study. Details were extracted from the questionnaires and passed to the CoTEDS team, who subsequently contacted families for consent to join the CoTEDS study and entered relevant details in the TEDS admin database.

Twin rewards

In TEDS 21, each participating twin was offered a £10 electronic Flexecode voucher on completion of the questionnaire, both in phase 1 and in phase 2 (hence each twin could earn up to £20 in total). However, in both phases, twins were asked to consider whether to decline some or all of this reward, and were asked to choose between three options: taking the £10 reward, taking only £5, or forfeiting the entire reward. In the electronic versions (app and web), this choice was offered immediately after completion of the questionnaire. For the paper version, the choice was offered as part of the consent form. Twins who failed to express a choice, but who nevertheless completed the questionnaire, were sent the £10 reward by default.

Twins were also offered entries in prize draws at intervals during the data collections of phase 1 and phase 2. The deadline for each prize draw was stated in relevant reminders, and in some cases also given by callers. In each prize draw, each twin was given a single entry for completion of the questionnaire; and each twin who had finished was given an entry in every prize draw that followed. In each draw there were two (equal) prizes. A single twin was selected at random as the first winner in each draw. For each twin drawn as a winner, the co-twin was given the same prize provided that s/he had also completed the battery. If the co-twin had not completed the battery, another singleton twin was randomly selected for the second prize.

Parents were not offered rewards.

G-game and covid data collection

These two studies were administered in very similar ways. The g-game and covid phase 1 started almost at the same time in April 2020, involved the same sample of twins, similar patterns of contact (mainly email) and similar administration of activities via the web. Covid phase 2, phase 3 and phase 4 followed in June 2020, October 2020 and March 2021 respectively, with a reduced sample, but in other respects the methods were the same.

A web design and hosting company called Quodit, run by former TEDS researcher Nic Shakeshaft, was hired to implement both the g-game and the covid questionnaires on the web. For the g-game (but not the covid study), Quodit also added elements of gamification in order to improve its visual appeal: see the g-game description page.

For each study, including each phase of the covid study, each twin was assigned a unique login code (different codes were used for different studies). In the email invitation, twins were sent the web link and their respective login codes, and a copy of the study's information sheet. The text of the email invitations is in the g-game email invitation and the covid email invitation (pdfs).

The web implementation for each study included a built-in electronic consent form. After logging in on the web, and before starting the activity (g-game or covid questionnaire), each twin was required to complete this consent form. Screen shots of the login and consent screens can be seen in the covid login and consent page and the g-game description page. Linked to the consent form was the study's information sheet. In order to consent, the participant was required to tick boxes to say that they had read the information sheet and that they agreed to their data being used in the ways described, then to click on a button to continue with the activities. See the information sheets for further details: g-game and covid study phase 1, phase 2, phase 3 and phase 4 (pdfs).

The g-game refers to a cognitive test, comprising selected items from verbal and non-verbal tests and designed to deliver a measurement of 'g' (general cognitive ability). From the start, a "gamified" version was anticipated, hence "g-game". A version of the g-game was originally planned to be administered alongside the questionnaires in the TEDS 21 study. Candidate cognitive test measures were piloted, both with twins and with unrelated adults, in 2016; a version of the g-game was implemented in the app but was not felt to be ready in time for the start of TEDS 21 data collection. In 2020, an opportunity arose to administer a new version, and the 2016 pilot data were used to select a set of 40 items (20 verbal and 20 non-verbal) that could be answered by twins in around 15 minutes. For the benefit of twins taking part, the g-game study was named the PathFinder study. The g-game, its cognitive test components and its visual elements, are described in more detail on separate pages.

Administration of the g-game was designed to be as quick and simple as possible. These plans were brought rapidly into focus by the covid-19 virus crisis and the consequent "lockdown" conditions, which coincided roughly with the intended start of the g-game study. It was therefore decided to invite twins only via email, and to implement the tests only via the web (without the app or paper alternatives used in TEDS 21). Invitations to twins to take part in the g-game were then accelerated by the planned start of the covid study directly afterwards.

Twins taking part in the g-game were not offered voucher rewards individually, but instead each participating twin was rewarded with entries into prize draws. There were two prize draws, each with over 150 prizes. The first prize draw took place roughly one month after the start of the study; the second prize draw took place 6 weeks later, immediately after the end of data collection. As in the TEDS 21 prize draws, if a twin was drawn as a prize-winner and the co-twin had also completed the activities, then both twins were given the same prize; otherwise, prizes were awarded individually to participating twins.

For the g-game, a single email reminder was sent, before the prize draw, to twins who had already finished the covid study (which had start by this time) but who had not finished the g-game. When the first covid study email reminder was sent, it included a brief reminder to twins also to finish the g-game if they had not done so; but no other g-game reminders were sent. Data collection ran for just over two months, from early April to mid-June 2020.

Plans for the covid study questionnaire were started as soon as the covid-19 lockdown began. The g-game study was then started immediately, while plans were rapidly put into effect for the covid study. The covid questionnaire included selected short measures that were repeated from TEDS 21, and hence designed to measure changes in the twins that might have been related to the covid crisis. The questionnaire also included a set of questions relating to the crisis itself and its effects on twins' health, lifestyle, movement and so on. The measures are described in more detail elsewhere. The questionnaires used in phase 1, phase 2, phase 3 and phase 4 (pdfs) are all very similar; these documents describe the variable names and coding in addition to the text of the questions. In each new phase, a small number of questions were added, removed or modified, and these changes are documented in covid phase changes (pdf).

In each phase of the covid study, a single email reminder was sent to all twins who had been invited but who had not participated; this was sent roughly two weeks after the invitations. In phases 1 and 2, a text message reminder was also sent to twins for whom a mobile phone number was recorded. In each phase, data collection ran for approximately three weeks (four weeks in phase 1).

As in the g-game, participating twins in the covid study were not offered voucher rewards individually, but were offered entries into prize draws. The first prize draw took place at the end of phase 2; a prize draw entry was given to each twin who had completed phase 1 and additionally to each twin who had completed phase 2, such that a twin who had completed both would have two entries in the draw with an increased probability of success. The second prize draw took place at the end of phase 3; a prize draw entry was given to each twin who had completed phase 3, regardless of participation in phases 1 and 2. A similar prize draw took place at the end of phase 4. As in the G-game and TEDS21 prize draws, the co-twins of winners were also rewarded if they had participated.

Data entry

General data entry issues (for all studies including 21 Year) are described in a separate page.

Parents and twins who participated (TEDS 21, g-game, covid) via the app or the web effectively entered their data themselves. Their responses were submitted from their devices to a database running on the web/app server (in TEDS 21 this was a TEDS server based in KCL, in the g-game and covid studies this was the server administered by Quodit). The server was programmed to produce, when required, files containing all relevant data recorded in the respective activities. In TEDS 21, separate files were needed for the parent and twin questionnaires, and for the ETT (app/web) and backup (web) systems, hence a total of four sets of files were required in phase 1. In all studies, the raw data files were plain text files, with fields delimited either by the pipe symbol (ETT system) or by tabs (web/backup systems). The files have been retained for the purpose of building the analysis dataset. Further details are given in the 21 Year data files page.

The raw data files downloaded from the server were also the source of important admin data. As described above, each TEDS 21 participant was asked for email, phone and address details; twins were asked for their reward preferences; and in phase 1, participants were asked for details of twins' children for the CoTEDS study. In the covid study, twins were again asked to give details of children for CoTEDS (but were not asked for updated email/phone/address details). All these details, where given, were used to update the TEDS admin database, generally by a process of copying and pasting but with a great deal of manual input for the purpose of data cleaning. For example, new contact details were compared carefully with existing details, and obvious typos and formatting anomalies were corrected.

TEDS 21 paper questionnaires were returned by post to the TEDS office, after which they were entered in one of two ways. Most parent booklets (phase 1 only) were entered by optical scanning: they were sent in large batches to Group Sigma, who had scanned TEDS booklets in many earlier studies. Each batch of booklets was scanned into a single raw data file. At the same time, images of the pages of the scanned booklets were created and returned to TEDS. Because the booklets were digitised by means of electronic images, there was no need to retain the paper copies after scanning; hence, one the data had been returned and checked, Group Sigma were asked to shred the paper copies. After data cleaning, the scanned parent booklet data were aggregated and stored in an Access database file. The original raw scanned data files have been retained for future reference.

Other TEDS 21 booklets were entered by manual keying - this included all twin paper booklets (phases 1 and 2) and those parent booklets that had been returned too late to be included in the larger batches for scanning. Manual data entry was carried out by TEDS staff in the TEDS office. These data were entered into the Access database, alongside the aggregated parent data from the scanned files. This ensured consistent coding and data quality in all data entered from the paper booklets. It also reduces the number of files that must be processed when building the analysis dataset.

In conclusion, the raw data from the various parts of the 21 Year study have been stored in a set of files downloaded from web and app servers, and in an Access database containing the data from paper booklets. After data collection, all data have been subjected to data cleaning, both at the stage of data entry (for paper booklets) and at the stage of dataset construction. For a discussion of general data cleaning issues in the TEDS data, see the data cleaning page.