TEDS Data Dictionary

TEDS Variable Names

Contents of this page:

Introduction

This page describes some of the conventions of variable naming in the TEDS phenotypic datasets.

While attempts have been made to use systematic naming of TEDS variables, some of the naming conventions have evolved over time and differ between datasets.

In the earlier TEDS datasets, variables names were limited to a maximum length of 8 characters - this limitation was imposed by earlier versions of the software (SAS, SPSS) that was used to build and analyse the datasets. In later dataset, this limitation has been relaxed although efforts have been made to avoid variable names longer than roughly 12 characters, to avoid over-complicating the writing of syntax/scripts.

First letter prefix: TEDS study

One of the most consistent variable naming conventions, applying across all the main TEDS studies, is the use of the first letter to denote the TEDS study. The ordering 1 to 26 of the letters 'a' to 'z' in the English alphabet has been used to denote the twin age (in years) corresponding to each main TEDS study. Hence, in the 1st Contact dataset all variables start with 'a' because the study was planned to collect data when twins were aged roughly 1 year; in the 2 Year study, all variables start with 'b', and so on.

This correspondence of this first letter with actual twin age has become increasingly approximate in later studies. As described elsewhere, later studies have often involved multiple data collections at different times, and sometimes data have been collected from several TEDS cohorts at the same time, leading to a range of twin ages within each dataset. However, in all cases, the convenient study name corresponds with the variable name prefix, even if not very exactly with actual twin ages.

Variable name prefixTEDS studyApproximate actual twin ages (years)
a1st Contact1.5
b2 Year2
c3 Year3
d4 Year4
eIn Home4 to 5
g7 Year6.5 to 7.5
h8 Year7 to 9
i9 Year8.5 to 9.5
j10 Year9.5 to 10.5
l12 Year10.5 to 12.5
n14 Year12 to 14
p16 Year15 to 17
r18 Year18 to 20
u21 Year21 to 26
z26 Year26 to 30

Additional prefixes: data collections

Later TEDS studies have often involved multiple data collections. Beginning with the 7 Year study, questionnaires were collected from twins' teachers as well as from parents and twins. Starting with the 9 Year study, some of the same measures were collected simultaneously from parents and from twins themselves. Starting with the 10 Year study, web data collections were used alongside paper booklet data collections. In the 16, 18 and 21 Year studies, there were multiple data collections from twins themselves, carried out independently and at different times.

To differentiate between variables from different data collections within the same main TEDS study, additional prefixes (directly after the first letter) have been used as shown in the table below. While these prefixes have not been used entirely systematically, they do at least serve to distinguish between similar or identical measures from different data collections within the same main study.

The most consistently used second letters for these prefixes are as follows:

  • p: parent data
  • c: twin (child) data
  • t: teacher data
TEDS studyVariable name prefixData collection
In HomeecTwin tests
epParent questionnaire
epvPost-visit questionnaire
7 YeargParent questionnaire, twin phone interviews
gtTeacher questionnaire
9 YearipParent questionnaire
icTwin questionnaire
itTeacher questionnaire
10 YearjTwin web data
jpqParent web questionnaire
jtTeacher questionnaire
12 YearlpParent questionnaire
lpncParent reported NC levels
ltTeacher questionnaire
lcTwin questionnaire
lTwin web and phone tests
14 YearnpParent questionnaire
nslParent SLQ
ntTeacher questionnaire
ncTwin questionnaire
nTwin web tests
16 YearppbhParent 'behaviour' questionnaire
ppl2Parent 'Leap-2' questionnaire
ppParent web questionnaire
pcexTwin exam results
pcbhTwin 'behaviour' questionnaire
pcl2Twin 'Leap-2' questionnaire
pTwin web activities
18 YearrcqTwin 18 year questionnaire
rcpTwin Perception web study
rcbTwin Bricks web study
rckTwin Kings Challenge web study
rcnTwin Navigation web study
rcfTwin FFMP web study
21 Yearu1pParent TEDS21 phase 1 questionnaire
u1cTwin TEDS21 phase 1 questionnaire
u2cTwin TEDS21 phase 2 questionnaire
ucgTwin G-game web study
ucv1Twin Covid study phase 1 questionnaire
ucv2Twin Covid study phase 2 questionnaire
ucv3Twin Covid study phase 3 questionnaire
ucv4Twin Covid study phase 4 questionnaire
26 YearzmhTwin TEDS26 mental health questionnaire (MHQ)

Abbreviations for measures and items and scales

After the prefixes described above, in cases where the variables form a set of items from a named measure, the next parts of the variable name are typically an acronym or abbreviation of the measure name followed by an item number.

There are too many measures in the TEDS dataset to list their variable name abbreviations here. There are some measures that were included in many TEDS data collections, although for historical reasons the same measure does not always have the some variable name abbreviation in different datasets. For full details, see the study variables lists (links top left on this page) and other pages such as the questionnaires annotated with dataset variable names. Here are a few illustrative examples, showing the extended variable name prefix (study, measure, item):

  • dbh09: 4 Year study, Behaviour section, item 9 (the Behaviour section in fact included several measures with items randomly mixed)
  • gcg2c: 7 Year study, Conceptual Grouping measure (twin phone test), item 2, part c
  • jpc07s: 10 Year study, Picture Completion web test, item 7, score
  • ltaps19: 12 Year study, teacher questionnaire, APSD measure, item 19
  • pcbhsdq20r: 16 Year study, twin 'Behaviour' questionnaire, SDQ measure, item 20, reversed version
  • u2cvict05: 21 Year study, TEDS21 phase 2, twin questionnaire, Victimisation measure, item 5

The abbreviation denoting the measure typically has two, three or four letters as shown in these examples.

The item numbering reflects the ordering of items as presented to participants in the original questionnaire or test. Where there were fewer than 10 items, they are numbered 1-X with a single digit. Where there were 10 or more items, they are number 01-XX with two digits.

As in some of the examples above, the item number may be followed by one or more additional letters as further descriptors. Examples are the use of 'r' to denote a reverse-coded version of an item (e.g. pcbhsdq20r); the use of letters (a, b, c, ...) to denote parts of a multi-part question (e.g. gcg2c); the use of various letters to denote different measurements for a test item (for example, 's' for score, 'a' for answer, 'rt' for response time), e.g. jpc07s.

Many derived variables are computed as means or totals from the items of a measure. These derived variables are usually referred to as scales (total or mean of all items in a measure), subscales (total or mean of a subset of items), total scores (total of test item scores) and composites (typically derived from more than one measure). Where the variable is derived from items of a single measure, usually the variable name includes the same abbreviation of the measure as used in the items; the item number is then usually replaced by one of these suffixes:

  • t: denotes a 'total' of measure items (although often computed as a re-scaled mean)
  • tot: sometimes used instead of t for test total scores
  • m: denotes a simple mean of measure items
  • xxt or xxm: a subscale, where 'xx' is replaced by two or three letters forming an abbreviation of the subscale name

Here are some illustrative examples of scale/subscale/score variable names:

  • lgktot: 12 Year study, General Knowledge twin web test, total score
  • ppbhconnt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score
  • ppbhconnimpt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure, subscale for impulsivity
  • u1cpilm: 21 Year study, TEDS 21 phase 1, twin questionnaire, Purpose In Life measure, mean score

Suffix for twin variables

Twin-specific phenotypic data in the TEDS datasets always comprise 'double entered' variables (see glossary for further explanation). This means that any variable referring to a specific twin, including items and derived variables, and including data collected from parents and teachers as well as from twins themselves, is effectively duplicated within the dataset: the same variable value is shown once for the twin in question, and again as a co-twin variable for the other twin of the same pair.

Each twin variable therefore effectively appears as a pair of variables, distinguished by the variable name suffix: 1 or 2. A variable name ending in 1 contains data for the twin identified in any given row of data in the dataset; the same variable but with name ending in 2 contains data, from the same source, for the co-twin.

For this reason, a twin variable in dataset may often be referenced as a pair of variables with '1/2' written at the end, e.g. u1cpilm1/2

Here are some illustrative examples, using some of the same variable name prefixes that were illustrated above.

Variable descriptionTwin variable nameCo-twin variable nameTwin pair variables
12 Year study, teacher questionnaire, APSD measure, item 19 ltaps191 ltaps192 ltaps191/2
16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score ppbhconnt1 ppbhconnt2 ppbhconnt1/2

Wherever possible, variable name suffixes '1' and '2' have been avoided for variables that apply to the parent, the family or the twins as a pair rather than individually.

Exceptions

The TEDS dataset contains many thousands of variables, from data collections going back over many years. Some variables were named historically without thought of consistent and systematic naming across future data collections. Other variables simply do not fit into the patterns described above. There are therefore many exceptions to the variable naming conventions described above. This sections briefly describes a few of the many types of exceptions to the variable naming rules.

Variables that do not originate from a specific TEDS study do not have the prefix a, b, c, etc. Examples of these are the background variables, ID variables, and non-phenotypic variables such as polygenic scores.

Item variable names may not follow the naming convention described above, of measure abbreviation followed by item number. In some cases this may be because the item is not part of a clearly defined measure. Even where the measure is clearly defined, the items may not follow a clearly numbered sequence or they may have widely-varying formats, so item numbering may not be used. The 1st Contact dataset contains many variables of these sorts; while these variables do have the 'a' prefix (denoting 1st Contact), and do have the 1/2 suffix if twin-specific, the remainder of the variable name is typically an abbreviation of some description of the question.

Some derived variables like twin ages do not relate to any given measure, even if they do relate to a specific study. Other derived variables, usually referred to as 'composites', are derived from more than one measure. Variables of these sorts will generally have abbreviated descriptive names although with prefixes and suffixes as above if appropriate. Examples include pcbhage1/2 (twin age in the 16 Year study when the twin 'Behaviour' questionnaire was returned) and drawg1/2 (4 Year study, general cognitive ability or 'g' composite, derived from several different cognitive tests).