TEDS Data Dictionary

TEDS Variable Names

Contents of this page:

Introduction
First letter prefix: TEDS study
Additional prefixes: data collections
Abbreviations for measures and items and scales
Suffix for twin variables
Exceptions

Introduction

This page describes some of the conventions of variable naming in the TEDS phenotypic datasets.

While attempts have been made to use systematic naming of TEDS variables, some of the naming conventions have evolved over time and differ between datasets.

In the earlier TEDS datasets, variables names were limited to a maximum length of 8 characters - this limitation was imposed by earlier versions of the software (SAS, SPSS) that was used to build and analyse the datasets. In later dataset, this limitation has been relaxed although efforts have been made to avoid variable names longer than roughly 12 characters, to avoid over-complicating the writing of syntax/scripts.

First letter prefix: TEDS study

One of the most consistent variable naming conventions, applying across all the main TEDS studies, is the use of the first letter to denote the TEDS study. The ordering 1 to 26 of the letters 'a' to 'z' in the English alphabet has been used to denote the twin age (in years) corresponding to each main TEDS study. Hence, in the 1st Contact dataset all variables start with 'a' because the study was planned to collect data when twins were aged roughly 1 year; in the 2 Year study, all variables start with 'b', and so on.

This correspondence of this first letter with actual twin age has become increasingly approximate in later studies. As described elsewhere, later studies have often involved multiple data collections at different times, and sometimes data have been collected from several TEDS cohorts at the same time, leading to a range of twin ages within each dataset. However, in all cases, the convenient study name corresponds with the variable name prefix, even if not very exactly with actual twin ages.

Variable name prefix	TEDS study	Approximate actual twin ages (years)
a	1st Contact	1.5
b	2 Year	2
c	3 Year	3
d	4 Year	4
e	In Home	4 to 5
g	7 Year	6.5 to 7.5
h	8 Year	7 to 9
i	9 Year	8.5 to 9.5
j	10 Year	9.5 to 10.5
l	12 Year	10.5 to 12.5
n	14 Year	12 to 14
p	16 Year	15 to 17
r	18 Year	18 to 20
u	21 Year	21 to 26
z	26 Year	26 to 30

Additional prefixes: data collections

Later TEDS studies have often involved multiple data collections. Beginning with the 7 Year study, questionnaires were collected from twins' teachers as well as from parents and twins. Starting with the 9 Year study, some of the same measures were collected simultaneously from parents and from twins themselves. Starting with the 10 Year study, web data collections were used alongside paper booklet data collections. In the 16, 18 and 21 Year studies, there were multiple data collections from twins themselves, carried out independently and at different times.

To differentiate between variables from different data collections within the same main TEDS study, additional prefixes (directly after the first letter) have been used as shown in the table below. While these prefixes have not been used entirely systematically, they do at least serve to distinguish between similar or identical measures from different data collections within the same main study.

The most consistently used second letters for these prefixes are as follows:

p: parent data
c: twin (child) data
t: teacher data

TEDS study	Variable name prefix	Data collection
In Home	ec	Twin tests
	ep	Parent questionnaire
	epv	Post-visit questionnaire
7 Year	g	Parent questionnaire, twin phone interviews
7 Year	gt	Teacher questionnaire
9 Year	ip	Parent questionnaire
	ic	Twin questionnaire
	it	Teacher questionnaire
10 Year	j	Twin web data
	jpq	Parent web questionnaire
	jt	Teacher questionnaire
12 Year	lp	Parent questionnaire
	lpnc	Parent reported NC levels
	lt	Teacher questionnaire
	lc	Twin questionnaire
	l	Twin web and phone tests
14 Year	np	Parent questionnaire
	nsl	Parent SLQ
	nt	Teacher questionnaire
	nc	Twin questionnaire
	n	Twin web tests
16 Year	ppbh	Parent 'behaviour' questionnaire
	ppl2	Parent 'Leap-2' questionnaire
	pp	Parent web questionnaire
	pcex	Twin exam results
	pcbh	Twin 'behaviour' questionnaire
	pcl2	Twin 'Leap-2' questionnaire
	p	Twin web activities
18 Year	rcq	Twin 18 year questionnaire
	rcp	Twin Perception web study
	rcb	Twin Bricks web study
	rck	Twin Kings Challenge web study
	rcn	Twin Navigation web study
	rcf	Twin FFMP web study
21 Year	u1p	Parent TEDS21 phase 1 questionnaire
	u1c	Twin TEDS21 phase 1 questionnaire
	u2c	Twin TEDS21 phase 2 questionnaire
	ucg	Twin G-game web study
	ucv1	Twin Covid study phase 1 questionnaire
	ucv2	Twin Covid study phase 2 questionnaire
	ucv3	Twin Covid study phase 3 questionnaire
	ucv4	Twin Covid study phase 4 questionnaire
26 Year	zmh	Twin TEDS26 mental health questionnaire (MHQ)

Abbreviations for measures and items and scales

After the prefixes described above, in cases where the variables form a set of items from a named measure, the next parts of the variable name are typically an acronym or abbreviation of the measure name followed by an item number.

There are too many measures in the TEDS dataset to list their variable name abbreviations here. There are some measures that were included in many TEDS data collections, although for historical reasons the same measure does not always have the some variable name abbreviation in different datasets. For full details, see the study variables lists (links top left on this page) and other pages such as the questionnaires annotated with dataset variable names. Here are a few illustrative examples, showing the extended variable name prefix (study, measure, item):

dbh09: 4 Year study, Behaviour section, item 9 (the Behaviour section in fact included several measures with items randomly mixed)
gcg2c: 7 Year study, Conceptual Grouping measure (twin phone test), item 2, part c
jpc07s: 10 Year study, Picture Completion web test, item 7, score
ltaps19: 12 Year study, teacher questionnaire, APSD measure, item 19
pcbhsdq20r: 16 Year study, twin 'Behaviour' questionnaire, SDQ measure, item 20, reversed version
u2cvict05: 21 Year study, TEDS21 phase 2, twin questionnaire, Victimisation measure, item 5

The abbreviation denoting the measure typically has two, three or four letters as shown in these examples.

The item numbering reflects the ordering of items as presented to participants in the original questionnaire or test. Where there were fewer than 10 items, they are numbered 1-X with a single digit. Where there were 10 or more items, they are number 01-XX with two digits.

As in some of the examples above, the item number may be followed by one or more additional letters as further descriptors. Examples are the use of 'r' to denote a reverse-coded version of an item (e.g. pcbhsdq20r); the use of letters (a, b, c, ...) to denote parts of a multi-part question (e.g. gcg2c); the use of various letters to denote different measurements for a test item (for example, 's' for score, 'a' for answer, 'rt' for response time), e.g. jpc07s.

Many derived variables are computed as means or totals from the items of a measure. These derived variables are usually referred to as scales (total or mean of all items in a measure), subscales (total or mean of a subset of items), total scores (total of test item scores) and composites (typically derived from more than one measure). Where the variable is derived from items of a single measure, usually the variable name includes the same abbreviation of the measure as used in the items; the item number is then usually replaced by one of these suffixes:

t: denotes a 'total' of measure items (although often computed as a re-scaled mean)
tot: sometimes used instead of t for test total scores
m: denotes a simple mean of measure items
xxt or xxm: a subscale, where 'xx' is replaced by two or three letters forming an abbreviation of the subscale name

Here are some illustrative examples of scale/subscale/score variable names:

lgktot: 12 Year study, General Knowledge twin web test, total score
ppbhconnt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score
ppbhconnimpt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure, subscale for impulsivity
u1cpilm: 21 Year study, TEDS 21 phase 1, twin questionnaire, Purpose In Life measure, mean score

Suffix for twin variables

Twin-specific phenotypic data in the TEDS datasets always comprise 'double entered' variables (see glossary for further explanation). This means that any variable referring to a specific twin, including items and derived variables, and including data collected from parents and teachers as well as from twins themselves, is effectively duplicated within the dataset: the same variable value is shown once for the twin in question, and again as a co-twin variable for the other twin of the same pair.

Each twin variable therefore effectively appears as a pair of variables, distinguished by the variable name suffix: 1 or 2. A variable name ending in 1 contains data for the twin identified in any given row of data in the dataset; the same variable but with name ending in 2 contains data, from the same source, for the co-twin.

For this reason, a twin variable in dataset may often be referenced as a pair of variables with '1/2' written at the end, e.g. u1cpilm1/2

Here are some illustrative examples, using some of the same variable name prefixes that were illustrated above.

Variable description	Twin variable name	Co-twin variable name	Twin pair variables
12 Year study, teacher questionnaire, APSD measure, item 19	ltaps191	ltaps192	ltaps191/2
16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score	ppbhconnt1	ppbhconnt2	ppbhconnt1/2

Wherever possible, variable name suffixes '1' and '2' have been avoided for variables that apply to the parent, the family or the twins as a pair rather than individually.

Exceptions

The TEDS dataset contains many thousands of variables, from data collections going back over many years. Some variables were named historically without thought of consistent and systematic naming across future data collections. Other variables simply do not fit into the patterns described above. There are therefore many exceptions to the variable naming conventions described above. This sections briefly describes a few of the many types of exceptions to the variable naming rules.

Variables that do not originate from a specific TEDS study do not have the prefix a, b, c, etc. Examples of these are the background variables, ID variables, and non-phenotypic variables such as polygenic scores.

Item variable names may not follow the naming convention described above, of measure abbreviation followed by item number. In some cases this may be because the item is not part of a clearly defined measure. Even where the measure is clearly defined, the items may not follow a clearly numbered sequence or they may have widely-varying formats, so item numbering may not be used. The 1st Contact dataset contains many variables of these sorts; while these variables do have the 'a' prefix (denoting 1st Contact), and do have the 1/2 suffix if twin-specific, the remainder of the variable name is typically an abbreviation of some description of the question.

Some derived variables like twin ages do not relate to any given measure, even if they do relate to a specific study. Other derived variables, usually referred to as 'composites', are derived from more than one measure. Variables of these sorts will generally have abbreviated descriptive names although with prefixes and suffixes as above if appropriate. Examples include pcbhage1/2 (twin age in the 16 Year study when the twin 'Behaviour' questionnaire was returned) and drawg1/2 (4 Year study, general cognitive ability or 'g' composite, derived from several different cognitive tests).