TEDS Data Dictionary

Derived Variables in the In Home Dataset

This page gives a listing of derived variables in the In Home dataset, in alphabetical order of variable name. For each variable, a short written description is followed by the SPSS syntax (in a box) that was used to derive the variable.

This page does not include descriptions of background variables that are derived from other sources and that are included in the In Home dataset. For information about such variables, see pages describing background variables, exclusions and scrambled IDs.

Most of the variables from the twin twin tests and postvisit reports were derived prior to double entering the dataset. Hence the variables used in the syntax on this page (having names ending in "1") were later used to make the corresponding co-twin variables (with names ending in "2").

List of variables described on this page

Click on a variable name in the table below to go to the description on this page. Alternatively, scroll down and find variables in alphabetical order.

Definitions of derived variables

Listed alphabetically

dlparca1/2

Twin low-Parca flag (1=low, 0=not low) based strictly on the Parca measures from the 4 Year booklet study. Note that dlparca1/2 is an alternative to the historical low-Parca flag variable dlowpar1/2, also in the dataset, which was used at the time of selection of families for in-home visits, but which is difficult to reconcile with Parca data currently in the dataset.
The dlparca1/2 variable is derived from 4 Year booklet dataset scales dparca1/2, dreparc1/2 and dadparc1/2.

* Low-Parca: in theory this should be based on the Parca composite dparca1/2.
* However dparca is missing for 10 pairs classified as medical exclusions.
* because they were excluded from the standardisation in the 4yr dataset.
* Therefore, rederive the composite.
* using means and standard deviations checked in the larger 4 year dataset.
* Rescale dreparc and dadparc to the mean 0 and SD 1 of the 4 year dataset.
COMPUTE zdreparc1 = (dreparc1 - 6.51) / 2.10.
COMPUTE zdreparc2 = (dreparc2 - 6.51) / 2.10.
COMPUTE zdadparc1 = (dadparc1 - 0.579) / 0.136.
COMPUTE zdadparc2 = (dadparc2 - 0.579) / 0.136.
EXECUTE.
* Now take the mean, again rescaling to mean 0 and SD 1 in the 4yr dataset.
COMPUTE dparcax1 = (MEAN(zdreparc1, zdadparc1) + 0.0015) / 0.834.
COMPUTE dparcax2 = (MEAN(zdreparc2, zdadparc2) + 0.0015) / 0.834.
EXECUTE.
* The recomputed dparcax correlates 1.00 with the original dparca.
* but the difference is that dparcax has fewer missing values.

* If dparca is non-missing, use the cut-off of -1.72.
* which is the 5 %-ile of dparca in the larger 4 year dataset.
NUMERIC dlparca1 dlparca2 (F1.0).
VARIABLE LEVEL dlparca1 dlparca2 (NOMINAL).
RECODE dparca1 dparca2
 (LOWEST THRU -1.72=1) (-1.7199 THRU HIGHEST=0)
INTO dlparca1 dlparca2.
EXECUTE.
* But if dparca is missing, use recomputed dparcax instead.
DO IF (SYSMIS(dparca1)).
 RECODE dparcax1 
  (LOWEST THRU -1.72=1) (-1.7199 THRU HIGHEST=0)
 INTO dlparca1.
END IF.
DO IF (SYSMIS(dparca2)).
 RECODE dparcax2
  (LOWEST THRU -1.72=1) (-1.7199 THRU HIGHEST=0)
 INTO dlparca2.
END IF.
EXECUTE.
ecbasgt1/2, ecbaslt1/2, ecbast1/2

Total scores for the BAS measure.
Ecbasgt1 is the total for the Grammatical sub-test.
Ecbaslt1 is the total for the Lexical sub-test.
Ecbast1 is the overal total score.
Derive as a sum of item scores, requiring at least half of the relevant item scores to be present (any missing items are treated in the same way as zero scores).

* BAS total scores.
* Each item is coded 1right 0wrong.
* Require at least half the items to be non-missing.
* but compute score using SUM so that any missing are treated like zero scores.

* Overall total out of 27 for all items.
COMPUTE ecbast1 = SUM.14(ecbas011, ecbas021, ecbas031, ecbas041, ecbas051, ecbas061,
 ecbas071, ecbas081, ecbas091, ecbas101, ecbas111, ecbas121, ecbas131, ecbas141, ecbas151, 
 ecbas161, ecbas171, ecbas181, ecbas191, ecbas201, ecbas211, ecbas221, ecbas231, ecbas241,
 ecbas251, ecbas261, ecbas271).
EXECUTE.

* Items 1-16 comprise a lexical sub-test.
COMPUTE ecbaslt1 = SUM.8(ecbas011, ecbas021, ecbas031, ecbas041, ecbas051, ecbas061, ecbas071,
 ecbas081, ecbas091, ecbas101, ecbas111, ecbas121, ecbas131, ecbas141, ecbas151, ecbas161) .
EXECUTE.

* Items 17-27 comprise a grammatical sub-test.
COMPUTE ecbasgt1 = SUM.6(ecbas171, ecbas181, ecbas191, ecbas201, ecbas211,
 ecbas221, ecbas231, ecbas241, ecbas251, ecbas261, ecbas271) .
EXECUTE.
ecgft1/2

Total score for the Goldman Fristoe test.
Derived as a sum of item scores, requiring at least half of the item scores to be present (any missing items are treated in the same way as zero scores). Items are coded 1=correct, 0=incorrect, with the code 2=not elicted formatted in SPSS as a missing value so it does not affect the score.

* Goldman Fristoe total score.
* Each item is coded as 1=right 0=wrong.
* (2=not elicited is set as a missing value so it won't contribute to the total).

* Derive simple total out of 74 for all items.
* Require at least half the items to be non-missing.
* but compute score using SUM so that any missing are treated like zero scores.
COMPUTE ecgft1 = SUM.37(ecgf01i1, ecgf01m1, ecgf01f1, ecgf02i1, ecgf02m1, ecgf02f1, ecgf03i1,
 ecgf03m1, ecgf03f1, ecgf04i1, ecgf05i1, ecgf06i1, ecgf06m1, ecgf06f1, ecgf07i1, ecgf07m1,
 ecgf07f1, ecgf08i1, ecgf08m1, ecgf08f1, ecgf09i1, ecgf09m1, ecgf09f1, ecgf10i1, ecgf10m1,
 ecgf10f1, ecgf11i1, ecgf11m1, ecgf11f1, ecgf12i1, ecgf13i1, ecgf13m1, ecgf13f1, ecgf14i1,
 ecgf14m1, ecgf14f1, ecgf15i1, ecgf15m1, ecgf15f1, ecgf16i1, ecgf16m1, ecgf16f1, ecgf17i1,
 ecgf17m1, ecgf17f1, ecgf18i1, ecgf18m1, ecgf18f1, ecgf19i1, ecgf19m1, ecgf19f1, ecgf20i1,
 ecgf20m1, ecgf20f1, ecgf21i1, ecgf21m1, ecgf21f1, ecgf22i1, ecgf22m1, ecgf22f1, ecgf23i1,
 ecgf23m1, ecgf24b1, ecgf25b1, ecgf26b1, ecgf27b1, ecgf28b1, ecgf29b1, ecgf30b1, ecgf31b1,
 ecgf32b1, ecgf33b1, ecgf34b1, ecgf35b1).
EXECUTE.
ecmcgci1/2, ecmcmmi1/2, ecmcmti1/2, ecmcppi1/2, ecmcqni1/2, ecmcvbi1/2

McCarthy cognitive index scores, each computed by the standard method as specified in the McCarthy handbook, as a weighted sum of relevant test scores. The individual test scores are item variables in the dataset.
ecmcgci1/2 is the general cognitive index.
ecmcmmi1/2 is the memory index.
ecmcmti1/2 is the motor index.
ecmcppi1/2 is the perceptual-performance index.
ecmcqni1/2 is the quantitative index.
ecmcvbi1/2 is the verbal index.
The computation, using the + operator rather than the SUM function in SPSS, requires every component score to be non-missing.

* McCarthy index scores as per McCarthy handbook.
* Note that the scores are computed by addition rather than using SUM.
* This means that the index score will be missing if any one term is missing.

* Verbal index.
COMPUTE ecmcvbi1 = ecmcpic1 + ecmcwpv1 + ecmcwov1
	+ (0.5 * ecmcvws1) + ecmcvst1 + ecmcvfl1 + (2 * ecmcopp1) .
EXECUTE.

* Perceptual-performance index.
COMPUTE ecmcppi1 = ecmcblo1 + (0.5 * ecmcpuz1) + ecmctap1 
	+ ecmcdes1 + ecmcchi1 + ecmccgr1 .
EXECUTE.

* Quantitative index.
COMPUTE ecmcqni1 = (2 * ecmcnum1) + ecmcnmf1 
	+ (2 * ecmcnmb1) + ecmccso1 .
EXECUTE.

* Memory index.
COMPUTE ecmcmmi1 = ecmcpic1 + ecmctap1 + (0.5 * ecmcvws1)
	+ ecmcvst1 + ecmcnmf1 + (2 * ecmcnmb1) . 
EXECUTE.

* Motor index.
COMPUTE ecmcmti1 = ecmcleg1 + ecmcabb1 + ecmcabc1 
	+ ecmcabt1 + ecmcimi1 + ecmcdes1 + ecmcchi1 .
EXECUTE.

* General cognitive index.
COMPUTE ecmcgci1 = ecmcblo1 + (0.5 * ecmcpuz1) + ecmcpic1 
	+ ecmcwpv1 + ecmcwov1 + (2 * ecmcnum1) + ecmctap1 
	+ (0.5 * ecmcvws1) + ecmcvst1 + ecmcdes1 + ecmcchi1 
	+ ecmcnmf1 + (2 * ecmcnmb1) + ecmcvfl1 + ecmccso1 
	+ (2 * ecmcopp1) + ecmccgr1 .
EXECUTE.
ecmchand1/2

Ordinal handedness scale, derived from raw items counting the use of each hand (left, right or both) in McCarthy ball, bean bag and drawing exercises. The handedness scale is coded as 1=entirely left (only left hand used), 2=mainly left (largest count is for left, but some use of right/both), 3=equally both (equal counts for left and right, and/or use of both), 4=mainly right, 5=entirely right. Further derivation details are in the syntax below.

* McCarthy handedness.
* Derived from raw item counts of use of hands in McCarthy coordination tests.
* ecmchr1 (right hand) and ecmchl1 (left hand) each 0-4, and ecmchb1 (both hands) 0-1.
* Convert the three items to an ordinal handedness scale.
* from 1=left to 5=right and 3=both equally.
* The combinations below were double checked with crosstabs.
* Entirely left (1) or right (5) handed: non-zero response for left/right.
* with zero in the other variables.
IF (ecmchl1 >= 0 & SUM(ecmchr1, ecmchb1) = 0) ecmchand1 = 1.
IF (ecmchr1 >= 0 & SUM(ecmchl1, ecmchb1) = 0) ecmchand1 = 5.
* Equal (3): non-zero equal left/right, or zero left/right and non-zero 'both hands'.
IF (SUM(ecmchl1, ecmchr1) >= 0 & (ecmchl1 = ecmchr1)) ecmchand1 = 3.
IF (SUM(ecmchl1, ecmchr1) = 0 & ecmchb1 >= 0) ecmchand1 = 3.
* The remainder are mainly left (2) or mainly right (4).
* Non-zero combinations of left/both or right/both.
IF (ecmchl1 >= 0 & ecmchb1 >= 0 & ecmchr1 = 0) ecmchand1 = 2.
IF (ecmchr1 >= 0 & ecmchb1 >= 0 & ecmchl1 = 0) ecmchand1 = 4.
* and non-zero but imbalanced combinations of left/right.
IF (ecmchl1 >= 0 & ecmchr1 >= 0 & ecmchl1 >= ecmchr1) ecmchand1 = 2.
IF (ecmchl1 >= 0 & ecmchr1 >= 0 & ecmchl1 < ecmchr1) ecmchand1 = 4.
EXECUTE.
* Drop the 3 raw items because they are no longer needed.
ecmcnmt1/2, ecmcvmt1/2, ecmcwkt1/2

McCarthy total scores, each derived as a weighted sum of two related test scores.
ecmcnmt1/2 is the numerical memory total score.
ecmcvmt1/2 is the verbal memory total score.
ecmcwkt1/2 is the word knowledge total score.
The computation, using the + operator rather than the SUM function in SPSS, requires every component score to be non-missing.

* McCarthy total scores.
* Note that individual tests have already been totalled in the raw data.
* but here we add some related scores into composites.

* Total for the two Word Knowledge scores.
COMPUTE ecmcwkt1 = ecmcwpv1 + ecmcwov1.
EXECUTE.

* Scaled total for the two Verbal Memory scores.
COMPUTE ecmcvmt1 = (0.5 * ecmcvws1) + ecmcvst1.
EXECUTE. 

* Scaled total for the two Numerical Memory scores.
COMPUTE ecmcnmt1 = ecmcnmf1 + (2 * ecmcnmb1).
EXECUTE.
ecmcppi1/2, ecmcqni1/2, ecmcvbi1/2

See ecmcgci1/2, etc above.

ecnont1/2

Total score for the Non Word Repetition test (0-20). Derived as a sum of the 20 item scores, each item being coded 1=correct, 0=incorrect. Require at least half of the item scores to be non-missing.

* Non Word Repetition total score.
* Each item is coded as 1right 0wrong.
* Simply sum all items to get total score out of 20.
* Require at least half the items to be non-missing.
* but compute score using SUM so that any missing are treated like zero scores.
COMPUTE ecnont1 = SUM.10(ecnon011, ecnon021, ecnon031, ecnon041, ecnon051, 
 ecnon061, ecnon071, ecnon081, ecnon091, ecnon101, ecnon111, ecnon121, 
 ecnon131, ecnon141, ecnon151, ecnon161, ecnon171, ecnon181, ecnon191, ecnon201).
EXECUTE.
econtrola, econtrolb

Flag variables to show whether the family is a control (coded 1) or a family in which one or both twins are 'lows' (coded 0). 'Low' twins may be either low-language or low-Parca, according to the 4 year booklet data.
Controls are defined in two ways, labelled A and B. Definition A: econtrola is derived using historical categories (dlowlan1/2 and dlowpar1/2) which were used at the time of selection of families for in-home visits.
Definition B: econtrolb is derived strictly using data from the current 4 Year booklet dataset (dllang1/2 is derived within that dataset, while dlparca1/2 is described elsewhere on this page).
The two definitions agree for the majority of families in the sample, but disagree for a significant minority. Researchers may of course choose to use either definition for analysis. Definition B is used for standardising In Home measures to the 'control' sample distributions (see 'ez' variables described elsewhere on this page).

* Categorise control families.
* A Control family is one in which neither twin is low (language or Parca).
* We need two definitions, labelled A and B.

* Start by defining families as controls by default.
* (this will include those with missing 4yr data).
COMPUTE econtrola = 1.
COMPUTE econtrolb = 1.
EXECUTE.

* Definition A is the old one based on historical 'low' flag variables.
* that were used at the time of sample selection for the study.
IF (dlowlan1 = 1 | dlowlan2 = 1 | dlowpar1 = 1 | dlowpar2 = 1) econtrola = 0.
EXECUTE.

* Definition B is based strictly on 4 year booklet data.
IF (dllang1 = 1 | dllang2 = 1 | dlparca1 = 1 | dlparca2 = 1) econtrolb = 0.
EXECUTE.
ecphhand1/2

Handedness scale derived from raw item responses recorded in the Phonological Awareness test.
Each such raw item is coded with integer values 1-5: 1=used left hand, 3=used both hands equally, 5=used right hand. However, codes 2 and 4 are problematic (picked up with one hand then posted with the other) because they do not clearly indicate either left- or right- handedness, so these codes are first recoded to missing values. The scale is then derived as a simple mean, with decimal values 1-5.

* Phonological Awareness handedness.
* A raw handedness score is recorded for each of the 12 items.
* Conveniently coded 1-5 in the same direction as the McCarthy handedness scale.
* but codes 2 and 4 are problematic: 2=picked up with left then posted with right.
* 4=picked up with right then posted with left.
* These codes do not necessarily imply predominantly left or right handedness.
* so recode them to missing before scaling.
RECODE ecph01h1 ecph02h1 ecph03h1 ecph04h1 ecph05h1 ecph06h1 
    ecph07h1 ecph08h1 ecph09h1 ecph10h1 ecph11h1 ecph12h1
   (2=SYSMIS) (4=SYSMIS) (ELSE=COPY).
EXECUTE.
* Now make a simple, decimal mean from the 12 recoded items.
COMPUTE ecphhand1 = RND(MEAN(ecph01h1, ecph02h1, ecph03h1, ecph04h1, 
  ecph05h1, ecph06h1, ecph07h1, ecph08h1, ecph09h1, ecph10h1, ecph11h1, ecph12h1), 0.1).
EXECUTE.
* The 12 raw handedness items are no longer useful and are dropped.
ecpht1/2

Total score for the Phonological Awareness test, using only the first 8 items.
Derived as a sum of item scores, requiring at least half of the item scores to be present (any missing items are treated in the same way as zero scores).

* Phonological Awareness total score.
* Each item is coded 1right 0wrong.
* Only use first 8 items, summing to get total score out of 8.
* Require at least half the items to be non-missing.
* but compute score using SUM so that any missing are treated like zero scores.
COMPUTE ecpht1 = SUM.4(ecph011, ecph021, ecph031, ecph041, ecph051, ecph061, ecph071, ecph081).
EXECUTE.
ectestn1/2

Number of twin cognitive tests completed (0 to 20).
This count only includes those 20 tests that are used to make cognitive language and non-verbal composites (which are described elsewhere on this page). This includes 14 of the McCarthy tests plus the 6 other main tests, but excludes the Bayley behaviour ratings. Derivation of the count is explained in the syntax below. The counts of non-missing scores, and the data flags for some tests, are treated as temporary variables and are not retained in the dataset.

* Evaluate completeness of data.
* Start by counting the number of non-missing scores in each test.
* that includes a set of item scores from which a total is derived.
COUNT ecbasnn1 = ecbas011 ecbas021 ecbas031 ecbas041 ecbas051 ecbas061 ecbas071 
  ecbas081 ecbas091 ecbas101 ecbas111 ecbas121 ecbas131 ecbas141 ecbas151 
  ecbas161 ecbas171 ecbas181 ecbas191 ecbas201 ecbas211 ecbas221 
  ecbas231 ecbas241 ecbas251 ecbas261 ecbas271 (0 THRU HIGHEST).
COUNT ecgfnn1 = ecgf01i1 ecgf01m1 ecgf01f1 ecgf02i1 ecgf02m1 ecgf02f1 
  ecgf03i1 ecgf03m1 ecgf03f1 ecgf04i1 ecgf05i1 ecgf06i1 ecgf06m1 ecgf06f1
  ecgf07i1 ecgf07m1 ecgf07f1 ecgf08i1 ecgf08m1 ecgf08f1
  ecgf09i1 ecgf09m1 ecgf09f1 ecgf10i1 ecgf10m1 ecgf10f1
  ecgf11i1 ecgf11m1 ecgf11f1 ecgf12i1 ecgf13i1 ecgf13m1 ecgf13f1
  ecgf14i1 ecgf14m1 ecgf14f1 ecgf15i1 ecgf15m1 ecgf15f1
  ecgf16i1 ecgf16m1 ecgf16f1 ecgf17i1 ecgf17m1 ecgf17f1
  ecgf18i1 ecgf18m1 ecgf18f1 ecgf19i1 ecgf19m1 ecgf19f1
  ecgf20i1 ecgf20m1 ecgf20f1 ecgf21i1 ecgf21m1 ecgf21f1
  ecgf22i1 ecgf22m1 ecgf22f1 ecgf23i1 ecgf23m1
  ecgf24b1 ecgf25b1 ecgf26b1 ecgf27b1 ecgf28b1 ecgf29b1
  ecgf30b1 ecgf31b1 ecgf32b1 ecgf33b1 ecgf34b1 ecgf35b1 (0 THRU HIGHEST).
COUNT ecnonnn1 = ecnon011 ecnon021 ecnon031 ecnon041 ecnon051 ecnon061
 ecnon071 ecnon081 ecnon091 ecnon101 ecnon111 ecnon121 ecnon131 ecnon141
 ecnon151 ecnon161 ecnon171 ecnon181 ecnon191 ecnon201 (0 THRU HIGHEST).
* Phonological awareness: count just the scores, not the handedness items.
* and only count the first 8 items as in the scale.
COUNT ecphonnn1 = ecph011 ecph021 ecph031 ecph041 ecph051 ecph061 ecph071
  ecph081 (0 THRU HIGHEST).
EXECUTE.
* Now derive test data flags for the above plus Bus and Action tests.
* The cut-offs for meaningful numbers of item scores are based on.
* the numbers required for total scores in the next script.
RECODE ecbasnn1 (LOWEST THRU 13=0) (14 THRU HIGHEST=1) INTO ecbasdata1.
RECODE ecgfnn1 (LOWEST THRU 36=0) (37 THRU HIGHEST=1) INTO ecgfdata1.
RECODE ecnonnn1 (LOWEST THRU 9=0) (10 THRU HIGHEST=1) INTO ecnondata1.
RECODE ecphonnn1 (LOWEST THRU 3=0) (4 THRU HIGHEST=1) INTO ecphondata1.
EXECUTE.
* Bus info test: 4 scores recorded but the critical one is ecbusin1.
* so all is OK if that score is present.
RECODE ecbusin1 (SYSMIS=0) (ELSE=1) INTO ecbussdata1.
* Similarly Action pictures test: 2 scores but ecactgr1 is critical.
RECODE ecactgr1 (SYSMIS=0) (ELSE=1) INTO ecactpdata1.
EXECUTE.

* McCarthy test scores.
* Note that each non-missing score represents a completed test (in theory).
* Focus on the 14 specific McCarthy scores that are used to make verbal and.
* non-verbal composites in the next script: count these.
COUNT ecmnn1 = ecmcblo1 ecmcpuz1 ecmcnum1 ecmctap1 ecmcdes1 ecmcchi1 
  ecmcnmf1 ecmcnmb1 ecmccso1 ecmccgr1 ecmcwpv1 ecmcwov1 ecmcvfl1 ecmcopp1 (0 THRU HIGHEST).
EXECUTE.

* Now make an overall count of the number of non-missing verbal and non-verbal tests.
* that contain meaningful data and that are used in the cognitive composites (next script).
* Do this by summing the count of the 'useful' 14 McCarthy tests.
* with the flags for other tests as derived above.
COMPUTE ectestn1 = SUM(ecmnn1, ecbasdata1, ecgfdata1, ecnondata1, ecphondata1, ecbussdata1, ecactpdata1).
EXECUTE.
* This reveals 5 cases with no meaningful test data at all.
* although some of these have Bayley behaviour ratings.

* Recode this count into the child test data flag.
RECODE ectestn1 (0=0) (1 THRU HIGHEST=1)
INTO ecdata1.
EXECUTE.
ectgft1/2

Log transformation of the Goldman Fristoe total score, scaled to decimal values between 0 and 1.
This purpose of the transformation is to reduce the skewness in the scores.
Derived variable ecgft1/2 (the untransformed total score) is described elsewhere on this page.

* Log transformation of Goldman Fristoe total, scaled to decimal values 0-1.
COMPUTE ectgft1 = (LG10(75) - LG10(75 - ecgft1)) / LG10(75).
EXECUTE.
eexclude

Exclusion variable for the in-home dataset (1=exclude, 0=not).
Based on twin-pair medical exclusions assessed at the time of the in-home visits (emedexcl) and known or unknown twin-pair sex and zygosity.

* Create general exclusion variable.
* Exclude if a medical exclusion (variable emedexcl, assessed at the time of the visit, 44 pairs).
* or if the pair has unknown sex/zygosity (1 pair).
* Note that all pairs in the dataset have paired twin data.
* and 1st Contact data, are not perinatal outliers, and have alang=1.
* so no need to consider these as exclusion criteria.
* (2 pairs have aethnic=0 but this is not a standard exclusion).
COMPUTE eexclude = 0.
IF (sexzyg = 7 | emedexcl = 1) eexclude = 1.
EXECUTE.
eprvoc01 through to eprvoc50

Scores (1=correct, 0=incorrect) for vocabulary items 1 to 50 in the parent questionnaire.
Computed simply by recoding the raw response variables (which have response codes 1-4) as shown in the syntax.
The syntax uses raw variable names pvoc01 - pvoc50, later renamed to eprvoc01 - eprvoc50.

* Items for which 1 is the correct response.
RECODE 
  pvoc07 pvoc14 pvoc18 pvoc19 pvoc23 pvoc33 pvoc35 pvoc44 pvoc47 
  (1=1) (2=0) (3=0) (4=0) (SYSMIS=SYSMIS) .
EXECUTE.
* Items for which 2 is the correct response.
RECODE
  pvoc02 pvoc03 pvoc13 pvoc17 pvoc22 pvoc24 pvoc38 pvoc42 pvoc43 pvoc45 
  pvoc46 pvoc48 pvoc49 (1=0) (2=1) (3=0) (4=0) (SYSMIS=SYSMIS) .
EXECUTE.
* Items for which 3 is the correct response.
RECODE
  pvoc01 pvoc06 pvoc09 pvoc11 pvoc12 pvoc15 pvoc16 pvoc20 pvoc21 pvoc25 
  pvoc26 pvoc27 pvoc29 pvoc30 pvoc32 pvoc36 pvoc41 pvoc50
  (1=0) (2=0) (3=1) (4=0) (SYSMIS=SYSMIS) .
EXECUTE.
* Items for which 4 is the correct response.
RECODE
  pvoc04 pvoc05 pvoc08  pvoc10 pvoc28 pvoc31 pvoc34 pvoc37 pvoc39 pvoc40
  (1=0) (2=0) (3=0) (4=1) (SYSMIS=SYSMIS) .
EXECUTE.
eprvoct

Total score for the parent vocabulary test.
Derived as a sum of the item scores, requiring at least half of the item scores to be present (any missing items are treated in the same way as zero scores). The item scores eprvocXX are derived from item responses, and are described elsewhere on this page.

* Parent vocabulary total score.
* Each item is coded 1right 0wrong, so sum to get total out of 50.
* Require at least half the item scores to be non-missing.
* but otherwise if some are missing then use SUM to treat missing as zero.
COMPUTE eprvoct = SUM.25(eprvoc01, eprvoc02, eprvoc03, eprvoc04, eprvoc05, eprvoc06,
 eprvoc07, eprvoc08, eprvoc09, eprvoc10, eprvoc11, eprvoc12, eprvoc13, eprvoc14, eprvoc15,
 eprvoc16, eprvoc17, eprvoc18, eprvoc19, eprvoc20, eprvoc21, eprvoc22, eprvoc23, eprvoc24, 
 eprvoc25, eprvoc26, eprvoc27, eprvoc28, eprvoc29, eprvoc30, eprvoc31, eprvoc32, eprvoc33,
 eprvoc34, eprvoc35, eprvoc36, eprvoc37, eprvoc38, eprvoc39, eprvoc40, eprvoc41, eprvoc42,
 eprvoc43, eprvoc44, eprvoc45, eprvoc46, eprvoc47, eprvoc48, eprvoc49, eprvoc50) .
EXECUTE.
etestage

Age of the twins on the date of the in-home visit (measured in decimal years).
Variable evisdate is an admin item containing the visit date. Variable aonsdob is the twin birth date, again from admin data. These date variables are not retained in the dataset.

* Create variable for child age at visit.
COMPUTE etestage = RND((DATEDIFF(evisdate, aonsdob, "days")) / 365.25, 0.1) .
EXECUTE.
etestLLCage, etestLLCdate

Age and date variables (for in-home visit and testing) derived for use in datasets in the LLC TRE (but not to be used in other datasets).
The LLC date variables contain only the month and year, not the day, as a means of reducing identifiability. The date variables are strings formatted as 'yyyy-mm'. These LLC dates are designed to enable the TEDS measures to be placed in a time sequence with NHS medical diagnosis dates in the data in the TRE.
The LLC age variables are integers measuring the number of months between birth and the given TEDS activity, consistent with the matching LLC date variables.
Variable aonsdob is the twin birth date - the raw date variables are not retained in the dataset.

* First extract year and month as temp variables, from birth date and activity dates.
COMPUTE birthyear = XDATE.YEAR(aonsdob).
COMPUTE birthmonth = XDATE.MONTH(aonsdob).
COMPUTE evisyear = XDATE.YEAR(evisdate).
COMPUTE evismonth = XDATE.MONTH(evisdate).
EXECUTE.

* The agreed LLC date format is a string yyyy-mm (nominal by default for strings).
* adding '0' where necessary for two-digit months.
STRING etestLLCdate (A7).
IF (evismonth < 10) etestLLCdate = CONCAT(STRING(evisyear, F4), '-0', STRING(evismonth, F1)).
IF (evismonth >= 10) etestLLCdate = CONCAT(STRING(evisyear, F4), '-', STRING(evismonth, F2)).
EXECUTE.

* The agreed LLC age variable is in integer months.
* and it must agree with the birth and booklet year/month variables that will be available in the LLC.
COMPUTE etestLLCage = (evismonth + (evisyear * 12)) - (birthmonth + (birthyear * 12)).
EXECUTE.
ezartic1/2

Articulation composite score, standardised to the control distribution.
See comments in the syntax below for details of derivation. Variables eztgft1/2, eznont1/2, eexclude and econtrolb are all derived variables described elsewhere on this page.

* Create Articulation composite.
* First find mean of standardised transformed Goldman Fristoe and Non Word scores.
COMPUTE artic1 = MEAN(eztgft1, eznont1).
EXECUTE.

* Apply filter to select the control families minus any exclusions.
USE ALL.
COMPUTE filter_$=(econtrolb = 1 & eexclude = 0).
VARIABLE LABELS filter_$ 'econtrolb = 1 & eexclude = 0 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

DESCRIPTIVES VARIABLES= artic1
  /STATISTICS=MEAN STDDEV .

* remove filter.
FILTER OFF.
USE ALL.
EXECUTE .

* use the control mean and standard deviation to standardise the new composite.
COMPUTE ezartic1 =  RND(((artic1 - 0.0004)/0.8992), 0.01) .
EXECUTE.
ezllan1/2, ezlnv1/2, ezslan1/2, ezsnv1/2

Language and Non-Verbal composite scores, standardised to the control distribution.
Ezllan1/2 is the 'long' language composite (computed from 8 components).
Ezslan1/2 is the 'short' language composite (computed from 7 components).
Ezlnv1/2 is the 'long' non-verbal composite (computed from 9 components).
Ezsnv1/2 is the 'short' non-verbal composite (computed from 4 components).
See syntax for details of derivation. All the standardised component scores (names starting with 'ez') and filtering variables (eexclude, econtrolb) are derived variables that are described elsewhere on this page.

* Compute Language and Non-Verbal composites.
* Derived as means of the standardised scores already computed.

* Long versions.
* ('long' in the sense that they are derived from a long list of scores).
* As computed by Tom Price, and used to identify lows for DNA pooling.

* The long language composite is the mean of 8 verbal measures.
COMPUTE llan1 = MEAN.5(ezbusin1, ezactgr1, ezbast1, 
  ezmcwkt1, ezmcvfl1, ezmcopp1, ezpht1, ezartic1).
EXECUTE.

* The long non-verbal composite is the mean of 9 non-verbal measures.
COMPUTE lnv1 = MEAN.4(ezmcblo1, ezmcpuz1, ezmcnum1,
  ezmctap1, ezmcdes1, ezmcchi1, ezmcnmt1, ezmccso1, ezmccgr1).
EXECUTE.

* Short versions.
* ('short' because derived from fewer scores than those above).
* As computed by Essi.

* The short language composite is the mean of 7 verbal measures.
* (does not include articulation).
COMPUTE slan1 = MEAN.5(ezbusin1, ezactgr1, ezbast1, 
  ezmcwkt1, ezmcvfl1, ezmcopp1, ezpht1).
EXECUTE.

* The short non-verbal composite is the mean of 4 non-verbal measures.
COMPUTE snv1 = MEAN.3(ezmcblo1, ezmcpuz1, ezmctap1, ezmcdes1).
EXECUTE.

* Apply filter to select the control families minus any exclusions.
USE ALL.
COMPUTE filter_$=(econtrolb = 1 & eexclude = 0).
VARIABLE LABELS filter_$ 'econtrolb = 1 & eexclude = 0 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

DESCRIPTIVES VARIABLES= llan1 lnv1 slan1 snv1 
  /STATISTICS=MEAN STDDEV .

* remove the filter.
FILTER OFF.
USE ALL.
EXECUTE .

* As with other variables, standardise using control sample means and SDs.
VARIABLE LEVEL ezllan1 ezlnv1 ezslan1 ezsnv1 (SCALE).
* Note that these four means are negative and close to zero.
COMPUTE ezllan1 = RND(((llan1 + 0.002) / 0.654), 0.01) .
COMPUTE ezlnv1 =  RND(((lnv1 + 0.004) / 0.578), 0.01) .
COMPUTE ezslan1 = RND(((slan1 + 0.001) / 0.673), 0.01) .
COMPUTE ezsnv1 =  RND(((snv1 + 0.001) / 0.663), 0.01) .
EXECUTE.
ezactgr1/2, ezactin1/2, ezbasgt1/2, ezbaslt1/2, ezbast1/2, ezbusa51/2, ezbusin1/2, ezbussc1/2, ezbussl1/2, ezmcblo1/2, ezmccgr1/2, ezmcchi1/2, ezmccso1/2, ezmcdes1/2, ezmcgci1/2, ezmcimi1/2, ezmcmmi1/2, ezmcmti1/2, ezmcnmt1/2, ezmcnum1/2, ezmcopp1/2, ezmcpic1/2, ezmcppi1/2, ezmcpuz1/2, ezmcqni1/2, ezmctap1/2, ezmcvbi1/2, ezmcvfl1/2, ezmcvmt1/2, ezmcvst1/2, ezmcvws1/2, ezmcwkt1/2, ezmcwov1/2, ezmcwpv1/2, eznont1/2, ezpht1/2, eztgft1/2

Standardized versions of various test scores.
Each is standardised to its respective distribution for 'control' twins.
Some of these test scores are item variables, while others are derived. All derived variables used in the syntax below (eexclude, econtrolb, ecbast1/2, ecbaslt1/2, ecbasgt1/2, ecmcgci1/2, ecmcmmi1/2, ecmcmti1/2, ecmcnmt1/2, ecmcppi1/2, ecmcqni1/2, ecmcvbi1/2, ecmcvmt1/2, ecmcwkt1/2, ecnont1/2, ecpht1/2, ectfgt1/2) are described elsewhere on this page.

* Standardise according to distribution of controls (ignore low cases from 4yr booklets).
* Apply filter to select the control families minus any exclusions.
USE ALL.
COMPUTE filter_$=(econtrolb = 1 & eexclude = 0).
VARIABLE LABELS filter_$ 'econtrolb = 1 & eexclude = 0 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

* Now determine means and standard deviations for twin test scores.
DESCRIPTIVES VARIABLES= ecbast1 ecbaslt1 ecbasgt1 ecmcblo1 ecmcpuz1 ecmcpic1 
  ecmcwpv1 ecmcwov1 ecmcnum1 ecmctap1 ecmcvws1 ecmcvst1 
  ecmcimi1 ecmcdes1 ecmcchi1 ecmchr1 
  ecmchl1 ecmchb1 ecmcnmf1 ecmcnmb1 ecmcvfl1 ecmccso1 ecmcopp1 
  ecmccgr1 ecnont1 ecbusin1 ecbussl1 ecbusa51 ecbussc1 ecactin1 ecactgr1 
  ecpht1 ectgft1 ecmcwkt1 ecmcvmt1 ecmcnmt1
  ecmcvbi1 ecmcppi1 ecmcqni1 ecmcmmi1 ecmcmti1 ecmcgci1 
  /STATISTICS=MEAN STDDEV .

* Remove the filter.
FILTER OFF.
USE ALL.
EXECUTE .

* Standardise to control distribution, in new variables.
* Subtract the control mean then divide by the control standard deviation.
COMPUTE ezbast1 = RND(((ecbast1 - 23.674) / 2.175), 0.01).
COMPUTE ezbaslt1 = RND(((ecbaslt1 - 15.43) / 0.983), 0.01).
COMPUTE ezbasgt1 = RND(((ecbasgt1 - 8.245) / 1.66), 0.01).
COMPUTE ezmcblo1 = RND(((ecmcblo1 - 8.761) / 1.586), 0.01).
COMPUTE ezmcpuz1 = RND(((ecmcpuz1 - 12.316) / 6.495), 0.01).
COMPUTE ezmcpic1 = RND(((ecmcpic1 - 2.917) / 1.356), 0.01).
COMPUTE ezmcwpv1 = RND(((ecmcwpv1 - 8.908) / 0.513), 0.01).
COMPUTE ezmcwov1 = RND(((ecmcwov1 - 4.471) / 3.241), 0.01).
COMPUTE ezmcnum1 = RND(((ecmcnum1 - 3.843) / 1.014), 0.01).
COMPUTE ezmctap1 = RND(((ecmctap1 - 3.572) / 1.572), 0.01).
COMPUTE ezmcvws1 = RND(((ecmcvws1 - 17.648) / 7.366), 0.01).
COMPUTE ezmcvst1 = RND(((ecmcvst1 - 3.231) / 2.894), 0.01).
COMPUTE ezmcvfl1 = RND(((ecmcvfl1 - 13.232) / 5.452), 0.01).
COMPUTE ezmcimi1 = RND(((ecmcimi1 - 3.226) / 1.021), 0.01).
COMPUTE ezmcdes1 = RND(((ecmcdes1 - 5.501) / 2.434), 0.01).
COMPUTE ezmcchi1 = RND(((ecmcchi1 - 8.419) / 3.273), 0.01).
COMPUTE ezmccso1 = RND(((ecmccso1 - 5.803) / 1.717), 0.01).
COMPUTE ezmcopp1 = RND(((ecmcopp1 - 3.993) / 1.791), 0.01).
COMPUTE ezmccgr1 = RND(((ecmccgr1 - 7.644) / 2.078), 0.01).
COMPUTE eznont1 = RND(((ecnont1 - 13.222) / 4.567), 0.01).
COMPUTE ezbusin1 = RND(((ecbusin1 - 18.8) / 10.379), 0.01).
COMPUTE ezbussl1 = RND(((ecbussl1 - 33.901) / 15.562), 0.01).
COMPUTE ezbusa51 = RND(((ecbusa51 - 6.841) / 3.031), 0.01).
COMPUTE ezbussc1 = RND(((ecbussc1 - 0.424) / 0.859), 0.01).
COMPUTE ezactin1 = RND(((ecactin1 - 27.188) / 5.469), 0.01).
COMPUTE ezactgr1 = RND(((ecactgr1 - 18.6) / 6.302), 0.01).
COMPUTE ezpht1 = RND(((ecpht1 - 4.864) / 2.205), 0.01).
COMPUTE eztgft1 = RND(((ectgft1 - 0.718) / 0.229), 0.01).
COMPUTE ezmcwkt1 = RND(((ecmcwkt1 - 13.315) / 3.371), 0.01).
COMPUTE ezmcvmt1 = RND(((ecmcvmt1 - 11.578) / 5.612), 0.01).
COMPUTE ezmcnmt1 = RND(((ecmcnmt1 - 6.491) / 3.194), 0.01).
COMPUTE ezmcvbi1 = RND(((ecmcvbi1 - 51.911) / 13.072), 0.01).
COMPUTE ezmcppi1 = RND(((ecmcppi1 - 40.182) / 8.562), 0.01).
COMPUTE ezmcqni1 = RND(((ecmcqni1 - 20.438) / 5.017), 0.01).
COMPUTE ezmcmmi1 = RND(((ecmcmmi1 - 26.698) / 7.386), 0.01).
COMPUTE ezmcmti1 = RND(((ecmcmti1 - 29.862) / 6.999), 0.01).
COMPUTE ezmcgci1 = RND(((ecmcgci1 - 114.097) / 21.91), 0.01).
EXECUTE.