TEDS Data Dictionary

Derived Variables in the 26 Year Dataset

This page gives a listing of derived variables in the 26 Year dataset, in alphabetical order of variable name. For each variable, a short written description is followed by the SPSS syntax (in a box) that was used to derive the variable.

Most variables described on this page were derived prior to double entering the dataset. The variable name suffix "1" or "2" (to denote twin and co-twin) was added to each twin-specific variable in a later script, hence is not generally shown in the syntax on this page.

This page does not include descriptions of background variables that are derived from other sources and that are included in the 26 Year dataset. For information about such variables, see pages describing background variables, exclusions and scrambled IDs.

List of variables described on this page

Click on a variable name in the table below to go to the description on this page. Alternatively, scroll down and find variables in alphabetical order.

Definitions of derived variables

Listed alphabetically

zcage1/2

Age of twin (in decimal years) when the CATSLife web study was started. Derived from variables representing respective dates, as mentioned in syntax comments; aonsdob is the twin birth date. These date variables are not retained in the dataset.

* Twin ages derived as differences between birth date and other event dates.
* Compute number of days then divide by 365 to get decimal years.
* and round to nearest 0.1 as in other datasets.
* Catslife: derive age when activities started (use consent date).
IF (zcdata1 = 1) zcage = RND(((DATEDIFF(zcssconstart, aonsdob, "days")) / 365.25), 0.1) .
EXECUTE.
zcdurn1/2, zcssdurn1/2, zctdurn1/2

Durations, measured in decimal minutes, for the CATSLife battery as a whole (zcdurn) and for its two main parts: Spatial Spy (zcssdurn) and Test My Brain (zctdurn). Each is measured in decimal minutes.
These are derived by summing the durations of component missions (Spatial Spy) and tests (Test My Brain), such durations being item variables that were generated on the web servers.

* Overall durations for TMB, Spatial Spy, and entire battery, ignoring consent and feedback.
* Only derive these when the activities have been finished.
* Spatial Spy: sum the mission durations and divide by 60 to convert seconds to minutes.
* This works nicely because each mission has a time limit so no extreme outliers.
IF (zcssstat = 2) zcssdurn = RND((SUM(zcssod1durn, zcssod3durn, zcssod5durn,
 zcssol1durn, zcssol3durn, zcssol5durn, zcssmn1durn, zcssmn3durn, zcssmn5durn) / 60), 0.1).
EXECUTE.
* TMB: sum the four test durations and divide by 60.
IF (zctstat = 2) zctdurn = RND((SUM(zctvcdurn, zctsndurn, zctmwdurn, zctmndurn) / 60), 0.1).
EXECUTE.
* Entire battery: sum the two variables above if both non-missing.
COMPUTE zcdurn = SUM.2(zcssdurn, zctdurn).
EXECUTE.
zcLLCage1/2, zcLLCdate1/2, zmhLLCage1/2, zmhLLCdate1/2

Age and date variables derived for use in datasets in the LLC TRE (but not to be used in other datasets).
Ages and dates are derived for the MHQ ('zmh') and Catslife ('zc') activities.
The LLC date variables contain only the month and year, not the day, as a means of reducing identifiability. The date variables are strings formatted as 'yyyy-mm'. These LLC dates are designed to enable the TEDS measures to be placed in a time sequence with NHS medical diagnosis dates in the data in the TRE.
The LLC age variables are integers measuring the number of months between birth and the given TEDS activity, consistent with the matching LLC date variables.
Variable aonsdob is the twin birth date - the raw date variables are not retained in the dataset.

* First we need best estimate of MHQ date (temp variable).
* MHQ logged as paper booklet return: use return date.
IF (zmhdata1 = 1 & zmhpaper1 = 1) zmhdate = zmhrdate1.
* If not logged as paper booklet return, use start date from Qualtrics.
IF (zmhdata1 = 1 & zmhpaper1 = 0) zmhdate = startDate.
EXECUTE.

* Now extract year and month as temp variables, from birth date and activity dates.
* zcssconstart is the consent date from the start of Catslife activities.
NUMERIC birthyear zmhyear zcyear (F4.0).
NUMERIC birthmonth zmhmonth zcmonth (F2.0).
COMPUTE zmhyear = XDATE.YEAR(zmhdate).
COMPUTE zcyear = XDATE.YEAR(zcssconstart).
COMPUTE birthyear = XDATE.YEAR(aonsdob).
COMPUTE zmhmonth = XDATE.MONTH(zmhdate).
COMPUTE zcmonth = XDATE.MONTH(zcssconstart).
COMPUTE birthmonth = XDATE.MONTH(aonsdob).
EXECUTE.

* The agreed LLC date format is a string yyyy-mm.
* adding '0' where necessary for two-digit months.
STRING zmhLLCdate zcLLCdate (A7).
IF (zmhmonth < 10) zmhLLCdate = CONCAT(STRING(zmhyear, F4), '-0', STRING(zmhmonth, F1)).
IF (zmhmonth >= 10) zmhLLCdate = CONCAT(STRING(zmhyear, F4), '-', STRING(zmhmonth, F2)).
IF (zcmonth < 10) zcLLCdate = CONCAT(STRING(zcyear, F4), '-0', STRING(zcmonth, F1)).
IF (zcmonth >= 10) zcLLCdate = CONCAT(STRING(zcyear, F4), '-', STRING(zcmonth, F2)).
EXECUTE.

* The agreed LLC age variable is in integer months.
* and it must agree with the birth and booklet year/month variables that will be available in the LLC.
COMPUTE zmhLLCage = (zmhmonth + (zmhyear * 12)) - (birthmonth + (birthyear * 12)).
COMPUTE zcLLCage = (zcmonth + (zcyear * 12)) - (birthmonth + (birthyear * 12)).
EXECUTE.
zcssdevmob1/2, zcssdevwdth1/2

Device categories, estimated from raw data collected at the start of the CATSLife Spatial Spy activities. Both categories can be used as quality control variables, because the Spatial Spy activities were designed only to be carried out on laptops or desktops with relatively large screens.
zcssdevwdth is the device's screen width, numerically categorised as small, medium, large or very large. For small devices (mobiles) an adjustment is made to detect cases where the device was turned sideways.
zcssdevmob is a binary category (1=yes 0=no) which estimates whether or not a mobile device was being used. This derived from substrings of the raw 'user agent' variable, which does not always give clear-cut results.
The raw device variables, from which these are derived, are not retained in the dataset.

* Screenwidth: arbitrary categories of small/medium/large.
* Make temporary version of raw screenwidth.
COMPUTE screenwidth = zcssscrw.
EXECUTE.
* For cases where width > height, if a small/medium "width" (really height) < 1200.
* then assume this is a small screen turned sideways and uses the smaller measurement as width.
IF ((zcssscrw > zcssscrh) & zcssscrw < 1200) screenwidth = zcssscrh.
EXECUTE.
* Note that Spatial Spy does not flip to landscape mode when a mobile device is turned.
* Categorise screen widths small, medium or large using cutoffs 768 and 1200 pixels.
* with new category of very large >= 1680.
* Mobile phones are invariably small (<768), tablets and small/cheap laptops medium.
* larger/better laptops and some desktops large, modern desktops very large.
* (Note that laptops and desktops typically have width > height).
RECODE screenwidth
 (LOWEST THRU 767=1) (768 THRU 1199=2) (1200 THRU 1679=3) (1680 THRU HIGHEST=4)
INTO zcssdevwdth.
EXECUTE.

* Device type: binary 'mobile' variable.
* Designed to identify mobile devices as mobile phones and tablets.
* although there can be a grey area for small/cheap laptops and large tablets, for example.
* Use the raw zcssuseragent field to identify crude device types from substrings.
* 'Windows', 'Macintosh' and 'X11' are probably all laptops or desktops (not mobile).
IF (CHAR.INDEX(zcssuseragent, 'Windows') > 0
 | CHAR.INDEX(zcssuseragent, 'Macintosh') > 0
 | CHAR.INDEX(zcssuseragent, 'X11') > 0) zcssdevmob = 0.
EXECUTE.
* 'iPhone' and 'Android' are likely to be mobile phones.
* 'iPad' are assumed to be tablets, also counted as mobile.
* 'Mobile' is another useful substring (note 'Tablet' seems to have largely gone out of use).
* These can override 'Windows' etc above.
IF (CHAR.INDEX(zcssuseragent, 'iPhone') > 0 
   | CHAR.INDEX(zcssuseragent, 'Android') > 0 
   | CHAR.INDEX(zcssuseragent, 'iPad') > 0 
   | CHAR.INDEX(zcssuseragent, 'Mobile') > 0
   | CHAR.INDEX(zcssuseragent, 'Tablet') > 0) zcssdevmob = 1.
EXECUTE.
zcssdurn1/2

See zcdurn1/2, zcssdurn1/2, zctdurn1/2 above.

zcssmnXas1/2, zcssodXas1/2, zcssolXas1/2

Mission accuracy scores for each of the 9 missions in the CATSLife Spatial Spy battery. X is 1, 3 or 5 (the mission number) for each mission type. The mission types are Map Reading No Memory (zcssmn), Orientation Direction (zcssod) and Orientation Landmarks (zcssol).
Each orientation mission comprised 3 tasks, and the mission score has integer values 0 to 3 indicating the number of tasks successfully completed. Each map reading mission has a single task, for which the score is 0 (failed), 1 (completed by an indirect route) or 2 (completed by the most direct route).
The derivation is of each score is explained in the syntax below.

* Map reading missions - very simple because there is only one task per mission.
* Use response code scores 1 and 2, recoding timeout (-1) to score of 0.
* giving mission scores 0/1/2.
RECODE zcssmn1r zcssmn3r zcssmn5r
 (-1=0) (1=1) (2=2)
INTO zcssmn1as zcssmn3as zcssmn5as.
EXECUTE.
* Orientation direction missions: 3 tasks per mission.
* with each task response coded 1=correct 0=wrong or -1=timed out.
* and tasks 2/3 can also be -2=discontinued if mission timed out during earlier attempt.
* Start by recoding the response to task 1 into scores.
RECODE zcssod1t1r zcssod3t1r zcssod5t1r
 (-1=0) (0=0) (1=1)
INTO zcssod1as zcssod3as zcssod5as.
EXECUTE.
* Now increment the mission score to 2 if task 2 was successful.
IF (zcssod1t2r = 1) zcssod1as = zcssod1as + 1.
IF (zcssod3t2r = 1) zcssod3as = zcssod3as + 1.
IF (zcssod5t2r = 1) zcssod5as = zcssod5as + 1.
EXECUTE.
* and increment again from task 3.
IF (zcssod1t3r = 1) zcssod1as = zcssod1as + 1.
IF (zcssod3t3r = 1) zcssod3as = zcssod3as + 1.
IF (zcssod5t3r = 1) zcssod5as = zcssod5as + 1.
EXECUTE.
* Orientation Landmarks are similar: 3 tasks per mission.
* In this case, each task response is 1=correct or -1=timed out (incorrect).
* and tasks 2/3 may be discontinued (-2) if a previous task was timed out.
* Start by recoding the response to task 1 into scores.
RECODE zcssol1t1r zcssol3t1r zcssol5t1r 
 (-1=0) (1=1)
INTO zcssol1as zcssol3as zcssol5as.
EXECUTE.
* Now increment the mission score to 2 if task 2 was successful.
IF (zcssol1t2r = 1) zcssol1as = zcssol1as + 1.
IF (zcssol3t2r = 1) zcssol3as = zcssol3as + 1.
IF (zcssol5t2r = 1) zcssol5as = zcssol5as + 1.
EXECUTE.
* and increment again from task 3.
IF (zcssol1t3r = 1) zcssol1as = zcssol1as + 1.
IF (zcssol3t3r = 1) zcssol3as = zcssol3as + 1.
IF (zcssol5t3r = 1) zcssol5as = zcssol5as + 1.
EXECUTE.
zcssmnXr1/2, zcssodXtYr1/2, zcssolXtYr1/2

Coded response outcomes for each mission, or for each task in each mission, in the Spatial Spy activities. These are used to replace the raw item variables which have obscure item coding. The coding used here is comparable to that used in the very similar 18 Year Navigation study activities.
Note that, in each variable name, X refers to the mission number (1, 3 or 5). Orientation missions (od, ol) are divided into tasks, and Y refers to the task number (1, 2 or 3).
zcssmnXr: coded response outcomes for Map Reading No Memory missions. Here, the possible outcomes for each mission are coded -1=failed (timed out), 1=succeeded by correct but indirect route, 2=succeeded by the most direct route.
zcssodXtYr: coded response outcomes for Orientation Direction tasks in each mission. Here, the possible task outcomes are coded -4=terminated when browser closed, -2=discontinued, -1=timed out, 0=incorrect direction, 1=followed correct direction.
zcssolXtYr: coded response outcomes for Orientation Landmarks tasks in each mission. The possible task outcomes are codes as for Orientation Direction, except that here there is no code 0 (because failure results from a timeout, not from taking an incorrect direction).
See comments in the syntax below for full explanations of mission/task rules and the coding used. Repetitive sections of syntax have been shorted where indicated in [comments like this].
The raw completion (cm) and error (er) item variables have not been retained in the dataset because they become redundant once the new response outcomes have been derived.

* Recode Spatial Spy items.
* ------------------------.
* Use coding as in 18yr Navigation.
* but adapted because of some changes to the activities.
* In the raw data, each task has 4 item variables.
* cm=completion, er=error, ct=completion time, rt=reaction time.
* These all vary in coding, with some redundancy, for the 3 mission types.
* Each od and ol mission has 3 tasks, each mn mission has a single task.
* Start by replacing cm and er with a single outcome variable (suffix r).
* Successful outcomes.
* OD and OL (3 tasks per mission): success if cm=1 and er=0.
IF (zcssod1t1cm = 1 & zcssod1t1er = 0) zcssod1t1r = 1.
[syntax repeated for each task in each od mission]
IF (zcssol1t2cm = 1 & zcssol1t2er = 0) zcssol1t2r = 1.
[syntax repeated for each task in each ol mission]
EXECUTE.
* MN (1 task per mission): perfect response (code 2) if cm=1 and er=0.
* and correct but imperfect response (code 1) if cm=1 and er=1.
IF (zcssmn1cm = 1 & zcssmn1er = 0) zcssmn1r = 2.
IF (zcssmn1cm = 1 & zcssmn1er = 1) zcssmn1r = 1.
[syntax repeated for each of three missions]
EXECUTE.

* Now in some missions, durn can be 0 or missing depending on circumstances.
* If start and end datetimes are both non-missing, use these to replace durn.
IF ((zcssod1durn = 0 | SYSMIS(zcssod1durn)) & ~SYSMIS(zcssod1end) & ~SYSMIS(zcssod1start)) 
   zcssod1durn = DATEDIFF(zcssod1end, zcssod1start, "seconds").
IF ((zcssol1durn = 0 | SYSMIS(zcssol1durn)) & ~SYSMIS(zcssol1end) & ~SYSMIS(zcssol1start)) 
   zcssol1durn = DATEDIFF(zcssol1end, zcssol1start, "seconds").
IF ((zcssmn1durn = 0 | SYSMIS(zcssmn1durn)) & ~SYSMIS(zcssmn1end) & ~SYSMIS(zcssmn1start)) 
   zcssmn1durn = DATEDIFF(zcssmn1end, zcssmn1start, "seconds").
[syntax repeated for mission 3 and mission 5 of each mission type]
EXECUTE.

* Incorrect outcomes including timeouts and discontinue.
* Only OD can be explicitly wrong, OL and MN are failed by timeout.
* OD: cm=0 and er=1 if wrong response.
IF (zcssod1t1cm = 0 & zcssod1t1er = 1) zcssod1t1r = 0.
[syntax repeated for each task in each mission for od]
EXECUTE.
* OD has mission time out (60 for od1, 180 for od3 and od5).
* (could time out during task 1, or during task 2 having completed task 1, etc).
* Task timed out (code -1) if cm=0 and er=0.
IF (zcssod1t1cm = 0 & zcssod1t1er = 0) zcssod1t1r = -1.
[syntax repeated for each task in each mission for od]
EXECUTE.
* Because timeout applies to entire mission, it is effectively a discontinue rule.
* for tasks 2/3 if timed out on task 1, or for task 3 if timed out on task 2.
* In such cases, change coding from -1 (timeout) to -2 (discontinued).
IF (zcssod1t1r = -1) zcssod1t2r = -2.
IF (zcssod1t1r = -1) zcssod1t3r = -2.
[syntax repeated for task 1 of od missions 3 and 5]
EXECUTE.
IF (zcssod1t2r = -1) zcssod1t3r = -2.
IF (zcssod3t2r = -1) zcssod3t3r = -2.
IF (zcssod5t2r = -1) zcssod5t3r = -2.
EXECUTE.
* In these cases, should also find task ct and rt missing, and mission durn = 60 or 180.

* OL has time out for each task (60) but this is not recorded in ct/rt if timed out.
* Timed out (-1) if cm=0 and er=0.
IF (zcssol1t1cm = 0 & zcssol1t1er = 0) zcssol1t1r = -1.
[syntax repeated for each task in each ol mission]
EXECUTE.
* There is also an effective timeout rule.
* If timed out on task 1, tasks 2 and 3 are discontinued (same for task 2 to task 3).
* In such cases, changed coding from -1 (timed out) to -2 (discontinued).
IF (zcssol1t1r = -1) zcssol1t2r = -2.
IF (zcssol1t1r = -1) zcssol1t3r = -2.
[syntax repeated for ol missions 3 and 5]
EXECUTE.
IF (zcssol1t2r = -1) zcssol1t3r = -2.
IF (zcssol3t2r = -1) zcssol3t3r = -2.
IF (zcssol5t2r = -1) zcssol5t3r = -2.
EXECUTE.
* MN: only one task (so no discontinue).
* Timed out if cm=0 and er=0.
IF (zcssmn1cm = 0 & zcssmn1er = 0) zcssmn1r = -1.
IF (zcssmn3cm = 0 & zcssmn3er = 0) zcssmn3r = -1.
IF (zcssmn5cm = 0 & zcssmn5er = 0) zcssmn5r = -1. 
EXECUTE.

* Other outcomes.
* OD and OL may effectively be terminated by closing the browser when not finished.
* In such cases, apparently durn=0 and task times (ct and rt) are missing.
* (does not seem to affect task 1 of each OD mission).
* Code terminated/crashed as -4.
IF (zcssod1durn = 0 & SYSMIS(zcssod1t1ct)) zcssod1t1r = -4.
[syntax repeated for each task in each od mission]
EXECUTE.
IF (zcssol1durn = 0 & SYSMIS(zcssol1t1ct)) zcssol1t1r = -4.
[syntax repeated for each task in each ol mission]
EXECUTE.
* Note that in these cases the mission duration is unavailable because durn=0.
* and also end datetime is missing; only ct and rt are non-missing for task 1.
* if it was completed before the browser was closed.
zcssmnXss1/2, zcssodXss1/2, zcssolXss1/2

Mission speed scores for each of the 9 missions in the CATSLife Spatial Spy battery. X is 1, 3 or 5 (the mission number) for each mission type. The mission types are Map Reading No Memory (zcssmn), Orientation Direction (zcssod) and Orientation Landmarks (zcssol).
Each speed score is derived from the mission duration (how long it took) as a decimal number between 0 and 1, where 1 represents the fastest possible correct response, lower values represent slower correct responses, and 0 is reserved for incorrect responses. For each mission, the 'fastest possible' correct response is determined from observed mission times in the data.
The derivation is of each score is explained in the syntax below.

* As in 18yr navigation, derive these as decimals with range 0-1.
* If accuracy score is 0 then nothing was achieved so speed score should also be 0.
IF (zcssod1as = 0) zcssod1ss = 0.
[syntax repeated for the other 8 missions]
EXECUTE.
* In other cases, speed score is based on the mission durations (durn variables).
* Map reading missions: time limit 60 seconds (durn < 60 unless timed out hence zero score above).
* Minimum time needed for successful completion is 20 (mn1), 22 (mn3), 19 (mn5).
* Round durn down to nearest second, subtract it from 60 to get range roughly 0 (slow) to ~40 (fast).
* and divide by suitable (60 - minimum) value to get an accuracy score from 0 to maximum close to 1.
* (but only if accuracy score was non-zero); round to 0.01.
IF (zcssmn1as > 0) zcssmn1ss = RND(((60 - TRUNC(zcssmn1durn)) / (60 - 20)), 0.01).
IF (zcssmn3as > 0) zcssmn3ss = RND(((60 - TRUNC(zcssmn3durn)) / (60 - 22)), 0.01).
IF (zcssmn5as > 0) zcssmn5ss = RND(((60 - TRUNC(zcssmn5durn)) / (60 - 19)), 0.01).
EXECUTE.
* Orientation direction missions: time limit 60 for od1, 180 for od3 and od5.
* Minimum time needed for success is 20 (od1), 20 (od3), 25 (od5).
* Derive speed scores in the same way as above.
IF (zcssod1as > 0) zcssod1ss = RND(((60 - TRUNC(zcssod1durn)) / (60 - 20)), 0.01).
IF (zcssod3as > 0) zcssod3ss = RND(((180 - TRUNC(zcssod3durn)) / (180 - 20)), 0.01).
IF (zcssod5as > 0) zcssod5ss = RND(((180 - TRUNC(zcssod5durn)) / (180 - 25)), 0.01).
EXECUTE.
* Orientation landmarks missions: time limit 180.
* Minimum time needed for success is 43 (ol1), 54 (ol3), 86 (ol5).
IF (zcssol1as > 0) zcssol1ss = RND(((180 - TRUNC(zcssol1durn)) / (180 - 43)), 0.01).
IF (zcssol3as > 0) zcssol3ss = RND(((180 - TRUNC(zcssol3durn)) / (180 - 54)), 0.01).
IF (zcssol5as > 0) zcssol5ss = RND(((180 - TRUNC(zcssol5durn)) / (180 - 86)), 0.01).
EXECUTE.
zcssmnXstat1/2, zcssodXstat1/2, zcssolXstat1/2, zcssstat1/2

CATSLife Spatial Spy activities: status flags for each mission and for the battery as a whole. Each status flag is coded 0=not started, 2=successfully completed, 3=data compromised. The overall flag may also be coded 1=started but not finished.
In all variable names, X is the mission number (1, 3 or 5). The mission types are Map Reading No Memory (zcssmn), Orientation Direction (zcssod) and Orientation Landmarks (zcssol).
The status flags are derived from raw item data, as indicated by comments in the syntax below. Repetitions of similar lines of syntax have sometimes been removed for the sake of brevity, and replaced by [comments like this].

* Spatial Spy activities (9 missions): make a data flag and a status variable for each mission.
* Overall data flag already exists; add overall status, and count of activities.
* Note: missions are numbered 1, 3, 5 not 1, 2, 3.
* (and each OD/OL mission comprises 3 tasks numbered t1, t2, t3).
* Start with status flags.
* Mission not started (0) if start datetime is missing.
IF (SYSMIS(zcssod1start)) zcssod1stat = 0.
[syntax repeated for each od, ol and mn mission]
EXECUTE.
* Finished (2) if end datetime is non-missing.
IF (~SYSMIS(zcssod1end)) zcssod1stat = 2.
[syntax repeated for each od, ol and mn mission]
EXECUTE.
* If start date is present but end date is missing, this really means.
* that the browser has been closed in mid-mission, so the mission has been.
* compromised rather than half-finished: set status flag to 3 rather than 1.
IF (SYSMIS(zcssod1end) & ~SYSMIS(zcssod1start)) zcssod1stat = 3.
[syntax repeated for each od, ol and mn mission]
EXECUTE.

* Unfinished (compromised) missions are not useful (they are fortunately rare).
* These are generally cases where the browser has been closed.
* and so at least one of tasks 1/2/3 are coded -4 above (does not seem to affect MN missions).
* In such cases, recode the mission item data to missing.
* and recode the status variable from 1 (started) to 3 (compromised).
DO IF (zcssod1stat = 3).
 RECODE zcssod1durn zcssod1start zcssod1end 
   zcssod1t1ct zcssod1t1er zcssod1t1rt zcssod1t1cm zcssod1t2ct zcssod1t2er zcssod1t2rt zcssod1t2cm 
   zcssod1t3ct zcssod1t3er zcssod1t3rt zcssod1t3cm zcssod1t1r zcssod1t2r zcssod1t3r (ELSE=SYSMIS).
END IF.
[syntax repeated for od missions 3 and 5]
EXECUTE.
DO IF (zcssol1stat = 3).
 RECODE zcssol1durn zcssol1start zcssol1end 
   zcssol1t1ct zcssol1t1er zcssol1t1rt zcssol1t1cm zcssol1t2ct zcssol1t2er zcssol1t2rt zcssol1t2cm 
   zcssol1t3ct zcssol1t3er zcssol1t3rt zcssol1t3cm zcssol1t1r zcssol1t2r zcssol1t3r (ELSE=SYSMIS).
END IF.
[syntax repeated for ol missions 3 and 5]
EXECUTE.
DO IF (zcssmn1stat = 3).
 RECODE zcssmn1durn zcssmn1start zcssmn1end 
  zcssmn1ct zcssmn1er zcssmn1rt zcssmn1cm zcssmn1r (ELSE=SYSMIS).
END IF.
[syntax repeated for mn missions 3 and 5]
EXECUTE.

* Convert status variables to data flags.
RECODE  zcssod1stat zcssod3stat zcssod5stat
 zcssol1stat zcssol3stat zcssol5stat
 zcssmn1stat zcssmn3stat zcssmn5stat
 (0=0) (2=1) (3=0)
INTO zcssod1data zcssod3data zcssod5data
 zcssol1data zcssol3data zcssol5data
 zcssmn1data zcssmn3data zcssmn5data.
EXECUTE.

* Count successfully completed missions as sum of data flags.
COMPUTE zcssnact = SUM(zcssod1data, zcssod3data, zcssod5data, 
 zcssol1data, zcssol3data, zcssol5data, zcssmn1data, zcssmn3data, zcssmn5data ).
EXECUTE.

* Overall Spatial Spy battery status: code 0/1/2 for now.
* Not started if no activities completed (could be that some started but were abandoned).
IF (zcssnact = 0) zcssstat = 0.
* Battery is clearly finished if all 9 missions successfully completed.
IF (zcssnact = 9) zcssstat = 2.
* Can also assume finished if last missions (od5) successfully completed.
* and fewer than 9 missions completed.
IF (zcssod5stat = 2 & RANGE(zcssnact, 1, 8)) zcssstat = 2.
* Can also assume finished if the following feedback activity was started.
* provided that at least one mission was successfully completed.
* (caters for cases where last activity was abandoned).
IF (zcssnact > 0 & ~SYSMIS(zcssfdbkstart)) zcssstat = 2.
* Battery started but unfinished in other cases.
* namely between 1 and 8 missions successfully completed, final mission not completed.
* and the following feedback activity was not started.
IF (RANGE(zcssnact, 1, 8) & ANY(zcssod5stat, 0, 1, 3) & SYSMIS(zcssfdbkstart)) zcssstat = 1.
EXECUTE.
* When cleaning (next script) look for cases with zcssstat = 2 & zcssnact < 8.
* (probably serial crashes making the battery invalid).
* and cases with zcssstat = 1 & zcssnact < 7 (decide on exclusion for incomplete data).

* There is at least one case of a major crash where at least one activity successfully completed.
* but other activities missing (status=0) and feedback started so zcssstat=2.
* Here, recode status from 0 to 3 for 'missing' missions.
DO IF (RANGE(zcssnact, 1, 8) & zcssstat = 2).
 RECODE zcssod1stat zcssod3stat zcssod5stat zcssol1stat zcssol3stat zcssol5stat 
    zcssmn1stat zcssmn3stat zcssmn5stat (0=3).
END IF.
EXECUTE.
* Recode the existing Spatial Spy data flag to 0 if no activities successfully completed.
IF (zcssstat = 0) zcssdata = 0.
EXECUTE.

* Now deal with exclusions (compromised data).
* Identify cases where multiple missions seem to have crashed (status=3).
COUNT zcsscrashed = zcssod1stat zcssod3stat zcssod5stat 
 zcssol1stat zcssol3stat zcssol5stat zcssmn1stat zcssmn3stat zcssmn5stat (3).
EXECUTE.
* Exclusions: recode Spatial Spy status from 0/1/2 to 3.
* Firstly, if started/finished and more than one mission crashed.
IF (ANY(zcssstat, 1, 2) & zcsscrashed > 1) zcssstat = 3.
* Secondly, if started and one mission crashed and < 7 successfully completed.
IF (zcssstat = 1 & zcsscrashed = 1 & zcssnact < 7) zcssstat = 3.
EXECUTE.
* Thirdly, assume data are compromised (unusable) if started, no crashes.
* but fewer than 4 missions successfully completed.
IF (zcssstat = 1 & zcsscrashed = 0 & zcssnact < 4) zcssstat = 3.
EXECUTE.
* Scores are nearly always low where both zcssdevwdth = 1 (small screen).
* and zcssdevmob = 1 (mobile device).
* Exclude all such cases.
IF (zcssdevwdth = 1 & zcssdevmob = 1 & ANY(zcssstat, 1, 2)) zcssstat = 3.
EXECUTE.
* Later, Spatial Spy data will be deleted where zcssstat = 3.
zcssmnXts1/2, zcssodXts1/2, zcssolXts1/2

Mission accuracy scores for each of the 9 missions in the CATSLife Spatial Spy battery. X is 1, 3 or 5 (the mission number) for each mission type. The mission types are Map Reading No Memory (zcssmn), Orientation Direction (zcssod) and Orientation Landmarks (zcssol).
For every mission, the total score is derived as a weighted mean of the accuracy score and the speed score (both described elsewhere on this page), with accuracy given twice the weighting of speed. This mean is scale so that total score decimal values are in the range 0 to 1.

* As at 18yr, combine the accuracy score and the speed score as a mean.
* but giving the accuracy score twice the weighting of the speed score.
* and scale the result to decimal values 0-1.
* Map reading missions.
* Accuracy scores 0-2, speed scores 0-1.
COMPUTE zcssmn1ts = RND((SUM(zcssmn1as, zcssmn1ss) / 3), 0.01).
COMPUTE zcssmn3ts = RND((SUM(zcssmn3as, zcssmn3ss) / 3), 0.01).
COMPUTE zcssmn5ts = RND((SUM(zcssmn5as, zcssmn5ss) / 3), 0.01).
EXECUTE.
* Orientation missions.
* Accuracy scores 0-3, speed scores 0-1.
COMPUTE zcssod1ts = RND((SUM(((zcssod1as * 2) / 3), zcssod1ss) / 3), 0.01).
COMPUTE zcssod3ts = RND((SUM(((zcssod3as * 2) / 3), zcssod3ss) / 3), 0.01).
COMPUTE zcssod5ts = RND((SUM(((zcssod5as * 2) / 3), zcssod5ss) / 3), 0.01).
COMPUTE zcssol1ts = RND((SUM(((zcssol1as * 2) / 3), zcssol1ss) / 3), 0.01).
COMPUTE zcssol3ts = RND((SUM(((zcssol3as * 2) / 3), zcssol3ss) / 3), 0.01).
COMPUTE zcssol5ts = RND((SUM(((zcssol5as * 2) / 3), zcssol5ss) / 3), 0.01).
EXECUTE.
zcssodas1/2, zcssolas1/2, zcssmnas1/2, zcssas1/2, zcssodss1/2, zcssolss1/2, zcssmnss1/2, zcssss1/2, zcssodts1/2, zcssolts1/2, zcssmnts1/2, zcssts1/2

Summary scores for the CATSLife Spatial Spy battery. These include overall scores for each of the three mission types (Map Reading No Memory: zcssmn; Orientation Direction: zcssod; Orientation Landmarks: zcssol), and overall across all three mission types. There are three types of score: accuracy score (as), speed score (ss) and total score (ts). Each is derived as a mean, starting from the scores already derived for each individual mission (see elsewhere on this page).
Every one of these scores has decimal values in the range 0 to 1, rounded to the nearest 0.01. The derivation is further explained in the syntax below.

* Summary scores.
* First for each of the 3 mission types, then overall across all mission types.
* Accuracy, speed and total scores.
* These are all decimal so make all in range 0-1.
* For each mission type, take the mean across the 3 missions and (if necessary) scale to 0-1.
* also rounding values to the nearest 0.01.
* Require at least 2 of 3 missions to be completed in each case.
COMPUTE zcssodas = RND(SUM.2(zcssod1as, zcssod3as, zcssod5as) / 9, 0.01).
COMPUTE zcssolas = RND(SUM.2(zcssol1as, zcssol3as, zcssol5as) / 9, 0.01).
COMPUTE zcssmnas = RND(SUM.2(zcssmn1as, zcssmn3as, zcssmn5as) / 6, 0.01).
COMPUTE zcssodss = RND(MEAN.2(zcssod1ss, zcssod3ss, zcssod5ss), 0.01).
COMPUTE zcssolss = RND(MEAN.2(zcssol1ss, zcssol3ss, zcssol5ss), 0.01).
COMPUTE zcssmnss = RND(MEAN.2(zcssmn1ss, zcssmn3ss, zcssmn5ss), 0.01).
COMPUTE zcssodts = RND(MEAN.2(zcssod1ts, zcssod3ts, zcssod5ts), 0.01).
COMPUTE zcssolts = RND(MEAN.2(zcssol1ts, zcssol3ts, zcssol5ts), 0.01).
COMPUTE zcssmnts = RND(MEAN.2(zcssmn1ts, zcssmn3ts, zcssmn5ts), 0.01).
EXECUTE.
* Grand total Spatial Spy scores: simply take the mean across the 3 mission types.
* requiring all 3 to be non-missing.
COMPUTE zcssas = RND(MEAN.3(zcssodas, zcssolas, zcssmnas), 0.01).
COMPUTE zcssss = RND(MEAN.3(zcssodss, zcssolss, zcssmnss), 0.01).
COMPUTE zcssts = RND(MEAN.3(zcssodts, zcssolts, zcssmnts), 0.01).
EXECUTE.
zctdurn1/2

See zcdurn1/2, zcssdurn1/2, zctdurn1/2 above.

zctmnstat1/2, zctmwstat1/2, zctsnstat1/2, zctvcstat1/2, zctstat1/2

Status flags for each of the four CATSLife Test My Brain tests, and overall. Each is coded 0=not started or 2=completed. The overall status flag may also be coded 1=started but not finished, if between 1 and 3 of the 4 tests were completed.
The tests are Remembering Numbers (zctmn), Remembering Word Pairs (zctmw), Digit Symbol Matching (zctsn) and Vocabulary (zctvc).
The status flags are derived from reliable test data flags already present in the raw item data. Note that each test is apparently complete, with no unfinished tests present.
The status flags may be recoded to value 4=excluded because of random responding or lack of engagement by a twin for each test. This recoding is not shown here.

* Reliable data flags already exist for the 4 TMB activities: ensure non-missing.
RECODE zctvcdata zctsndata zctmwdata zctmndata (SYSMIS=0).
EXECUTE.
* Derive status variables and an overall data flag from these.
NUMERIC zctvcstat zctsnstat zctmwstat zctmnstat zctstat zctdata (F1.0).
RECODE zctvcdata zctsndata zctmwdata zctmndata
 (0=0) (1=2)
INTO zctvcstat zctsnstat zctmwstat zctmnstat.
EXECUTE.
* and overall TMB status.
IF (SUM(zctvcdata, zctsndata, zctmwdata, zctmndata) = 4) zctstat = 2.
IF (SUM(zctvcdata, zctsndata, zctmwdata, zctmndata) = 0) zctstat = 0.
IF (RANGE(SUM(zctvcdata, zctsndata, zctmwdata, zctmndata), 1, 3)) zctstat = 1.
EXECUTE.
* recode this into overall data flag.
RECODE zctstat (0=0) (1=1) (2=1) INTO zctdata.
EXECUTE.

* Now deal with exclusions.
* Matching Shapes and Numbers test.
* Use quality control guidelines as suggested by Corianna (Many Brains).
* to detect disengaged twins and code status as 4.
* generally where twins are clicking either very rapidly or very slowly or randomly.
* First: invalid if < x correct.
* Corianna suggested 6 but data show < 35 are extreme outliers.
IF (zctsntot < 35) zctsnstat = 4.
EXECUTE.
* Second: invalid if fewer than x% correct of those answered.
* Corianna suggested 50% but data show 75% detects outliers.
IF ((zctsntot / zctsnatn) < 0.75) zctsnstat = 4.
EXECUTE.
* Third: invalid if implausibly fast reaction time.
* Corianna suggests zctsncortmd < 300ms but < 500ms are very extreme outliers.
IF (zctsncortmd < 500) zctsnstat = 4.
EXECUTE.
* Fourth: invalid if SD/median for correct response times > x.
* Corianna suggests > 2 but > 1.2 cuts off extreme outliers.
IF ((zctsncortsd / zctsncortmd) > 1.2) zctsnstat = 4.
EXECUTE.
* Fifth (not suggested by Corianna): invalid if same key pressed repeatedly.
* Graphs show clear pattern for (number times key x pressed) / total score > y.
* where y is 0.5 for keys 2 and 3, y is 0.4 for key 1.
* (fewer correct responses resulted from key 1 than from keys 2 and 3).
IF ((zctsna1n / zctsntot) > 0.4) zctsnstat = 4.
IF ((zctsna2n / zctsntot) > 0.5) zctsnstat = 4.
IF ((zctsna3n / zctsntot) > 0.5) zctsnstat = 4.
EXECUTE.
* All these exclusion categories are quite largely overlapping.

* Vocabulary test.
* Fewer problems with this straightforward test.
* which allows items to be timed out then repeated.
* First exclusion: very rapid responding.
* Mean item response times < 1.5s are outlying and always have low scores.
IF (zctvctmn < 1.5) zctvcstat = 4.
EXECUTE.
* Second exclusion: repeated (probably random) selection of same answer option.
* Detect using temporary variable: SD of item responses (having values 1-5).
COMPUTE zctvcasd = SD(zctvc01a, zctvc02a, zctvc03a, zctvc04a, zctvc05a, zctvc06a, zctvc07a, 
  zctvc08a, zctvc09a, zctvc10a, zctvc11a, zctvc12a, zctvc13a, zctvc14a, zctvc15a, zctvc16a, 
  zctvc17a, zctvc18a, zctvc19a, zctvc20a).
EXECUTE.
* Values below  about 1.1 are very low, below 1.0 are extreme.
* Exclude outliers with SD < 1.1 if score <= 6 (probably random).
IF ((zctvcasd < 1.1) & (zctvctot <= 6)) zctvcstat = 4.
EXECUTE.

* Remembering Words test.
* Test has a similar structure to Vocabulary.
* but data show more problems because of high difficulty levels.
* hence presumably participant boredom and loss of engagement.
* Take care not to exclude twins who simply could not remember any word pairs.
* so focus on the first 10 items where twins should have been trying hard.
* First exclusion: very rapid responding in first 10 items.
COMPUTE zctmwtmn10 = RND((MEAN(zctmw01t, zctmw02t, zctmw03t, zctmw04t, zctmw05t, zctmw06t, 
 zctmw07t, zctmw08t, zctmw09t, zctmw10t) / 1000), 0.1).
EXECUTE.
* Mean item response times <= 1.6s are outlying and always have low scores.
IF (zctmwtmn10 <= 1.6) zctmwstat = 4.
EXECUTE.
* Second exclusion: same response repeatedly chosen in first 10 items.
COMPUTE zctmwasd10 = SD(zctmw01a, zctmw02a, zctmw03a, zctmw04a, zctmw05a, zctmw06a, zctmw07a, 
  zctmw08a, zctmw09a, zctmw10a).
EXECUTE.
* Values below 0.6 are clearly outlying and have low scores, so exclude.
IF (zctmwasd10 <= 0.6) zctmwstat = 4.
EXECUTE.

* Remembering Numbers test.
* This test seems to have been robust, short and with few problems.
* The test had a discontinue rule: discontinue if both items wrong at a given level.
* which seems to have eliminated some problems.
* Items had no time limit: very long times may suggest disengagement.
* although rapid responding does not seem to have been a factor.
* First exclusion: discontinued after first two items (minimal score = 1).
* with comparatively long median item response time of > 5s.
IF (zctmntot = 1 & zctmntmd > 5) zctmnstat = 4.
EXECUTE.
* Second exclusion: apparently random responses in the first two items.
* leading to immediate discontinue.
* Correct responses were '87' and '13' respectively.
* so exclude if neither response contained any correct digits.
* (indicating that twin misunderstood or didn't try).
IF (zctmnstat > 0
  & (~ANY(CHAR.SUBSTR(zctmn01a, 1, 1), '7', '8'))
  & (~ANY(CHAR.SUBSTR(zctmn01a, 2, 1), '7', '8'))
  & (~ANY(CHAR.SUBSTR(zctmn02a, 1, 1), '1', '3'))
  & (~ANY(CHAR.SUBSTR(zctmn02a, 2, 1), '1', '3'))) zctmnstat = 4.
EXECUTE.
zctmntmd1/2, zctmwtmn1/2, zctvctmn1/2

Average item response times for CATSLife Test My Brain tests:
zctmntmd is the median time for the Remembering Numbers test;
zctmwtmn is the mean time for the Remembering Words test;
zctvctmn is the mean time for the Vocabulary test.
All are derived as the mean or median of the respective item response times, which are all measured in milliseconds. A median is used for Remembering Numbers because this test had no item time limit hence there were extreme high outliers that would distort a mean.
No derivation was needed for the Digit Symbol Matching test, for which average times were generated on the web server and included as raw item variables.

* Derive mean times for Vocabulary and Memorising Words.
* and median time for Matching, all converted to seconds and rounded to 0.1 .
COMPUTE zctvctmn = RND((MEAN(zctvc01t, zctvc02t, zctvc03t, zctvc04t, zctvc05t, zctvc06t, 
 zctvc07t, zctvc08t, zctvc09t, zctvc10t, zctvc11t, zctvc12t, zctvc13t, zctvc14t, zctvc15t, 
 zctvc16t, zctvc17t, zctvc18t, zctvc19t, zctvc20t) / 1000), 0.1).
COMPUTE zctmwtmn = RND((MEAN(zctmw01t, zctmw02t, zctmw03t, zctmw04t, zctmw05t, zctmw06t, 
 zctmw07t, zctmw08t, zctmw09t, zctmw10t, zctmw11t, zctmw12t, zctmw13t, zctmw14t, zctmw15t, 
 zctmw16t, zctmw17t, zctmw18t, zctmw19t, zctmw20t, zctmw21t, zctmw22t, zctmw23t, zctmw24t, 
 zctmw25t) / 1000), 0.1).
COMPUTE zctmntmd = RND((MEDIAN(zctmn01t, zctmn02t, zctmn03t, zctmn04t, zctmn05t, zctmn06t, 
 zctmn07t, zctmn08t, zctmn09t, zctmn10t, zctmn11t, zctmn12t, zctmn13t, zctmn14t, zctmn15t, 
 zctmn16t, zctmn17t, zctmn18t, zctmn19t, zctmn20t) / 1000), 0.1).
EXECUTE.
* These can be used for identifying exclusions.
zctmntotc1/2, zctsncorp1/2

Alternative scores for CATSLife Test My Brain tests. For each of the four tests, there is a test score that was generated on the web server and is therefore treated as an item variable. Here, the raw test scores are supplemented by other potentially useful scores.
zctmntotc is a conventional score, measuring the number of items answered correctly, for the Remembering Numbers test (the raw score, zctmntot, has a different meaning). It is derived as a sum of the item scores and has integer values in the range 0 to 20.
zctsncorp measures the proportion of items answered, within the 90 second time limit, that were answered correctly. It has decimal values between 0 and 1.

* Remembering Numbers.
* The raw score is the longest string of numbers remembered (1-11).
* Add a conventional score: number of items correct (0-20).
COMPUTE zctmntotc = SUM(zctmn01s, zctmn02s, zctmn03s, zctmn04s, zctmn05s, zctmn06s, 
 zctmn07s, zctmn08s, zctmn09s, zctmn10s, zctmn11s, zctmn12s, zctmn13s, zctmn14s, 
 zctmn15s, zctmn16s, zctmn17s, zctmn18s, zctmn19s, zctmn20s).
EXECUTE.

* Matching Shapes and Numbers.
* Raw score is number correct within the 90 second time limit.
* Make an alternative score: divide number correct by number attempted.
IF (zctsnatn > 0) zctsncorp = RND((zctsntot / zctsnatn), 0.01).
EXECUTE.
zmhage1/2

Age of twin (in decimal years) when the MHQ (TEDS26 questionnaire) was started (if online) or returned (if on paper). Derived from variables representing respective dates, as mentioned in syntax comments; aonsdob is the twin birth date. These date variables are not retained in the dataset.

* First we need best estimate of MHQ date (temp variable).
* MHQ logged as paper booklet return: use return date.
IF (zmhdata1 = 1 & zmhpaper1 = 1) zmhdate = zmhrdate1.
* If not logged as paper booklet return, use start date from Qualtrics.
IF (zmhdata1 = 1 & zmhpaper1 = 0) zmhdate = startDate.
EXECUTE.
* Use this date to derive twin age, by subtraction of birth date.
IF (zmhdata1 = 1) zmhage = RND(((DATEDIFF(zmhdate, aonsdob, "days")) / 365.25), 0.1) .
EXECUTE.
zmhagoradiag1/2

Diagnosis flag, coded 1=yes 0=no, for Agoraphobia, derived from responses in the measure of the same name in the twin MHQ.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Agoraphobia diagnosis.
* The conditions for diagnosis are.
* (1) At least 2 of the 5 initial screening items (1a-1e) are 'yes' (coded 1).
* and (2) Situations always or almost always cause fear - item 2.
* and (3) One or more of.
*        (a) avoid situations - item 3a.
*        (b) endure with intense anxiety - item 3b.
*        (c) require companion - item 3c.
* and (4) One or more of.
*        (a) worried about fainting, etc - item 4a.
*        (b) worried about escape - item 4b.
*        (c) worried help not available - item 4c.
* and (5) Fears lasted over 6 months - item 5c.
* and (6) Fears interfered with everyday life (some or a lot) - item 6.
* and (7) Fears out of proportion - item 7.

* Note that, in the questionnaire, conditions (1) and (2) screen by branching.
* so if condition (1) is negative then the rest of the items are missing.
* while if condition (1) is positive but condition (2) is negative.
* then all the remaining items are missing.

* Convert each of these conditions into a 1/0 flag (note item 7 already coded this way).
* For condition (1), sum the 5 parts of item 1 (all parts coded 0/1).
COMPUTE zmhagora01sum = SUM(zmhagora01a, zmhagora01b, zmhagora01c, zmhagora01d, zmhagora01e).
EXECUTE.
* and count the number of parts that are non-missing (if agora data present).
DO IF (ANY(zmhagorastat, 1, 2)).
 COUNT zmhagora01count = zmhagora01a zmhagora01b zmhagora01c zmhagora01d zmhagora01e (0, 1).
END IF.
EXECUTE.
* Condition (1) is positive if sum is 2 or more, regardless of any missing data.
IF (zmhagora01sum >= 2) zmhagora01flag = 1.
EXECUTE.
* Negative if sum is 0 and at least 4 flags are present.
* (unknown if sum is 0 and 2+ flags missing because these might be positive).
IF (zmhagora01sum = 0 & zmhagora01count >= 4) zmhagora01flag = 0.
EXECUTE.
* Also negative if sum is 1 and all 5 flags are present.
* (unknown if sum is 1 and 1 or more flags missing because these might be positive).
IF (zmhagora01sum = 1 & zmhagora01count = 5) zmhagora01flag = 0.
EXECUTE.
RECODE zmhagora02 (3 THRU 4=1) (0 THRU 2=0) INTO zmhagora02flag.
EXECUTE.
* In item 3 parts a-c, for a negative result we need all 3 parts to be non-missing.
* (if 1 or 2 are negative but others are missing, this flag will be missing).
IF (SUM.3(zmhagora03a, zmhagora03b, zmhagora03c) = 0) zmhagora03flag = 0.
IF (SUM(zmhagora03a, zmhagora03b, zmhagora03c) > 0) zmhagora03flag = 1.
EXECUTE.
* Likewise in item 4 parts a-c, for a negative result we need all 3 parts to be non-missing.
IF (SUM.3(zmhagora04a, zmhagora04b, zmhagora04c) = 0) zmhagora04flag = 0.
IF (SUM(zmhagora04a, zmhagora04b, zmhagora04c) > 0) zmhagora04flag = 1.
EXECUTE.
RECODE zmhagora05c (1=0) (2 THRU 5=1) INTO zmhagora05cflag.
EXECUTE.
RECODE zmhagora06 (0 THRU 1=0) (2 THRU 3=1) INTO zmhagora06flag.
EXECUTE.
* Now sum the flags for the 5 post-screening criteria to get a score 0-5.
COMPUTE zmhagoradiagscore = SUM(zmhagora03flag, zmhagora04flag, zmhagora05cflag, zmhagora06flag, zmhagora07).
EXECUTE.
* Also count the number of non-missing post-screening criteria variables (if agoraphobia data present).
DO IF (ANY(zmhagorastat, 1, 2)).
 COUNT zmhagoradiagflags = zmhagora03flag zmhagora04flag zmhagora05cflag zmhagora06flag zmhagora07 (0, 1).
END IF.
EXECUTE.

* Now set the diagnosis flag (0 or 1).
* Note that, for twins screening positive on items 1 and 2.
* over 80% have positive diagnostic results in each of the other 5 criteria.
* so allow a probable positive diagnosis if 4 criteria are met and the other is missing.
* Diagnosis is positive if positively screened with all 5 other criteria are met.
IF (zmhagora01flag = 1 & zmhagora02flag = 1 & zmhagoradiagscore = 5) zmhagoradiag = 1.
* Also allow positive diagnosis if 4 criteria are met, and the other 1 is missing.
IF (zmhagora01flag = 1 & zmhagora02flag = 1 & zmhagoradiagscore = 4 & zmhagoradiagflags = 4) zmhagoradiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by item 1 or item 2.
IF (zmhagora01flag = 0) zmhagoradiag = 0.
IF (zmhagora01flag = 1 & zmhagora02flag = 0) zmhagoradiag = 0.
EXECUTE.
* Diagnosis is also negative if items 1 and 2 positive but.
* the other 5 criteria are all non-missing and the score is less than 5.
IF (zmhagora01flag = 1 & zmhagora02flag = 1 & zmhagoradiagscore < 5 & zmhagoradiagflags = 5) zmhagoradiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 3 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the five criteria is non-missing and negative).
IF (zmhagora01flag = 1 & zmhagora02flag = 1 & zmhagoradiagscore <= 3 
    & (zmhagoradiagscore < zmhagoradiagflags) ) zmhagoradiag = 0.
EXECUTE.
* In all other (rare) cases, the diagnosis variable will be missing.
* including cases where item 1 flags positive and item 2 is missing.
* and cases where items 1 and 2 flag positive and (a) the 5 other criteria are all missing.
* or (b) 1 is positive and the other 4 missing, or (c) 2 are positive and the other 3 missing.
zmhagoradurn1/2, zmhalcodurn1/2, zmhbdddurn1/2, etc, and
zmhagorapaus1/2, zmhalcopaus1/2, zmhbddpaus1/2, etc

zmhXXXXdurn1/2: duration in seconds of each block (section) in the twin MHQ.
zmhXXXXpaus1/2: flag to show that a pause occurred during the block (1=yes 0=no).
(XXXX denotes the block name abbreviation.)
The duration is measured by subtracting the start date-time from the end date-time.
A pause is detected in two ways: firstly, if the block did not start and end on the same day; and secondly if the duration exceeded a cut-off at the extreme high end of the distribution (typically 5 minutes). Where a pause is detected, the duration value is deleted, hence removing extreme outliers in the duration distribution.
There are roughly 40 blocks in the questionnaire. For the sake of brevity, the syntax below is shown for only a few of the blocks.

* Derive a duration for each block, using the start/end date/time variables.
* Contraception block was not attempted if not female (item zmhgender).
* but for some reason start and end times are still recorded so recode those times to missing.
DO IF (SYSMIS(zmhgender) | zmhgender ~= 2).
 RECODE Contraception_EndDate Contraception_StartDate Contraception_EndTime Contraception_StartTime 
 (ELSE=SYSMIS).
END IF.
EXECUTE.
* If started and ended on the same date, simply subtract times as number of seconds.
IF (DATEDIFF(Dem_EndDate, Dem_StartDate, 'days') = 0) 
  zmhdemogdurn = DATEDIFF(Dem_EndTime, Dem_StartTime, 'seconds').
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* If started and ended either side of midnight (consecutive days), subtract and adjust.
IF (DATEDIFF(Dem_EndDate, Dem_StartDate, 'days') = 1) 
  zmhdemogdurn = DATEDIFF(Dem_EndTime, Dem_StartTime, 'seconds') + (24 * 60 * 60).
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* In other cases (difference of 2+ days) the duration is missing.

* Remove outlying long times, also flagging probable pauses (number of days more than 1.
* or duration more than X, where X varies between measures according to length).
* The aim is to get a meaningful distribution of times without extreme outliers for each block.
* and to get an indication of where a block was probably paused (or at least, took a very long time).
* First set each pause flag to a default value of 0 if the block was completed.
* Use a modified rule for Contraception block: only attempted if female (zmhgender=2).
* but can still use end date to determine if completed.
IF ((~SYSMIS(Contraception_EndDate)) & zmhgender = 2) zmhcontrapaus = 0.
EXECUTE.
* All other blocks: flag as 0 (not paused) by default.
* provided that the block was completed (end date not missing).
IF (~SYSMIS(Dem_EndDate)) zmhdemogpaus = 0.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Flag as paused if 2 or days have elapsed, meaning a definite pause.
IF (DATEDIFF(Dem_EndDate, Dem_StartDate, 'days') > 1) zmhdemogpaus = 1.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Flag as paused for extreme outliers of high duration.
* Attempt to identify genuine pauses rather than just slow completion.
* but using cut-offs beyond the region of continuous variation shown by the histograms.
* By default, use a minimum cut-off of 300s (5 min) for each measure.
* but increase this to 10 min or 15 min for longer measures; the 98%-ile is a rough guide.
IF (zmhdemogdurn > 300) zmhdemogpaus = 1.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS, with some variations in the cut-off]

* and recode the outlying durations to missing if above these cut-offs.
RECODE zmhdemogdurn zmhmedhisdurn zmhphqdurn zmhmhddurn zmhmfqdurn 
 zmhganxdurn zmhspephdurn zmhsocphdurn zmhagoradurn zmhgaddurn zmhphqdepdurn 
 zmhwasasdurn zmhctqdurn zmhatsdurn zmhpcldurn zmhlifevdurn zmhslfhmdurn 
 zmhqoldurn zmhbdddurn zmhconndurn zmhraadsdurn 
 zmhicudurn zmhcanndurn zmhsmodurn zmhdietdurn zmhexerdurn zmhphydurn 
 zmhsaspddurn zmhspeqheddurn
 (301 THRU HIGHEST=SYSMIS).
EXECUTE.
RECODE zmhpanicdurn zmhsdqdurn zmheatddurn zmhmctqdurn 
 zmhspeqdurn zmhmdqdurn zmhcontradurn zmhalcodurn
 (601 THRU HIGHEST=SYSMIS).
RECODE
 zmhcididdurn zmhcidiadurn
 (901 THRU HIGHEST=SYSMIS).
EXECUTE.
zmhagorastat1/2, zmhalcostat1/2, zmhbddstat1/2, etc

Status variable for each block (section) in the twin MHQ.
Each is coded 0=not started, 1=started but not finished, 2=completed successfully, 3=entire block skipped, 4=excluded as a careless responder.
There are roughly 40 blocks in the questionnaire. For the sake of brevity, the syntax below is shown for only a few of the blocks.

* Add a status variable for each of the 39 data 'blocks' in Qualtrics (generally one measure per block).
* ignoring admin blocks prior to Demographics and after SPEQ Hedonia.
* Each status variable will be coded 0=not started, 1=started but not finished, 2=finished.
* 3=skipped (branched out or all questions skipped), 4=excluded due to random responding (coded in next script).

* Not started if first item is missing: code 0.
* Note that optional items, if seen but not answered, still have non-missing -99 values.
* which are later recoded to missing in the dataset.
* If block was skipped due to branching (wasas, contra), will be recoded from 0 to 3 later.
IF (SYSMIS(zmhrelst)) zmhdemogstat = 0.
IF (SYSMIS(height)) zmhmedhisstat = 0.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Started but not finished: code 1.
* Detect if first item is non-missing but end date is missing.
IF (~SYSMIS(zmhrelst) & SYSMIS(Dem_EndDate)) zmhdemogstat = 1.
IF (~SYSMIS(height) & SYSMIS(MedHis_EndDate)) zmhmedhisstat = 1.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Finished: code 2.
* Detect if first item is non-missing and end date is also non-missing.
IF (~SYSMIS(zmhrelst) & ~SYSMIS(Dem_EndDate)) zmhdemogstat = 2.
IF (~SYSMIS(height) & ~SYSMIS(MedHis_EndDate)) zmhmedhisstat = 2.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Skipped with no meaningful data: code 3.
* Note that this category does not apply in blocks that are non-branched.
*   and include at least one item that has forced response (cannot be -99).
*   and that has no opt-out responses (don't know, prefer not to answer).
*   These blocks that cannot be skipped are SDQ, Conners, Exercise (all items forced).
*   and MCTQ, Smoking, Diet, Hedonia (at least one item forced).
* Skipping can occur due to branching for WASAS and Contraception blocks: recode 0 to 3.
* WASAS is skipped by branching if both GAD2 and PHQ2 blocks have been completed.
* and the GAD2 score is 0 or 1 and the PHQ2 score is 0 or 1.
IF (zmhgadstat = 2 & zmhphqdepstat = 2 & GAD2_score < 2 & PHQ2_score < 2) zmhwasasstat = 3.
EXECUTE.
* Contraception block is skipped if Demographics block was completed and gender was not stated as female.
IF (zmhdemogstat = 2 & zmhgender ~= 2) zmhcontrastat = 3.
EXECUTE.
* More generally, a block can be skipped if a participant chose not to give a meaningful answer in any question.
* In an optional question, the participant can choose not to answer (value -99).
* and in some questions the participant can opt out of answering with 'prefer not to answer' (-11).
* or "don't know" (-88).
* For each measure in which every question may be affected this way, count the meaningful responses.
* Count non-missing responses that are non-negative (not -99, -88 or -11).
* ignoring QC items and branched items, and taking account of value coding characteristics.
* In most blocks, raw variables have value codes from 1 (sometimes 0) upwards.
* In some blocks only a few screening items need to be counted; often the items were optional so could be -99.
COUNT zmhmedhiscount = height weight zmhcovid (0 THRU HIGHEST).
[SYNTAX REPEATED FOR ALL BLOCKS WHERE APPLICABLE]
* Alcohol, Cannabis, Smoking: each has a single screening item, the rest are branched.
COUNT zmhalcocount = zmhalco1 (1 THRU HIGHEST).
COUNT zmhcanncount = zmhcann1 (1 THRU HIGHEST).
EXECUTE.
* Demographics: all variables have meaningful response values between 1 and 18.
* Note that zmhehthnic is coded 19 instead of -11 for prefer not to answer.
* Ignore branched items zhmuniexpX and zmhbenfXXX.
COUNT zmhdemogcount = zmhrelst zmhhqualc zmhempst zmhempzh zmhempinc 
    zmhbenf zmhethnic zmhgender zmhtransg zmhsexor (1 THRU 18).
EXECUTE.
* General medical health: count responses in first two items.
COUNT zmhmhdcount = zmhmhddis zmhmhdprof (1 THRU HIGHEST).
EXECUTE.
* then add 'yes' responses (boxes ticked) in the two lists of diagnoses.
* including "none of above" but not counting "don't know" or "prefer not to answer" options.
COMPUTE zmhmhdcount = SUM(zmhmhdcount, zmhmhddx1a, zmhmhddx1b, zmhmhddx1c, zmhmhddx1d, 
 zmhmhddx1e, zmhmhddx1f, zmhmhddx1g, zmhmhddx1h, zmhmhddx1i, zmhmhddx1j, zmhmhddx1k, 
 zmhmhddx1l, zmhmhddx1m, zmhmhddx1n, zmhmhddx1none, zmhmhddx2a, zmhmhddx2b, zmhmhddx2c, zmhmhddx2d, 
 zmhmhddx2e, zmhmhddx2f, zmhmhddx2g, zmhmhddx2h, zmhmhddx2i, zmhmhddx2j, zmhmhddx2none). 
EXECUTE.
* MDQ: count 13 initial screening items, plus last two items which can branch in from CIDID block.
* even if intermediate MDQ items are branched out.
COUNT zmhmdqcount = zmhmdq1a zmhmdq1b zmhmdq1c zmhmdq1d zmhmdq1e zmhmdq1f zmhmdq1g zmhmdq1h zmhmdq1i zmhmdq1j zmhmdq1k 
    zmhmdq1l zmhmdq1m zmhmdq6a zmhmdq6b (1 THRU HIGHEST).
EXECUTE.
* Contraception block: first count responses in the PMS items (can be -99).
COUNT zmhcontracount = zmhpms1 zmhpms2 zmhpms3 zmhpms4 zmhpms5 zmhpms6 (1 THRU HIGHEST).
EXECUTE.
* then add 'yes' responses (boxes ticked) in the initial contraception screening items.
* including "none of above" but not counting "prefer not to answer" options.
COMPUTE zmhcontracount = SUM(zmhcontracount, zmhcontra1a, zmhcontra1b, zmhcontra1c, zmhcontra1d, 
 zmhcontra1e, zmhcontra1f, zmhcontra1g, zmhcontra1h, zmhcontra1i, zmhcontra1none).
EXECUTE.
* Physical illness diagnosis block: comprises 4 tables of 'tick all that apply' items.
* Here, sum the 'yes' responses including 'none of the above' but not "don't know" or "prefer not to answer".
COMPUTE zmhphycount = SUM(zmhphy1a, zmhphy1b, zmhphy1c, zmhphy1d, zmhphy1none, 
  zmhphy2a, zmhphy2b, zmhphy2c, zmhphy2d, zmhphy2e, zmhphy2f, zmhphy2none, 
  zmhphy3a, zmhphy3b, zmhphy3c, zmhphy3d, zmhphy3none, 
  zmhphy4a, zmhphy4b, zmhphy4c, zmhphy4d, zmhphy4e, zmhphy4none).
EXECUTE.

* Where there are no data, recode the status variable from 1 or 2 to 3.
IF (ANY(zmhdemogstat, 1, 2) & zmhdemogcount = 0) zmhdemogstat = 3.
IF (ANY(zmhmedhisstat, 1, 2) & zmhmedhiscount = 0) zmhmedhisstat = 3.
[SYNTAX REPEATED FOR ALL OTHER BLOCKS]

* Measure mean item response times.
* Check this for each measure that has a QC item.
* First count the actual number of items answered in each measure (including QC item).
* (alcohol and cannabis affected by branching, others might be incomplete in a few cases).
* remembering to include -11 'prefer not to answer' responses.
* Count all items that are included in the time variable, even if (for measures like.
* speq, alcohol and cannabis) parts of the measure are separate from the part with the QC item.
COUNT zmhphqnansw = zmhphq01 zmhphq02 zmhphq03 zmhphq04 zmhphq05 zmhphq06 zmhphq07 zmhphqqc zmhphq08 
 zmhphq09 zmhphq10 zmhphq11 zmhphq12 zmhphq13 zmhphq14 zmhphq15 (-11 THRU HIGHEST).
[SYNTAX REPEATED FOR 12 OTHER BLOCKS CONTAINING QC ITEMS]

* Now for each measure with a QC item, compute the mean time per item answered, in seconds.
* Use the times from the timer items, not the derived durations.
* Note that for alcohol and cannabis, analysis of the QC item will automatically eliminate.
* trivially fast cases where only the first screening item was answered.
IF (zmhphqnansw > 0) zmhphqitemtime = zmhphqtime / zmhphqnansw.
[SYNTAX REPEATED FOR 12 OTHER BLOCKS CONTAINING QC ITEMS]

* Exclusions within specific measures.
* These only apply in those 13 measures that include a QC item.
* Exclude if QC error and very fast time (low mean item response time).
* Use 10%-ile of the mean item time as a guide.
* adjusting slightly if there is an obvious grouping on the graph, roughly between 5% and 20%-ile.
IF (zmhphqqcer = 1 & zmhphqitemtime < 1.9) zmhphqexcl1rapid = 1.
[SYNTAX REPEATED FOR 12 OTHER BLOCKS CONTAINING QC ITEMS, with varying time parameters]

* Convert these to exclusion flags (0/1) for each QC measure.
IF (zmhphqstat > 0) zmhphqexclude = 0.
IF (zmhphqexcl1rapid = 1) zmhphqexclude = 1.
[SYNTAX REPEATED FOR 12 OTHER BLOCKS CONTAINING QC ITEMS]

* For each of these measures, indicate exclusion using value 4 in status flag.
IF (zmhphqexclude = 1) zmhphqstat = 4.
[SYNTAX REPEATED FOR 12 OTHER BLOCKS CONTAINING QC ITEMS]
zmhalcoaudit1/2

AUDIT scale derived from items of the Alcohol Use measure in the twin MHQ. This scale is equivalent to the one used in TEDS at age 21, and is designed to be as similar as possible to the published scale.
The scale is effectively derived as a total score from 10 items, all coded 0/1/2/3/4, hence the value range of the scale is 0-40.

* AUDIT scale as at age 21, derived from items 4, 5, 6a-f, 7a-b, based on the published measure.
* The 4 parts of item 5 are already converted into estimated total units.
* Now recode into categories 0-4 as in the published measure (temporary variable).
RECODE zmhalco5
 (0 THRU 2=0) (2.1 THRU 4=1) (4.1 THRU 6=2) (6.1 THRU 9.9=3) (10 THRU HIGHEST=4)
INTO zmhalco5un.
EXECUTE.
* Now create a total AUDIT score from the 10 items, all coded 0-4, including recoded item 5.
COMPUTE zmhalcoaudit = 10 * MEAN.5(zmhalco4, zmhalco5un, 
 zmhalco6a, zmhalco6b, zmhalco6c, zmhalco6d, zmhalco6e, zmhalco6f, zmhalco7a, zmhalco7b).
EXECUTE.
zmhatsdomab1/2, zmhatsevab1/2

These are not scales but flags (coded 1=yes, 0=no) to indicate whether or not responses indicate any form of abuse, in each of the two parts of the ATS measure.
zmhatsdomab1/2: this flags affirmative response(s) in any of the five items of the traumatic events part of the measure.
zmhatsdomab1/2: this flags affirmative response(s) in any of the five items of the domestic abuse part of the measure.
In both parts of the measure, item responses are coded 0=no (never) and 1/2=yes (in different time frames).

* This measure is not scaled, but flag 'any abuse', separately in the events and domestic sections.
* Events items: flag abuse if there is a 'yes' response (coded 1 or 2) in any item.
IF (SUM(zmhatsev1, zmhatsev2, zmhatsev3, zmhatsev4, zmhatsev5) > 0) zmhatsevab = 1.
* Domestic items: flag abuse if 'yes' (coded 1/2) in items 2/3/4, or if 'no' (coded 0) in items 1/5.
IF (zmhatsdom1 = 0 | zmhatsdom2 > 0 | zmhatsdom3 > 0 | zmhatsdom4 > 0 | zmhatsdom5 = 0) zmhatsdomab = 1.
EXECUTE.
* Now count the number of responses, of any sort, in each section.
COUNT zmhatsevcount = zmhatsev1 zmhatsev2 zmhatsev3 zmhatsev4 zmhatsev5 (0 THRU HIGHEST).
COUNT zmhatsdomcount = zmhatsdom1 zmhatsdom2 zmhatsdom3 zmhatsdom4 zmhatsdom5 (0 THRU HIGHEST).
EXECUTE.
* Set flag to zero if not set to 1 above, and if at least 3 of 5 items answered.
IF (SYSMIS(zmhatsevab) & zmhatsevcount >= 3) zmhatsevab = 0.
IF (SYSMIS(zmhatsdomab) & zmhatsdomcount >= 3) zmhatsdomab = 0.
EXECUTE.
zmhbddt1/2

Total scale, from all 7 items of the DCQ-BDD measure in the twin MHQ. Each item has values 0/1/2/3, hence the scale values have range 0 to 21.

* Total scale: sum of all 7 items).
COMPUTE zmhbddt = 7 * MEAN.4(zmhbdd1, zmhbdd2, zmhbdd3, zmhbdd4, zmhbdd5, zmhbdd6, zmhbdd7).
EXECUTE.
zmhbmi1/2, zmheatd02bmi1/2

BMI (body mass index) in units of kilgrams per square metre.
zmhbmi1/2 is the BMI derived from the self-reported height and weight in the initial Medical History section of the twin MHQ.
zmheatd02bmi1/2 is the BMI derived from the 'lowest' weight reported in the Eating Disorder measure of the MHQ, together with the same height as above.

* Height variable is in centimetres.
* We want BMI in units of kilograms per square metre.
* So include a scaling factor of 10000 in the BMI calculation.
* and round to nearest 0.1.
COMPUTE zmhbmi = RND((10000 * zmhweight / (zmhheight * zmhheight)), 0.1).
* Derive similar BMI at lowest weight but current height, for eating disorder item.
COMPUTE zmheatd02bmi = RND((10000 * zmheatd02b / (zmhheight * zmhheight)), 0.1).
EXECUTE.

* Eliminate a few extreme outliers.
DO IF (zmhheight <= 140 | zmhheight > 200 | zmhweight > 144 | zmhbmi < 14 | zmhbmi > 50).
 RECODE zmhheight zmhweight zmhbmi (ELSE=SYSMIS).
END IF.
EXECUTE.
* The eating disorder lowest weight effectively has no lower limit.
* resulting in some data that are clearly infeasible outliers - eliminate these.
* Note also that weights above 80kg are outliers in this context.
DO IF (zmheatd02b > 80 | zmheatd02b < 26 | zmheatd02bmi < 14 | zmheatd02bmi > 30).
 RECODE zmheatd02b zmheatd02bmi (ELSE=SYSMIS).
END IF.
EXECUTE.
zmhcidiadiag1/2

Generalised Anxiety Disorder diagnosis flag derived from responses in the CIDIA measure in the twin MHQ.
The diagnosis variable is coded 1=yes, 0=no.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire.
The criteria are complex and are described in detail in the syntax below.
Note that condition (2) has been slightly modified from an earlier version of this derivation, as described in the syntax below, and this has resulted in an increase of roughly 90 diagnosed twins in the dataset.

* CIDIA: generalised anxiety disorder.
* The conditions for diagnosis are.
* (1) at least one of screening items 1 and 2 was 'yes'.
* and (2) worrying continued for 6 months or more - item 3.
* and (3) worried most days - item 6b.
* and (4) EITHER worried about more than one thing - item 7.
*        OR many worries on your mind - item 9.
* and (5) One or more of.
*        (a) difficult to stop worrying - item 8.
*        (b) often cannot put out of mind - item 10a.
*        (c) Often cannot control worrying - item 10b.
* and (6) at least 3 of 6 sub-conditions are met, as follows.
*          (a) EITHER restless OR on edge - items 11a, 11b.
*          (b) easily tired - 11c. 
*          (c) difficulty concentrating - 11d.
*          (d) irritable - 11e.
*          (e) sore muscles - 11f.
*          (f) sleep problems - 11g.
* and (7) problems interfered some or a lot with everyday life - item 18.

* NOTE: condition (2) has been changed from the earlier version of this algorithm.
* The condition is now zmhcidia03 >= 7 (>=6 months).
* but was originally zmhcidia03 > 7 (>6 months).
* This change brings the algorithm closer to the DSM5 criteria.
* and to the GLAD version of the derived diagnosis.

* Note that, in the questionnaire, condition (1) screens by branching.
* so if condition (1) is negative then the rest of the items are missing.

* Convert each of the above conditions into a 1/0 flag (item 6b is already coded this way).
* (need intermediate flags for items 7, 10a & 10b, 11a & 11b before combining with others).
* For the condition based on items 1/2, for a negative result we need both items to be non-missing.
* (if one is negative but the other is missing, this flag will be missing).
IF (SUM.2(zmhcidia01, zmhcidia02) = 0) zmhcidia012flag = 0.
* For a positive result, if at least 1 item is positive then missingness in the other does not matter.
IF (SUM(zmhcidia01, zmhcidia02) > 0) zmhcidia012flag = 1.
EXECUTE.
* NOTE change in cidia03 condition from earlier version: now cidia03 >=7, previously cidia03 >7.
RECODE zmhcidia03 (7 THRU 11=1) (1 THRU 6=0) INTO zmhcidia03flag.
RECODE zmhcidia07 (2=1) (1=0) INTO zmhcidia07flag.
EXECUTE.
* For the condition based on items 7/9, for a negative result we need both items to be non-missing.
* in case there is a missing positive.
* (if one is negative but the other is missing, this flag will be missing).
IF (SUM.2(zmhcidia07flag, zmhcidia09) = 0) zmhcidia079flag = 0.
* For a positive result, if at least 1 item is positive then missingness in the other does not matter.
IF (SUM(zmhcidia07flag, zmhcidia09) > 0) zmhcidia079flag = 1.
EXECUTE.
RECODE zmhcidia10a zmhcidia10b (3=1) (0 THRU 2=0) INTO zmhcidia10aflag zmhcidia10bflag.
EXECUTE.
* For the condition based on items 8/10a/10b, for a negative result all 3 must be non-missing.
IF (SUM.3(zmhcidia08, zmhcidia10aflag, zmhcidia10bflag) > 0) zmhcidia0810flag = 1.
* For a positive result, if at least 1 item is positive then missingness in the others does not matter.
IF (SUM(zmhcidia08, zmhcidia10aflag, zmhcidia10bflag) = 0) zmhcidia0810flag = 0.
EXECUTE.
* For the condition based on items 11a/11b, for a negative result both must be non-missing.
IF (SUM.2(zmhcidia11a, zmhcidia11b) > 0) zmhcidia11abflag = 1.
* For a positive result, if at least 1 item is positive then missingness in the other does not matter.
IF (SUM(zmhcidia11a, zmhcidia11b) = 0) zmhcidia11abflag = 0.
EXECUTE.
* The next condition is based on a score of 3 or more from 6 sub-conditions.
* Count the non-missing flags for these 6 sub-conditions.
COUNT zmhcidia11subflags = zmhcidia11abflag zmhcidia11c zmhcidia11d zmhcidia11e 
    zmhcidia11f zmhcidia11g (0,1).
EXECUTE.
* For a negative result, we need the sum of the flags to be 0 with no more than 2 missing.
* or 1 with no more than 1 missing, or 2 with none missing, in case the missing flags.
* might indicate a positive.
IF (SUM.4(zmhcidia11abflag, zmhcidia11c, zmhcidia11d, zmhcidia11e, 
    zmhcidia11f, zmhcidia11g) = 0) zmhcidia11flag = 0.
IF (SUM.5(zmhcidia11abflag, zmhcidia11c, zmhcidia11d, zmhcidia11e, 
    zmhcidia11f, zmhcidia11g) = 1) zmhcidia11flag = 0.
IF (SUM.6(zmhcidia11abflag, zmhcidia11c, zmhcidia11d, zmhcidia11e, 
    zmhcidia11f, zmhcidia11g) = 2) zmhcidia11flag = 0.
* For a positive, if sum is at least 3 then missingness in others is unimportant.
IF (SUM(zmhcidia11abflag, zmhcidia11c, zmhcidia11d, zmhcidia11e, 
    zmhcidia11f, zmhcidia11g) >= 3) zmhcidia11flag = 1.
* Frequencies show that over 75% of twins respond positively in each of these conditions.
* except for item 11f (< 40%).
* Assume a positive outcome is likely if the sum of the flags is 2 and the other 4 are missing.
* or if the sum is 2, one flag is negative, and the other 3 are missing.
* on the basis that at least one of the missing 3/4 conditions is likely to be positive.
IF (SUM(zmhcidia11abflag, zmhcidia11c, zmhcidia11d, zmhcidia11e, 
    zmhcidia11f, zmhcidia11g) = 2 & zmhcidia11subflags <= 3) zmhcidia11flag = 1.
EXECUTE.
RECODE zmhcidia18 (2 THRU 3=1) (0 THRU 1=0) INTO zmhcidia18flag.
EXECUTE.

* Now sum the flags for the 6 main non-branching criteria to get a score 0-6.
COMPUTE zmhcidiadiagscore = SUM(zmhcidia03flag, zmhcidia06b,
    zmhcidia079flag, zmhcidia0810flag, zmhcidia11flag, zmhcidia18flag).
EXECUTE.
* Also count the number of these same variables that are non-missing (if cidia data present).
DO IF (ANY(zmhcidiastat, 1, 2)).
 COUNT zmhcidiadiagflags = zmhcidia03flag zmhcidia06b
    zmhcidia079flag zmhcidia0810flag zmhcidia11flag zmhcidia18flag (0, 1).
END IF.
EXECUTE.

* Now set the diagnosis flag (0 or 1).
* Note that, for twins screening positive on items 1/2.
* around 90% have a positive result in each of item conditions 6b, 7/9, 8/10 and 11.
* around 70% in item condition 18 but only 50% in item 3.
* Therefore allow a probable positive diagnosis if 5 of 6 criteria are positive and the other one missing.
* but make an exception to this rule if item 3 is the missing item.
* Diagnosis is positive if positively screened by item 1 and all 6 other criteria are met.
IF (zmhcidia012flag = 1 & zmhcidiadiagscore = 6) zmhcidiadiag = 1.
EXECUTE.
* Also allow positive diagnosis if 5 criteria are met and the other 1 is missing.
* unless the missing one is item 3.
IF (zmhcidia012flag = 1 & zmhcidia03flag = 1 & zmhcidiadiagscore = 5 
    & zmhcidiadiagflags = 5) zmhcidiadiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by items 1/2.
IF (zmhcidia012flag = 0) zmhcidiadiag = 0.
EXECUTE.
* Diagnosis is also negative if item 1/2 screen is positive but.
* the other 6 criteria are all non-missing and the score is less than 6.
IF (zmhcidia012flag = 1 & zmhcidiadiagscore < 6 & zmhcidiadiagflags = 6) zmhcidiadiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 4 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the six criteria is non-missing and negative).
IF (zmhcidia012flag = 1 & zmhcidiadiagscore <= 4 
    & (zmhcidiadiagscore < zmhcidiadiagflags) ) zmhcidiadiag = 0.
EXECUTE.
* In all other cases, the diagnosis variable will be missing.
* including cases where item 1 is negative and item 2 is missing or vice versa (quite common).
* and much rarer cases where item 1 and/or item 2 is positive and (a) all other criteria are missing.
* or (b) one is positive and 5 are missing, or (c) two are positive and 4 are missing, and so on.
zmhcididmdiag1/2, zmhcididadiag1/2

Diagnosis flags derived from responses in the CIDID measure in the twin MHQ.
zmhcididmdiag1/2: MDD (Major Depressive Disorder) diagnosis
zmhcididadiag1/2: Atypical Depression diagnosis.
Both variables are coded 1=yes, 0=no.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire.
Atypical Depression cases are taken to be a subset of MDD cases, therefore if the latter diagnosis is negative then so is the former, or if the latter diagnosis is missing then so is the former. The criteria for both variables are complex and are described in detail in the syntax below.
Note that this derivation has changed from an earlier version of the dataset; one of the MDD criteria has been removed, reducing the number of required criteria from 6 to 5. This change is documented in the syntax below. As a result, the number of twins diagnosed with MDD by this algorithm increased by over 600, and the number diagnosed with atypical depression increased by over 60.

* Major depressive disorder (MDD) diagnosis: 1/0 flag.
* The essential criteria are.
*     (1) at least one of screening items 1 and 2 was 'yes'.
* and (2) depressive feelings last most/all day - item 3.
* and (3) depressive feelings occurred most/every day - item 4.
* and (4) at least 5 of 8 sub-conditions are met as follows.
*       (a) depressed - item 1.
*       (b) lost interest - item 2.
*       (c) significant weight or appetite change - items 6a, 6b, 8.
*       (d) tired out - item 5.
*       (e) sleep changed - item 7.
*       (f) concentration problems - item 13.
*       (g) felt no good - item 14.
*       (h) thought about death - item 15.  
* and (5) there was some or a lot of interference in everyday life - item 19.
* Do not incorporate the criteria relating to mania because this would involve other measures.

* Note that, in the questionnaire, condition (1) screens by branching.
* so if condition (1) is negative then the rest of the items are missing.

* NOTE: a sixth condition has been removed from the earlier version of this algorithm.
* Condition (6) was that the episodes did not mostly/all follow a traumatic event.
* coded as zmhcidid22 <= 2 (excluding those with response 3=most/all).
* This condition is removed partly because of the ambiguous wording of this response.
* and in line with the GLAD version of the algorithm.

* Convert each of the above conditions into a 1/0 flag.
* For the condition based on items 1/2, for a negative result we need both items to be non-missing.
* (if one is negative but the other is missing, diagnosis is not possible and this flag will be missing).
IF (SUM.2(zmhcidid01, zmhcidid02) = 0) zmhcidid012flag = 0.
* For a positive result, if at least 1 item is positive (and missingness in the other does not matter).
IF (SUM(zmhcidid01, zmhcidid02) > 0) zmhcidid012flag = 1.
EXECUTE.
* The following can be converted to 1/0 flags by recoding.
RECODE zmhcidid03 (1 THRU 2=1) (3 THRU 4=0) INTO zmhcidid03flag.
RECODE zmhcidid04 (1 THRU 2=1) (3=0) INTO zmhcidid04flag.
RECODE zmhcidid06a (1 THRU 3=1) (4=0) INTO zmhcidid06aflag.
RECODE zmhcidid08 (2 THRU 4=1) (1=0) INTO zmhcidid08flag.
RECODE zmhcidid19 (2 THRU 3=1) (0 THRU 1=0) INTO zmhcidid19flag.
EXECUTE.
* (Note that items 5/6b/7/13/14/15 are already coded 1/0 and do not need recoding).
* For criterion (4)(c), start by combining items 6a and 6b.
* Item 6a screens for item 6b, so 6b is missing if 6a is negative or missing.
* For positive outcome, need a positive response in both items: neither missing.
IF (zmhcidid06aflag = 1 & zmhcidid06b = 1) zmhcidid06flag = 1.
* For negative outcome, could have positive 6a but negative 6b.
IF (zmhcidid06aflag = 1 & zmhcidid06b = 0) zmhcidid06flag = 0.
* Or negative if 6a is negative (when 6b will be missing).
IF (zmhcidid06aflag = 0) zmhcidid06flag = 0.
EXECUTE.
* Note that the item 6 flag (above) is missing if both 6a and 6b are missing.
* or if 6a is positive and 6b is missing.
* Now combine items 6 and 8: both flags must be non-missing for a negative.
* in case the missing one might have been positive.
IF (SUM.2(zmhcidid06flag, zmhcidid08flag) = 0) zmhcidid068flag = 0.
* For a positive result, one flag can be missing if the other is positive.
IF (SUM(zmhcidid06flag, zmhcidid08flag) > 0) zmhcidid068flag = 1.
EXECUTE.
* Now condition (4) is based on a score of 5 or more from 8 sub-conditions.
* Start by computing the sum (values 0-8) of the 1/0 flag variables.
COMPUTE zmhcididsubconditions = SUM(zmhcidid01, zmhcidid02, zmhcidid05, zmhcidid068flag, zmhcidid07,
    zmhcidid13, zmhcidid14, zmhcidid15).
EXECUTE.
* and count the number of non-missing sub-conditions.
COUNT zmhcididsubflags = zmhcidid01 zmhcidid02 zmhcidid05 zmhcidid068flag zmhcidid07
    zmhcidid13 zmhcidid14 zmhcidid15 (0,1).
EXECUTE.
* A positive outcome results if 5 or more sub-conditions are met.
IF (zmhcididsubconditions >= 5) zmhcididsubconditionsflag = 1.
* Frequencies show that 80-90% of twins respond positively in each of these 8 flags.
* except for item 15 (57%).
* Therefore, assume a very probable positive result if the score is 4 with the other 4 missing.
* or a score of 4 with one negative and the other 3 missing.
* on the basis that at least one of the missing 3 or 4 items is likely to be positive.
IF (zmhcididsubconditions = 4 & zmhcididsubflags <= 5) zmhcididsubconditionsflag = 1.
EXECUTE.
* For a negative result from this count, we need the sum of the flags to be 4 with none missing.
* or 3 with no more than 1 missing, or 2 with no more than 2 missing.
* or 1 with no more than 3 missing, or 0 with no more than 4 missing.
* in case the missing flags might indicate positives.
IF (zmhcididsubflags = 8 & zmhcididsubconditions = 4) zmhcididsubconditionsflag = 0.
IF (zmhcididsubflags >= 7 & zmhcididsubconditions = 3) zmhcididsubconditionsflag = 0.
IF (zmhcididsubflags >= 6 & zmhcididsubconditions = 2) zmhcididsubconditionsflag = 0.
IF (zmhcididsubflags >= 5 & zmhcididsubconditions = 1) zmhcididsubconditionsflag = 0.
IF (zmhcididsubflags >= 4 & zmhcididsubconditions = 0) zmhcididsubconditionsflag = 0.
EXECUTE.
* In the other cases, this flag will be missing, e.g. sum is 2 with 3 or more missing.

* Now sum the flags for the 4 non-branching criteria to get a score 0-4.
COMPUTE zmhcididmdiagscore = SUM(zmhcidid03flag, zmhcidid04flag, 
      zmhcididsubconditionsflag, zmhcidid19flag).
* and count the number of these 4 flags that are non-missing (if cidid data present).
DO IF (zmhdata1 = 1 & ANY(zmhcididstat, 1, 2)).
 COUNT zmhcididmdiagflags = zmhcidid03flag zmhcidid04flag 
    zmhcididsubconditionsflag zmhcidid19flag (0, 1).
END IF.
EXECUTE.

* We are now ready to derive the MDD diagnosis flag variable.
* Note that, for twins screening positive on items 1/2.
* around 70-90% have a positive result in each of the other 4 flagged conditions.
* Therefore allow a probable positive diagnosis if 3 of 4 other criteria are positive and the other one is missing.
* Diagnosis is positive if positively screened by item 1/2 and all 4 other criteria are met.
IF (zmhcidid012flag = 1 & zmhcididmdiagscore = 4) zmhcididmdiag = 1.
EXECUTE.
* Also allow positive diagnosis if 3 criteria are met and the other 1 is missing.
IF (zmhcidid012flag = 1 & zmhcididmdiagscore = 3 & zmhcididmdiagflags = 3) zmhcididmdiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by items 1/2.
IF (zmhcidid012flag = 0) zmhcididmdiag = 0.
EXECUTE.
* Diagnosis is also negative if item 1/2 screen is positive but.
* the other 4 criteria are all non-missing and the score is less than 4.
IF (zmhcidid012flag = 1 & zmhcididmdiagscore < 4 & zmhcididmdiagflags = 4) zmhcididmdiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 2 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the four criteria is non-missing and negative).
IF (zmhcidid012flag = 1 & zmhcididmdiagscore <= 2 
    & (zmhcididmdiagscore < zmhcididmdiagflags) ) zmhcididmdiag = 0.
EXECUTE.
* In all other cases, the diagnosis variable will be missing.
* including cases where item 1 is negative and item 2 is missing or vice versa (19 cases).
* and cases where item 1 and/or item 2 is positive.
* and (a) all 4 other criteria are missing (16 cases).
*  or (b) one is positive and 3 are missing (15 cases). 
*  or (c) two are positive and 2 are missing (39 cases).

* CIDID: Atypical depression.
* --------------------------.
* Defined as a subset of those with MDD as defined above: must have zmhcididmdiag = 1.
* The conditions for atypical depression are.
* (1) diagnosis of MDD as defined above - zmhcididmdiag=1.
* and (2) mood brightens - item 9.
* and (3) at least 2 of 3 sub-conditions are met as follows.
*         (a) weight change of at least 4kg - item 6b.
*         (b) heavy feelings in limbs - item 10.
*         (c) sensitivity to rejection causing impairment - item 11.
* Note that item 6b is branched from item 6a, so is missing in many cases.

* Item 9, condition (2), is already a 1/0 flag.
* as are items 6b and 10, but we need to recode item 11 this way.
RECODE zmhcidid11 (2=1) (1=0) (3=0) INTO zmhcidid11flag.
EXECUTE.
* Now make a 1/0 flag for condition (3).
* Condition is positive if at least 2 sub-conditions are met.
IF (SUM(zmhcidid06b, zmhcidid10, zmhcidid11flag) >= 2) zmhcidid061011flag = 1.
EXECUTE.
* Percentage positive is not very high in any of these 3 flags.
* so do not impute any positive result from missing data.
* Condition is negative if score is 0 with none or 1 missing.
IF (SUM.2(zmhcidid06b, zmhcidid10, zmhcidid11flag) = 0) zmhcidid061011flag = 0.
* and condition is negative if score is 1 with none missing.
IF (SUM.3(zmhcidid06b, zmhcidid10, zmhcidid11flag) = 1) zmhcidid061011flag = 0.
EXECUTE.
* Note that there are many cases of missing data in these three flags.

* Now set the 'atypical depression' diagnosis flag (0 or 1).
NUMERIC zmhcididadiag (F1.0).
VARIABLE LEVEL zmhcididadiag (NOMINAL).
* Diagnosis is positive if already diagnosed with MDD (zmhcididmdiag = 1).
* and both of conditions (2) and (3) are met.
IF (zmhcididmdiag = 1 & zmhcidid09 = 1 & zmhcidid061011flag = 1) zmhcididadiag = 1.
EXECUTE.
* Missing data are quite common, and % positives are not high in conditions (2) and (3).
* so do not impute any other diagnoses if data are missing.
* Diagnosis is negative if MDD diagnosis is negative.
IF (zmhcididmdiag = 0) zmhcididadiag = 0.
EXECUTE.
* Diagnosis is also negative if MDD but condition (2) is negative.
IF (zmhcididmdiag = 1 & zmhcidid09 = 0) zmhcididadiag = 0.
EXECUTE.
* And diagnosis is negative if MDD but condition (3) is negative.
IF (zmhcididmdiag = 1 & zmhcidid061011flag = 0) zmhcididadiag = 0.
EXECUTE.
* In all other cases, the atypical depression diagnosis is missing.
* including cases where MDD diagnosis is missing (around 40).
* and cases where MDD but item 9 is missing (nearly 300 cases).
* and cases where MDD but condition (3) is missing (a further 200+ cases).
zmhconnt1/2

Total scale, measuring the inattention trait, from all 11 items of the Conners measure in the twin MHQ. Each item has values 0/1/2/3, hence the scale values have range 0 to 33.

* Total Inattention scale from all 11 items.
COMPUTE zmhconnt = 11 * MEAN.6(zmhconn01, zmhconn02, zmhconn03, zmhconn04, 
 zmhconn05, zmhconn06, zmhconn07, zmhconn08, zmhconn09, zmhconn10, zmhconn11).
EXECUTE.
zmhctqt1/2

Total scale, from all 5 items of the CTS (or CTQ) measure in the twin MHQ. Items 1 and 5 are reversed for the purpose of this scale. Each item has values 0/1/2/3/4, hence the scale values have range 0 to 20.

* Total scale: sum of all 5 items (two reversed).
COMPUTE zmhctqt = 5 * MEAN.3(zmhctq1r, zmhctq2, zmhctq3, zmhctq4, zmhctq5r).
EXECUTE.
zmhduration1/2, zmhpaused1/2

Duration in minutes of the entire twin MHQ questionnaire, if completed and if not paused.

* The web server has provided a raw overall duration variable in seconds: convert to minutes.
COMPUTE zmhduration = duration / 60.
EXECUTE.
* Extreme high outliers are meaningless: often completed the questionnaire over 2+ days.
* Flag cases with extremely long durations (over 2.5 hours).
RECODE zmhduration (0 THRU 150=0) (150.001 THRU HIGHEST=1)
INTO zmhpaused.
EXECUTE.
* and recode overall duration to missing if paused by this measure.
IF (zmhpaused = 1) zmhduration = $SYSMIS.
EXECUTE.
zmheatdandiag1/2

Diagnosis categories for Anorexia, derived from responses about symptoms in the Eating Disorder measure in the twin MHQ. The variable is coded 0=not diagnosed, 1=diagnosed without subtype, 2=diagnosed with restricting subtype, 3=diagnosed with purging/binge-eating subtype.
Based on the derivation used for a similar measure by the GLAD/EDGI studies, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Anorexia Nervosa.
* With subtypes: coded 0=not diagnosed, 1=diagnosed without subtype.
* 2=restricting subtype, 3=binge-eating/purging subtype (or missing if insufficient data).
* The latter two subtypes are mutually exclusive.
* Main criteria, regardless of subtype, are as follows.
* (1) lowest BMI <= 18.55 (using zmheatd02bmi).
* (2) zmheatd02a = 1 (felt fat or afraid of gaining weight).
* (3) zmheatd10 > 0 (self-esteem dependent on body weight/shape).
* Start by coding conditions 1 and 3 temporarily as 1/0 flags (as for zmheatd02a).
* Lowest BMI <=18.55 for criterion (1).
RECODE zmheatd02bmi (LOWEST THRU 18.55=1) (18.5501 THRU HIGHEST=0)
INTO zmheatd02bmilow.
EXECUTE.
* Self-esteem flag from item 10, for criterion (3).
RECODE zmheatd10 (0=0) (1=1) (2=1)
INTO zmheatd10flag.
EXECUTE.
* AN diagnosed (without subtype) only if all three criteria are met.
IF (SUM(zmheatd02a, zmheatd02bmilow, zmheatd10flag) = 3) zmheatdandiag = 1.
EXECUTE.
* AN not diagnosed if any of the three conditions is not met.
IF (zmheatd02a = 0) zmheatdandiag = 0.
IF (zmheatd02bmilow = 0) zmheatdandiag = 0.
IF (zmheatd10flag = 0) zmheatdandiag = 0.
* AN also not diagnosed if branched out by item 1 (no low weight episodes).
IF (zmheatd01 = 0) zmheatdandiag = 0.
EXECUTE.

* Restricting subtype criteria.
* (1) zmheatdandiag = 1: meets the three essential criteria for AN of any type.
* (2) EITHER (a) zmheatd04=0: no binge-eating.
*     OR (b) zmheatd05a=0 and zhmeatd05b=1: has binged but only outside low-weight.
* (3) zmheatd03b=0 and zmheatd03c=0: has not used vomiting or laxatives, etc.
* Use flag variables to simplify conditions 2 and 3.
* Convert item 4 to a 1/0 flag indicating at least some binge-eating.
RECODE zmheatd04 (0=0) (1=1) (2=1)
INTO zmheatd04bingeflag.
EXECUTE.
* Convert the item 5a and 5b criteria into a single flag.
IF (zmheatd05a = 0 & zmheatd05b = 1) zmheatd05abflag = 1.
IF (zmheatd05a = 1) zmheatd05abflag = 0.
IF (zmheatd05b = 0) zmheatd05abflag = 0.
EXECUTE.
* (this condition is apparently unaffected by missing data).
* Now combine the either/or of criterion (2) into a single flag.
IF (zmheatd04bingeflag = 0) zmheatd45abflag = 1.
IF (zmheatd05abflag = 1) zmheatd45abflag = 1.
* Negative result if no bingeing (during low weight) indicated by BOTH variables.
IF (zmheatd04bingeflag = 1 & zmheatd05abflag = 0) zmheatd45abflag = 0.
EXECUTE.
* (again, no issues of missing data here but note that if item 4 shows.
*  no binge-eating then branching causes items 5a/b to be missing).
* Now combine items 3b and 3c into a single flag for criterion (3).
IF (zmheatd03b = 0 & zmheatd03c = 0) zmheat3bcrestricting = 1.
IF (zmheatd03b = 1) zmheat3bcrestricting = 0.
IF (zmheatd03c = 1) zmheat3bcrestricting = 0.
EXECUTE.
* (again, these variables are unaffected by missing data).
* AN Restricting sub-type: change code of zmheatdandiag from 1 to 2.
* if the three main criteria are met.
IF (zmheatdandiag = 1 & SUM(zmheatd45abflag, zmheat3bcrestricting) = 2) zmheatdandiag = 2.
EXECUTE.
* Note that if zmheatdandiag = 0 then this remains unchanged (no AN diagnosis).
* and if zmheatdandiag = 1 but criteria (2) and (3) are not both met.
* then again this remains unchanged (AN diagnosed but without sub-type).

* Binge-eating or purging subtype criteria.
* Note that these are mutually exclusive with the restricting subtype criteria.
* so we can use the same diagnosis variable with a different code.
* (1) zmheatdandiag = 1: meets the three essential criteria for AN of any type.
* (2) zmheatd05a = 1: binge-eating at periods of low weight.
* (3) zmheatd05c = 1: feelings of no control during binge-eating.
* (4) EITHER (a) zmheatd03b = 1: vomiting.
*     OR (b) zmheatd03c = 1: used laxatives, etc.
* Items 5a and 5c, for criteria (2) and (3), are already 1/0 flags.
* We need a new 1/0 flag for criterion (4), combining items 3b and 3c.
* which is simply a reverse-coded of zmheat3bcrestricting above.
RECODE zmheat3bcrestricting (0=1) (1=0)
INTO zmheat3bcpurging.
EXECUTE.
* AN Binge-eating/Purging sub-type: change code of zmheatdandiag from 1 to 3.
* if the 4 main criteria are met.
IF (zmheatdandiag = 1 & SUM(zmheatd05a, zmheatd05c, zmheat3bcpurging) = 3) zmheatdandiag = 3.
EXECUTE.
* Note that if zmheatdandiag = 0 then this remains unchanged (no AN diagnosis).
* and if zmheatdandiag = 1 but criteria (2) to (4) are not all met.
* then again this remains unchanged (AN diagnosed but without sub-type).
* Note also that there is no clash with the restricting sub-type (zmheatdandiag=3).
* because the criteria, e.g. items 3b, 3c, 5a, are mutually exclusive.

* There are some cases of missing data in the lowest-BMI and in item 2a.
* causing a missing diagnosis, but these variables are essential for the diagnosis.
* so do not attempt to deal with these in any way.
zmheatdbediag1/2

Diagnosis flag for Binge-Eating Disorder, coded 0=no 1=yes, derived from responses about symptoms in the Eating Disorder measure in the twin MHQ.
Based on the derivation used for a similar measure by the GLAD/EDGI studies, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Binge-eating disorder.
* Criteria in the TEDS data are as follows.
* (1) zmheatd04 = 2: binge eating at least weekly for over 3 months.
* (2) zmheatd05c = 1: feeling of no control during binge eating.
* (3) At least 3 of item 6 parts a/b/c/d/e are selected: binge-eating associated.
*     with at least 3 of rapid eating, over-eating, etc.
* (4) zmheatd07 = 1: distressed about binge-eating.
* (5) None of item 9 parts a/b/c/d are selected: binge-eating not compensated.
*     for by fasting, vomiting, compulsive exercise, pills, etc.
* (6) zmheatd05b = 1: binge eating outside low-weight episodes.
* zmheatd04regularbingeflag and items 5b, 5c, 7 already exist as 1/0 flags.
* Convert criterion (3) into a flag variable; each part of item 6 is coded 0/1.
IF (SUM(zmheatd06a, zmheatd06b, zmheatd06c, zmheatd06d, zmheatd06e) >= 3) zmheatd06flag = 1.
IF (SUM(zmheatd06a, zmheatd06b, zmheatd06c, zmheatd06d, zmheatd06e) < 3) zmheatd06flag = 0.
EXECUTE.
* (there are no missing data issues here: either all are answered or none).
* The four parts of item 9 are flagged collectively in zmheatd09flag above.
* but here we need the reverse of this flag.
RECODE zmheatd09flag (0=1) (1=0)
INTO zmheatd09negativeflag.
EXECUTE.
* We can now derive the BED diagnosis.
* Positive diagnosis if all 6 criteria are met.
IF (SUM(zmheatd04regularbingeflag, zmheatd05b, zmheatd05c, zmheatd07,
    zmheatd06flag, zmheatd09negativeflag) = 6) zmheatdbediag = 1.
* Negative diagnosis if any one of these 6 criteria is negative.
* even if some of the others might be missing. 
IF (zmheatd04regularbingeflag = 0) zmheatdbediag = 0.
IF (zmheatd05b = 0) zmheatdbediag = 0.
IF (zmheatd05c = 0) zmheatdbediag = 0.
IF (zmheatd07 = 0) zmheatdbediag = 0.
IF (zmheatd06flag = 0) zmheatdbediag = 0.
IF (zmheatd09negativeflag = 0) zmheatdbediag = 0.
EXECUTE.
* Where all 5 other criteria are met, over 85% meet the item 7 condition.
* hence, if item 7 is missing it is quite likely that a positive diagnosis would be made.
* so assign diagnoses to these (very few) cases.
IF (SUM(zmheatd04regularbingeflag, zmheatd05b, zmheatd05c, 
    zmheatd06flag, zmheatd09negativeflag) = 5 & SYSMIS(zmheatd07)) zmheatdbediag = 1.
EXECUTE.
zmheatd02bmi1/2

See zmhbmi1/2, zmheatd02bmi1/2 above.

zmheatdbndiag1/2

Diagnosis flag for Bulimia Nervosa, coded 0=no 1=yes, derived from responses about symptoms in the Eating Disorder measure in the twin MHQ.
Based on the derivation used for a similar measure by the GLAD/EDGI studies, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Bulimia Nervosa.
* Criteria in the TEDS data are as follows.
* (1) zmheatd04 = 2: binge eating at least weekly for over 3 months.
* (2) zmheatd05c = 1: feeling of no control during binge eating.
* (3) one or more of zmheatd09a=1 (fasting), 09b=1 (vomiting), 09c=1 (pills, etc).
*     or 09d=1 (compulsive exercise) to compensate for over-eating.
* (4) one or more of zmheatd11a=1, 11b=1, 11c=1 or 11d=1 (same as above but.
*     done independently of binge eating or low weight).
* (5) zmheatd10 > 0 (self-esteem dependent on body weight/shape).
* (6) zmheatd05b = 1: binge eating outside low-weight episodes.
* Items 5b, 5c and zmheatd10flag already exist as 1/0 flags.
* Do the same for the other criteria.
* First item 4: only value 2 counts as positive.
RECODE zmheatd04 (0=0) (1=0) (2=1)
INTO zmheatd04regularbingeflag.
EXECUTE.
* Combine parts a/b/c/d of item 9 (each part is coded 0/1).
IF (SUM(zmheatd09a, zmheatd09b, zmheatd09c, zmheatd09d) > 0) zmheatd09flag = 1.
* Require all 4 parts to be non-missing for a negative result.
IF (SUM.4(zmheatd09a, zmheatd09b, zmheatd09c, zmheatd09d) = 0) zmheatd09flag = 0.
EXECUTE.
* (note that in fact these items are unaffected by missing data).
* Combine parts a/b/c/d of item 11 in the same way.
IF (SUM(zmheatd11a, zmheatd11b, zmheatd11c, zmheatd11d) > 0) zmheatd11flag = 1.
* Require all 4 parts to be non-missing for a negative result.
IF (SUM.4(zmheatd11a, zmheatd11b, zmheatd11c, zmheatd11d) = 0) zmheatd11flag = 0.
EXECUTE.
* (again, missing data is not an issue - items are either all present or all missing).
* We can now derive the BN diagnosis.
IF (SUM(zmheatd04regularbingeflag, zmheatd05c, zmheatd09flag, zmheatd11flag,
    zmheatd10flag, zmheatd05b) = 6) zmheatdbndiag = 1.
* Negative diagnosis if any one of these 6 criteria is negative.
* even if some of the others might be missing. 
IF (zmheatd04regularbingeflag = 0) zmheatdbndiag = 0.
IF (zmheatd05c = 0) zmheatdbndiag = 0.
IF (zmheatd09flag = 0) zmheatdbndiag = 0.
IF (zmheatd11flag = 0) zmheatdbndiag = 0.
IF (zmheatd10flag = 0) zmheatdbndiag = 0.
IF (zmheatd05b = 0) zmheatdbndiag = 0.
EXECUTE.
* Where all 5 other criteria are met, 80% of twins meet the item 11 condition.
* and over 85% meet the item 5c condition; hence, if either one of these is missing.
* it is quite likely that a positive diagnosis would be made.
* so assign diagnoses to these (very few) cases.
IF (SUM(zmheatd04regularbingeflag, zmheatd05c, zmheatd09flag, 
     zmheatd10flag, zmheatd05b) = 5 & SYSMIS(zmheatd11flag)) zmheatdbndiag = 1.
IF (SUM(zmheatd04regularbingeflag, zmheatd11flag, zmheatd09flag, 
    zmheatd10flag, zmheatd05b) = 5 & SYSMIS(zmheatd05c)) zmheatdbndiag = 1.
EXECUTE.
zmhecvul1/2, zmhneet1/2, zmhses1/2

Derived variables relating to demographics and SES.
zmhneet1/2 is a binary NEET flag (not in education, employment or training), obtained by recoding the employment status item.
zmhecvul1/2 is an ordinal score measuring economic vulnerability or instability, derived as the sum of several 1/0 flag variables (see syntax below).
zmhses1/2 is a standardised, continuously variable composite measuring twin SES. This is derived from three equally-weighted components: educational level, employment income level, and economic vulnerability.

* NEET Category.
* Recode employment status into NEET flag (Not in Education, Employment or Training).
RECODE zmhempst  (1 THRU 3=0) (4=1) (5=0) (6 THRU 7=1) (8=0)
INTO zmhneet.
EXECUTE.

* Economic vulnerability/instability score.
* First recode income level into temporary low-income flag (lowest two categories).
RECODE zmhempinc (1 THRU 2=1) (3 THRU HIGHEST=0)
INTO lowincome.
EXECUTE.
* Now derive vulnerability ordinal score as a sum of five 1/0 flags.
* for NEET (see above), low income, children, on benefits, in zero-hours employment.
* Require at least half of them to be non-missing. 
COMPUTE zmhecvul = SUM.2(zmhneet, lowincome, zmhbenf, lowincome, zmhchild).
EXECUTE.
* The range of scores is 0-4 because there is no overlap of NEET with income or zero-hours.
* hence require at least 2 of 4 possible components to be non-missing.

* SES composite.
* Use three components: educational level, income level and economic vulnerability (above).
* Make a temporary version of the latter without the low-income flag.
* so that income weighting is not increased by being included twice.
* and reverse its value (higher value = higher SES as in the other two components).
COMPUTE ecvulnoincome = 4 - SUM.2(zmhneet, zmhbenf, lowincome, zmhchild).
EXECUTE.
* Now standardise the three components.
DESCRIPTIVES VARIABLES=ecvulnoincome zmhempinc zmhhqual /SAVE.
* Take a mean, requiring at least 2 of the 3 to be non-missing.
COMPUTE ses = MEAN.2(Zecvulnoincome, Zzmhempinc, Zzmhhqual).
EXECUTE.
* Finally, re-standardise to ensure SD is 1.
DESCRIPTIVES VARIABLES=ses (zmhses) /SAVE.
* Unlike at age 21, there is no major cohort effect here.
* (mean cohort differences are less than 0.1).
* because the twins have generally all gone through higher education.
* and have apparently had time to settle into employment patterns.
zmhexerm1/2

Total scale, from the 3 items of the Exercise measure in the twin MHQ. Derived as a weighted mean, using the same method as used for the same measure in TEDS at age 21. The scale has the same value range (1-5) as the items.

* Use the same scale as used in TEDS21 and covid studies, from the same 3 items.
* Compute as a weighted mean, with weightings 3, 2 and 1 respectively.
* for items 1 (strenuous), 2 (moderate) and 3 (mild).
* To keep things simple, require all three items to be non-missing.
COMPUTE zmhexerm = (SUM.3((3 * zmhexer1), (2 * zmhexer2), zmhexer3)) / 6.
EXECUTE.
zmhgadt1/2

Total scale, from the 2 items of the GAD-2 measure in the twin MHQ. Each item has values 0/1/2/3, hence the scale values have range 0 to 6.

* Total scale: sum both items (there are only 2, so scale must have integer values).
COMPUTE zmhgadt = 2 * MEAN.1(zmhgad1, zmhgad2).
EXECUTE.
zmhganxt1/2

Total scale, from all 10 items of the General Anxiety (GAD10) measure in the twin MHQ. Each item has values 0/1/2/3/4, hence the scale values have range 0 to 40.

* Total scale: sum of all 10 items.
COMPUTE zmhganxt = 10 * MEAN.5(zmhganx01, zmhganx02, zmhganx03, zmhganx04, 
    zmhganx05, zmhganx06, zmhganx07, zmhganx08, zmhganx09, zmhganx10).
EXECUTE.
zmhicut1/2, zmhicuunct1/2, zmhicucalt1/2

Total scale and subscales for the ICU measure in the twin MHQ.
zmhicuunct1/2: uncaring subscale (3 items)
zmhicucalt1/2: callous subscale (3 items)
zmhicut1/2: overall total scale (all 7 items)
Every item is coded 0/1/2/3, hence the subscales each have value ranges 0-9 and the overall total scale has value range 0-21. Items 1, 3, 6 and 7 are reversed for the purpose of deriving these scales. Note that item 2 alone measures an 'unemotional' trait and therefore does not form part of either subscale.

* Total scale from all 7 items.
* Plus subscales for Callous and Uncaring (3 items each).
COMPUTE zmhicut = 7 * MEAN.4(zmhicu1r, zmhicu2, zmhicu3r, zmhicu4, zmhicu5, zmhicu6r, zmhicu7r).
COMPUTE zmhicuunct = 3 * MEAN.2(zmhicu1r, zmhicu6r, zmhicu7r).
COMPUTE zmhicucalt = 3 * MEAN.2(zmhicu3r, zmhicu4, zmhicu5).
EXECUTE.
zmhlifevnat1/2, zmhlifevnnt1/2, zmhlifevpat1/2, zmhlifevpnt1/2

These derived variables are counts of life events, not conventional scales.
Each life event item may be coded as negative (prefix zmhlifevn) or positive (zmhlifevp) according to the pattern of responses; a few items are coded in both ways. In all cases, the item coding is 0=did not happen, 1=happened but without any effect, 2/3=happened with moderate or serious effect. The derived variables are:
zmhlifevnnt1/2: number of negative events reported with no effect
zmhlifevnat1/2: number of negative events reported with some effect on the twin
zmhlifevpnt1/2: number of positive events reported with no effect
zmhlifevpat1/2: number of positive events reported with some effect on the twin.
See comments in syntax below for detailed treatment.

* Conventional scales based on means are inappropriate in this measure.
* because the items report largely independent events with little or no correlation.
* However, we can count the numbers of positive and negative life events that occurred.
* and the counts can be divided according to whether or not the events had a perceived effect.
* First, count the numbers of all responses, including 'no' responses.
* (counted separately for negative and positive).
* Note that such counts are zero by default even if data are missing.
COUNT zmhlifevnegcount =  zmhlifev01n zmhlifev02n zmhlifev04n zmhlifev05n zmhlifev07n 
 zmhlifev08n zmhlifev11n zmhlifev12n zmhlifev13n zmhlifev14n zmhlifev15n 
 zmhlifev16n zmhlifev17n zmhlifev18n zmhlifev19n (0 THRU HIGHEST).
COUNT zmhlifevposcount = zmhlifev02p zmhlifev03p zmhlifev04p zmhlifev05p zmhlifev06p 
 zmhlifev09p zmhlifev10p zmhlifev18p zmhlifev20p (0 THRU HIGHEST).
EXECUTE.
* Counts of events that occurred may be invalid if too many items are missing.
* so require at least two thirds of the respective items to be non-missing below.
* Note that items 2, 4, 5 and 18 may be recorded as either positive or negative events.
* To avoid counting responses twice, which may only occur if response is.
* 'yes but did not affect me' (value 1), only count this response for the.
* 'positive' category because this is the more common type of response in all 4 items.
* Negative events: require at least 10 of 15 possible negative event responses to be present.
DO IF (zmhlifevnegcount >= 10).
 * Count events that occurred but did not affect twin.
 COUNT zmhlifevnnt = zmhlifev01n zmhlifev07n 
 zmhlifev08n zmhlifev11n zmhlifev12n zmhlifev13n zmhlifev14n zmhlifev15n 
 zmhlifev16n zmhlifev17n zmhlifev19n (1).
 * Count events that affected the twin.
 COUNT zmhlifevnat = zmhlifev01n zmhlifev02n zmhlifev04n zmhlifev05n zmhlifev07n 
 zmhlifev08n zmhlifev11n zmhlifev12n zmhlifev13n zmhlifev14n zmhlifev15n 
 zmhlifev16n zmhlifev17n zmhlifev18n zmhlifev19n (2,3).
END IF.
EXECUTE.
* Positive events: require at least 6 out of 9 possible positive event responses to be present.
DO IF (zmhlifevposcount >= 9).
 * Count events that did not affect twin.
 COUNT zmhlifevpnt = zmhlifev02p zmhlifev03p zmhlifev04p zmhlifev05p zmhlifev06p 
 zmhlifev09p zmhlifev10p zmhlifev18p zmhlifev20p (1).
 * Count events that affected the twin.
 COUNT zmhlifevpat = zmhlifev02p zmhlifev03p zmhlifev04p zmhlifev05p zmhlifev06p 
 zmhlifev09p zmhlifev10p zmhlifev18p zmhlifev20p (2,3).
END IF.
EXECUTE.
zmhLLCage1/2, zmhLLCdate1/2

See zcLLCage1/2, etc above.

zmhmdq1count1/2, zmhmdq3count1/2, zmhmdqmanscr1/2, zmhmdqmandiag1/2, zmhmdqhypdiag1/2

Mania/hypomania diagnosis and related variables derived from responses in the MDQ measure of the twin MHQ.
zmhmdq1count1/2: count of symptoms reported in question 1 (0-13).
zmhmdq3count1/2: count of co-occurring symptoms reported in question 3 (2-13).
zmhmdqmanscr1/2: Mania screening flag (coded 0/1/2).
zmhmdqmandiag1/2: Mania diagnosis flag (coded 0/1/2).
zmhmdqhypdiag1/2: Hypomania diagnosis flag (coded 0/1/2).
The latter three variables are all coded with values 0=no, 1=yes with less stringent criteria, 2=yes with more stringent criteria. The difference between the less stringent and more stringent criteria, in all three variables, is that the more stringent conditions require all relevant symptoms to be co-occurring (as reported in question 3). For Mania, cases identified by the diagnosis variable (both stringent and less stringent) are subsets of cases identified by the screener variable. Full derivation details are explained in comments in the syntax below.

* Note MDQ branching from screening items 1 and 2 affecting subsequent items 3 to 5.
* (but not always items 6a/b, however these items are not used below).
* If at least 2 'yes' responses in the 13 parts of item 1, proceed to item 2.
* Then if item 2 response is 'yes', proceed to item 3.
* In all other cases, skip to item 6 (which has its own branch rules).
* Also, item 3 only displays those parts for which the item 1 response was 'yes'.

* MDQ initial screener and counts of symptoms.
* firstly reported in the opening question 1 and secondly (after branching).
* in question 3, the number of the same symptoms reported as co-occurring.
* Also, to help with diagnosis below, counts of 7 specific symptoms.
* in Q1 and Q3: parts c, d, e, f, g, i, l.
NUMERIC zmhmdq1count zmhmdq3count mdq1cdefgilcount mdq3cdefgilcount zmhmdqmanscr (F1.0).
* Count the number of mania symptoms reported in question 1.
* requiring responses in at least half the 13 items to be non-missing.
* (in fact partial responses practically never occur in TEDS).
COMPUTE zmhmdq1count = SUM.7(zmhmdq1a, zmhmdq1b, zmhmdq1c, zmhmdq1d, zmhmdq1e, 
 zmhmdq1f, zmhmdq1g, zmhmdq1h, zmhmdq1i, zmhmdq1j, zmhmdq1k, zmhmdq1l, zmhmdq1m).
* Now count the number of co-occurring symptoms reported in question 3 (after branching).
* Here, we logically require at least 2 responses to be non-missing.
* because the symptoms are co-occurring so 1 response would be unhelpful.
IF (zmhmdq2 = 1) zmhmdq3count = SUM.2(zmhmdq3a, zmhmdq3b, zmhmdq3c, zmhmdq3d, zmhmdq3e, 
 zmhmdq3f, zmhmdq3g, zmhmdq3h, zmhmdq3i, zmhmdq3j, zmhmdq3k, zmhmdq3l, zmhmdq3m).
EXECUTE.
* This already excludes some cases who decided to skip q3 entirely (all missing).
* We also need to exclude a few cases who gave only 1 'yes' response in q3.
* which is illogical because they were asked to indicate those that occurred simultaneously.
RECODE zmhmdq3count (1=SYSMIS) (ELSE=COPY).
EXECUTE.
* Now make a (temporary) count of 7 key symptoms, in both Q1 and Q3.
* namely parts c, d, e, f, g, i, l.
* Missing data isn't a problem here so do a simple sum.
COMPUTE mdq1cdefgilcount = SUM(zmhmdq1c, zmhmdq1d, zmhmdq1e, 
  zmhmdq1f, zmhmdq1g, zmhmdq1i, zmhmdq1l).
COMPUTE mdq3cdefgilcount = SUM(zmhmdq3c, zmhmdq3d, zmhmdq3e, 
  zmhmdq3f, zmhmdq3g, zmhmdq3i, zmhmdq3l).
EXECUTE.

* Now derive an MDQ Mania screener, at two levels, coded 1 and 2.
* The basic screener involves counting at least 7 reported symptoms.
* but this is adapted for better compatibility with the mania diagnosis criteria below.
* where only 4 or 5 symptoms may be sufficient if they are the key symptoms.
* as follows.
* (A) any 7 or more of the 13 symptoms.
* or (B) symptom a (hyper) plus at least 3 of symptoms c/d/e/f/g/i/l.
* or (C) symptom b (irritable) but not symptom a, plus at least 4 of symptoms c/d/e/f/g/i/l.
* The less stringent criteria (coded 1) are as follows.
*  (1) Q1 symptoms counted as described above.
*     (A) zmhmdq1count >=7.
*     or (B) zmhmdq1a = 1 & mdq1cdefgilcount >= 3.
*     or (C) zmhmdq1b = 1 & mdq1cdefgilcount >= 4.
*  (2) zmhmdq2 = 1: at least some were co-occurring.
*  (3) zmhmdq4b >= 2: severity was moderate or serious.
* Do not allow missing values in item 4b because only 30% have zmhmdq4b > = 2.
* when the other two conditions are met; plus 4b is the critical indicator of mania not hypomania.
* However, 97% have zmhmdq2 = 1 when the other two conditions are met.
* so can allow missing values in item 2 for a positive result (affects 29 twins).
IF (zmhmdq1count >= 7 & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) & zmhmdq4b >= 2) zmhmdqmanscr = 1.
IF (zmhmdq1a = 1 & mdq1cdefgilcount >= 3 & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) & zmhmdq4b >= 2) zmhmdqmanscr = 1.
IF (zmhmdq1b = 1 & mdq1cdefgilcount >= 4 & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) & zmhmdq4b >= 2) zmhmdqmanscr = 1.
* Negative result if any one of the 3 criteria is not met.
IF (zmhmdq1a = 0 & zmhmdq1b = 0 & zmhmdq1count < 7) zmhmdqmanscr = 0.
IF (zmhmdq1a = 1 & mdq1cdefgilcount < 3) zmhmdqmanscr = 0.
IF (zmhmdq1a = 0 & zmhmdq1b = 1 & mdq1cdefgilcount < 4) zmhmdqmanscr = 0.
IF (zmhmdq2 = 0) zmhmdqmanscr = 0.
IF (zmhmdq4b < 2) zmhmdqmanscr = 0.
EXECUTE.
* This screener will be missing if either zmhmdq1count or zmhmdq4b is missing.
* The more stringent criteria, coded 2, are similar but with all symptoms co-occurring.
*  (1) co-occurring symptoms counted as above, from question 3.
*  (2) zmhmdq4b > = 2: severity was moderate or serious.
* Note that criterion (1) necessarily implies that zmhmdq2 = 1.
* and that the symptoms reported in Q3 were also reported in Q1.
* because of branching rules in the questionnaire, hence, to simplify.
* the stringent criteria imply that zmhmdqmanscr = 1 as derived above.
* and zmhmdqmanscr=1 also implies that zmhmdq4b >= 2.
* Hence, for derivation, the stringent criteria can be applied as follows.
*  (1) zmhmdqmanscr = 1, as derived above.
*  (2) Q3 symptoms counted as described above.
*     (A) zmhmdq3count >=7.
*     or (B) zmhmdq3a = 1 & mdq3cdefgilcount >= 3.
*     or (C) zmhmdq3b = 1 & mdq3cdefgilcount >= 4.
IF (zmhmdqmanscr = 1 & zmhmdq3count >= 7) zmhmdqmanscr = 2.
IF (zmhmdqmanscr = 1 & zmhmdq3a = 1 & mdq3cdefgilcount >= 3) zmhmdqmanscr = 2.
IF (zmhmdqmanscr = 1 & zmhmdq3b = 1 & mdq3cdefgilcount >= 4) zmhmdqmanscr = 2.
EXECUTE.
* If the Q3 variables are missing, then the screener value remains at 1.
* Note that if Q3 variables are present then zmhmdq2 is necessarily non-missing because of branching.

* Mania and Hypomania diagnosis variables.
* Again with values 0/1/2, 2 being based on stricter criteria.
NUMERIC zmhmdqmandiag zmhmdqhypdiag (F1.0).
* For Mania (bipolar type 1), the less stringent criteria, for value 1, are.
*  (1) At least one of Q1a (hyper) and Q1b (irritable) is answered 1=yes.
*  (2) At least 3 (if Q1a is yes) or at least 4 (if Q1a is no/missing but Q1b is yes).
*      of the 7 Q1 parts c/d/e/f/g/i/l are answered 1=yes.
*  (3) Q2 is 1=yes (at least some symptoms co-occurred).
*  (4) Q4a (longest time) is 3=a week or more.
*  (5) Q4b (severity) is 2=moderate or 3=serious.
* For Hypomania, the less stringent criteria are idential EXCEPT FOR.
*  (5) Q4b (severity) is 0=none or 1=minor.
IF (zmhmdq1a = 1 & mdq1cdefgilcount >= 3 & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) 
    & zmhmdq4a = 3 & zmhmdq4b >= 2) zmhmdqmandiag = 1.
IF (zmhmdq1a = 1 & mdq1cdefgilcount >= 3 & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) 
    & zmhmdq4a = 3 & zmhmdq4b < 2) zmhmdqhypdiag = 1.
IF ((zmhmdq1a = 0 | SYSMIS(zmhmdq1a)) & zmhmdq1b = 1 & mdq1cdefgilcount >= 4 
    & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) & zmhmdq4a = 3 & zmhmdq4b >= 2) zmhmdqmandiag = 1.
IF ((zmhmdq1a = 0 | SYSMIS(zmhmdq1a)) & zmhmdq1b = 1 & mdq1cdefgilcount >= 4 
    & (zmhmdq2 = 1 | SYSMIS(zmhmdq2)) & zmhmdq4a = 3 & zmhmdq4b < 2) zmhmdqhypdiag = 1.
EXECUTE.
* In practice, different parts of Q1 are rarely if ever missing.
* The conditions as coded may be met if zmhmdq1a=1 but zmhmdq1b is missing.
* and also if zmhmdq1a is missing but zmhmdq1b=1 (explicitly coded).
* while other parts of Q1 may be missing as long as mdq1cdefgilcount reaches the required level.
* Hence, no need to cater for other unlikely permutations of missing data in Q1.
* Q4b must not be missing because this is critical for the distinction.
* between mania and hypomania.
* When other conditions are met, zmhmdq2 = 1 in over 95% (mania) or over 85% (hypomania).
* of cases, so we can allow zmhmdq2 to be missing if all other criteria are met.
* However, zmhmdq4a = 3 in well under 50% of cases when the other criteria are met.
* so do not allow Q4a to be missing.
* Negative diagnoses.
* Both diagnoses are negative if both Q1a and Q1b are 0=no.
IF (zmhmdq1a = 0 & zmhmdq1b = 0) zmhmdqmandiag = 0.
IF (zmhmdq1a = 0 & zmhmdq1b = 0) zmhmdqhypdiag = 0.
* Note: diagnoses will be missing if both 1a and 1b missing, or if one is 'no' and the other missing.
* Both diagnoses are negative if 1a is 1=yes and mdq1cdefgilcount < 3.
IF (zmhmdq1a = 1 & mdq1cdefgilcount < 3) zmhmdqmandiag = 0.
IF (zmhmdq1a = 1 & mdq1cdefgilcount < 3) zmhmdqhypdiag = 0.
* and both are negative if 1a is 0=no or missing, 1b is 1=yes, and mdq1cdefgilcount < 4.
IF ((zmhmdq1a = 0 | SYSMIS(zmhmdq1a)) & zmhmdq1b = 1 & mdq1cdefgilcount < 4) zmhmdqmandiag = 0.
IF ((zmhmdq1a = 0 | SYSMIS(zmhmdq1a)) & zmhmdq1b = 1 & mdq1cdefgilcount < 4) zmhmdqhypdiag = 0.
* and both are negative is Q2 is 0=no.
IF (zmhmdq2 = 0) zmhmdqmandiag = 0.
IF (zmhmdq2 = 0) zmhmdqhypdiag = 0.
* and both are negative if zmhmdq4a <  3 (duration less than a week).
IF (zmhmdq4a < 3) zmhmdqmandiag = 0.
IF (zmhmdq4a < 3) zmhmdqhypdiag = 0.
* Mania diagnosis is negative if zmhmdq4b <  2 (no problem or minor problem).
IF (zmhmdq4b < 2) zmhmdqmandiag = 0.
* while Hypomania diagnosis is negative if zmhmdq4b > = 2 (moderate or serious problem).
IF (zmhmdq4b >= 2) zmhmdqhypdiag = 0.
EXECUTE.
* Both diagnoses are missing if Q4b is missing (16 cases, with or without other missing data).
* and both are missing if all parts of Q1 were skipped so entire measure is missing (2 cases).
* Mania diagnosis is missing but Hypomania diagnosis is negative if Q4a is missing.
*   but Q4b is present and >=2, ruling out Hypomania (11 cases).
* Similarly, Hypomania diagnosis is missing but Mania diagnosis is negative.
*   if Q4a is missing but Q4b < 2, ruling out Mania (27 cases).
* This apparently accounts for all the missing diagnoses.

* The criteria for stringent diagnosis, coded with value 2, are similar.
* but based on co-occurring symptoms in Q3 rather than symptoms in Q1.
* For Mania.
*  (1) At least one of Q3a (hyper) and Q3b (irritable) is answered 1=yes.
*  (2) At least 3 (if Q3a is yes) or at least 4 (if Q3a is no/missing but Q3b is yes).
*      of the 7 Q3 parts c/d/e/f/g/i/l are answered 1=yes.
*  (3) Q2 is 1=yes (at least some symptoms co-occurred).
*  (4) Q4a (longest time) is 3=a week or more.
*  (5) Q4b (severity) is 2=moderate or 3=serious.
* For Hypomania, the only difference is that.
*  (5) Q4b (severity) is 0=none or 1=minor.
* Note that criteria (1) and (2) necessarily imply that Q2 is 1=yes because of branching rules.
* Also criteria (4) and (5) are identical to those used in the less stringent diagnosis.
* Note also that the parts of Q3 only appear if selected in Q1 (branching rule).
*   hence each part of criteria (1) and (2) can only apply if the corresponding.
*   criteria in Q1, for less stringent diagnosis, are already met.
* Hence the more stringent diagnosis can be derived more simply as follows.
*   with an upgrading of the coded value from 1 to 2.
*  (1) At least one of Q3a (hyper) and Q3b (irritable) is answered 1=yes.
*  (2) At least 3 (if Q3a is yes) or at least 4 (if Q3a is no/missing but Q3b is yes).
*      of the 7 Q3 parts c/d/e/f/g/i/l are answered 1=yes.
*  (3) The less stringent diagnosis has already been made (coded as 1).
IF (zmhmdqmandiag = 1 & zmhmdq3a = 1 & mdq3cdefgilcount >= 3) zmhmdqmandiag = 2.
IF (zmhmdqhypdiag = 1 & zmhmdq3a = 1 & mdq3cdefgilcount >= 3) zmhmdqhypdiag = 2.
IF (zmhmdqmandiag = 1 & (zmhmdq3a = 0 | SYSMIS(zmhmdq3a)) & zmhmdq3b = 1 
    & mdq3cdefgilcount >= 4) zmhmdqmandiag = 2.
IF (zmhmdqhypdiag = 1 & (zmhmdq3a = 0 | SYSMIS(zmhmdq3a)) & zmhmdq3b = 1 
    & mdq3cdefgilcount >= 4) zmhmdqhypdiag = 2.
EXECUTE.
* If Q3a or Q3b is 1=yes when the other is allowed to be missing.
* but zmhmdqmandiag and mdq3cdefgilcount must not be missing.
zmhmfqt1/2

Total scale, from all 13 items of the MFQ measure in the twin MHQ. Each item has values 0/1/2, hence the scale values have range 0 to 26.

* Total scale: sum of all 13 items.
COMPUTE zmhmfqt = 13 * MEAN.7(zmhmfq01, zmhmfq02, zmhmfq03, zmhmfq04, zmhmfq05, 
 zmhmfq06, zmhmfq07, zmhmfq08, zmhmfq09, zmhmfq10, zmhmfq11, zmhmfq12, zmhmfq13).
EXECUTE.
zmhneet1/2

See zmhecvul1/2, zmhneet1/2, zmhses1/2 above.

zmhpanicdiag1/2

Diagnosis flag, coded 1=yes 0=no, for Panic Disorder, derived from responses in the measure of the same name in the twin MHQ.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Panic disorder diagnosis.
* The conditions for diagnosis are.
* (1) At least 4 of the 13 initial screening items (1a-1m) are 'yes' (coded 1).
* and (2) One or more of.
*        (a) worried about more panic attacks - item 3a.
*        (b) worried about losing control, etc - item 3b.
*        (c) avoided situations - item 3c.
* and (3) symptoms lasted more than 1 month - item 4.
* and (4) panic attacks not all caused by medical problem, drugs, etc - item 6.
* and (5) One or both of.
*        (a) panic unrelated to situations that cause fear - item 7.
*        (b) panic has occurred when not in such situations - item 8.

* Note that, in the questionnaire, condition (1) screens by branching.
* so if condition (1) is negative then the rest of the items are missing.
* Also, condition (2), item 3a/b/c, screens for condition (3), item 4.
* so if condition (2) is negative then item 4 and condition (3) is missing.
* but conditions (4) and (5), items 6/7/8, are unaffected.

* Convert each of the above conditions into a 1/0 flag.
* The first condition is based on a score of 4 or more from 13 sub-conditions.
* For a negative result, we need the sum of the flags to be 0 with no more than 3 missing.
* or 1 with no more than 2 missing, or 2 with no more than 1 missing, or 3 with none missing.
* in case the missing flags might indicate a positive.
IF (SUM.10(zmhpanic01a, zmhpanic01b, zmhpanic01c, zmhpanic01d, zmhpanic01e, 
     zmhpanic01f, zmhpanic01g, zmhpanic01h, zmhpanic01i, zmhpanic01j, 
     zmhpanic01k, zmhpanic01l, zmhpanic01m) = 0) zmhpanic01flag = 0.
IF (SUM.11(zmhpanic01a, zmhpanic01b, zmhpanic01c, zmhpanic01d, zmhpanic01e, 
     zmhpanic01f, zmhpanic01g, zmhpanic01h, zmhpanic01i, zmhpanic01j, 
     zmhpanic01k, zmhpanic01l, zmhpanic01m) = 1) zmhpanic01flag = 0.
IF (SUM.12(zmhpanic01a, zmhpanic01b, zmhpanic01c, zmhpanic01d, zmhpanic01e, 
     zmhpanic01f, zmhpanic01g, zmhpanic01h, zmhpanic01i, zmhpanic01j, 
     zmhpanic01k, zmhpanic01l, zmhpanic01m) = 2) zmhpanic01flag = 0.
IF (SUM.13(zmhpanic01a, zmhpanic01b, zmhpanic01c, zmhpanic01d, zmhpanic01e, 
     zmhpanic01f, zmhpanic01g, zmhpanic01h, zmhpanic01i, zmhpanic01j, 
     zmhpanic01k, zmhpanic01l, zmhpanic01m) = 3) zmhpanic01flag = 0.
* For a positive result, if at least 4 parts are positive then missingness in the others does not matter.
IF (SUM(zmhpanic01a, zmhpanic01b, zmhpanic01c, zmhpanic01d, zmhpanic01e, 
     zmhpanic01f, zmhpanic01g, zmhpanic01h, zmhpanic01i, zmhpanic01j, 
     zmhpanic01k, zmhpanic01l, zmhpanic01m) >= 4) zmhpanic01flag = 1.
EXECUTE.
* No need to check for cases such as 3 positive with 1 missing because this causes.
* a branch such that the rest of the measure is missing - the diagnosis also will be missing.
* In item 3 parts a-c, for a negative result we need all 3 parts to be non-missing.
* (if 1 or 2 are negative but others are missing, this flag will be missing).
IF (SUM.3(zmhpanic03a, zmhpanic03b, zmhpanic03c) = 0) zmhpanic03flag = 0.
IF (SUM(zmhpanic03a, zmhpanic03b, zmhpanic03c) > 0) zmhpanic03flag = 1.
EXECUTE.
RECODE zmhpanic04 (1=0) (2 THRU 7=1) INTO zmhpanic04flag.
EXECUTE.
RECODE zmhpanic06 (0 THRU 1=1) (2=0) INTO zmhpanic06flag.
EXECUTE.
* Combining items 7 and 8, for a negative result both items must be non-missing.
* (if one is negative and the other missing, this flag will be missing).
IF (zmhpanic07 = 1 & zmhpanic08 = 0) zmhpanic078flag = 0.
* If one is positive then it doesn't matter if the other is missing.
IF (zmhpanic07 = 0 | zmhpanic08 = 1) zmhpanic078flag = 1.
EXECUTE.
* Now sum the flags for the 4 post-screening criteria to get a score 0-4.
COMPUTE zmhpanicdiagscore = SUM(zmhpanic03flag, zmhpanic04flag, zmhpanic06flag, zmhpanic078flag).
EXECUTE.
* Also count the number of non-missing post-screening flag variables (if panic data present).
DO IF (ANY(zmhpanicstat, 1, 2)).
 COUNT zmhpanicdiagflags = zmhpanic03flag zmhpanic04flag zmhpanic06flag zmhpanic078flag (0, 1).
END IF.
EXECUTE.

* Now set the diagnosis flag (0 or 1).
* Note that, for twins screening positive on item 1.
* around 98% have a positive result in item 6, 80% in item 3, 73% in item 7/8 and 61% in item 4.
* Therefore allow a probable positive diagnosis if 3 of 4 criteria are positive and the other one missing.
* but make an exception to this rule if item 4 is the missing item.
* (item 3 must also not be missing because this screens for item 4 so two items would be missing).
* Diagnosis is positive if positively screened by item 1 and all 4 other criteria are met.
IF (zmhpanic01flag = 1 & zmhpanicdiagscore = 4) zmhpanicdiag = 1.
* Also allow positive diagnosis if 3 criteria are met and the other 1 is missing.
* unless the missing one is item 4.
IF (zmhpanic01flag = 1 & zmhpanic04flag = 1 & zmhpanicdiagscore = 3 & zmhpanicdiagflags = 3) zmhpanicdiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by item 1.
IF (zmhpanic01flag = 0) zmhpanicdiag = 0.
* Diagnosis is also negative if item 1 is positive but.
* the other 4 criteria are all non-missing and the score is less than 4.
IF (zmhpanic01flag = 1 & zmhpanicdiagscore < 4 & zmhpanicdiagflags = 4) zmhpanicdiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 2 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the four criteria is non-missing and negative).
IF (zmhpanic01flag = 1 & zmhpanicdiagscore <= 2 & (zmhpanicdiagscore < zmhpanicdiagflags)) zmhpanicdiag = 0.
EXECUTE.
* In all other (rare) cases, the diagnosis variable will be missing.
* including cases where many parts of item 1 are missing and fewer than 4 of those parts are positive.
* and cases where item 1 flags positive and (a) all 4 other criteria are missing.
* or (b) 1 is positive and the other 3 are missing, or (c) 2 are positive and the other 2 are missing.
* or (d) 3 are positive and item 4 is missing.
zmhpaused1/2

See zmhduration1/2, zmhpaused1/2 above.

zmhpauses1/2

Estimated total number of pauses during the twin MHQ, estimated as the sum of the pauses flagged in individual blocks.

* Count the estimated number of pauses.
COMPUTE zmhpauses = SUM(zmhdemogpaus, zmhmedhispaus, zmhphqpaus, zmhmhdpaus, zmhmfqpaus, zmhcididpaus, zmhganxpaus, 
 zmhcidiapaus, zmhspephpaus, zmhsocphpaus, zmhpanicpaus, zmhagorapaus, zmhgadpaus, zmhphqdeppaus, 
 zmhwasaspaus, zmhctqpaus, zmhatspaus, zmhpclpaus, zmhlifevpaus, zmhslfhmpaus, zmhsdqpaus, zmhqolpaus, 
 zmhbddpaus, zmheatdpaus, zmhmctqpaus, zmhspeqpaus, zmhmdqpaus, zmhconnpaus, zmhraadspaus, zmhicupaus, 
 zmhcontrapaus, zmhalcopaus, zmhcannpaus, zmhsmopaus, zmhdietpaus, zmhexerpaus, zmhphypaus, 
 zmhsaspdpaus, zmhspeqhedpaus).
EXECUTE.
zmhpclt1/2

Total scale, from all 6 items of the PCL measure in the twin MHQ. Each item has values 0/1/2/3/4, hence the scale values have range 0 to 24.

* Total scale: sum of all 6 items.
COMPUTE zmhpclt = 6 * MEAN.3(zmhpcl1, zmhpcl2, zmhpcl3, zmhpcl4, zmhpcl5, zmhpcl6).
EXECUTE.
zmhphqdept1/2

Total scale, from the 2 items of the PHQ-2 measure (current depression) in the twin MHQ. Each item has values 0/1/2/3, hence the scale values have range 0 to 6.

* Total scale: sum both items (there are only 2, so scale must have integer values).
COMPUTE zmhphqdept = 2 * MEAN.1(zmhphqdep1, zmhphqdep2).
EXECUTE.
zmhphqt1/2

Total scale, from all 15 items of the PHQ-15 measure (physical health symptoms) in the twin MHQ. Each item has values 0/1/2, hence the scale values have range 0 to 30.

* Total scale: sum of all 15 items.
COMPUTE zmhphqt = 15 * MEAN.8(zmhphq01, zmhphq02, zmhphq03, zmhphq04, zmhphq05, zmhphq06, 
    zmhphq07, zmhphq08, zmhphq09, zmhphq10, zmhphq11, zmhphq12, zmhphq13, zmhphq14, zmhphq15).
EXECUTE.
zmhpmst1/2

Total scale, from all 6 items of the PMS measure (measuring frequency of symptoms) in the twin MHQ. Each item has values 0/1/2/3/4, hence the scale values have range 0 to 24.

* Total scale from all 6 items.
COMPUTE zmhpmst = 6 * MEAN.3(zmhpms1, zmhpms2, zmhpms3, zmhpms4, zmhpms5, zmhpms6).
EXECUTE.
zmhqolm1/2

Overall mean scale, from all 3 items of the Quality of Life measure (subjective wellbeing) in the twin MHQ. Items 1 and 2 have response codes 1-6; item 3 is recoded to the same range (from values 0-4) before deriving the mean, so the mean scale also has value range 1-6.

* Total scale: sum of all 15 items.
* Mean of all 3 items.
* Items 1 and 2 are on 6-point scales (1-6).
* while item 3 is on a 5-point scale (0-4).
* To ensure equal weighting, rescale item 3 to 1-6 values before deriving the mean.
COMPUTE zmhqolm = MEAN.2(zmhqol1, zmhqol2, (1 + (zmhqol3 * 5 / 4))).
EXECUTE.
zmhraadst1/2, zmhraadssoct1/2, zmhraadsnont1/2

Total scale and subscales for the RAADS measure in the twin MHQ.
zmhraadssoct1/2: socio-communicative subscale (3 items)
zmhraadsnont1/2: non-social subscale (3 items)
zmhraadst1/2: overall total scale (all 6 items)
Every item is coded 0/1/2/3, hence the subscales each have value ranges 0-9 and the overall total scale has value range 0-18.

* Total scale from all 6 items.
* Plus subscales for socio-communicative and non-social (3 items each).
COMPUTE zmhraadst = 6 * MEAN.4(zmhraads1, zmhraads2, zmhraads3, zmhraads4, zmhraads5, zmhraads6).
COMPUTE zmhraadssoct = 3 * MEAN.2(zmhraads1, zmhraads3, zmhraads5).
COMPUTE zmhraadsnont = 3 * MEAN.2(zmhraads2, zmhraads4, zmhraads6).
EXECUTE.
zmhsaspdm1/2

Overall mean scale, from all 9 items of the SASPD measure in the twin MHQ. All items have value codes 1/2/3/4, hence the scale also has value range 1-4.

* Mean of all 9 items.
COMPUTE zmhsaspdm = MEAN.5(zmhsaspd1, zmhsaspd2, zmhsaspd3, 
    zmhsaspd4, zmhsaspd5, zmhsaspd6, zmhsaspd7, zmhsaspd8, zmhsaspd9).
EXECUTE.
zmhsdqbeht1/2, zmhsdqcont1/2, zmhsdqemot1/2, zmhsdqhypt1/2, zmhsdqpert1/2, zmhsdqprot1/2

Subscales for the SDQ measure in the twin MHQ.
zmhsdqemot1/2: emotion subscale (5 items)
zmhsdqpert1/2: peer problems subscale (5 items)
zmhsdqhypt1/2: hyperactivity subscale (5 items)
zmhsdqcont1/2: conduct subscale (5 items)
zmhsdqprot1/2: prosocial subscale (5 items)
zmhsdqemot1/2: overall behaviour problems subscale (20 items: includes all the above except for the prosocial items).
Every item is coded 0/1/2, hence the first five subscales each have value ranges 0-10 and the overall behaviour problems subscale has value range 0-40. Some items have been reverse-coded for the purpose of deriving these scales.

* The usual 5 subscales (5 items each) plus the behaviour problems total (20 items).
* using reversed items where necessary.
* Emotional symptoms subscale (previously called Anxiety).
COMPUTE zmhsdqemot = 5 * MEAN.3(zmhsdqemo1, zmhsdqemo2, zmhsdqemo3, zmhsdqemo4, zmhsdqemo5).
* Peer problems subscale.
COMPUTE zmhsdqpert = 5 * MEAN.3(zmhsdqper1, zmhsdqper2r, zmhsdqper3r, zmhsdqper4, zmhsdqper5).
* Hyperactivity subscale.
COMPUTE zmhsdqhypt = 5 * MEAN.3(zmhsdqhyp1, zmhsdqhyp2, zmhsdqhyp3, zmhsdqhyp4r, zmhsdqhyp5r).
* Conduct subscale.
COMPUTE zmhsdqcont = 5 * MEAN.3(zmhsdqcon1, zmhsdqcon2r, zmhsdqcon3, zmhsdqcon4, zmhsdqcon5).
* Prosocial subscale.
COMPUTE zmhsdqprot = 5 * MEAN.3(zmhsdqpro1, zmhsdqpro2, zmhsdqpro3, zmhsdqpro4, zmhsdqpro5).
* Behaviour problems total (all items except prosocial).
COMPUTE zmhsdqbeht = 20 * MEAN.10(zmhsdqemo1, zmhsdqemo2, zmhsdqemo3, zmhsdqemo4, zmhsdqemo5,
    zmhsdqper1, zmhsdqper2r, zmhsdqper3r, zmhsdqper4, zmhsdqper5,
    zmhsdqhyp1, zmhsdqhyp2, zmhsdqhyp3, zmhsdqhyp4r, zmhsdqhyp5r,
    zmhsdqcon1, zmhsdqcon2r, zmhsdqcon3, zmhsdqcon4, zmhsdqcon5).
EXECUTE.
zmhses1/2

See zmhecvul1/2, zmhneet1/2, zmhses1/2 above.

zmhsocphdiag1/2

Diagnosis flag, coded 1=yes 0=no, for Social Phobias, derived from responses in the measure of the same name in the twin MHQ.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Social phobia diagnosis.
* The conditions for diagnosis are.
* (1) At least 1 of the 2 initial screening items (1a, 1b) is 'yes' (coded 1).
* and (2) Worried about what others think - item 2.
* and (3) Situations always or almost always cause fear - item 3.
* and (4) Situations avoided or endured with anxiety - items 4a, 4b.
* and (5) Fears out of proportion - item 5.
* and (6) Fears lasted over 6 months - item 6.
* and (7) Fears interfered with everyday life (some or a lot) - item 8.

* Note that, in the questionnaire, conditions (1) and (3) screen by branching.
* so if condition (1) is negative then the rest of the items are missing.
* while if condition (1) is positive but condition (3), item 3, is negative.
* then all the subsequent items are missing (but item 2 is unaffected).

* Convert each of the above conditions into a 1/0 flag (items 2, 5 are already coded this way).
* In item 1 parts a/b, for a negative (0) result require both parts to be non-missing.
* to be sure that there is not a missing positive.
* (if one is negative but the other is missing, this flag will be missing).
IF (SUM.2(zmhsocph01a, zmhsocph01b) = 0) zmhsocph01flag = 0.
* For a positive result, if one part is positive then missingness in the other does not matter.
IF (SUM(zmhsocph01a, zmhsocph01b) > 0) zmhsocph01flag = 1.
EXECUTE.
RECODE zmhsocph03 (3 THRU 4=1) (0 THRU 2=0) INTO zmhsocph03flag.
EXECUTE.
* In item 4 parts a/b, for a negative result we need both parts to be non-missing.
* (if one is negative but the other is missing, this flag will be missing).
IF (SUM.2(zmhsocph04a, zmhsocph04b) > 0) zmhsocph04flag = 1.
* For a positive result, if one part is positive then missingness in the other does not matter.
IF (SUM(zmhsocph04a, zmhsocph04b) = 0) zmhsocph04flag = 0.
EXECUTE.
RECODE zmhsocph06 (1=0) (2 THRU 5=1) INTO zmhsocph06flag.
EXECUTE.
RECODE zmhsocph08 (0 THRU 1=0) (2 THRU 3=1) INTO zmhsocph08flag.
EXECUTE.

* Now sum the flags for the 5 non-branching criteria (i.e. not items 1, 3) to get a score 0-5.
COMPUTE zmhsocphdiagscore = SUM(zmhsocph02, zmhsocph04flag, zmhsocph05, zmhsocph06flag, zmhsocph08flag).
EXECUTE.
* Also count the number these same variables that are non-missing (if SOCPH data present).
DO IF (ANY(zmhsocphstat, 1, 2)).
 COUNT zmhsocphdiagflags = zmhsocph02 zmhsocph04flag zmhsocph05 zmhsocph06flag zmhsocph08flag (0, 1).
END IF.
EXECUTE.

* Now set the diagnosis flag (0 or 1).
* Note that, for twins screening positive on items 1 and 3.
* around 80% or more have positive diagnostic results in each of the other 5 criteria.
* so allow a probable positive diagnosis if 4 criteria are met and the other is missing.
* Diagnosis is positive if positively screened (items 1,3) and all 5 other criteria are met.
IF (zmhsocph01flag = 1 & zmhsocph03flag = 1 & zmhsocphdiagscore = 5) zmhsocphdiag = 1.
* Also allow positive diagnosis if 4 criteria met, and the other one is missing.
IF (zmhsocph01flag = 1 & zmhsocph03flag = 1 & zmhsocphdiagscore = 4 & zmhsocphdiagflags = 4) zmhsocphdiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by item 1 or item 3.
IF (zmhsocph01flag = 0) zmhsocphdiag = 0.
IF (zmhsocph01flag = 1 & zmhsocph03flag = 0) zmhsocphdiag = 0.
* Diagnosis is also negative if items 1 and 3 are positive but.
* the other 5 criteria are all non-missing and the score is less than 5.
IF (zmhsocph01flag = 1 & zmhsocph03flag = 1 & zmhsocphdiagscore < 5 & zmhsocphdiagflags = 5) zmhsocphdiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 3 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the five criteria is non-missing and negative).
IF (zmhsocph01flag = 1 & zmhsocph03flag = 1 & zmhsocphdiagscore <= 3 
    & (zmhsocphdiagscore < zmhsocphdiagflags) ) zmhsocphdiag = 0.
EXECUTE.
* In all other (rare if any) cases, the diagnosis variable will be missing.
* including cases where one part of item 1 is negative and the other part is missing.
* and cases where item 1 flags positive and item 3 is missing.
* and cases where items 1 and 3 flag positive and (a) the 5 other criteria are all missing.
* or (b) 1 is positive and the other 4 missing, or (c) 2 are positive and the other 3 missing.
zmhspephdiag1/2

Diagnosis flag, coded 1=yes 0=no, for Specific Phobias, derived from responses in the measure of the same name in the twin MHQ.
Based on the derivation used for the same measure by the GLAD study, using DSM-5 criteria where these can be matched with items of the questionnaire. The derivation is explained in full in the syntax below.

* Specific phobia diagnosis.
* The conditions for diagnosis are.
* (1) At least 1 of the 5 initial screening items (1a to 1e) is 'yes' (coded 1).
* and (2) Situations avoided or endured with anxiety - items 2a, 2b.
* and (3) Situations always or almost always cause fear - item 3.
* and (4) Fears lasted over 6 months - item 5.
* and (5) Fears interfered with everyday life (some or a lot) - item 6.
* and (6) Fears out of proportion - item 7.

* Note that, in the questionnaire, conditions (1) and (3) screen by branching.
* so if condition (1) is negative then the rest of the items are missing.
* while if condition (1) is positive but condition (3), item 3, is negative.
* then all the subsequent items are missing (but item 2 is unaffected).

* Convert each of the above conditions into a 1/0 flag (item 7 is already coded like this).
* In item 1 parts a-e, for a negative (0) result require all 5 parts to be non-missing.
* to be sure that there is not one missing positive.
* (if up to 4 are negative but the others are missing, this flag will be missing).
IF (SUM.5(zmhspeph01a, zmhspeph01b, zmhspeph01c, zmhspeph01d, zmhspeph01e) = 0) zmhspeph01flag = 0.
* For a positive result, if at least 1 parts is positive then missingness in the others does not matter.
IF (SUM(zmhspeph01a, zmhspeph01b, zmhspeph01c, zmhspeph01d, zmhspeph01e) > 0) zmhspeph01flag = 1.
EXECUTE.
* In item 2 parts a/b, for a negative result we need both parts to be non-missing.
* (if one is negative but the other is missing, this flag will be missing).
IF (SUM.2(zmhspeph02a, zmhspeph02b) = 0) zmhspeph02flag = 0.
* For a positive result, if one part is positive then missingness in the other does not matter.
IF (SUM(zmhspeph02a, zmhspeph02b) > 0) zmhspeph02flag = 1.
EXECUTE.
RECODE zmhspeph03 (3 THRU 4=1) (0 THRU 2=0) INTO zmhspeph03flag.
EXECUTE.
RECODE zmhspeph05 (2 THRU 5=1) (1=0) INTO zmhspeph05flag.
EXECUTE.
RECODE zmhspeph06 (2 THRU 3=1) (0 THRU 1=0) INTO zmhspeph06flag.
EXECUTE.

* Now sum the flags for the 4 non-branching criteria (i.e. not items 1, 3) to get a score 0-4.
COMPUTE zmhspephdiagscore = SUM(zmhspeph02flag, zmhspeph05flag, zmhspeph06flag, zmhspeph07).
EXECUTE.
* Also count the number these same variables that are non-missing (if SPEPH data present).
DO IF (ANY(zmhspephstat, 1, 2)).
 COUNT zmhspephdiagflags = zmhspeph02flag zmhspeph05flag zmhspeph06flag zmhspeph07 (0, 1).
END IF.
EXECUTE.

* Now set the diagnosis flag (0 or 1).
* Note that, for twins screening positive on items 1 and 3.
* around 97% have a positive result in item 2, 75-80% in items 5 and 7, but only 48% in item 6.
* so allow a probable positive diagnosis if 3 criteria are met and the other is missing.
* as long as the missing item is not item 6.
* Diagnosis is positive if positively screened (items 1,3) and all 4 other criteria are met.
IF (zmhspeph01flag = 1 & zmhspeph03flag = 1 & zmhspephdiagscore = 4) zmhspephdiag = 1.
EXECUTE.
* Also give positive diagnosis if 3 criteria are met and the other 1 is missing.
* as long as the missing one is not item 6.
IF (zmhspeph01flag = 1 & zmhspeph03flag = 1 & zmhspeph06flag = 1 
    & zmhspephdiagscore = 3 & zmhspephdiagflags = 3) zmhspephdiag = 1.
EXECUTE.
* Diagnosis is negative if screened out by item 1 or item 3.
IF (zmhspeph01flag = 0) zmhspephdiag = 0.
IF (zmhspeph01flag = 1 & zmhspeph03flag = 0) zmhspephdiag = 0.
EXECUTE.
* Diagnosis is also negative if items 1 and 3 are positive but.
* the other 4 criteria are all non-missing and the score is less than 4.
IF (zmhspeph01flag = 1 & zmhspeph03flag = 1 & zmhspephdiagscore < 4 
    & zmhspephdiagflags = 4) zmhspephdiag = 0.
EXECUTE.
* Similarly, allowing for some missing criteria, diagnosis is negative if.
* the score is 2 or less and the number of non-missing criteria exceeds the score.
* (signifying that at least one of the four criteria is non-missing and negative).
IF (zmhspeph01flag = 1 & zmhspeph03flag = 1 & zmhspephdiagscore <= 2 
    & (zmhspephdiagscore < zmhspephdiagflags) ) zmhspephdiag = 0.
EXECUTE.
* In all other (rare if any) cases, the diagnosis variable will be missing.
* including cases where parts of item 1 are negative and the other parts are missing.
* and cases where item 1 flags positive and item 3 is missing.
* and cases where items 1 and 3 flag positive and (a) the 4 other criteria are all missing.
* or (b) 1 is positive and the other 3 missing, or (c) 2 are positive and the other 2 missing.
* or (c) 3 are positive but item 6 is missing.
zmhspeqhalt1/2, zmhspeqhedm1/2, zmhspeqpart1/2

Subscales of the SPEQ measure in the twin MHQ.
zmhspeqpart1/2: paranoia subscale from 15 items
zmhspeqhalt1/2: hallucinations subscale from 9 items
zmhspeqhedm1/2: hedonia subscale from 10 items.
The paranoia and hallucinations items have value codes 0-5, hence these two scales have value ranges 0-75 and 0-45 respectively. The hedonia items have value codes 1-6 and this scale is derived as a mean, also having value range 1-6. Item 3 of hedonia has been reversed for the purpose of deriving this scale.

* 3 subscales: paranoia, hallucinations, hedonia (spread over 2 blocks in the qnr).
* Paranoia: total from 15 items, all coded 0-5.
COMPUTE zmhspeqpart = 15 * MEAN.8(zmhspeqpar01, zmhspeqpar02, zmhspeqpar03, zmhspeqpar04, 
    zmhspeqpar05, zmhspeqpar06, zmhspeqpar07, zmhspeqpar08, zmhspeqpar09, zmhspeqpar10, 
    zmhspeqpar11, zmhspeqpar12, zmhspeqpar13, zmhspeqpar14, zmhspeqpar15).
* Hallucinations: total from 9 items, all coded 0-5.
COMPUTE zmhspeqhalt = 9 * MEAN.5(zmhspeqhal1, zmhspeqhal2, zmhspeqhal3, zmhspeqhal4, 
    zmhspeqhal5, zmhspeqhal6, zmhspeqhal7, zmhspeqhal8, zmhspeqhal9).
* Hedonia: mean of 10 items, all coded 1-6 (one item reversed).
COMPUTE zmhspeqhedm = MEAN.5(zmhspeqhed01, zmhspeqhed02, zmhspeqhed03r, zmhspeqhed04, 
    zmhspeqhed05, zmhspeqhed06, zmhspeqhed07, zmhspeqhed08, zmhspeqhed09, zmhspeqhed10).
EXECUTE.
zmhstatus1/2

Overall questionnaire status for the twin MHQ.
Coded 0=not started, 1=started but not finished, 2=successfully completed, 4=excluded as careless responder. At least some MHQ data are present if zmhstatus=1 or 2, but no data are present in the dataset if zmhstatus=0 or 4.
Derived from the status of individual blocks (sections) within the questionnaire. These block status flags are described elsewhere on this page.

* Derive an overall status variable, coded 0=not started, 1=started, 2=finished.
* Code as 0 (unstarted) if demographics block not started.
IF (zmhdemogstat = 0) zmhstatus = 0.
* Code as 2 (finished) if hedonia block finished.
IF (zmhspeqhedstat = 2) zmhstatus = 2.
* Code as 1 (started not finished) if demographics started but hedonia not finished.
IF (zmhdemogstat > 0 & zmhspeqhedstat < 2) zmhstatus = 1.
EXECUTE.

* Overall questionnaire exclusions.
* (1) For exclusion based solely on multiple QC errors, we must count these.
COMPUTE totalqcerrors = SUM(zmhphqqcer, zmhmfqqcer, zmhganxqcer, zmhlifevqcer, zmhsdqqcer, 
 zmhbddqcer, zmhspeqqcer, 
 zmhconnqcer, zmhicuqcer, zmhalco6qcer, zmhcann3qcer, zmhexerqcer, zmhspeqhedqcer).
EXECUTE.

* (2) For overall exclusion based on multiple rapid-response measure exclusions, also count these.
COMPUTE measuresexcluded = SUM(zmhphqexclude, zmhmfqexclude, zmhganxexclude, zmhlifevexclude, 
 zmhsdqexclude, zmhbddexclude, zmhspeqexclude, zmhconnexclude, zmhicuexclude, zmhalcoexclude, 
 zmhcannexclude, zmhexerexclude, zmhspeqhedexclude).
EXECUTE.

* Mark the questionnaire overall as excluded by changing status flag value to 4.
* Exclude if either or both of these rules applies to a twin.
* (1) 4 or more QC errors were made, regardless of time or anything else.
* (2) 2 or more measures were excluded by the rapid-response rule.
IF (totalqcerrors >=4 | measuresexcluded >=2) zmhstatus = 4.
EXECUTE.

* Where zmhstatus=4, all questionnaire data are deleted.
* and the data flag (zmhdata) is recoded from 1 to 0.
zmhwasast1/2

Total scale, from all 6 items of the WASAS measure in the twin MHQ. Each item has values 0-8, hence the scale values have range 0 to 48.

* Total scale: sum of all 6 items.
COMPUTE zmhwasast = 6 * MEAN.3(zmhwasas1, zmhwasas2, zmhwasas3, zmhwasas4, zmhwasas5, zmhwasas6).
EXECUTE.