Health related quality of life for young people receiving dialectical behaviour therapy (DBT): a routine outcome-monitoring pilot

Purpose Adults presenting with borderline personality disorder (BPD) score poorly on measures of health related quality of life (HRQoL). Little is known about HRQoL in adolescents with BPD type presentations and how treatment impacts quality of life. Our primary aim was to use routinely collected quality-of-life outcome measures pre and post-treatment in dialectical behaviour therapy (DBT) for adolescents to address this gap. Secondary aims were to benchmark these data against EuroQol 5 dimensions (EQ-5D™) outcomes for clients treated in clinical trials and to assess the potential of the EQ-5D™ as a benchmarking tool. Method Four adolescent DBT teams, routinely collecting outcome data using a pseudonymised secure web-based system, supplied data from consecutive discharges. Results Young people in the DBT programmes (n = 43) had severely impaired HRQoL scores that were lower at programme admission than those reported in published studies using the EQ-5D™ in adults with a BPD diagnosis and in one study of adolescents treated for depression. 40 % of adolescents treated achieved Reliable Clinical Change. HRQoL improved between admission and discharge with a large effect size. These results were not statistically significant when clustering in programme outcomes was accounted for. Conclusion Young people treated in NHS DBT programmes for BPD type presentations had poorer HRQoL than adults with a BPD diagnosis and adolescents with depression treated in published clinical trials. The EQ-5D™ detected reliable change in this group of adolescents. Programme outcome clustering suggests that both the measure and the web-based monitoring system provide a mechanism for benchmarking clinical programmes.


Background
Whilst borderline personality disorder (BPD) is most commonly diagnosed in adults, more recently clinicians and researchers have begun to consider the assessment and identification of personality disorders in adolescents (Miller et al. 2008;Sharp and Fonagy 2015). Studies investigating BPD traits in young people report that these presentations predict the presence of personality disorders in adulthood and are also linked to other psychiatric disorders, impaired long-term functioning and to increased mortality (Cohen et al. 2005;Crawford et al. 2008;Winograd et al. 2008). Currently, there are no established effective treatments for young people with BPD-type presentations. More typically, adolescent research has focussed on interventions for repeated self-harm, one of the diagnostic criteria for BPD. A meta-analysis of psychological and social interventions for suicide attempts and self-harm (Ougrin et al. 2015) reviewing 19 trials comprising 2176 young people concluded that the selected interventions appeared to be effective overall for self-harm. Dialectical behaviour therapy (DBT) (Miller and Rathus 2002;Mehlum et al. 2014), cognitive-behavioural therapy (CBT) (Esposito-Smythers et al. 2011) and mentalization-based therapy (MBT) (Roussow and Fonagy 2012) demonstrated the largest effect-sizes; however, these results require independent replication.
DBT is an effective treatment for BPD in adults with robust evidence from randomised controlled trials demonstrating significant impacts on a number of important outcomes, including reduced suicidal and self-harming behaviours and service utilisation (Clarkin et al. 2007;Koons et al. 2001;Linehan et al. 1991Linehan et al. , 1999Linehan et al. , 2002Linehan et al. , 2006McMain et al. 2009;Verheul et al. 2003). The recent Cochrane Review concluded that DBT was the only psychological treatment for BPD with sufficient data to pool into a meta-analysis (Stoffers et al. 2012). Results demonstrated moderate to large statistically significant effects for DBT over treatment as usual in reductions in suicidal and self-harm behaviours (SMD −0.54, 95 % CI −0.92 to −0.16); improvements in mental health (SMD 0.65, 95 % CI 0.07-1.24) and decreases in anger (SMD −0.83, 95 % CI −1.43 to −0.22). The National Institute for Health and Care Excellence (NICE) Guidelines for the treatment and management of BPD recommend DBT particularly where reduction in self-harm is a clinical priority. An earlier review reported that DBT had the potential for costeffectiveness (Brazier et al. 2006).
As a direct consequence of its success in treating suicidal and self-harm behaviours with adults diagnosed with BPD, DBT was adapted for the treatment of adolescents (DBT-A) presenting with suicidal and self-harm behaviours who were also demonstrating features of a developing BPD (Miller et al. 2007). A recent RCT of the adapted form of DBT conducted in Norway replicated the findings of earlier studies in adult populations. Mehlum et al. (2014) randomly allocated 77 adolescents presenting with suicidal and self-harming behaviour with at least two other BPD characteristics to either DBT-A or Enhanced Treatment as Usual (ETAU). After 16 weeks of treatment DBT-A was significantly superior to ETAU in terms of decreases in self-harm behaviour and depression. Several non-randomised studies conducted in the UK and elsewhere also indicate that DBT maybe a promising intervention with adolescents presenting with selfharm behaviour in the context of BPD (Fleischhaker et al. 2011;James et al. 2008James et al. , 2011Katz et al. 2004).
Adults diagnosed with BPD have severe impairments in Health Related Quality of Life (HRQoL) Soeteman et al. 2008;Perseius et al. 2006). In addition to the significant personal burden of a diagnosis of BPD (Heard 2000), clients with BPD are typically "treatment seeking" with associated high utilisation of health services (Bender et al. 2001;Benjamin 1993;Coid 2003;Lieb et al. 2004), leading researchers and policy makers alike to highlight the importance of investigating and implementing clinically and cost-effective treatments for this population (Esposito-Smythers et al. 2011;Soeteman et al. 2008). Establishing cost-effectiveness (specifically cost-utility analysis) requires generic preference based measures that can be used to calculate the cost per additional Quality-adjusted life year (QALY). QALYs are a generic measure of disease burden that includes both the quality and length of life and allow for comparison across conditions to help inform decision makers where best to invest scarce health care resources. NICE (NCCMH 2009) recommend the EuroQol 5 Dimensions (EQ-5D ™ ) as the most appropriate measure for health economic evaluations of new technologies. This standardised and validated self-report measure describes an individual's current health status and can be used to identify changes occurring over time. The construct validity and testretest reliability of the EQ-5D ™ have also been supported (van Asselt et al. 1994).
As the measure is generic rather than condition-specific, the EQ-5D ™ provides a common denominator for different evaluations allowing comparison of new technologies with each other. Such generic measures assess broad levels of functioning in contrast to symptomatic measures that may address a single clinical outcome e.g. self-harm or depression. Whilst individual symptom measures are an important measure of clinical outcome, clinical guidance recommends considering broader measures of functioning and quality of life rather than simply symptomatic improvement (NCCMH 2009). Use of generic measures maybe of particular importance in the case of BPD where clients' problems impinge on a wide range of health domains.
Using the EQ-5D has an additional advantage above addressing functional outcomes more broadly; measures that enable the calculation of QALYs across a number of services will provide data about cost-effectiveness of DBT as delivered in routine clinical practice, and, subject to sufficient variation across programmes, allow for the development of a national benchmarking system. Transferring evidence-based treatments established as efficacious in randomised-controlled clinical trials to routine clinical practice is fraught with difficulty (Damschroder et al. 2009). Routinely measuring outcomes in clinical settings and benchmarking them against published efficacy data provides one method of assessing whether clients are benefiting from the treatments they receive. Once programmes that do not deliver good clinical outcomes can be identified, organisational interventions to address poor outcomes can be developed and implemented.
The primary aim of the pilot study was to evaluate the HRQoL outcomes of adolescents receiving DBT in routine clinical practice. Our secondary aim, given the absence of studies reporting on HRQoL in adolescents, was to compare the findings with published RCT data on adults with BPD and adolescents with other mental health conditions. Finally, we wished to assess the potential of the EQ-5D ™ as a benchmarking measure.

Procedure
All DBT programmes (N = 9) with a subscription to the DBT pseudonymised outcome benchmarking website at www.dbt.uk.net were invited to participate in the study. The pseudonoymised outcome website was developed to assist teams to collect routine outcome data based on the potential value of such data for implementation of DBT programmes (Swales et al. 2012). To reduce the administrative burden on participating programmes and hopefully maximise the success of data collection, the amount of data required was kept to an absolute minimum using a single outcome measure, the EQ-5D ™ . Consistent with keeping the demands on busy programmes low, only pre-and post-treatment data were required for entry on the website. Programmes were asked to assess clients on the EQ-5D ™ at admission to the programme and on discharge, regardless of whether discharge was planned or unplanned. Assessing all entrants to the programme, regardless of whether they complete treatment or not, provides a more conservative test of the effectiveness of a treatment programme. Only including data from treatment completers may overestimate the effectiveness of treatment in routine practice. Pseudonyms were selected according to the gender of the client so number of male and female clients in the sample as a whole is known; otherwise no demographic data is available at the individual client level.
Seven of the subscribed nine teams were working with adolescents. A census of DBT clients in treatment on a specified date (1 August 2011) established that only four of these programmes were using the website to routinely collect data with sufficient accuracy for benchmarking purposes, namely that the website was reporting the correct number of clients in treatment that day. Length of time programmes had used the website for routine data collection varied between 7 (January 2011) and 16 months (April 2009). These four programmes consented for their data to be downloaded from the system on the predefined census date and for their programme data to be included in the multi-site data analysis. All participating programmes had broadly similar admission criteria (ages 14-18; five or more BPD criteria of which one must be the recent occurrence of self-harm behaviour).
All programmes had permission from their information governance officers to upload pseudonymised routine outcomes to the website. Ethical Approval was sought and received from the Ethics Committee of the School of Psychology, Bangor University. The Local NHS Research Ethics Committee was asked for an opinion on whether the study required NHS approval and considered that the study qualified as service evaluation. No patient identified information was submitted to or held by the outcome benchmarking website.

EQ-5D ™
The EQ-5D ™ is a generic outcome measure that asks participants to rate their current general health status on 5 dimensions: mobility, self-care, usual activities, pain or discomfort and anxiety and depression. The participant rates each of these dimensions at one of three different levels (no problems, some problems or extreme problems). The scoring system then classifies the individual into one of 243 possible health states. Each health state provides a summary of the participants rating of their health status on each dimension. For example, the health state '11111' represents perfect health on all 5 dimensions-a state that 56 % of the UK population and 70 % of the Spanish population will report on any one day (Feng et al. 2015;Gaminde and Roset 2001). In contrast, a health status of '33333' indicates the worst possible health status with extreme problems on all 5 dimensions. These scores can be transformed into a utility score that ranges between −0.59 and 1 (with death anchored at 0 and 1 representing perfect health) by applying societal weights to each level that are based on the values of these health states in adult general population samples derived from a choice based method such as time trade-off (TTO) (Szende et al. 2007). Currently there are no weights derived from valuations of health states by children and adolescents. Negative scores indicating health states judged as 'worse than death' are possible. The utility scores can be used to calculate QALYs. The sole adolescent study utilising the EQ-5D ™ provided tentative support for its use in this client group (Byford 2013).

Data collection and entry
Each DBT team had its own protocol for administering and entering the data onto the website. In most cases young people scored the questionnaires themselves. On some occasions clinicians completed the questionnaires on their behalf. Proxy completion of the EQ-5D ™ is a recognised method of data collection. In a recent study of the EQ-5D ™ with children and adolescents from a community study, where parents acted as proxies, high levels of agreement were found between the self-report and proxy versions of the EQ-5D-Y (Gusi et al. 2014). Each DBT team uploaded EQ-5D scores for all patients treated in their DBT programme at admission and discharge. An audit on a random selection of inputs from each team was conducted to ensure that the data had been accurately entered into the website. To conduct the audit one of the research team (LB) telephoned the DBT team and asked them to provide the EQ-5D ™ scores stored on paper in the clinical notes for each client for a set of randomly chosen patients and time points. These scores were checked for accuracy with the data already entered into the website. No inconsistencies between the data entered onto the system and the original paper copies of the data were detected during this audit.

Data analysis strategy
Utility scores at or below zero might be anticipated in clinical samples for which achieving a 'life worth living' (Linehan 1993) remains a daily struggle. In a sample of adolescents in treatment for suicidal and self-harm behaviours a measure that captures health states considered 'worse than death' displays considerable face and criterion validity; these scores were retained. Difference scores were calculated by subtracting EQ-5D ™ utility scores on discharge from those at admission.
In an endeavour to establish whether the change in EQ-5D ™ scores represented a robust change at the individual level, the Reliable Change Index for the EQ-5D was calculated using the Jacobson method (Jacobson and Truax 1991). This formula takes into account the reliability of the measure and variance in measurement to generate a change score in excess of which we can essentially be 95 % certain that the change in score is a real (hence, reliable) change over time. Ideally, in order to exclude sources of systematic error in this calculation, an estimate of test-retest reliability derived from a clinical sample whose clinical characteristics have not changed over a period of time would be preferred. Since clinical samples are, by definition, in treatment and therefore on a change trajectory and the EQ-5D ™ is not yet used widely in mental health settings, such estimates are hard to obtain. Hurst et al. (1997, Table IX), using an intraclass correlation coefficient (ICC) rather than Pearson correlation because of concerns about over-estimating reliability, reported a reliability coefficient for the EQ-5D ™ of 0.78 (0.6-0.96) over 2 weeks in a clinical sample of 31 rheumatoid arthritis sufferers where there was no change in rheumatoid arthritis. More recently Sonntag et al. (2013, Table 4) have tabulated ICCs for n = 106 social phobics at an interval of 6 months and n = 60 at 12 months, anchored by no change on the Liebowitz Social Anxiety Scale which are all also 0.78. In view of the replication of this estimate and albeit limited clinical similarities in the study populations, this value was used in the calculation of the Reliable Change Index in the DBT programmes in this study.

Description of the sample
On the census date, 76 sets of client data were downloaded from four programmes of which 66 had female pseudonyms and 10 male. Of these 76 clients, 43 (38 female and 5 male) had been discharged from their respective programme and 33 were still in treatment. Admission EQ-5D ™ utility scores for the whole sample (n = 76) were 0.236 (SD 0.32, range −0.594 to 0.848). The admission EQ-5D scores of the 43 adolescents who had completed treatment prior to the census date were not significantly different from the 33 young people who were still in treatment. [Mean admission utility score still in treatment (n = 33) = 0.244, mean admission utility score discharged (n = 43) = 0.230, t(75) = 0.19, p = 0.85.] Average length of stay of adolescents consecutively discharged from programmes (n = 43) was 177 days (SD 116, range 23-462).

Comparison of admission and discharge HRQoL scores
Admission and discharge scores for the 43 consecutive discharges in the dataset were compared. Mean admission utility scores were 0.230 (SD 0.345, range −0.590 to 0.883) and mean discharge utility scores were 0.554 (SD 0.376, range −0.008 to 1.000). Fourteen clients reported health states as worse than death at admission and nine at discharge (Table 1). These data were not normally distributed and were tested for significance using the Wilcoxon signed-rank test. Utility scores between admission and discharge were significantly different (z = −4.26, p < 0.001).
Variability between DBT programmes in baseline health status on admission (p < 0.001) and their ability to generate changes in EQ-5D scores between admission and discharge (p < 0.0001) was apparent in the dataset. This was indicative of clustering in the data that would exaggerate the significance levels in analyses if not accounted for. An intra-programme correlation coefficient was calculated from the difference scores (treating DBT programme as a random effect) and used to inflate the width of the estimated confidence interval for mean differences to account for between-programme variability in outcomes.
The high-level of clustering, indicated by an ICC of 0.71, increased the variance of the average difference score by a factor of approximately 12 according to the exact method of calculation for unequal cluster sizes given by Donner, Birkett and Buck (Donner et al. 1981). The average difference in EQ-5D ™ utility scores of 0.32 between admission and discharge failed to attain statistical significance once this adjustment was made (t = 1.56, p = 0.13).
The Reliable Change Index calculation indicated that EQ-5D ™ difference scores of at least 0.45 could be considered reliable in the present study. Seventeen clients in the sample of 43 (40 %) experienced reliable change between admission and discharge. The ability of clients to achieve change on the EQ-5D ™ was constrained by an obvious ceiling effect in that over 25 % in the DBT sample had a utility score >0.55 on admission and so could not achieve a change of >0.45. Ishak et al. (2013) identified only 10 studies of psychotherapeutic interventions for BPD incorporating the use of HRQoL measures, of which only three report the EQ-5D ™ as an outcome measure (McMain et al. (2009) analyse the Euroqol VAS thermometer but do not report the EQ-5D utility scores) and, of these remaining three, only two represent unique datasets (van Asselt et al. Both are studies in adult BPD. There is currently limited, but promising evidence to support the use of the EQ-5D instrument in CAMHS settings as a HRQoL measure (Byford 2013).

Comparison of HRQoL scores with other published data sets
Baseline and discharge EQ-5D ™ scores for this clinical sample of admissions are presented in Table 1, alongside pre-and post-intervention EQ-5D ™ scores from the few available RCTs in mental health for comparison purposes. As the present study is a small-uncontrolled study and primarily of clinical interest in relation to benchmarking outcomes across DBT programmes, effect sizes using Cohen's d are also reported. The EQ-5D ™ scores at treatment commencement are much lower in this study than in the studies of BPD in adult populations (Nadort et al. (2009), t(103) = 2.15, p < 0.05; van Asselt et al. (2008) t(89) = 2.70, p < 0.005), both of which were of Schema-Focussed therapy. The adolescents in this pilot study end treatment at an EQ-5D ™ score average commensurate with the starting EQ-5D ™ scores of adolescents in the RCT for depression (Byford (2013) t(241) = 0.88, p > 0.02, ns). The effect size for this pilot study is higher than the two adult studies (Nadort et al. 2009;van Asselt et al. 2008) but a similar size to the adolescent depression study (Byford 2013).

Discussion
This study is the first to report on HRQoL of adolescents with BPD-type presentations in routine clinical practice. Participants in the DBT programmes in this study had significantly impaired health related quality of life (HRQoL) on admission, scoring significantly lower than data available from a published RCT using the EQ-5D with adult BPD participants ) and from a study reporting on the treatment of adolescents with depression (Byford 2013). These data potentially support the often-expressed view of clinicians that clients in clinical services are more severe than those participating in clinical trials. Indeed, this view has been highlighted as a barrier to the implementation of evidence-based practice (Shafran et al. 2009).
Alternatively the experience of BPD in adolescence compared to adulthood may have a particularly significant effect on HRQoL. Typically adolescents presenting with developing BPD in adolescence have experienced Table 1 Comparison of EQ-5D scores across studies a Includes 9 clients (21 %) admitted in state 11233 whose utility remained unchanged at −0.008 b Relative to baseline variance which is the effect size commonly used in power calculations, Nadort et al. (2009)  high levels of adversity in a context of genetic vulnerability over many years (Sharp and Fonagy 2015). In addition to significant mental health difficulties with high rates of comorbidity (Chanen et al. 2007;Ha et al. 2014), they encounter major problems at school and with familial relationships and friends (Winograd et al. 2008;Zanarini et al. 2006;Crawford et al. 2008) that persist often into adulthood even when some of the more impulsive behaviours may have subsided. In clinical practice the broad ranging impact on almost all aspects of development is striking. Poor HRQoL in this context is perhaps unsurprising.
Despite the severity of impairment demonstrated in the patients treated in the DBT programmes, their HRQoL improved between admission and discharge with 40 % of the sample achieving Reliable Clinical Change. This finding is particularly interesting when considering the fact that patients were discharged for multiple reasonsincluding both planned (e.g. they completed the DBT programme and no longer needed DBT) and unplanned discharges (e.g. the patient chose to stop attending treatment). This study reported on the changes between admission and discharge for all patients who had received DBT regardless of whether they completed the treatment programme or not. This finding also runs counter to the views clinicians express (Shafran et al. 2009) that increased clinical severity might prevent significant and reliable clinical change.
Despite the change in HRQoL scores for patients in the DBT programmes, discharge scores remained considerably lower than the general population norm and comparable to the scores of adolescents commencing treatment in an RCT for depression (Byford 2013). These findings underscore the view, reported in the literature, that individuals with BPD have significant impairment of HRQoL Soeteman et al. 2008;Perseius et al. 2006) even following what is an effective treatment. The results suggest that further research should be directed towards further enhancing clinical outcomes. Typically, adaptations of DBT for adolescents have shortened the programme duration from 1 year (typical in adult services) to 16 weeks (Mehlum et al. 2014;Brazier et al. 2006). The final HRQoL outcomes in this study would argue against this given the low level of functioning of adolescents at treatment end and that adolescents in this study had a longer average treatment length than that described in the literature (Mehlum et al. 2014;Miller et al. 2007). In the shorter forms of DBT-A adolescents entering the programme typically have fewer BPD symptoms (typically 2-3) whereas in the clinical programmes studied here inclusion criteria to the programme required 5 or more BPD criteria which may account for the longer treatment duration. These results suggest that for adolescents at this level of severity longer treatment durations may be necessary.
Not all of the programmes were equally successful in producing good HRQoL outcomes. When this variability was accounted for outcomes were not significantly different at treatment end. The level of clustering in the dataset (ICC of 0.71) is unusually high, beyond the range commonly encountered in naturally occurring biological and disease-related phenomena. Such variability is perhaps not uncommon in pilot data, especially with a team-based treatment delivered within highly variable organisational contexts. Whether the high levels of clustering were driven by differences in programme fidelity or therapist competence or are simply the artefact of small numbers of participants with a condition with highly variable outcomes is unknown. Future studies with more programmes with larger numbers of teams and treated clients will be necessary to tease out this finding. Paradoxically, the differential success of the DBT programmes in producing HRQoL outcomes, with some programmes performing better than others, augurs well for the intended use of the website for future benchmarking between programmes.
Whilst these findings in this pilot study are of interest, aspects of the data collection indicate that they should be interpreted with a significant degree of caution. Firstly, only four out of seven CAMHS teams with a subscription to the website were successfully using the system to collect routine data. The teams that were collecting data may have been especially motivated and thus potentially have been more likely to produce improved outcomes. Resolving problems in routinely collecting outcomes using the system would be essential for future meaningful use of the system to collect national outcome data or to benchmark programmes systematically. Secondly, these data were collected under routine clinical practice conditions and so the research team did not control data collection and entry. Each individual DBT programme operated their own data administration, collection and entry procedures for their own clinical purposes and this may have led to different protocols for data collection. In some teams the proportion of EQ-5Ds completed by clinicians as a proxy, may have been higher and in some cases clinicians recorded the measure retrospectively, particularly in circumstances where patients left the programme prematurely. Both of these practices may have resulted in biases in the data. Secondly, the absence of any additional data on either the participants or other outcomes limits generalizability. Thirdly, the absence of a control group means that change in the sample cannot be attributed to the treatment that they received. The absence of a control group and the high-level of clustering means that these data cannot make any definitive statement about whether DBT is effective in adolescents with BPD treated in routine clinical practice.
In conclusion, although further research is necessary to unpick the findings of this pilot study, the DBT outcome monitoring website has demonstrated its potential to collect data in routine practice and in real-time and thus may be a promising tool to benchmark what is gained and lost following the implementation of DBT in clinical practice settings. We would like to extend out thanks to the DBT teams and their patients who participated in this study. Thanks to Dr. Barbara Baragwanath for tracking down references and proofreading the manuscript.

Competing interests
MS is the Director of the British Isles DBT Training Team that trains practitioners in DBT with a licensed training programme and receives fees for training in DBT. She also receives royalties from books and training products in DBT. MS is married to the managing director and principal shareholder of Integral Business Support Ltd that delivers licensed training in DBT. RABH is the Managing Director and principal shareholder of Integral Business Support Ltd that delivers licensed training in DBT. RPH's spouse is a member of the British Isles DBT Training Team and receives income from DBT training.

Ethical approval
All procedures were in accordance with the ethical standards of the institutional and national research committees and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Funding
The study was funded by a grant to Bangor University from the Knowledge Economy Skills Scholarships (KESS) programme (Bangor University KESS Mini 016), and by Integral Business Support Ltd.

Informed consent
The Ethical Committees to whom the study was submitted did not require informed consent for individuals as the data was pseudonymised and was collected as part of routine clinical data collection.