Thorough planning and implementation of clinical trials is the key to demonstrating a test drug's therapeutic potential and to meeting regulatory requirements for new drug approval. Success of clinical development undoubtedly depends on the quality of the study design and implementation, and pharmaceutical companies choose critical components of design and execution aimed at achieving the highest possible levels of success within the relevant budgetary and technical constraints. In this research, we investigated possible links between components of the study design and the effect size (i.e., the standardized outcome of various clinical trials) and found significant associations between them. We also investigated whether study environment and the prior experience of drug companies were associated with effect size, and some variables yielded statistically significant associations.
Among the variables related to study design, a negative correlation was found between the sample size and the effect size. No consistent correlations have been reported in other studies (Khan et al.2004; Papakostas and Fava2009; Yildiz et al.2011a), and no persuasive explanation was apparent for these associations. One plausible explanation for our result is that in our research targeting only approved drugs, most drug companies successfully achieved p values of approximately the same level (i.e., close to 0.05 or less) in clinical trials, which could lead to an inverse relationship between the effect size and the sample size. This is likely to reflect the statistical equation, T = f (N) * g (ES) (see Methods), but it is difficult in this model to distinguish this spurious association from substantial (i.e., causal) relationship of interest, if any. It is interesting to note that the negative association was observed even when recent increases in sample size were controlled by the time-trend variable “approval year” in Model 1. As Figure 2 shows, the sample size was larger in more recent trials in our dataset (r = 0.49), and similar trends have been observed in trials submitted to the US Food and Drug Administration (Khin et al.2011).
The number of arms, another component of study design, was not associated with effect sizes. It was previously reported that more number of arms resulted in greater effect sizes in placebo-controled anti-depressant trials (Khan et al.2004). In past decades, two-arm confirmatory trials with an active-comparator were common in Japan (Ono et al.2002). Since the introduction of the International Conference on Harmonisation (ICH) E10 guideline (The International Conference on Harmonization2000) and recent clinical evaluation guidelines for each therapeutic field that require a concurrent placebo control group, however, comparative trials with both concurrent positive comparators and a placebo arm, are expected to increase. Our results do not necessarily reflect these changes after the ICH E10, and caution is required when extrapolating them.
The time-trend variable, approval year, has various implications, including changes in patient populations and background therapies, and it seems likely that various mechanisms influenced the observed correlation. Stricter inclusion and exclusion criteria in recent trials to improve homogeneity may yield more focused outcomes (Rief et al.2009). Larger effect sizes in recent trials might also reflect drug companies' general preference to develop more effective drug candidates than existing drugs in response to stricter requests from the regulatory agency and healthcare professionals and to get ahead of strong market competition.
Using an active comparator with the same mode of action of the test drug seemed to yield smaller effect sizes, with statistical purposes controlled. In such trials the test drugs are generally less novel and innovative, thus this result seems logical.
Regarding endpoints, clinical trials using subjective endpoints showed smaller effect sizes than those using objective endpoints, although we cannot deny the possibility that the negative coefficient for CGI-I, a categorical variable, reflects some heterogeneity introduced by the difference in conversion to effect sizes. In this analysis we defined “subjective endpoints” as those evaluated via clinicians' subjective interpretations of patients' responses. A substantial difference between observer-rating scales and self-reporting scales has been reported in some therapeutic fields (Rief et al.2009; Bullens et al.2001). This may make it difficult to detect modest test drug/comparator differences in patients' symptoms.
Primary endpoints predicted higher effect sizes than secondary endpoints. Since our dataset consisted of successfully approved NDAs, endpoints tagged as “primary” were generally expected to be more efficient than secondary endpoints.
The proportion of female subjects was negatively related to effect size. It is reported that women and men can respond differently to drug treatments, particularly psychological agents (Khan et al.2004). Some previous reports have suggested positive associations between the male proportion and study outcomes (Khan et al.2004; Yildiz et al.2011b), which is in line with our observation. It is still difficult to predict, however, that increasing proportions of male subjects may yield better outcomes, because the gender proportion in clinical trials seems confounded by several factors affecting both the gender proportion and clinical outcomes, and it is almost impossible to adjust such potential confounders in retrospective analysis. Although drug companies routinely examine possible differences in efficacy by sex in study reports, they rarely become a focal point of discussion in the publications such as common technical documents for NDAs and review reports.
We were also interested in the possible influences of sponsors (i.e., drug companies), and the environment of clinical development of the drug, on effect sizes. Companies can use their previous data and experience, which vary greatly between companies, when designing and conducting trials. Regarding study environment, a previous analysis has shown that the development lag between Japan and the US was positively associated with the probability of transition from Phase 2 to Phase 3 trial and from Phase 3 trial to registration (Hirai et al.2012). It has been reported that the accumulation of experience also positively affects the success rate of clinical development (Danzon et al.2005). We included two explanatory dummy variables in Model 2 and Model 4, "Precedent foreign clinical trial data" and "Companies’ domestic development experience with similar drugs." The former incorporated the foreign clinical development experience regarding the drug prior to Japanese clinical development. The latter was included to explore possible roles of prior development experience of drugs in the same therapeutic class in Japan. We considered only successful experience (i.e., experience of NDAs approved) in this study as the explanatory variable due to practical difficulties in defining experience of a drug company and obtaining reliable data of unsuccsessful development projects. It should be noted, however, that both successes and failures actually consist of a company's development experience.
The results of Model 2 in Table 2 including both phase 2 and 3 trials show that companies' domestic development experience was positively associated with effect size, but use of foreign clinical trial data was not. In the subgroup analysis aiming at only phase 3 trials (Model 4 in Table 2), however, use of foreign clinical data as well as domestic experience had positive impact on effect size. They suggested that domestic development experience might lead to the accumulation of skills and knowledge within drug companies. Improvement in study design, for example, would be the key to establishing unequivocal results. Another possibility might be ascribed to accumulated experience in domestic clinical trial professionals with whom companies make clinical study contracts. Appropriate planning, design, and conduct of clinical trials largely depends on the skills of such professionals, and their skills could be improved by previous experience. Our results may support the latter possibility in that precedent development experience in foreign countries did not necessarily result in enhanced effect sizes in early exploratory phases. Difficulties in extrapolating foreign clinical data and experience to the Japanese environment, probably due to the differences of intrinsic and extrinsic factors such as medical practice and therapeutic approach, might also confound the situation. In phase 3 confirmatory trials, drug companies can make the most of all the preceding evidence in both Japan and other countries. Even in cases where ethnic differences have substantial impact on clinical development, companies at this last stage of development might be able to cope with such differences, optimizing planning, design, and conduct of Japanese phase 3 trials, and successfully achieve larger effect sizes based on previous domestic and foreign experience.
Our study suggests that we need to be cautious about trial design features and also drug companies’ experience when comparing the results of clinical trials. There were examples in which those features could have played some role in explaining differences in trial outcomes. Re-evaluation of a drug class termed “cerebral circulation and metabolism improver” in Japan during the 1990s was a historical case. Four out of five drugs in this class were withdrawn in 1998 because they failed to establish superiority to placebo in clinical trials for the re-evaluation ordered by the Ministry of Health and Welfare (MHW) (Hayashi et al.1998; The Ministry of Health and Welfare.1998). All of them had been approved based on equivalence studies in comparison with calcium hopantenate. The MHW justified its initial decisions of approval in the late 1980s and ascribed the results to changes in the healthcare environment such as advances in early diagnosis, surgical procedures, basic care, and rehabilitation. In addition to those healthcare environment changes, differences in study design between trials for initial approval and those for re-evaluation were noted.
Limitations
Several limitations in our study need to be considered. First, our research focused on trials that were conducted in Japan and submitted for Japanese NDAs. We need to be cautious in extrapolating our current results to different regions. The homogeneity of the study populations of the trials has advantages, however, because it enables us to exclude possible confounders related to heterogeneity of race and ethnicity, while maintaining sufficient variety with regard to study design and developmental phase. Second, clinical trials submitted for NDAs are somewhat restricted in design and quality, compared to trials that are not conducted specifically for NDAs. Third, it should also be noted that the trials included in this study mostly constituted successful research, in that they were chosen on the basis of approval decisions. Fourth, the number of placebo-controlled trials investigating drugs for psychological disorders was small in our dataset because only a small number of trials of that kind were conducted in Japan during the observation period. Fifth, although we know agreements between the regulatory agency and the pharmaceutical companies could have significant impact on study design and outcomes, we were not able to collect data on meetings and agreements due to difficulties in accessing in-house development histories.