Proefschrift

2 46 CHAPTER 2 2.3.3.4. Predictive validity of change scores Beggs and Grace (2011) recommended that research into the predictive validity of change scores should ideally include statistically controlling for static and initial, pretreatment, dynamic risk score to rule out the effect of preexisting risk levels. In other words, controlling for the fact that it is more likely to find (greater) positive changes in individuals with a high-risk (HR) of reoffending and (greater) negative changes in individuals with low-risk (LR) of reoffending (regression toward the mean). For that reason, we included two analyses of change scores, one with regular change scores and one with change scores controlled for static and initial dynamic risk scores. For the evaluation of the predictive validity of change scores (Research Question 3), 12 papers were eligible for inclusion. Six of these papers consisted of studies with overlapping samples. These overlapping samples were selected as described previously, resulting in the final inclusion of nine unique samples in this component of the metaanalysis (see Appendices B3 and B4, available in the online supplemental materials). The studies on which these nine samples were based were conducted in Canada (5), New Zealand (1), the United Kingdom (3), and were published between 2007 and 2014 (mode 2013, median 2012). All of these papers were published in English. The effect sizes of these studies were taken directly from the publications for six samples. For the remaining three samples, effect sizes were obtained from the authors, who either conducted additional analyses or provided us with the raw data, allowing us to compute effect sizes. Seven samples included men with a history of sexual offenses who had offended against children, as well as men who had offended against adults. Two samples included only men who had offended against children. In seven of the samples, all men with a history of sexual offenses received some form of treatment; one sample included only untreated men; one sample included both treated and untreated men. 2.3.4 INTER-RATER RELIABILITY The first and third author independently examined all 148 studies for eligibility and agreed on 98.6% of them (146 of the 148 studies). After a discussion, the other two studies were included in consensus. The first and second author independently examined 11 randomly chosen studies out of the 52 included studies to assess the reported effect sizes most suitable for inclusion in the three components of the meta-analysis. While the included studies presented a number of effect sizes and reported these in various ways (e.g., Cohen’s d, AUC, etc.) the authors coded both the preferred unit of measurement for the meta-analysis and the effect size itself. Rater one coded 25, and rater two coded 29 effect sizes. There was a perfect agreement on the preferred unit of measurement (100%). There was agreement on the choice of 42 of the 54 actual effect sizes (77.8%). Most differences involved simple mistakes and omissions.

RkJQdWJsaXNoZXIy MjY0ODMw