Inter‐rater reliability of ultrasound measurements of acromion–greater tuberosity distance between experienced and novice raters in healthy people

Acromion-greater tuberosity (AGT) distance is used to assess shoulder pathologies including rotator cuff tears. The aim of this pilot study was, to provide short course of training and assess inter-rater reliability of ultrasonographic measurements of AGT distance between experienced and novice raters (physiotherapy students) in healthy individuals prior to testing on patient populations. Eleven healthy individuals with a mean age 54 years (SD 5) who gave informed written consent were recruited. Three ﬁnal year physiotherapy students acted as novice raters and an experience rater (physiotherapist) recorded the ultrasound measurements of AGT distance on both shoulders using a standardised protocol. ICC values for inter-rater reliability between experienced rater and novice raters 1, 2 and 3 for the left shoulder were 0.83 (95% CI 0.63-0.94), 0.87 (95% CI 0.73-0.96) and 0.61 (95% CI 0.37-0.85) respectively. Corresponding values for the right shoulder were 0.85 (95% CI 0.70- 0.95), 0.75 (95% CI 0.55-0.91) and 0.62 (95% CI 0.38-0.85). The standard error of measurement (SEM) and minimum detectable change (MDC) were ≤0.2cm and ≤0.3 cm respectively for all the raters. Evidence from this study provides further support for the potential usefulness of ultrasound in both research and clinical practice for the management of shoulder pathologies.


| INTRODUCTION
A reduction in the acromion-humeral distance (AHD) is used as a predictive marker in identifying rotator cuff tears (Cholewinski, Kusz, Wojciechowski, Cielinski, & Zoladz, 2008). By contrast, an increase in AHD is suggestive of inferior shoulder subluxation following stroke, which is a common secondary musculoskeletal problem reported in up to 81% of patients (Ada & Foongchomcheay, 2002). Several studies have reported the usefulness of diagnostic ultrasound in the measurement of AHD, which is defined as the nearest distance between the lateral margin of the acromion and the head of the humerus (Azzoni, Cabitza, & Parrini, 2004;Desmeules, Minville, Riederer, Côté, & Frémont, 2004). Recently, the acromion-greater tuberosity (AGT) distance (measured between the acromion process of the scapula and the greater tuberosity of the humerus) has also been used in the diagnoses of rotator cuff tears (Cholewinski et al., 2008) and shoulder subluxation in post-stroke hemiplegia (Kumar, Bradley, Gray, & Swinkels, 2011a;Park, Kim, Sohn, Shin, & Lee, 2007).
Similarly, several other studies have reported the reliability of ultrasonographic measurements of AGT distance both in healthy participants (Kumar, Bradley, & Swinkels, 2010) and in patients with post-stroke hemiplegia (Kumar et al., 2011a;Park et al., 2007). Kumar et al. (2010) reported excellent within-day (ICC 0.98-0.99) and between-day (ICC 0.96-0.97) intrarater reliability in healthy participants when assessed by an experienced rater (a physiotherapist).
Similarly, another study reported excellent intrarater (ICC 0.84-0.91) and good inter-rater (ICC 0.79) reliability when assessed by three novice raters (physiotherapy students) (Kumar et al., 2011b). High reliability coefficients reported from these two studies suggest that ultrasonographic measurements are reliable when measured by novice (Kumar et al., 2011b) and experienced (Kumar et al., 2010) raters.
To the best of the authors' knowledge, no previous studies have investigated inter-rater reliability between experienced and novice raters for AGT measurements. The aim of the present pilot study was to provide a short course of training and assess the inter-rater reliability of ultrasonographic measurements of AGT distance between experienced and novice raters (physiotherapy students) in healthy individuals prior to testing on patient populations. The results of this study should inform future research studies and the clinical application of ultrasonographic measurements in the ongoing assessment of specific shoulder-related pathologies.
the West of England, Bristol, and each participant gave informed written consent to take part.

| Procedure
Baseline demographic data related to arm dominance (as stated by participants), gender, age and history of previous shoulder injury were recorded. Each participant was asked to perform a few simple arm movements to establish that the range of movement of both shoulders was equal, pain free and within normal parameters (Petty & Moore, 2004).
Ultrasound measurements of AGT distance were undertaken by a physiotherapist (P.K.), who acted as the experienced rater. The experienced rater had been involved in previous reliability studies. Three physiotherapy students acted as the novice raters and underwent shoulder ultrasound training specific to the present study. Each novice rater received 1 h of formal training on the ultrasound technique by the experienced physiotherapist (P.K.) and then practised on one another, unsupervised, for an additional hour, to become familiar with the protocol and measurement procedure.
A portable diagnostic ultrasound, (TITAN model, M-Mode, Depth 3.9, L38/10-5 MHz broadband 38 mm linear array transducer, Sonosite Limited, Hitchin, UK) was used for scanning the shoulder and for recording the AGT distance. The equipment was tested and calibrated according to the manufacturer's guidelines prior to commencement of the data collection process. The precision of the linear measures based on manufacturer specifications was ±2%.
A standardized position was used for ultrasound scanning and for recording the AGT measurements. Participants were scanned while seated upright on an armless chair with their hips and knees flexed to 90°and feet resting flat on the ground. After adoption of standardized position, three ultrasound images of the first shoulder were obtained and AGT distance was measured on each frozen image by the first rater. AGT distance was defined as the relative distance between the lateral edge of the acromion process of the scapula and the nearest margin of the superior part of the greater tuberosity of the humerus. This was repeated on the opposite shoulder. The participants were then encouraged to move out of the standardized position. The same procedure was then repeated by the other three raters, who ensured that participants were in the standardized position for ultrasound imaging. Therefore, a total of three measurements per shoulder were taken by each rater. All raters were given a number and the order of measurements recorded was randomized. All four raters were blind to their own measurements (values were obscured by placing a sticker on the ultrasound screen) and to each other's measurements.

| Data analysis
Data were analysed using the Statistical Package for Social Sciences (SPSS v23.0; IBM UK, Business Analytics, Middlesex, UK). Descriptive statistics were used to calculate the mean and standard deviation (SD) of the three AGT distance measurements for both shoulders undertaken by each rater.
To assess the inter-rater reliability of ultrasonographic measurements of AGT distance, ICC of 3.3 with 95% confidence intervals were used. For calculation purposes, the mean of three measurements recorded by the experienced rater were compared with the mean of three measurements recorded by the three novice raters for both the right and left shoulders. Reliability was considered excellent if the ICC value was ≥0.75, fair to good if the value was 0.40-0.74 and poor if the value was ≤0.39 (Shrout & Fleiss, 1979).
The Standard Error of Measurement (SEM) was used to define 95% confidence limits around individual measurements. The minimum detectable change (MDC) was used to quantify the magnitude of change that was not likely to be a result of measurement error (Haley & Fragala-Pinkham, 2006). For MDC, a confidence interval of 90% (MDC90) is commonly recommended (Kolber, Saltzman, Beekhuizen, & Cheng, 2009).
Repeated measures analysis of variance (ANOVA) was used to analyse the variability of repeated ultrasonographic measurements of AGT distance on each shoulder between raters. Significance was set at the p = 0.05 level and post-hoc testing was performed using pairwise Bonferroni correction, with significance set at p = 0.08.

| RESULTS
Eleven healthy individuals (nine female, two male) with a mean age of 54 years (SD 5) were recruited into the study. All participants were right-hand dominant. A summary of descriptive data for AGT distance measurements for all four raters is provided in Table 1. The ICC, standard error of measurement and MDC90 for inter-rater reliability for both right and left shoulders are presented in Table 2.
Repeated measures ANOVA indicated significant differences between AGT distance measurements when comparing novice raters with the experienced rater for the left shoulders (F 3, 30 = 10.147; p ≤ 0.001) and right shoulders (F 3, 30 = 3.394; p = 0.031). Post-hoc analysis with pairwise Bonferroni correction showed a significant difference between novice rater 3 (p < 0.01) and the experienced rater but not between the latter and the other two raters.

| DISCUSSION
The primary aim of the present study was to assess the inter-rater reliability of ultrasonographic measurements of AGT distance between experienced and novice raters in healthy individuals. One experienced physiotherapist and three physiotherapy students acted as raters and recorded AGT distance measurements using portable ultrasound equipment. The study found good (ICC 0.61) to excellent (0.87) interrater reliability.
The findings were in agreement with previous studies in healthy individuals (Kumar et al., 2011a(Kumar et al., , 2011b and in patients with shoulder-related problems such as SIS (Cholewinski et al., 2008). Kumar et al. (2011a) tested inter-rater within-day and between-day reliability, and reported excellent between-day intrarater reliability (ICC 0.97-0.98) for AGT measurements in older healthy adults (mean age 64 years [SD 11]). In that study, however, only one rater, an experienced physiotherapist with modest training in shoulder ultrasound, was involved with the recording of measurements.
Another study, involving three physiotherapy students, reported excellent inter-rater reliability (ICC 0.79) of AGT measurements for the right shoulder in a relatively young age group (mean age 21 years [SD 2]) (Kumar et al., 2011b). However, the reliability was not compared with that of an experienced rater. By contrast, the current study involved three novice raters and assessed the inter-rater reliability of the AGT distance measurements by comparison with an experienced rater. To the authors' knowledge, this is the first report of the interrater reliability of AGT distance measurements taken by novice raters and compared with measurements taken by an experienced rater, using portable ultrasound equipment in healthy people. Good reliability of measurements suggests that novice raters with limited training in ultrasound are capable of undertaking reliable ultrasonographic measurements of AGT distance. These results with relatively inexperienced raters are encouraging, suggesting that this technique can be easily learned by clinical physiotherapists with no previous experience in ultrasound. This is because physiotherapists are generally considered to have a good basic knowledge of anatomy and therefore are able to produce reliable results with minimal training. This should be tested in a future study in a patient population.
In conjunction with ICC, this study used SEM, which provides an estimation of how repeated measures on a person are most likely to be distributed around the "true" value (Wyrwich, 2004). On successive testing, there is a 95% probability that repeated measurements on an individual would fall within a mean of ±2 (SEM) cm (Keating & Matyas, 1998). The standard error of measurement for both shoulders, across all raters was ≤0.2 cm, which indicates that, for between-rater measurements, there is a 95% probability that the true measurement would lie within 0.2 cm of the obtained value. These findings were in agreement with those of previous studies which reported a low SEM (≤0.15 cm) when ultrasound measurements were undertaken by experienced (Kumar et al., 2010) and novice (Kumar et al., 2011b) raters.
In the present study, inter-rater reliability was good (ICC 0.61-0.62) for rater 3 but excellent (ICC 0.75-0.87) for the other two raters when compared with the experienced rater. The low reliability coefficients noted for rater 3 could have been due to individual variation in the identification of the bony point for measurement purposes.
The mean AGT measurements recorded by rater 3 were generally on the low side when compared with those of the other three raters, suggesting that rater 3 potentially selected different points for measurement purposes. For the purpose of standardization, it is critical that all raters measure the AGT distance using the same bony reference points.   (Keating and Matyas, 1998).