Evaluating outreach activities: overcoming challenges through a realist ‘small steps’ approach

ABSTRACT Practitioners are being placed under increasing pressure to evaluate the success of their outreach activities, both by government and by their own universities. Based in a reductionist doctrine of ‘evidence-based practice’, there is a desire to demonstrate the effectiveness and value-for-money across activities that now account for around £175 million per year across England. This article examines some of the difficulties in evaluating the complex social world of outreach and suggests a ‘small steps’ approach to overcome some of these. This uses the idea of a transformative ‘theory of change’ as a framework for understanding the particular contribution made by discrete activities within a wider portfolio, providing a more reliable form of inference than attempts to ‘prove’ impact over longer timeframes.


Introduction
For some time, there has been concern about the effectiveness of outreach activities (e.g. summer schools, university taster days and mentoring schemes) that are designed to encourage disadvantaged individuals to apply to higher education. This is perhaps inevitable for any high-profile and high-cost social policy, especially during a period of austerity; the latest figures show outreach activities totalling nearly £175 million in England (OFFA 2016). The recent national strategy document lays out a clear expectation: It is essential to understand which approaches and activities have the greatest impact, and why. An improved evidence base, and a robust approach to evaluation, are critical in helping the sector and partners to understand which of their activities are most effective and have the greatest impact on access, student success and progression, so enabling effort to be focused on these areas. (BIS 2014, 9) With slightly softer rhetoric, the recent OFFA strategic plan makes a similar point, signalling its intent to use … an evidence-based approach to more actively challenge and engage with universities and colleges to make sustained and faster progress towards their targets across the student lifecycle. (OFFA 2015, 12) This fits within a wider doctrine of 'evidence-based practice' in education, sometimes colloquially known as 'what works'. The idea is simple: to focus resources on activities that have strong evidence for effectiveness. The reality is significantly more problematic, as generating unequivocal evidence in complex social fields is notoriously difficult (Donaldson, Christie, and Mark 2009;Pawson 2013;Lingenfelter 2016). Outreach is clearly such a field, with its long timescales, diverse settings and myriad influences.
This complexity is exacerbated without a clear definition of 'effectiveness' in place. Current government policy aims 'to double the proportion of people from disadvantaged backgrounds entering university in 2020 compared to 2009' (BIS 2016, 54), additionally focusing on increasing the proportion entering elite universities. Conversely, individual universities are duty bound to direct their outreach activities towards meeting the requirements of the Access Agreements that they negotiate with OFFA. These are generally couched in competitive terms of meeting recruitment targets of disadvantaged students to that specific university (McCaig 2015;Rainford 2016). Senior university managers are inevitably keen to avoid censure (or worse) from OFFA and ensure that their admissions remain buoyant, alongside wider social justice motivations.
The tension here is obvious: a university can meet its targets (and ostensibly be effective) for recruiting disadvantaged students without impacting at all on the national targets if it is simply capturing a greater share of the existing applicant pool; a 'zero sum game' where outreach is conflated with recruitment and universities seek easy wins, leading to few additional students being encouraged into higher education. This 'confusion of successes' is an added challenge to practitioners: is it success against institutional or national targets that matterseffectiveness for one university or for society in general?
Outreach activities are often focused on changing attitudes to higher education by, for example, making it appear desirable, achievable or 'normal'. However, other changes may also occur in terms of knowledge, behaviours or social relations. This article therefore focuses on evaluating whether and how activities lead to change. This change could be the explicit intention to enter higher education or, more likely, an intermediate state such as increased motivation at school, having a clear career goal or developing more self-confidence. 'Effectiveness' is used hereafter in an informal sense of judging the amount of change which can be ascribed to an activity. In other words, it is an assessment of what did happen with respect to participants relative to what would have happened otherwiseoften described as the 'counterfactual' situation. Similarly, 'causality' is used informally to indicate the certainty that an activity is directly responsible for this change.
These complexities underline the need for a considered and critical approach to evaluation that generates credible claims to knowledge. This article is aimed at those expected to generate or assess such claims, including researchers, practitioners, university managers and experienced evaluators applying their expertise to outreach activities. It draws on the findings of the recent Assessing Impact and Measuring Success project led by the authors (see Harrison and Waller, forthcoming). Inter alia, this study found that 32% of university outreach managers had concerns about the quality of evidence available to them, while 91% were seeking to improve their evaluation processes.
This article is methodologically agnostic, concerning itself instead with broad principles that can usefully underpin all forms of evaluation. It might appropriately be positioned within the 'realist' tradition developed and advocated by Pawson (2006Pawson ( , 2013 which engages with the intricate realities of how human choices are made within complex social fields. In particular, it looks at how we might better understand the impact of outreach in terms of transformational changes that are reflected in the choices made by young people who have been subject to deeply ingrained educational inequalities. It also obliquely questions whether an evaluative focus on institutionally driven ideas of success is actually a distraction from the wider issues of social justice that outreach is intended to address.

Dominant approaches to outreach evaluation
At the time of writing, two approaches to evaluating outreach work are attaining a form of dominance in the field, yet both have shortcomings: . The 'tracking' approach. This has widespread support among current practitioner-managers (Harrison and Waller, forthcoming) and is generally based on collecting data on individuals over time with respect to (a) their involvement in activities, (b) their changing attitudes and choices and (c) school outcomes including qualifications. These data are then used to explore the effectiveness of individual activities or a whole programme by identifying how attitudes and behaviours shift in step with outreach activities. . The 'trials' approach. This is emerging as a 'borrowing' from medicine and seeks to use techniques like randomised controlled trials to isolate a direct causal effect of activities. This has historically strong support in the US, albeit that there are growing critical voices about its claims (e.g. Bickman and Reich 2009;Lingenfelter 2016;Scriven 2016). It is not currently widely used in the UK, although some practitioners believe it offers something of a 'gold standard' (Harrison and Waller, forthcoming).

Five challenges for evaluating outreach
This section briefly explores five key epistemological challenges that are inherent within evaluations of outreach work and which any successful approach needs to consider, mitigate as far as possible and preferably overcome. They are not intended to be exhaustive, but rather a starting point for critiquing any proposed approach, including the tracking and trials approaches outlined above; indeed, the former are likely to be susceptible to 1 and 3, while the latter are more likely to be challenged by 2, 4 and 5.

Selection and self-selection biases
A longstanding tenet of outreach is to target activities at individuals within identified disadvantaged groups who are felt likely to benefit from them (e.g. Department for Education and Employment 2000; BIS 2014). This is clearly appropriate in seeking to overcome structural educational inequalities by providing more support to those most in need, but it is heightened further where resources are constrained. From the evaluation perspective, this creates a strong selection bias within any data collected. The participants are not representative of the school or area from which they are drawn, but form a rarefied subgroup that have been selected for a particular purposei.e. because they are deemed to be potentially 'in the market' for higher education. This is further complicated where targeted young people, their families or their schools are able to absent themselves from the activity, either through a choice to opt-out (e.g. a refusal to participate) or a failure to opt-inwhether active (e.g. not completing a form) or passive (e.g. not being aware of the activity). If an activity requires an opt-in or where there are significant numbers of opt-outs, then self-selection bias is layered on top of the selection bias outlined above.
Those families already positively predisposed towards education are likely to disproportionately take up opportunities compared to those 'hard-toreach' families who might benefit more but who may be less likely to participate due to various forms of exclusion (Boag-Munroe and Evangelou 2012).
Evaluations which seek to compare the (self-)selected group with an unselected group as a counterfactual analysis are therefore likely to be fallacious and will usually over-estimate effectiveness as the two groups are likely to have different demographic profiles and pre-activity attitudes towards education.

Priming and social desirability effects
The challenge with evaluating activities designed to shift attitudes is that young people very quickly become attuned to the idea that there are a 'correct' collection of attitudes to express to practitioners, teachers and parents. This is a form of social desirability bias; a well-attested phenomenon in social research whereby the participant will reproduce what they understand to be the required responses in order to please, impress or reassure the researcher. This potentially compromises both the reliability and validity of the evaluation data collected from young people about their educational experiences (Bowman and Seifert 2011) Similarly, if a young person has been engaged in an activity that is designed to impart certain knowledge about higher education, it is likely that they will reflect this back to evaluators and others in the short term, especially if they are also asked about what they will do in the future. In the context of outreach, a taster day is very likely to generate short-term results that suggest an increased likelihood of attending university as this has been the purpose of the day and the events are fresh in the young person's mind. Unless it is effectively internalised or regularly reinforced, this priming will fade over time as the information and experiences fall out of memory. As a result, the effectiveness of activities are likely to appear greater the closer in time the data are collected.

Deadweight and leakage
The linked phenomena of leakage and deadweight are relevant to any social policy which is predicated on targeting certain individuals, including participation in higher education (Harrison 2012;Harrison and Waller, forthcoming).
Leakage occurs when the targeting method fails and relatively advantaged individuals are erroneously included within the target group. Aside from the wastage of resources, this is challenge for evaluation as it will tend to cause an over-estimation of an activity's effectiveness by capturing individuals who were always likely to apply to university. This might occur, for example, where relative advantaged children in a school serving a disadvantaged area are included in general classroom-based activities.
Deadweight is a more complex idea. It relates to the targeting of individuals who meet the relevant criteria of disadvantage, but who would have followed the desired path without the activity; in other words, a disadvantaged young person who is already on the pathway to higher education without the need for outreach activities, even if they themselves are not aware of it at the time. Clearly, this is very difficult to assess from the practitioner's perspective, as it involves engaging with future decisions yet to be made by a young person who cannot know at, say, the age of 13 what their intentions will turn out to be at the age of 17.
In this instance, changes in reported intention can be erroneously assigned to activities that happen to coincide, without there being a causal relationship. In particular, this is a risk when evaluating lengthy programmes of activities that occur over several years, which can appear very effective simply by selecting high-achieving, but disadvantaged, young people who would almost certainly have progressed anyway (Croll and Attwood 2013). The inability to predict future choices makes the construction of a viable comparison group problematic; indeed, improving this prediction would be a useful goal for future research.

Complexity and bounded rationality
Social fields like participation in higher education are inherently complex. It is, however, easy to slip into a reductionist mindset of viewing outreach activities as quasi-scientific interventions, where a specific stimulus leads inexorably to a measurable result (Doyle and Griffin, 2012;Pawson 2013). Within this mindset, the objective of the practitioner is to devise the 'right' portfolio of activities and the role of the evaluator is to confirm 'what works' by demonstrating unequivocal causal changes.
The reality is very different. The lives of all young people are 'messy' as they are buffeted by myriad experiences and influencessome planned, but many accidental. The beliefs and expectations of their families, schools and communities will shape their own attitudes and ambitions. The intersection of their gender and ethnicity will also play a role, as will other social factors like disability or sexuality. All of these elements are then mediated through the prism of personalityitself mutable in the process of becoming an adult.
Furthermore, Simon's (e.g. 1979Simon's (e.g. , 1997 seminal work on bounded rationality compelling asserts the limits of human decision-making. This is not to say that young people are inherently irrational, but that their decisions about higher education will be dictated by the information available to them, their own subjective priorities, the time available and judgements about likelihoods and risk. Humans also tend to make intuitive decisions that are grounded in emotions and a range of unconscious heuristics (Kahneman 2003).
Given this complexity of environment and decisionmaking, the idea of an outreach activity having a predictable causal outcome on a young person's decisions appears thoroughly misguided. Activities will affect certain groups or individuals more than othersindeed, they may be actively negative for some. Similarly, the impact may be positive from the perspective of the practitioner in one element (e.g. raising motivation for schoolwork), but negative in another (e.g. making apprenticeships seem more attractive than university). The same activity run twice with different individuals or in different places may well have markedly different outcomes.
This complexity means that effectiveness of activities will never be static or predictable. An activity can only be judged to have been successful at one time and in one contextand probably only with some of the participants (Pawson 2013). This must limit the inferences that can be drawn about effectiveness and the life expectancy of those inferences. It also makes conclusions about certain types of activity in the abstract particularly problematicfor example, a claim that 'summer schools are effective'especially as every university will provide their own 'flavour' of the activity with different staff and resources (Hoare and Mann 2012).

Confounding factors and non-linearity
From its inception, outreach has generally been conceived as a process rather than as a single event in time. It is assumed, probably rightly, that shifting the knowledge, beliefs, attitudes and behaviours of young people takes concerted effort over a series of encountersespecially where there are ingrained expectations from their families, their schools or their communities acting to prevent that change (Gorard et al. 2006). In its most extreme incarnations, it is a 10-year process spanning mid primary through to late secondary schooling.
In a related point to the previous one, one temptation may be to seek to evaluate changes in young people over this time period as if the efforts of practitioners are the only influence when, in reality, there are many confounding factors at work. In particular, the impact of the school and its teachers, where young people spend far more time than in outreach activities, are very likely to effect changes to knowledge about and attitudes towards higher education (Winterton and Irwin 2012; Fuller 2014). Within a long-term, but punctuated, programme of activities, there is a risk of erroneously ascribing changes to those activities rather than what might occur in between: is it the activities offered that are effective or the day-to-day influence of teachers? It may even be the ongoing partnership between a university and a school which impacts on the knowledge, expectations and ethos embodied in the latter, rather than any direct effect of activities.
Within a structured and long-term series of activities with a young person, there is also the risk of assuming that there is a linear and positive cumulative effect over timei.e. that each activity goes a little way further to tipping them towards higher education. This is likely to be fallacious. As noted earlier, some activities may have negative effects from the perspective of higher education (e.g. by suggesting alternative routes) or may only have an effect months later when reflected upon, perhaps in conjunction with other experiences. Alternatively, two activities might only prove effective when offered several months apart, providing mutual reinforcement, with neither being effective in isolation. This non-linearity makes conclusions about causality and effectiveness problematic.

Realist evaluation
As a springboard, this article uses Pawson's (2006Pawson's ( , 2013 idea of 'realist' evaluation. This approach places the individual's choices at the heart of the evaluation, 'recognising that the fate of social policy lies in the real choices of choice makers and [evaluation's] task is to explain the distribution and consequences of those choices' (Pawson 2013, 71). Human choice is seen as the driving force for changes in behaviour, so the purpose of an activity is to provide circumstances where changes can take place. The idea of direct causality between activity and change is dismissed as simplistic in a complex social field with multiple confounding influences. In particular, it rejects 'medicalised' approaches to evaluation that derive from a basic stimulus-effect model of human behaviour. Pawson (2006, 25) emphasises the 'messiness' of social fields and argues that the only appropriate question is 'What works for whom and in what circumstances?', rather than seeking authoritative statements about effectiveness that are decontextualised from people, setting or time: 'the ludicrous idea that evaluators and researchers are able to tell policy-makers and practitioners exactly what works in the world of policy interventions' (Pawson 2006, 170).
A key idea of realist evaluation is that a planned activity within a social field is the embodiment of a 'theory of change'it represents some conception of how an individual might be 'moved' from one state to another. This might be a deliberate process, based in the expertise of the practitioner or social theory, or a tacit one based on beliefs, prior experiences or borrowing from elsewhere. Pawson sees this transformational theory of change as being the focus of evaluation rather than the outcomes of the activity, with the purpose of evaluation being to interrogate and hone this theory. This approach embraces the inherent complexity of fields like outreach and the bounds on human rationality, with a desire to understand the complex web of factors as work and how to influence them, rather than seeking to ignore or eliminate them in pursuit of simplistic causal relationships and dubious measures of effectiveness. Realist evaluation rejects this as likely to create misleading results with overconfident conclusions, while remaining silent on how to improve practice: There is […] no concealing the reality that the same intervention can trigger change in myriad ways, and no way of camouflaging the truth that the different contexts in which programmes are implemented are as wide as society is wide. (Pawson 2013, 29-30)

A 'small steps' approach
This section outlines a potential alternative approach, broadly within the realist tradition, to both conceiving and evaluating outreach activity based on 'small steps'. This is intended to signal a partial rejection both of long-term tracking (although this may have value for understanding key junctures at which change occurs) and of unwieldy and over-engineered trials (although they may have some value in evaluating short-term activities). It also denotes a conceptualisation of participation in higher education as a process with many intermediate steps which young people take and which evaluators must heed.
It attempts to provide a means of addressing issues of effectiveness while overcoming some of the challenges outlined above. It is methodologically neutral, in that it is compatible with a range of data collection methodsboth quantitative and qualitativewhich need to be designed around the intervention, the participant group and the practitioners involved. Rather, we suggest five principles to guide how evaluation is conceived and undertaken, relating to theories of change, measurement, causality, timescales and disadvantage: . Articulation of a clear theory of change. Outreach activities are, at their heart, about causing change within individuals. If practitioners expect to cause change, then they need to have a clear articulation of the mechanisms by which they expect this to occur at the individual levela theory of change. As well as attending to outcomes, which is an obvious concern of evaluation, the starting point of the individual needs to be recognised alongside a deep engagement with psychological, sociological and psychosocial processes; indeed, the first and last of these have been somewhat neglected within theorisations of participation in general. For example, practitioners wanting to 'raise aspirations' need to be clear what an aspiration is, how it is formed and how it is crystallised in reference to others. This clarity then provides a framework for evaluation which focuses on individual and group processes in sequence (i.e. a logic chain). The theory of change can then be evaluated in terms of its effectiveness in describing processes and predicting outcomes, to be further honed through reflective practice and empirical research. . Criticality about causality. The complexity and nonlinearity outlined above is problematic for drawing strong conclusions about an activity and its causal effects on individuals. The trial approach attempts to resolve this by focusing on the outcomes of participants relative to a (preferably randomised) group of non-participants. If well-executed, this can provide some evidence as to whether an intervention is effective (self-selection, priming and confounding issues notwithstanding), but not the more important question of why it is effective. This sort of evaluation risks reducing activities to a form of 'magic box' where nothing is known about the processes within it. Indeed, it may fail to identify if the effective element is incidental to the activity rather than integral to itfor example, the personal relationships developed alongside the activity. Instead, we advocate evaluating the success of activities in terms of these intermediate processesthat is, the logic chain within the theory of change. In general, the research community knows surprisingly little about the effects of interventions on educational disadvantage (Gorard and See 2013). Instead, much is assumed by practitioners and it is in these areas of small change that evaluations focusing on causality might best be used. . Criticality about measurability. While some of the measures used to understand widening participation are broadly reliable and valid (e.g. examination results or the submission of a university application), many are more subjective and readily contestable. In particular, evaluations often rely on easily collected self-reports of attitudes and future intentions from young people (or teachers and parents)measuring the measurable. Validity here is very uncertain, especially given priming and social desirability effects. In order to ensure evaluation through measures with strong reliability and internal validity, we suggest eschewing attitudinal measures in favour of those based on knowledge or behaviourse.g. asking about the number of university websites visited rather than a possible future intention to apply. Future research may be able to reveal which of such measures are strongly correlated with future behaviours and can therefore be used as a proxy. Greater use of pre/post and quasi-experimental designs are also likely to support a more robust approach to the identification of changes. . Using appropriate timescales. There is a tension between evaluating individual activities over a short time period and evaluating whole programmes over very long periods, potentially measured in years. While the desire for the latter is understandable, we suggest that it is probably unattainable due to complexity and the dominance of the confounding factors in young people's lives, as well as the difficulty in undertaking a counterfactual analysis. Where there certainly is value is in tracking young people's attitudes at regular points in time, with appropriate distance from major activities to mitigate social desirability and priming effects. Instead, we suggest that evaluative efforts are focused on individual activities. If a robust theory of change for each activity is evidenced and there is an overarching theory of change for the integrated programme, then there is unlikely to be a need to evaluate the programme as a wholeand efforts to do so are likely to be vexed for the reasons discussed above. It is more important to have confidence in each intervention in its own terms, relative to its theory of change, especially in universities which employ a 'pick and mix' approach where young people receive a varying portfolio of activities built around their unique needs. . A focus on educational disadvantage. Evidence is building that differences in participation rates between socioeconomic groups results from the accumulated educational disadvantages faced by some young people, rather than being an issue around aspirations in the late-teenage years (e.g. Crawford 2014; Whitty, Hayton, and Tang 2015). As such, it is not only morally important that outreach should address itself more directly to these structural inequalities, but it provides a useful distinction between effectiveness as conceived as challenging disadvantage as opposed to success in recruitment for a specific university; we suggest that evaluation also needs to recognise this distinction. Inequalities in attainment are clearly key, but other areas that have been somewhat neglected include challenging negative expectations from adults surrounding young people, broadening career horizons and providing high-quality advice and guidance (Harrison and Waller, forthcoming).

Conclusion
This article does not seek to provide a toolkit for evaluation, but rather to identify challenges to be mitigated and principles that are likely to underpin effective evaluation practice. We do not claim that the 'small steps' approach we advocate provides a full solution to the vexed issues outlined in the first half of the article. However, we do feel that it does provide a sounder basis than the existing and emerging orthodoxies with their focus on excessive timeframes or certifying 'the best' interventions. Perhaps the most important element of this is the focus on the social and individual theory of change embodied within an activity. In particular, our approach respects the role of practitioners as reflective professionals who, with help from evaluators, can refine their theories of change and the resulting practices. We have contextualised our small steps approach within a period of policy that is marked by what we have typified as a 'confusion of successes'. On the one hand, the recent White Paper (BIS 2016) commits to doubling participation rates for disadvantaged young people, however the main policy levers used on individual universities instils a competitive market where targets can just as easily be met by targeted recruitment activities as those designed to challenge educational inequalities. We believe this risks distorting evaluation activity to focus on attempts to demonstrate value-for-money against simplistic and inward-looking institutional recruitment outcomes (Harrison and Waller, forthcoming). Our final point, then, is to encourage practitioners to refocus their evaluative efforts back on effectiveness in addressing structural educational inequalities, in particular through improving young people's attainment, broadening the educational and occupational opportunities available to them and offering guidance to help them realise their ambitions.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was informed by the AIMS project which was supported by the Society for Research into Higher Education.

Notes on contributors
Dr Neil Harrison is a senior lecturer in the Department of Education and Childhood at the University of the West of England. His involvement in widening participation dates to the mid-1990s, first a practitioner and then as a researcher. His co-edited book, Access to Higher Education: Theoretical Perspectives and Contemporary Challenges, was published by Routledge in November 2016.
Dr Richard Waller is an associate professor in the Department of Education and Childhood at the University of the West of England.
He previously worked in widening participation, with a particular focus on mature access learners. His co-authored book, Higher Education, Social Class and Social Mobility: The Degree Generation, was published by Palgrave Macmillan in November 2016.