The use of Cronbach’s Alpha…

Cronbach alpha

Taber, K. S. (2017). The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education. Research in Science Education, 1-24. doi:10.1007/s11165-016-9602-2

This article was published Open Access in Research in Science Education, so can be freely downloaded by anyone. Like most of the top science education journals, Research in Science Education charges a subscription or download fee to readers, but does not charge authors for publication. However, I found that my University had some existing agreement with the publisher (Springer) that allowed me to publish this article Open Access without paying a fee.

The abstract reads:

“Cronbach’s alpha is a statistic commonly quoted by authors to demonstrate that tests and scales that have been constructed or adopted for research projects are fit for purpose. Cronbach’s alpha is regularly adopted in studies in science education: it was referred to in 69 different papers published in 4 leading science education journals in a single year (2015)—usually as a measure of reliability. This article explores how this statistic is used in reporting science education research and what it represents. Authors often cite alpha values with little commentary to explain why they feel this statistic is relevant and seldom interpret the result for readers beyond citing an arbitrary threshold for an acceptable value. Those authors who do offer readers qualitative descriptors interpreting alpha values adopt a diverse and seemingly arbitrary terminology. More seriously, illustrative examples from the science education literature demonstrate that alpha may be acceptable even when there are recognised problems with the scales concerned. Alpha is also sometimes inappropriately used to claim an instrument is unidimensional. It is argued that a high value of alpha offers limited evidence of the reliability of a research instrument, and that indeed a very high value may actually be undesirable when developing a test of scientific knowledge or understanding. Guidance is offered to authors reporting, and readers evaluating, studies that present Cronbach’s alpha statistic as evidence of instrument quality.”

Why write a paper about a statistic?

Most of my research uses qualitative methods such as interviewing, and I do not see myself as an expert in quantitative methods. So writing about a statistic seems an odd choice (and perhaps even a very dry choice of topic). Indeed, I did not initially know much about Cronbach’s alpha, except I had noticed it was often used in studies.

My motivation to write about Cronbach’s alpha did not derive from my own research programme, or even from my teaching about research methodology. Rather, it came from work as a reviewer and editor. Being asked to review studies submitted for publication I found that Cronbach’s alpha was being used, and values reported as justifying the value of the instrument used. These papers seldom offered an argument beyond – to paraphrase – “alpha was >0.70 so the instrument is reliable”. So to do my work as a reviewer I found I needed to go and read up on alpha, and indeed find out what Cronbach thought alpha was about.

So what does alpha measure?

Two of the key things I learned were that alpha measured internal consistency, rather than reliability as it is normally understood (seeing if one gets the same result when repeating a measurement), and so it was only meaningful when applied to an instrument with a range of items intended to measure a single construct. So where an instrument has several scales, it only makes sense to use alpha separately for the individual scales, not for all the items pooled together.

The other key point seemed to be that there was no strong logic with using 0.70 as a cut-off. The higher the value of alpha, the more the different items are eliciting the same responses from people completing the scale. A low value means that the items are not getting at the same thing. A very high value means people are responding to the different items in pretty much the same way (different people are responding differently, but each person is responding to the scale items in a consistent way) – and perhaps suggests there is too much redundancy, and some of the items could be dropped without undermining the instrument. (If one had a perfect item that elicited exactly the construct being tested from everyone, then there would be no need to use more than one item as a scale. Of course, that does not tend to be the case – but having more items than are needed {as Cronbach recognised} does not add anything other than requiring more time and effort in completing, and analysing, an instrument.)

An important point is that alpha very sensitive to the number of items in a scale. A high value of alpha for a scale with three items suggests these items are well aligned, but the same value for a scale with 25 items would be much less persuasive. It is fairly easy to increase the value of alpha by adding more items – as long as those items are perceived very similarly to some of these already in the scale.

To offer a facetious hypothetical example. Imagine a scale of ten items where alpha was found on an administration (the value only applies to a specific administration to a particular sample, not to the scale itself) 0.69, and that was considered just below the required level of ‘reliability’. If one of the items read “I very much enjoy using the Bunsen burner”, then adding another item that reads “I enjoy using the Bunsen burner very much” would have almost certainly led to the value of alpha having instead been >0.7 and the scale being judged ‘reliable’. The statistic is very fickle to this kind of manipulation.

Evidence of poor practice

I found that some of papers I was asked to read as a reviewer, or sometimes those I saw as an editor, used alpha in ways that seemed inappropriate, and often without any apparent understanding of what alpha did and why it might be useful: simply calculating, hoping it was more than 0.7, and reporting it in the paper, seemed to be seem as important.

Authors were in effect saying “this is a good instrument because alpha >0.7” even when it was inappropriate to use alpha. Commonly alpha would be calculated across diverse scales in an instrument, or alpha would be calculated for a wide-ranging knowledge test where clearly the items (questions) were testing understanding of a range of concepts and principles.

So having done some reading to find out about alpha, it seemed that most of the examples I then met in science education papers used alpha without explaining what it was or why it was used, and often authors seemed to use alpha where it was not relevant or helpful. Moreover, this applied to examples I saw in published studies in good journals, as well as manuscripts submitted for review.

This motivated me to write an article explaining what alpha was, and why it was introduced, and when it should be used, and what we should read into the values calculated. I drew on examples from the published literature to explain appropriate use, and the limitations and flaws in many published studies.

Improvement of the article in peer review

The original manuscript submitted for publication drew upon specific published articles to make particular points about how alpha was being presented in science education studies. I felt the article worked well that way. However, one of the reviewers for Research in Science Education, made a fair point that in writing the paper I was largely, for my examples, picking on published studies that showed flaws: this could be seen as relying on anecdotes rather than being systematic.

To address this I surveyed the four top science eduction journals over the most recent complete year of publication to identify every use of alpha. Whilst, as might be expected, this revealed some good practice, it also showed just how inconsistently authors described what alpha was testing, and how qualitative interpretations of the actual values calculated varied wildly. This lengthened and changed the feel of the paper (from originally being a kind of perspective article, to being arranged around a formal study offering more specific evidence of current practice). I was initially a little uneasy about this at the time, having felt the earlier version had more ‘punch’. However, on balance. I feel the reviewer offered good advice. I do think that responding to the recommendation strengthened the paper, so that it offered a more robust evidential basis for offering specific guidelines for others using Cronbach’s alpha in their studies.

Personal ignorance can motivate a move towards greater expertise

So that’s how a largely ‘qualitative researcher’ came to publish a paper about statistics. I did not initially understand the basis or purpose of Cronbach’s alpha, but the need to find out in order to effectively review manuscripts led me to realise that a lot of those researchers actually using Cronbach’s statistic do not seem to really understand it either.

The use of Cronbach’s Alpha…

The abstract reads:

Why write a paper about a statistic?

So what does alpha measure?

Evidence of poor practice

Improvement of the article in peer review

Personal ignorance can motivate a move towards greater expertise

Published by Prof. Keith S. Taber

Leave a comment Cancel reply

The abstract reads:

Why write a paper about a statistic?

So what does alpha measure?

Evidence of poor practice

Improvement of the article in peer review

Personal ignorance can motivate a move towards greater expertise

Share this:

Related

Published by Prof. Keith S. Taber

Leave a comment Cancel reply