Determining Laboratory Reference Intervals: CLSI Guideline Makes the Task Manageable

Recently, a publication in the American Heart Journal 1 stated that the reference interval for creatine kinase (CK), a test commonly performed to monitor statin therapy, could be off by as much as a factor of 3. As a result, many patients are advised, incorrectly, to discontinue their medications, causing their cholesterol levels to return to their original abnormal levels and putting them at increased risk for coronary heart disease.

To be sure that your laboratory’s reference intervals for all tests, including CK, are appropriate for your patient population, refer to Clinical and Laboratory Standards Institute’s (CLSI) newly revised document Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline—Third Edition (C28-A3), published in November 2008. The document provides the laboratory with guidance to define criteria for selecting a healthy reference population, determine how many subjects are needed, identify outliers, and perform the calculations necessary to generate a valid reference interval.

The 1,500 individuals tested for the CK reference range study noted above were of different sexes, ages, and races who met multiple criteria, including not exercising for three days and not taking any medications (specifically, statins). “For most laboratories, including mine, testing to this extent isn’t practical,” said Gary L. Horowitz, MD, member of the CLSI subcommittee that completed the revision of C28-A3.

As a result of the CK study, which followed the reference interval verification protocol exactly as defined in C28-A2, it was determined that the reference intervals recommended by the manufacturer in question were three to five times too low. For example, 60% of normal black males would have been deemed abnormal if the package insert data provided by this manufacturer was used. “My laboratory did not know, but could have known based upon a study, that the package insert was incorrect,” Dr. Horowitz said.

Time for an Update

The previous version of C28-A3, C28-A2, published in 2000, focused its reference interval calculations on laboratories that could collect samples from 120 reference individuals. “It allowed only one criterion, i.e., Dixon’s test, to determine if there were outliers in the data set,” said Amadeo J. Pesce, PhD, another member of the subcommittee. “This put limitations on the ability of laboratories to set reference intervals, as many laboratories are unable to collect the minimum number of samples.”

“Most laboratories, including my own, use reference ranges published in manufacturers’ package inserts, even though they are supposed to establish their own,” Dr. Horowitz explained. As laboratories put new test systems into use, they are required by accreditation and certification bodies to verify or establish performance specifications, including reference ranges. A new protocol outlined in C28-A3 allows a laboratory to validate a reference interval with a smaller number of samples. Dr. Horowitz said, “It is now emphasized that, if a laboratory considers only 20 normal individuals, it can determine if reference intervals established elsewhere are valid for its method and population.”

When evaluating 20 samples, if no more than two results are outside the proposed reference interval range, it is statistically valid for the laboratory to adopt the proposed reference interval as its own. If three or four results lie outside the reference range, then another 20 samples must be collected. Dr. Horowitz believes strongly that “every laboratory, no matter how small, is capable of collecting 20 samples from reference (‘normal’) individuals.”

A More Efficient, Effective Laboratory

By employing the strategies discussed in C28-A3, the laboratory can have more confidence in the reference intervals it reports with the test results. For example, although 120 samples remains the recommended standard, reasonable estimates of reference intervals can be made with fewer samples. Secondly, the laboratory can verify reference intervals transferred from elsewhere if the values are vetted by the procedures described in the document.

In some respects, C28-A3 can be described as more practical. Although it strongly endorses 120 as the recommended number of reference samples, it recognizes that this is a high standard. Thus, it discusses in detail techniques to use fewer samples, both to establish and to verify reference intervals. “Although the latter (verifying reference intervals) was included in the previous version of the document, many readers may not have appreciated its simplicity, which is emphasized more in the new version,” Dr. Horowitz said.

Paul S. Horn, PhD, advisor to the subcommittee, added “Anyone who changes instruments or testing methods must determine reference intervals. They must understand the basic principles of how these values are derived. Reading this document and following the guideline will help you derive appropriate reference intervals.”

Major Changes Made

Dr. Horowitz noted the revised guideline 2 introduces several concepts:

Decision limits. There are a number of laboratory tests where accuracy is more important than a traditional reference is a commonly ordered interval. For example, hemoglobin A_1c test for diabetic patients. National guidelines define a good value for someone who has his or her diabetes under control. Therefore, it is unnecessary to establish a reference interval. Other examples of tests that have defined decision limits are cholesterol, glucose, and neonatal bilirubin.
Multicentered trials. On occasion, data from multiple laboratories, following strict quality assurance guidelines, can be pooled. Thus, if six laboratories participated, each could collect 20 samples to achieve the recommended 120 samples, reducing the burden on each laboratory significantly.
New statistical methods, such as “robust method.” “This method will enable you to generate reference intervals with a smaller number of individuals,” Dr. Horowitz said. This may be particularly helpful with specialized populations, such as pediatrics and geriatrics, where collecting 120 samples is much more challenging.

Importance of Participation

According to Dr. Horowitz, “Clinicians compare the values laboratory professionals report with the given reference intervals. Whether a test is normal or abnormal is perhaps the most important element of a laboratory test. But laboratories typically spend relatively little time making sure reference intervals given by manufacturers apply to their populations. Given the importance of reference intervals, I wanted to understand them better myself and help to make the process more understandable and more practical.”

Dr. Horowitz continued, “When I first got involved in creating the document, I found it somewhat intimidating. But, once I got to the sections with specific examples, the concepts were much easier to understand. My laboratory uses the techniques in the document all the time. I am now on a personal mission to get laboratory professionals to understand that reference interval studies are not only important, but also reasonably straightforward to do.”

Dr. Horn also noted that the guideline provides improved estimates of reference intervals for physicians. “It gives us more confidence in the information we provide to practicing physicians,” he said.

Improving Health Care

“Ultimately, better reference intervals should result in fewer false-positive and false-negative results,” Dr. Horn said. “This means fewer wrong or missed diagnoses. As a result, the entire medical care system will become more efficient.”

“If people reading this article would make the effort to collect samples from just 20 reference individuals,” Dr. Horowitz claimed, “I’ll guarantee that most of them will discover their CK reference intervals are too narrow.” He went on to say that, “with those same 20 samples, the laboratory should be able to verify many of the other reference intervals it is using.”

For guidance on establishing or validating reference intervals, Dr. Horowitz recommends that a laboratory obtain a copy of C28-A3, which was developed by internationally recognized experts and scientists from regulatory bodies, diagnostic laboratories, and the in vitro diagnostic industry.

The Experts

Paul S. Horn, PhD, is a professor in the Department of Mathematical Sciences at the University of Cincinnati, Cincinnati, OH. He is also a visiting professor of neurology at Cincinnati Children’s Hospital Medical Center and a statistician of the Psychiatry Service at the Veterans Affairs Medical Center of Cincinnati.

Gary L. Horowitz, MD, is Director of Clinical Chemistry at Beth Israel Deaconess Medical Center and Associate Professor of Pathology at Harvard Medical School, Boston, MA.

Amadeo J. Pesce, PhD, is Professor Emeritus of the Department of Pathology and Laboratory Medicine, University of Cincinnati, Cincinnati, OH. He is also an adjunct professor with the Department of Pathology and Laboratory Medicine at UCSD School of Medicine, La Jolla, CA.