A survey of validity and utility of electronic patient records in a general practice
BMJ 2001; 322 doi: https://doi.org/10.1136/bmj.322.7299.1401 (Published 09 June 2001) Cite this as: BMJ 2001;322:1401All rapid responses
Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles or when it is brought to our attention that a response spreads misinformation.
From March 2022, the word limit for rapid responses will be 600 words not including references and author details. We will no longer post responses that exceed this limit.
The word limit for letters selected from posted responses remains 300 words.
In response to Dr Phil Hughes:
We chose 5 years retrospective entries for Read Codes because we had been
paperless for 6 years at the time of the study. Thus the practice EPR
system should have been valid for the whole of this time. We searched for
Read coded entries for the test conditions during the study period and
compared this to specific (drug) treatments, diagnostic tests and
procedure codes to validate the entry for each test condition. We included
conditions with a matching entry made during the study period. This would
only include conditions that were more than five years old where there was
an entry about that condition during the 5 year study period. The same
criteria applied to the drug searches.
We think the contingency tables for breast and prostate cancer are
correct. This is because that entries about these 2 conditions could still
be made in the wrong records (male or female), so validation must include
all active records within the EPR system. We were also aware that 1% of
breast cancer occurs in men.
As a training practice ourselves, we are also interested in the
application of our study to training practice reapproval. It may well be
that some form of validation or audit of "paperless" training practice EPR
systems will become a necessary part of that process. However, at this
stage, a review of processes rather than outcomes might seem a reasonable
basis for training practice reapproval.
Competing interests: No competing interests
Hassey et al. have produced a thought provoking paper. I have
attempted to repeat their work in a practice that has changed to EPR in
the last few years. A number of questions have arisen.
I note that they have used Read coded entries over the previous five
years, (or one for asthma and ischaemic heart disease). I wonder about the
rationale behind this, was this a pragmatic decision based on when they
had gone paperless or is there another reason? I presume thay are
searching for a consultation or coded entry with the Read code rather than
the date of onset or diagnosis as these may have been many years before.
This may apply particularly to Breast cancer, and hyperthyroidism where a
patient may have had the disorder diagnosed for example 10 years ago and
now be on no treatment.
I am also interested in the date range they applied to the drug
searches. Are these current, recent ie in the last 6 months or so, or ever
been prescribed? Again this will affect the yields of the searches and
subsequent data.
In addition I suspect that the 2x2 contingency tables are incorrect
for breast cancer and prostate cancer. As these conditions are sex
specific should the population base not be female for breast cancer and
male for prostate. Otherwise the number of true negatives will be double
that which it should be. (false true negative?!)
One final thought is that this paper is of great relevance to
training practices. Normally EPRs are compared with paper records to
assess the quality and accuracy of the summaries, however increasingly the
paper records are inaccurate. Would there come a time when pracices would
be expected to present this data along with their protocols for note
summarisation and data capture as part of the validation procedure.
Phil Hughes
Competing interests: No competing interests
We are grateful to Newcombe, Altman and Bryant (1) for showing an
error in our calculation of confidence intervals in the EPR-Val
toolkit(2).
These errors were corrected on 19 July 2001, and we would
recommend that users upgrade from EPR-Val to EPR-Val2. The upgrade is
available from the BMJ website(3).
References
(1) http://bmj.com/cgi/eletters/322/7299/1401#EL5
(2) Alan Hassey, David Gerrett, and Ali Wilson. BMJ 2001;322:1401-1405.
(3) http://bmj.com/cgi/content/full/322/7299/1401/DC1
Alan Hassey, David Gerrett, Ali Wilson
Competing interests: No competing interests
Following my earlier response above to Newcombe et al, I am pleased
to report that a new version of the EPR-Val (EPR-Val2) calculator has been
tested and submitted to the BMJ for publication on their website in due
course. EPR-Val2 now uses Wilson's method for the calculation of 95%
confidence intervals for the sensitivity, specificity, PPV & NPV.
Those users who prefer a professional/commercial statistical package
should note that the latest version of StatsDirect(TM) now includes
routines for the calculation of confidence intervals for proportions (e.g.
sensitivity). The method of calculation is different from Wilson's method
(advocated by Newcombe et al) & returns slightly different confidence
interval values than EPR-Val2. Both methods have a sound statistical
basis.
Competing interests: No competing interests
We agree with Bayliss-Brown that sensitivity and positive predictive
value are useful measures of the validity of electronic patient record
systems. Indeed we give several references in our paper (1) to other
studies that
have used these methods and we believe that we have given due
consideration to the existing literature.
We did consider the use of Cohens (2) Kappa (K),
originally proposed as a measure of agreement between two assessors
allocating to nominal level categories. However, we were concerned with
its reliance on symmetric marginal distributions and the major difficulty
of interpreting the statistic. This was recognised by Cohen in the
qualifying statistic K max, calculated by multiplying the marginal values
of each column and row, and dividing by the total number of observations.
The value is the maximum value that K could achieve in the given
circumstances. Thus, 1 - Kmax is the proportion of possibilities excluding
chance, which cannot be achieved as a consequence of differing marginals.
Indeed, Collis interprets 1-Kmax as indicating the extent to which judges
are using different criteria to make their
judgements.(3) Our problem was that there are no acceptable standards for
balancing and interpreting K, Kmax, 1-Kmax. This is to say nothing of
weighted Kappa,(4) Partial Kappa (2) or the Proportion agreement. Further
difficulties have been noted in the literature, which increased our
reluctance to use the statistic.(5) (6)
It might have been possible to use a multiple category, multiple
assessor extension of Kappa as described by light.(7) Unfortunately the
authors know of no probability density distribution calculation for this
statistic.
Overall, we felt that use of Kappa would have lead us into
interpretation difficulties at a time when our goal was to provide a
simple, easily usable tool.
1. Hassey A, Gerrett D, Wilson Ali. A survey of validity and utility
of
electronic patient records in general practice. BMJ 2001; 322:1401-5.
2. Cohen JA. A coefficient of agreement for nominal scales. Educational
Psychological Measurement 1960;20(1):37-45.
3. Collis GM, Kappa, measures of marginal symmetry and intraclass
correlations. Educational Psychological Measurement 1985;45:55-61.
4. Fleiss J. Cohen J, Everitt BS. Large Sample standard errors of Kappa
and
Weighted Kappa. Psychological Bulletin 1969;72(5):323-7
5. Maclure M. Willett WC. Misinterpretation and misues of the Kappa
statistic. American Journal Epidemiology 1987;126(2):161-9
6. Brennan RL, Prediger DJ. Coefficient Kappa and some uses, misuses and
alternatives. Educational Psychological measurement 1981;41:687-99.
7. Light RJ. Measures of response agreement for qualitative data.
Psychological Bulletin 1971;76(5):365-77
Competing interests: No competing interests
We are grateful to Newcombe, Altman and Bryant for showing an error
in our calculation of confidence intervals in the EPR-Val toolkit. Initial
analysis of all our data was performed using SPSS(TM) &
StatsDirect(TM) commercial software packages. The EPR-Val toolkit was
developed and tested against StasDirect(TM) output for the practice data.
We then added the confidence interval calculations, TP:FN ratio &
DBFind(10,000) derived statistics. These were not externally tested
against a commercial package. We believe that all other statistics are
correctly calculated in EPR-Val and we see no need for the package to be
withdrawn, though we will correct the CI calculations and undertake to do
this as soon as possible.
The authors(1) believe that there is no single best measure of
validity for electronic patient records (EPRs). The EPR-Val toolkit
provides a range of statistics calculated from the 2x2 contingency tables
so that users may describe exactly what statistics they have used to
establish EPR validity. We agree that some of the measures displayed may
seem misleading - particularly "accuracy" (the proportion of all tests
that are correct) because EPR validity has previously been calculated
using sensitivity and PPV as measures of "completeness" and "accuracy"
respectively.(2)(3) (4). We recommend that in future studies, those
measuring EPR validity should say exactly what they mean by validity and
state what measures they have calculated from thier data. We have provided
the EPR-Val toolkit to facilitate this process.
Finally we make no claims that measures of EPR validity reflect the
true prevalance of any diagnostic condition in the community. Nor do these
results reflect the effectiveness of our clinical management for these
conditions. Our survey was designed to measure ONLY the validity of the
data we hold in the clinical records. The derived statistics TPFN ratio
and DBFind10,000 are included to help health workers understand how many
true cases of the test condition (e.g. diabetes) remain undiagnosed in the
database and to help quantify the benefits of validating a clinical
database for those conditions. Time will tell whether future workers will
find these measures useful.
References
1. Hassey A, Gerrett D, Wislon A. A survey of validity and utility of
electronic patient records in general practice. BMJ 2001;322:1401-5
2. Whitelaw FG, Nevin SL, Milne RM, Taylor RJ, Taylor MW, Watt AH.
Completeness and accuracy of morbidity and repeat prescribing records held
on general practice computers in Scotland. Br J Gen Pract 1996;46(404):181
-186.
3. Whitelaw F, Nevin S, Taylor R, Watt A.. Morbidity and prescribing
patterns for the middle-aged population of Scotland. Br J of Gen Pract
1996;46:707-714.
4. Mant J, Mant F, Winner S. How good is routine information? Validation
of coding for acute stroke in Oxford hospitals. Health Trends
1997/98;29(4):96-99.
Competing interests: No competing interests
Hassey, Gerrett & Wilson (1) indicate the need to validate
electronic patient records in primary care. While findings are
appropriately expressed in percentages as in this article, their EPR-Val
toolkit yields incorrect confidence intervals. For the diabetes data, the
calculated 95% confidence intervals are incorrect on two counts.
Incorrect use of the table total as the denominator in calculating
standard errors results in intervals which are too narrow, indeed grossly
so for sensitivity and positive predictive value. Furthermore, the
traditional method is inferior, especially for proportions near 100%. Our
table shows their results, recalculated using the traditional, and the
preferred Wilson method (2,3):
Statistic Estimate 95% confidence interval (%) (%) EPR-Val Correctly Wilson calculated method traditional method Sensitivity 98.3 98.1 to 98.5 96.8 to 99.8 96.0 to 99.3 Specificity 100 100.0 to 100.0 100.0 to 100.0 99.9 to 100 Positive Predictive Value 99.3 99.3 to 99.4 98.4 to 100.3 97.5 to 99.8 Negative Predictive Value 100 99.9 to 100 99.9 to 100 99.9 to 100
Even with large samples the traditional method can give impossible
values exceeding 100%, as for the positive predictive value here. The
preferable Wilson method is available in Confidence Interval Analysis
software (4) and for Excel (5). We are disturbed by the dissemination of
the inadequately tested EPR-Val software, which should be withdrawn
immediately from the BMJ website. Potential users should check new
software using data with known answers, as errors are quite common. (6)
Furthermore, some of the measures displayed are redundant while
others, especially accuracy, are potentially misleading. The quoted
accuracy of 99.9% conceals the fact that about 1 in 60 diagnosed diabetics
is not coded as such on the database.
There is a danger in using terms such as sensitivity, specificity and
predictive value, familiar from the clinical or screening context, in data
validation. In the former situation implicitly the "gold standard" is
whether the individual really has the disease. In the data validation
context, these quantities measure how two parts of the record agree.
Clearly some of the 13302 patients whose records do not indicate
"diabetes" would have diagnosable disease, if sought using systematic
diagnostic criteria. We are concerned lest clinicians and managers
naively believe such figures indicate the practice has successfully
identified all prevalent diabetics and is managing them proactively.
The study usefully showed that many diagnosed cases of asthma, iron
deficiency anaemia, hypothyroidism and IHD are not adequately identifiable
within present standards of record keeping. It is helpful to demonstrate
such deficiencies, complete the audit cycle and correct them. But the
converse is false: high sensitivity and specificity do not imply all is
well. Certainly high "accuracy" does not. Even with improved consistency
of record keeping for asthma etc., there could still be many practice
patients with unidentified disease, just as for diabetes.
1 Hassey A, Gerrett D, Wilson A. A survey of validity and utility of
electronic patient records in a general practice. BMJ 2001; 322: 1401-5.
2 Wilson EB. Probable inference, the law of succession, and statistical
inference. J Am Stat Assoc 1927; 22: 209-12.
3 Newcombe RG, Altman DG. Proportions and their differences. In: Altman
DG, Machin D, Bryant TN, Gardner MJ . Statistics with Confidence. 2nd
edition, 2000. BMJ Books, London: 45-56.
4 Bryant TN. Computer software for calculating confidence intervals (CIA).
In: Altman DG, Machin D, Bryant TN and Gardner MJ. Statistics with
Confidence. 2nd edition, 2000. BMJ Books, London: 208-13.
5
http://www.uwcm.ac.uk/epidemiology_statistics/research/statistics/newcom...
6 Bland JM, Altman DG. Misleading statistics: errors in textbooks,
software and manuals. Int J Epidemiol 1988;17:245-7.
Dr. Robert G. Newcombe
Senior Lecturer in Medical Statistics
University of Wales College of Medicine,
Heath Park,
Cardiff CF14 4XN.
E-mail Newcombe@cf.ac.uk
Prof. Douglas G. Altman
Professor of Statistics in Medicine
ICRF Medical Statistics Group,
Centre for Statistics in Medicine,
Institute of Health Sciences,
Old Road, Headington,
Oxford OX3 7LF.
E-mail d.altman@icrf.icnet.uk
Dr. Trevor N. Bryant
Medical Computing
University of Southampton,
MailPoint 820,
Southampton General Hospital,
Southampton SO16 6YD
E-mail T.N.Bryant@soton.ac.uk
Competing interests: Statistic Estimate 95% confidence interval (%) (%) EPR-Val Correctly Wilson calculated method traditional methodSensitivity 98.3 98.1 to 98.5 96.8 to 99.8 96.0 to 99.3Specificity 100 100.0 to 100.0 100.0 to 100.0 99.9 to 100Positive Predictive Value 99.3 99.3 to 99.4 98.4 to 100.3 97.5 to 99.8Negative Predictive Value 100 99.9 to 100 99.9 to 100 99.9 to 100
EDITOR – Hassey has rightly highlighted the importance of ensuring
electronic records are accurate. (1) The study explored a method of
measuring the validity and utility of electronic records in general
practice including whether the coding of 15 marker diagnoses was a true
reflection of the actual prevalence. However they are wrong in their
assertion that no published accounts of measuring the validity of
electronic record contents exist. Hogan performed a literature review and
compared 20 articles that met certain quality criteria. (2) He recommended
(as has been used in Hassey’s paper) that measures of completeness
(sensitivity or detection rate) and correctness (positive predictive
value) were valuable. These measures have also been shown to be valuable
in measuring the quality of data retrieval. (3) Other measures derived
from 2×2 contingency tables are less likely to be helpful because of the
combination of a large total number of records and true negatives. In
order to compensate for this Hassey proposes two new descriptive
statistics. Previous reports have however used Cohen’s kappa (4) - this is
a measure of the strength of agreement between the observed retrieval and
the gold standard against that, which might be expected by chance. Cohen’s
kappa has the advantage of being a well-validated single index and has
been shown to be a useful index of measuring data retrieval from
electronic records where performances of >0.9 can be achieved. (3) When
Cohen’s kappa is applied to Hassey’s data it highlights similar priority
areas of data concern where the value is <0.9 (obesity = 0.04,
hypothyroidism = 0.89, iron deficiency anaemia = 0.86, asthma = 0.86).
Prescriptions generated were also compared to those dispensed by a
local pharmacy. As they were computer generated unsurprisingly 99.7% were
reported to be valid; however of the 10 handwritten prescriptions only 80%
were accurately recorded; perhaps a more suitable design would have been
to check in a sample how many of the prescriptions reflected the correct
dose and frequency?
Hassey claims that the principal innovation of the study was the use
of Read codes as the test for the true presence of a diagnosis despite
Gray’s earlier account of identifying patients with ischaemic heart
disease using a similar technique and reporting exactly the same
sensitivity rate (96%). (5) The approach used by Hassey in triangulating
disease codes with treatments and other findings has merit but due
consideration should have been given to existing literature.
1 Hassey A, Gerrett D, Wilson A. A survey of validity and utility of
electronic patient records in a general practice. BMJ 2001; 322: 1401-
1405. (9 June.)
2 Hogan WR, Wagner MM. Accuracy of data in computer-based patient
records. JAMIA. 1997; 4: 342-55.
3 Brown PJB, Sönksen P. (2000) Evaluation of the quality of
information retrieval of clinical findings from a computerised patient
database using a semantic terminological model. JAMIA; 7: 401-412.
4 Brown PJB, Sönksen P, Price C, Young P. (1999) A Standard for
Evaluating the Retrieval Performance of Clinical Terminologies. In Lorenzi
N (ed) Proceedings of the 1999 AMIA Fall Symposium. Philadelphia: Hanley
& Belfus 1999: 1031.
5 Gray J, Majeed A, Kerry S, Rowlands G. Identifying patients with
ischaemic heart disease in general practice: cross sectional study of
paper and computerised medical records BMJ 2000;321:548-550. ( 2
September.)
Competing interests: No competing interests
Validity of electronic patient records depends on the quality of the
electronic system as well as the user.
EDITOR - The electronic patient record (EPR) systems in general
practice were, until recently, used largely as data repositories. The
situation is now changing rapidly with the introduction of clinical
governance. As there is a need to derive information from data in the
electronic records, the validity of data becomes important. Therefore the
survey by Hassey et. al.(1) is timely and much more work is needed in this
area. It is of interest to note that there were two men in their survey
recorded as having had cervical smears. This may appear trivial but it
reflects a design fault of the EPR. Recording information on paper is a
simple process with few steps, all of which are visible. However the same
process on a computer involves many more steps and some are invisible.
There is a potential for errors at each step, hence increasing chances of
this in the EPR. The data recorded can be invalid simply because of a
wrong diagnosis or accidental selection of an incorrect Read code from a
pick list. Both of these are possible in the paper system; I have seen
"anxiety t.d.s" written in notes as a prescription. More worryingly we
have seen a cytotoxic drug incorrectly recorded for a child due to a
mapping error of the system and an invisible record which could only be
seen at the back end of the database.
EPR can not be compared to the paper records, as some features are
unique to it. The automated search facility and the ability to include an
audio or video clip as part of the record are not available in the paper
record. Different standards are also expected of an EPR such as error
trapping at user interface level and design of the interface to minimise
the error (2). Which actions are allocated to the user and which are
allocated to computer is a system design issue. Building systems using
tested and 'certified' components and use of object oriented software
technology for development may address these issues.
Bernard Fernando
General practitioner
Thames Avenue Surgery, Rainham, Kent ME8 9BW
(1) Hassey A, Gerrett D, Wilson Ali. A survey of validity and utility
of electronic patient records in general practice. BMJ 2001; 322:1401-5.
(9 June.)
(2) Wyatt J. Same information, different decisions: format counts. BMJ
1999; 318:1501-2.
Competing interests: No competing interests
Guidance for GPs
The latest guidance on how electronic records should be used in GP
surgeries has been published by DoH and General Practice Committee. This
is not simply a voluntary activity but one about which all GPs are obliged
to inform themselves and their services users. Those who have access to
the net can find it on the NHSIA Stakeholder Bulletin site. Point 3 '
Information Governance' is particularly useful but all of it is
accessable. Even this minimal information in the surgery would be useful
and take a few minutes to download for those who have no access to
computers.
As for the tagging of children with their NHS number from birth, parents
need to be aware that mistakes and slipups are already being identified
and advice being given to NHS staff to double check - parents may wish to
treble check.
Competing interests:
None declared
Competing interests: No competing interests