Precision (Reliability)
Precision is defined, in most articles as reliability. Precision
determines the degree to which a test result is free from random
measurement errors 14. Therefore, the better the
precision of the instrument, more similar are the results produced, when
used repeatedly under the same conditions 9,15. Two
types of precision are considered as crucial, namely when regards to QoL
instruments: test-retest and internal consistency (reliability)5,9,15.
In the context of assessing QoL, it is important to remember that many
factors can potentially influence their response, in addition to the
patient’s experience. Such factors may include the defined assessment
method (whether you are in a laboratory or in a clinic), the person who
administers the instrument (an unknown researcher or the doctor himself
or even a family member), other subjective experiences and feelings at
the time (feeling more or less fatigued, tired or bored), motivational
factors (desire to appear stronger) or a history of prior learning (for
example, previous experience reporting higher or lower levels of
itching). The variability in the score (the “variance”), which is
associated with all these possible factors, and which is not associated
with a specific dimension, is considered a variation of error14.
Internal consistency (the reliability itself) assesses the
characteristics, attitudes or qualities that the instruments should
measure, reliably reflecting the extent to which all items in a
questionnaire address the same theoretical construction9,22. A questionnaire is considered internally
consistent when there is a high intercorrelation between the item’s
scores. Intercorrelation is usually expressed by Cronbach’s α
coefficient 5,9,14. This coefficient varies from 0.0
to 1.0, and represents how well a set of items measures the same
dimension or construction 14. If all items on a scale,
that are supposed to measure the same topic, are unreliable, they will
show weak associations among themselves and the coefficient value will
be low. In contrast, if the items in an instrument reach the same
objective, Cronbach’s α will be high 14,23. The closer
its value is to 1, the more consistent the scale is internally5,18,23. The coefficient being <0.70
suggests that the items evaluate different constructions among them, in
a given domain 5. In practical terms it is very
difficult the items in a questionnaire maintain exactly the same
results, which would translate into 100% of consistency. However, it is
desirable a high proximity 9,15. If the studied
questionnaire is form by different dimensions Cronbach’s α coefficient
can be calculated by dimension and overall.
The test-retest is the method used to observe if an instrument produces
stable scores over time 5,9,14,15. To assess
test-retest, the instrument under study must be administered on two
separate occasions, with a sufficiently short interval time to assume
that the underlying condition is unlikely to have changed, but with
sufficient time for patients to not remember their previous responses5–7. Nevertheless, the use of test-retest stability
as an estimate of reliability also assumes that the construction being
evaluated is stable over time. This can happen with several
characteristics of some diseases, such as pruritus, but not with others
such as pain, which in one day can be level 8, in the next level 4. The
test-retest of each dimension or overall can be evaluated by calculating
the Intraclass Correlation Coefficient (ICC) between scores in the first
and second participations 15,21. This
correlation measures the degree of the relationship between two
variables that presents the proportion of the intersubjective variance
in relation to the total variance 24. ICC varies
between 0-1 5,9,15,18,22. The closer the coefficient
to 1, the greater the reliability of the instrument 5.
Preferably, it should be above 0.80. Nonetheless, a correlation
coefficient above 0.70 is considered to be adequate5,6. Kappa coefficient of agreement may be used for
test-rest, nevertheless, instead of using the results from two different
questionnaires, it uses the results from two different participations15,21.