Precision (Reliability)
Precision is defined, in most articles as reliability. Precision determines the degree to which a test result is free from random measurement errors 14. Therefore, the better the precision of the instrument, more similar are the results produced, when used repeatedly under the same conditions 9,15. Two types of precision are considered as crucial, namely when regards to QoL instruments: test-retest and internal consistency (reliability)5,9,15.
In the context of assessing QoL, it is important to remember that many factors can potentially influence their response, in addition to the patient’s experience. Such factors may include the defined assessment method (whether you are in a laboratory or in a clinic), the person who administers the instrument (an unknown researcher or the doctor himself or even a family member), other subjective experiences and feelings at the time (feeling more or less fatigued, tired or bored), motivational factors (desire to appear stronger) or a history of prior learning (for example, previous experience reporting higher or lower levels of itching). The variability in the score (the “variance”), which is associated with all these possible factors, and which is not associated with a specific dimension, is considered a variation of error14.
Internal consistency (the reliability itself) assesses the characteristics, attitudes or qualities that the instruments should measure, reliably reflecting the extent to which all items in a questionnaire address the same theoretical construction9,22. A questionnaire is considered internally consistent when there is a high intercorrelation between the item’s scores. Intercorrelation is usually expressed by Cronbach’s α coefficient 5,9,14. This coefficient varies from 0.0 to 1.0, and represents how well a set of items measures the same dimension or construction 14. If all items on a scale, that are supposed to measure the same topic, are unreliable, they will show weak associations among themselves and the coefficient value will be low. In contrast, if the items in an instrument reach the same objective, Cronbach’s α will be high 14,23. The closer its value is to 1, the more consistent the scale is internally5,18,23. The coefficient being <0.70 suggests that the items evaluate different constructions among them, in a given domain 5. In practical terms it is very difficult the items in a questionnaire maintain exactly the same results, which would translate into 100% of consistency. However, it is desirable a high proximity 9,15. If the studied questionnaire is form by different dimensions Cronbach’s α coefficient can be calculated by dimension and overall.
The test-retest is the method used to observe if an instrument produces stable scores over time 5,9,14,15. To assess test-retest, the instrument under study must be administered on two separate occasions, with a sufficiently short interval time to assume that the underlying condition is unlikely to have changed, but with sufficient time for patients to not remember their previous responses5–7. Nevertheless, the use of test-retest stability as an estimate of reliability also assumes that the construction being evaluated is stable over time. This can happen with several characteristics of some diseases, such as pruritus, but not with others such as pain, which in one day can be level 8, in the next level 4. The test-retest of each dimension or overall can be evaluated by calculating the Intraclass Correlation Coefficient (ICC) between scores in the first and second participations 15,21. This correlation measures the degree of the relationship between two variables that presents the proportion of the intersubjective variance in relation to the total variance 24. ICC varies between 0-1 5,9,15,18,22. The closer the coefficient to 1, the greater the reliability of the instrument 5. Preferably, it should be above 0.80. Nonetheless, a correlation coefficient above 0.70 is considered to be adequate5,6. Kappa coefficient of agreement may be used for test-rest, nevertheless, instead of using the results from two different questionnaires, it uses the results from two different participations15,21.