Accuracy (Validity)
Validity is the characteristic which determines whether a questionnaire measures what it is actually supposed to measure1,5,13. Validity refers to the suitability, significance and usefulness of an instrument for a specific objective and is generally seen as the most important consideration when evaluating an instrument. It does not refer to any inherent characteristics of the instrument; it is never ”valid” or ”not valid”14. It is also, particularly important, with regard to the language, culture and clinical situations for which an instrument was developed - an instrument validated in a specific language or population may not be valid in other clusters1,5,9,14.
The validity itself is also, and more correctly, called accuracy. Although there are several forms (or designations) of validation, the most commonly used to test the psychometry are construction validity, convergent validity 1,5,9,14 and discriminant validity1,5,6,9,12,14. Note that, in the literature a mixture of those terms can be found. In this way, an attention to the definitions and statistical methods must be drawn.
The construction validity (the designation ’content’ can also be found in the literature) assesses whether the content of an instrument is appropriate for its intended use. It involves a critical evaluation of the design and development of the instrument, to test the scope, relevance and understanding of the instrument among experts, such as specialists in Dermatology and Allergology, and patients5,9,15,16. The items must adequately represent the entire measured construction and the questions must be clear and free of redundant items. For example, generic instruments in dermatology generally have lower content validity, compared to specific dermatology instruments, since the first contain items not explicit to dermatological patients 9. Construction validity is usually determined through expert advice or statistical analysis - such as factor analysis and principal component analysis. These methods are applied to a group of variables (such as items on a multiple item scale) to determine whether the variables span a single dimension or more than one dimension 5,14. Subsequently, it is necessary to do a pre-test, that is, to apply to patients with the characteristics to be studied 15. A small sample - more or less 30 participants - is essential to identify some issue less accessible to individuals in the population and adapt it subsequently15.
From the moment that the questionnaire items were validated by experts, it is important to correlate the individual result of the questionnaire being validated to a gold standard definition independent of the questionnaire 15, that is, to recognise the association with other scales. This step is called convergent validity15. To compare the associations between the two (or more) questionnaires, a Pearson or Spearman correlation may be executed. A coefficient greater than 0.80 represents an excellent correlation, between 0.40-0.70 represents a good correlation and below 0.40 a weak correlation 9,17,18. Therefore, to measure the convergent validity of the questionnaire, the Kappa coefficient of agreement between the questionnaire to be validated and the standard can be used, as well as the means comparison and the Receiver Operating Characteristic (ROC) curve over the score of the questionnaire in study15. The Kappa coefficient is intended to answer two questions: ”How far is the agreement between the two questionnaires is better than the one would expect if done by chance?” and ”what is the maximum that the two participations can improve in their agreement in relation to the agreement that would be expected by mere chance?”. Effectively, the maximum that can be expected is 100% (or 1)17. Thus, Kappa quantifies the extent to which the observed agreement that both questionnaires managed to obtain17. According to Landis and Koch (1997), if this coefficient results in an interval between 0.01-0.20 it means that the agreement is weak; if it is between 0.21-0.40 it is reasonable; if between 0.41-0.60 it is moderate; if between 0.61-0.80 it is substantial and if between 0.81-1.00 it is almost perfect 19,20.
The discriminating validity indicates whether the questionnaire is actually measuring what is supposed to be 15, it determines whether the instrument is able to discriminate between different groups of individuals 5, for example: subjects with clinical diagnosis of AD vs. subjects without the disease. To assess the discriminant validity, the Mann-Whitney test or the t-student test may be used to compare the two populations15,21.
At this point, different scales arise, although with the same objectives. The importance of all types of validity is to address the question of whether the items on a scale adequately cover what the scale was designed to assess (are all players’ positions occupied and in place?), as well as the suitability of the items that are selected to assess the building of interest (i.e., how talented are the players in each position?) 14. On other words, any of the following words describe symptoms of AD: “itch” or “pruritus”, however, any one can be judged as more appropriate by a specialist to assess the disease. Therefore, a scale to assess pruritus should have a strong association with other scale with the same objective (pruritus and itching), such as the 5-D itch scale and Dynamic Pruritus Score (DPS).