Accuracy (Validity)
Validity is the characteristic which determines whether a questionnaire
measures what it is actually supposed to measure1,5,13. Validity refers to the suitability,
significance and usefulness of an instrument for a specific objective
and is generally seen as the most important consideration when
evaluating an instrument. It does not refer to any inherent
characteristics of the instrument; it is never ”valid” or ”not valid”14. It is also, particularly important, with regard to
the language, culture and clinical situations for which an instrument
was developed - an instrument validated in a specific language or
population may not be valid in other clusters1,5,9,14.
The validity itself is also, and more correctly, called accuracy.
Although there are several forms (or designations) of validation, the
most commonly used to test the psychometry are construction validity,
convergent validity 1,5,9,14 and discriminant validity1,5,6,9,12,14. Note that, in the literature a mixture
of those terms can be found. In this way, an attention to the
definitions and statistical methods must be drawn.
The construction validity (the designation ’content’ can also be found
in the literature) assesses whether the content of an instrument is
appropriate for its intended use. It involves a critical evaluation of
the design and development of the instrument, to test the scope,
relevance and understanding of the instrument among experts, such as
specialists in Dermatology and Allergology, and patients5,9,15,16. The items must adequately represent the
entire measured construction and the questions must be clear and free of
redundant items. For example, generic instruments in dermatology
generally have lower content validity, compared to specific dermatology
instruments, since the first contain items not explicit to
dermatological patients 9. Construction validity is
usually determined through expert advice or statistical analysis - such
as factor analysis and principal component analysis. These methods are
applied to a group of variables (such as items on a multiple item scale)
to determine whether the variables span a single dimension or more than
one dimension 5,14. Subsequently, it is necessary to
do a pre-test, that is, to apply to patients with the characteristics to
be studied 15. A small sample - more or less 30
participants - is essential to identify some issue less accessible to
individuals in the population and adapt it subsequently15.
From the moment that the questionnaire items were validated by experts,
it is important to correlate the individual result of the questionnaire
being validated to a gold standard definition independent of the
questionnaire 15, that is, to recognise the
association with other scales. This step is called convergent validity15. To compare the associations between the two (or
more) questionnaires, a Pearson or Spearman correlation may be executed.
A coefficient greater than 0.80 represents an excellent correlation,
between 0.40-0.70 represents a good correlation and below 0.40 a weak
correlation 9,17,18. Therefore, to measure the
convergent validity of the questionnaire, the Kappa coefficient of
agreement between the questionnaire to be validated and the standard can
be used, as well as the means comparison and the Receiver Operating
Characteristic (ROC) curve over the score of the questionnaire in study15. The Kappa coefficient is intended to answer two
questions: ”How far is the agreement between the two questionnaires is
better than the one would expect if done by chance?” and ”what is the
maximum that the two participations can improve in their agreement in
relation to the agreement that would be expected by mere chance?”.
Effectively, the maximum that can be expected is 100% (or 1)17. Thus, Kappa quantifies the extent to which the
observed agreement that both questionnaires managed to obtain17. According to Landis and Koch (1997), if this
coefficient results in an interval between 0.01-0.20 it means that the
agreement is weak; if it is between 0.21-0.40 it is reasonable; if
between 0.41-0.60 it is moderate; if between 0.61-0.80 it is substantial
and if between 0.81-1.00 it is almost perfect 19,20.
The discriminating validity indicates whether the questionnaire is
actually measuring what is supposed to be 15, it
determines whether the instrument is able to discriminate between
different groups of individuals 5, for example:
subjects with clinical diagnosis of AD vs. subjects without the disease.
To assess the discriminant validity, the Mann-Whitney test or the
t-student test may be used to compare the two populations15,21.
At this point, different scales arise, although with the same
objectives. The importance of all types of validity is to address the
question of whether the items on a scale adequately cover what the scale
was designed to assess (are all players’ positions occupied and in
place?), as well as the suitability of the items that are selected to
assess the building of interest (i.e., how talented are the players in
each position?) 14. On other words, any of the
following words describe symptoms of AD: “itch” or “pruritus”,
however, any one can be judged as more appropriate by a specialist to
assess the disease. Therefore, a scale to assess pruritus should have a
strong association with other scale with the same objective (pruritus
and itching), such as the 5-D itch scale and Dynamic Pruritus Score
(DPS).