Question:
I have an interesting statistical problem concerning a circumplex model
in multidimensional scaling of 55 alcoholism symptoms (binary)in a very
large data set (42,000 cases, most of which have no alcoholism
symptoms). Based on the Jaccard S3 similarity matrix (percent having
other symptom when one symptom is present), I get an almost perfect
circumplex (donut or circle) in the 3-D solution, which is most evident
when plotting only the first two dimensions (X and Y, but not Z). The
question is whether this circumplex is fact or statistical artifact, and
if fact, how to interpret it. The scree plot of Stress by
Dimensionality has a sharp elbow at 2-D
Answer:
There's surely a risk that the Jaccard similary coefficient, which I too
would have chosen, will discount differences in presence of the symptoms.
You are only measuring the differences in kind and that is what a circumplex
shows.
Perhaps another coefficient might be different.
You could try one which effectively would have some combination of
indication: not just "if symptom a is present how often is symptom b
present", but also some idea of how usual it is that cases have neither!
Aren't there some weighted ones?
Dice, RR, SS2,K1,K2,ochiai
plus some of the distance coefficients that don't include the neither have
got it table would be sensible for the "try another coefficient" and a wider
choice to try to include more information.