Better and better. They data for their second study depended on Kiesler's 2010 study. They used Kite & Deaux's (1986) HAS test. Unfortunately, test-retest reliability was determined using Pearson's r. Although this is done fairly frequently in the social sciences, the problems with treating likert-scale variables as raw numbers were the reason for the creation of a whole new system of techniques within logic, statistics, computer science, probability, mathematics, etc (fuzzy set techniques based on the development of fuzzy logic by Zadeh). Even before that, however, there's a reason we have different correlation coefficients for ordinal data (and a likert scale is ordinal data). Pearson's r assumes a normal bivariate distribution of two continous variable. However, most discrete sets are appropriate as well (SAT scores, IQ tests, etc., all have sufficient variability and range to approximate continous data). Likert scale based data do not approximate continuous sets, and they frequently are frequently not normally distributed.
The authors then used an intelligence test which they claim is no longer "experimental." However, the correlation tests they point to in order to demonstrate this (the correlation of their test SAT scores and grades in various classes) were often fairly low. The most recent study investigating the validity of their test which they refer to showed a decent correlation with Math SAT scores but did not show a correlation with Verbal scores. But they used it anyway. Fine. There are always issues with measures. However, their entire subject pool consisted of college students. Getting GPA scores and SAT data would not have been difficult, and would have provided additional measurements of abstract reasoning. After all, these were the tests used to "demonstrate" the validity of the test they did use.
Also, although they obtained additional data about their subjects (race, income bracket, and gender) the only factored gender into their analysis. And once again, they used a correlation matrix of r scores when dealing with likert-data.
The study the OP refers to used these data to massage into a path analysis.