# Calculate corrected item-total correlations
calculate_corrected_item_total <- function(data) {
n_items <- ncol(data)
correlations <- numeric(n_items)
for(i in 1:n_items) {
item_score <- data[, i]
total_minus_item <- rowSums(data[, -i, drop = FALSE])
correlations[i] <- cor(item_score, total_minus_item, use = "complete.obs")
}
names(correlations) <- colnames(data)
return(correlations)
}
# Example usage
# corrected_correlations <- calculate_corrected_item_total(scale_data)Notes
Scale Item Correlation Equations
Item-Total Correlations
Corrected Item-Total Correlation
The corrected item-total correlation removes the focal item from the total score to avoid spurious inflation:
\[r_{i, T_{-i}} = \frac{\text{Cov}(X_i, T_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T_{-i})}}\]
Where: - \(X_i\) = score on item \(i\) - \(T_{-i}\) = total score excluding item \(i\), calculated as \(T_{-i} = \sum_{j \neq i} X_j\) - \(\text{Cov}(X_i, T_{-i})\) = covariance between item \(i\) and corrected total - \(\text{Var}(X_i)\) and \(\text{Var}(T_{-i})\) = variances of item \(i\) and corrected total
Uncorrected Item-Total Correlation
The uncorrected item-total correlation includes the focal item in the total score:
\[r_{i,T} = \frac{\text{Cov}(X_i, T)}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T)}}\]
Where: - \(T = \sum_{j=1}^{k} X_j\) (total score including all \(k\) items)
# Calculate uncorrected item-total correlations
calculate_uncorrected_item_total <- function(data) {
total_score <- rowSums(data)
correlations <- numeric(ncol(data))
for(i in 1:ncol(data)) {
correlations[i] <- cor(data[, i], total_score, use = "complete.obs")
}
names(correlations) <- colnames(data)
return(correlations)
}
# Example usage
# uncorrected_correlations <- calculate_uncorrected_item_total(scale_data)Sub-Domain Correlations
Inter-Item Correlations Within Sub-Domains
For correlations between items within the same sub-domain or factor:
\[r_{jk} = \frac{\text{Cov}(X_j, X_k)}{\sqrt{\text{Var}(X_j) \cdot \text{Var}(X_k)}}\]
Where \(X_j\) and \(X_k\) are items within the same sub-domain.
Corrected Item-Subdomain Correlation
For correlation between an item and its sub-domain total (excluding the item itself):
\[r_{i, S_{-i}} = \frac{\text{Cov}(X_i, S_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(S_{-i})}}\]
Where: - \(S_{-i}\) = sub-domain total excluding item \(i\) - \(S_{-i} = \sum_{j \in \text{subdomain}, j \neq i} X_j\)
# Calculate item-subdomain correlations
calculate_item_subdomain_correlations <- function(data, subdomain_items) {
correlations <- list()
for(subdomain_name in names(subdomain_items)) {
items <- subdomain_items[[subdomain_name]]
subdomain_data <- data[, items, drop = FALSE]
subdomain_correlations <- numeric(length(items))
for(i in 1:length(items)) {
item_score <- subdomain_data[, i]
subdomain_total_minus_item <- rowSums(subdomain_data[, -i, drop = FALSE])
subdomain_correlations[i] <- cor(item_score, subdomain_total_minus_item,
use = "complete.obs")
}
names(subdomain_correlations) <- items
correlations[[subdomain_name]] <- subdomain_correlations
}
return(correlations)
}
# Example usage
# Define subdomain structure
# subdomain_items <- list(
# "Factor1" = c("item1", "item2", "item3"),
# "Factor2" = c("item4", "item5", "item6")
# )
#
# subdomain_correlations <- calculate_item_subdomain_correlations(scale_data, subdomain_items)Inter-Item Correlation Matrix Within Sub-Domains
# Calculate correlation matrix for items within each subdomain
calculate_subdomain_correlation_matrix <- function(data, subdomain_items) {
correlation_matrices <- list()
for(subdomain_name in names(subdomain_items)) {
items <- subdomain_items[[subdomain_name]]
subdomain_data <- data[, items, drop = FALSE]
# Calculate correlation matrix
correlation_matrices[[subdomain_name]] <- cor(subdomain_data, use = "complete.obs")
}
return(correlation_matrices)
}
# Example usage
# subdomain_cor_matrices <- calculate_subdomain_correlation_matrix(scale_data, subdomain_items)Comprehensive Analysis Function
# Comprehensive correlation analysis
analyze_scale_correlations <- function(data, subdomain_items = NULL) {
results <- list()
# Overall scale correlations
results$corrected_item_total <- calculate_corrected_item_total(data)
results$uncorrected_item_total <- calculate_uncorrected_item_total(data)
# Sub-domain analyses (if specified)
if(!is.null(subdomain_items)) {
results$item_subdomain_correlations <- calculate_item_subdomain_correlations(data, subdomain_items)
results$subdomain_correlation_matrices <- calculate_subdomain_correlation_matrix(data, subdomain_items)
}
return(results)
}
# Example comprehensive analysis
# scale_analysis <- analyze_scale_correlations(scale_data, subdomain_items)Interpretation Guidelines
- Corrected item-total correlations should typically be > 0.30 for acceptable items
- Inter-item correlations within sub-domains should be moderate (0.30-0.70)
- Very high correlations (> 0.85) may indicate item redundancy
- Very low correlations (< 0.20) may indicate poor item fit
Subscales
How should items in a sub-scale correlate to each other and the general factor?
Clark and Watson (2019) describe a two sub-scale model of self-harm. The general factor was self harm the two subscales were suicide potential and low self-esteem. There were 16 items across both subscales.
Key findings:
- General Factor Dominance - all 16 items loaded most strongly onto a single factor (self-harm) (average loading = .52, range .41-.66)
- Clean Sub-Factor Structure - When two factors extracted and rotated:
- Items loaded strongly on their intended sub-factor (.57 average)
- Items loaded weakly on the other sub-factor (.17 average)
- This shows good discriminant validity between subscales.
- Inter-Factor Correlation - The two subscales correlated .48 with each other.
Relevance to General vs. Sub-Factor Correlations
This example demonstrates several important principles:
Hierarchical Structure Pattern
- Items can simultaneously contribute to both a general factor (self-harm broadly) and specific factors (suicide vs. low self-esteem)
- This is exactly what you’d see in a bifactor model or higher-order factor model
Expected Correlation Patterns
- Sub-factor correlation (.48): Falls in the ideal range of .30-.70 for related but distinct constructs
- General factor loadings (.52 average): Shows all items tap into the overarching construct
- Specific factor loadings (.57 primary, .17 non-primary): Shows items have both shared and unique variance
Practical Implications
This pattern suggests:
- A total score is justified (because of the strong general factor)
- Subscale scores are also meaningful (because of clean factor separation)
- The subscales are related but distinct (moderate .48 correlation)
Why This Matters
This example illustrates the “ideal” correlation pattern Clark is recommending for hierarchically dimensional scales:
- Strong general factor: All items should load meaningfully on the overall construct
- Distinct subfactors: Items should show higher loadings on their intended subfactor
- Moderate inter-factor correlations: Subfactors should be related but not redundant
This is exactly the kind of evidence you’d want to present when justifying both total scale scores and subscale scores in a multidimensional measure.RetryClaude can make mistakes. Please double-check responses.