Notes

Scale Item Correlation Equations

Item-Total Correlations

Corrected Item-Total Correlation

The corrected item-total correlation removes the focal item from the total score to avoid spurious inflation:

\[r_{i, T_{-i}} = \frac{\text{Cov}(X_i, T_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T_{-i})}}\]

Where: - \(X_i\) = score on item \(i\) - \(T_{-i}\) = total score excluding item \(i\), calculated as \(T_{-i} = \sum_{j \neq i} X_j\) - \(\text{Cov}(X_i, T_{-i})\) = covariance between item \(i\) and corrected total - \(\text{Var}(X_i)\) and \(\text{Var}(T_{-i})\) = variances of item \(i\) and corrected total

# Calculate corrected item-total correlations
calculate_corrected_item_total <- function(data) {
  n_items <- ncol(data)
  correlations <- numeric(n_items)
  
  for(i in 1:n_items) {
    item_score <- data[, i]
    total_minus_item <- rowSums(data[, -i, drop = FALSE])
    correlations[i] <- cor(item_score, total_minus_item, use = "complete.obs")
  }
  
  names(correlations) <- colnames(data)
  return(correlations)
}

# Example usage
# corrected_correlations <- calculate_corrected_item_total(scale_data)

Uncorrected Item-Total Correlation

The uncorrected item-total correlation includes the focal item in the total score:

\[r_{i,T} = \frac{\text{Cov}(X_i, T)}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T)}}\]

Where: - \(T = \sum_{j=1}^{k} X_j\) (total score including all \(k\) items)

# Calculate uncorrected item-total correlations
calculate_uncorrected_item_total <- function(data) {
  total_score <- rowSums(data)
  correlations <- numeric(ncol(data))
  
  for(i in 1:ncol(data)) {
    correlations[i] <- cor(data[, i], total_score, use = "complete.obs")
  }
  
  names(correlations) <- colnames(data)
  return(correlations)
}

# Example usage
# uncorrected_correlations <- calculate_uncorrected_item_total(scale_data)

Sub-Domain Correlations

Inter-Item Correlations Within Sub-Domains

For correlations between items within the same sub-domain or factor:

\[r_{jk} = \frac{\text{Cov}(X_j, X_k)}{\sqrt{\text{Var}(X_j) \cdot \text{Var}(X_k)}}\]

Where \(X_j\) and \(X_k\) are items within the same sub-domain.

Corrected Item-Subdomain Correlation

For correlation between an item and its sub-domain total (excluding the item itself):

\[r_{i, S_{-i}} = \frac{\text{Cov}(X_i, S_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(S_{-i})}}\]

Where: - \(S_{-i}\) = sub-domain total excluding item \(i\) - \(S_{-i} = \sum_{j \in \text{subdomain}, j \neq i} X_j\)

# Calculate item-subdomain correlations
calculate_item_subdomain_correlations <- function(data, subdomain_items) {
  correlations <- list()
  
  for(subdomain_name in names(subdomain_items)) {
    items <- subdomain_items[[subdomain_name]]
    subdomain_data <- data[, items, drop = FALSE]
    
    subdomain_correlations <- numeric(length(items))
    
    for(i in 1:length(items)) {
      item_score <- subdomain_data[, i]
      subdomain_total_minus_item <- rowSums(subdomain_data[, -i, drop = FALSE])
      subdomain_correlations[i] <- cor(item_score, subdomain_total_minus_item, 
                                     use = "complete.obs")
    }
    
    names(subdomain_correlations) <- items
    correlations[[subdomain_name]] <- subdomain_correlations
  }
  
  return(correlations)
}

# Example usage
# Define subdomain structure
# subdomain_items <- list(
#   "Factor1" = c("item1", "item2", "item3"),
#   "Factor2" = c("item4", "item5", "item6")
# )
# 
# subdomain_correlations <- calculate_item_subdomain_correlations(scale_data, subdomain_items)

Inter-Item Correlation Matrix Within Sub-Domains

# Calculate correlation matrix for items within each subdomain
calculate_subdomain_correlation_matrix <- function(data, subdomain_items) {
  correlation_matrices <- list()
  
  for(subdomain_name in names(subdomain_items)) {
    items <- subdomain_items[[subdomain_name]]
    subdomain_data <- data[, items, drop = FALSE]
    
    # Calculate correlation matrix
    correlation_matrices[[subdomain_name]] <- cor(subdomain_data, use = "complete.obs")
  }
  
  return(correlation_matrices)
}

# Example usage
# subdomain_cor_matrices <- calculate_subdomain_correlation_matrix(scale_data, subdomain_items)

Comprehensive Analysis Function

# Comprehensive correlation analysis
analyze_scale_correlations <- function(data, subdomain_items = NULL) {
  results <- list()
  
  # Overall scale correlations
  results$corrected_item_total <- calculate_corrected_item_total(data)
  results$uncorrected_item_total <- calculate_uncorrected_item_total(data)
  
  # Sub-domain analyses (if specified)
  if(!is.null(subdomain_items)) {
    results$item_subdomain_correlations <- calculate_item_subdomain_correlations(data, subdomain_items)
    results$subdomain_correlation_matrices <- calculate_subdomain_correlation_matrix(data, subdomain_items)
  }
  
  return(results)
}

# Example comprehensive analysis
# scale_analysis <- analyze_scale_correlations(scale_data, subdomain_items)

Interpretation Guidelines

Corrected item-total correlations should typically be > 0.30 for acceptable items
Inter-item correlations within sub-domains should be moderate (0.30-0.70)
Very high correlations (> 0.85) may indicate item redundancy
Very low correlations (< 0.20) may indicate poor item fit

Subscales

How should items in a sub-scale correlate to each other and the general factor?

Clark and Watson (2019) describe a two sub-scale model of self-harm. The general factor was self harm the two subscales were suicide potential and low self-esteem. There were 16 items across both subscales.

Key findings:

1. General Factor Dominance - all 16 items loaded most strongly onto a single factor (self-harm) (average loading = .52, range .41-.66)
1. Clean Sub-Factor Structure - When two factors extracted and rotated:
- Items loaded strongly on their intended sub-factor (.57 average)
- Items loaded weakly on the other sub-factor (.17 average)
- This shows good discriminant validity between subscales.
1. Inter-Factor Correlation - The two subscales correlated .48 with each other.

Relevance to General vs. Sub-Factor Correlations

This example demonstrates several important principles:

Hierarchical Structure Pattern

Items can simultaneously contribute to both a general factor (self-harm broadly) and specific factors (suicide vs. low self-esteem)
This is exactly what you’d see in a bifactor model or higher-order factor model

Expected Correlation Patterns

Sub-factor correlation (.48): Falls in the ideal range of .30-.70 for related but distinct constructs
General factor loadings (.52 average): Shows all items tap into the overarching construct
Specific factor loadings (.57 primary, .17 non-primary): Shows items have both shared and unique variance

Practical Implications

This pattern suggests:

A total score is justified (because of the strong general factor)
Subscale scores are also meaningful (because of clean factor separation)
The subscales are related but distinct (moderate .48 correlation)

Why This Matters

This example illustrates the “ideal” correlation pattern Clark is recommending for hierarchically dimensional scales:

Strong general factor: All items should load meaningfully on the overall construct
Distinct subfactors: Items should show higher loadings on their intended subfactor
Moderate inter-factor correlations: Subfactors should be related but not redundant

This is exactly the kind of evidence you’d want to present when justifying both total scale scores and subscale scores in a multidimensional measure.RetryClaude can make mistakes. Please double-check responses.