Notes

Scale Item Correlation Equations

Item-Total Correlations

Corrected Item-Total Correlation

The corrected item-total correlation removes the focal item from the total score to avoid spurious inflation:

\[r_{i, T_{-i}} = \frac{\text{Cov}(X_i, T_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T_{-i})}}\]

Where: - \(X_i\) = score on item \(i\) - \(T_{-i}\) = total score excluding item \(i\), calculated as \(T_{-i} = \sum_{j \neq i} X_j\) - \(\text{Cov}(X_i, T_{-i})\) = covariance between item \(i\) and corrected total - \(\text{Var}(X_i)\) and \(\text{Var}(T_{-i})\) = variances of item \(i\) and corrected total

# Calculate corrected item-total correlations
calculate_corrected_item_total <- function(data) {
  n_items <- ncol(data)
  correlations <- numeric(n_items)
  
  for(i in 1:n_items) {
    item_score <- data[, i]
    total_minus_item <- rowSums(data[, -i, drop = FALSE])
    correlations[i] <- cor(item_score, total_minus_item, use = "complete.obs")
  }
  
  names(correlations) <- colnames(data)
  return(correlations)
}

# Example usage
# corrected_correlations <- calculate_corrected_item_total(scale_data)

Uncorrected Item-Total Correlation

The uncorrected item-total correlation includes the focal item in the total score:

\[r_{i,T} = \frac{\text{Cov}(X_i, T)}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(T)}}\]

Where: - \(T = \sum_{j=1}^{k} X_j\) (total score including all \(k\) items)

# Calculate uncorrected item-total correlations
calculate_uncorrected_item_total <- function(data) {
  total_score <- rowSums(data)
  correlations <- numeric(ncol(data))
  
  for(i in 1:ncol(data)) {
    correlations[i] <- cor(data[, i], total_score, use = "complete.obs")
  }
  
  names(correlations) <- colnames(data)
  return(correlations)
}

# Example usage
# uncorrected_correlations <- calculate_uncorrected_item_total(scale_data)

Sub-Domain Correlations

Inter-Item Correlations Within Sub-Domains

For correlations between items within the same sub-domain or factor:

\[r_{jk} = \frac{\text{Cov}(X_j, X_k)}{\sqrt{\text{Var}(X_j) \cdot \text{Var}(X_k)}}\]

Where \(X_j\) and \(X_k\) are items within the same sub-domain.

Corrected Item-Subdomain Correlation

For correlation between an item and its sub-domain total (excluding the item itself):

\[r_{i, S_{-i}} = \frac{\text{Cov}(X_i, S_{-i})}{\sqrt{\text{Var}(X_i) \cdot \text{Var}(S_{-i})}}\]

Where: - \(S_{-i}\) = sub-domain total excluding item \(i\) - \(S_{-i} = \sum_{j \in \text{subdomain}, j \neq i} X_j\)

# Calculate item-subdomain correlations
calculate_item_subdomain_correlations <- function(data, subdomain_items) {
  correlations <- list()
  
  for(subdomain_name in names(subdomain_items)) {
    items <- subdomain_items[[subdomain_name]]
    subdomain_data <- data[, items, drop = FALSE]
    
    subdomain_correlations <- numeric(length(items))
    
    for(i in 1:length(items)) {
      item_score <- subdomain_data[, i]
      subdomain_total_minus_item <- rowSums(subdomain_data[, -i, drop = FALSE])
      subdomain_correlations[i] <- cor(item_score, subdomain_total_minus_item, 
                                     use = "complete.obs")
    }
    
    names(subdomain_correlations) <- items
    correlations[[subdomain_name]] <- subdomain_correlations
  }
  
  return(correlations)
}

# Example usage
# Define subdomain structure
# subdomain_items <- list(
#   "Factor1" = c("item1", "item2", "item3"),
#   "Factor2" = c("item4", "item5", "item6")
# )
# 
# subdomain_correlations <- calculate_item_subdomain_correlations(scale_data, subdomain_items)

Inter-Item Correlation Matrix Within Sub-Domains

# Calculate correlation matrix for items within each subdomain
calculate_subdomain_correlation_matrix <- function(data, subdomain_items) {
  correlation_matrices <- list()
  
  for(subdomain_name in names(subdomain_items)) {
    items <- subdomain_items[[subdomain_name]]
    subdomain_data <- data[, items, drop = FALSE]
    
    # Calculate correlation matrix
    correlation_matrices[[subdomain_name]] <- cor(subdomain_data, use = "complete.obs")
  }
  
  return(correlation_matrices)
}

# Example usage
# subdomain_cor_matrices <- calculate_subdomain_correlation_matrix(scale_data, subdomain_items)

Comprehensive Analysis Function

# Comprehensive correlation analysis
analyze_scale_correlations <- function(data, subdomain_items = NULL) {
  results <- list()
  
  # Overall scale correlations
  results$corrected_item_total <- calculate_corrected_item_total(data)
  results$uncorrected_item_total <- calculate_uncorrected_item_total(data)
  
  # Sub-domain analyses (if specified)
  if(!is.null(subdomain_items)) {
    results$item_subdomain_correlations <- calculate_item_subdomain_correlations(data, subdomain_items)
    results$subdomain_correlation_matrices <- calculate_subdomain_correlation_matrix(data, subdomain_items)
  }
  
  return(results)
}

# Example comprehensive analysis
# scale_analysis <- analyze_scale_correlations(scale_data, subdomain_items)

Interpretation Guidelines

  • Corrected item-total correlations should typically be > 0.30 for acceptable items
  • Inter-item correlations within sub-domains should be moderate (0.30-0.70)
  • Very high correlations (> 0.85) may indicate item redundancy
  • Very low correlations (< 0.20) may indicate poor item fit

Subscales

How should items in a sub-scale correlate to each other and the general factor?

Clark and Watson (2019) describe a two sub-scale model of self-harm. The general factor was self harm the two subscales were suicide potential and low self-esteem. There were 16 items across both subscales.

Key findings:

    1. General Factor Dominance - all 16 items loaded most strongly onto a single factor (self-harm) (average loading = .52, range .41-.66)
    1. Clean Sub-Factor Structure - When two factors extracted and rotated:
    • Items loaded strongly on their intended sub-factor (.57 average)
    • Items loaded weakly on the other sub-factor (.17 average)
    • This shows good discriminant validity between subscales.
    1. Inter-Factor Correlation - The two subscales correlated .48 with each other.

Relevance to General vs. Sub-Factor Correlations

This example demonstrates several important principles:

Hierarchical Structure Pattern

  • Items can simultaneously contribute to both a general factor (self-harm broadly) and specific factors (suicide vs. low self-esteem)
  • This is exactly what you’d see in a bifactor model or higher-order factor model

Expected Correlation Patterns

  • Sub-factor correlation (.48): Falls in the ideal range of .30-.70 for related but distinct constructs
  • General factor loadings (.52 average): Shows all items tap into the overarching construct
  • Specific factor loadings (.57 primary, .17 non-primary): Shows items have both shared and unique variance

Practical Implications

This pattern suggests:

  • A total score is justified (because of the strong general factor)
  • Subscale scores are also meaningful (because of clean factor separation)
  • The subscales are related but distinct (moderate .48 correlation)

Why This Matters

This example illustrates the “ideal” correlation pattern Clark is recommending for hierarchically dimensional scales:

  1. Strong general factor: All items should load meaningfully on the overall construct
  2. Distinct subfactors: Items should show higher loadings on their intended subfactor
  3. Moderate inter-factor correlations: Subfactors should be related but not redundant

This is exactly the kind of evidence you’d want to present when justifying both total scale scores and subscale scores in a multidimensional measure.RetryClaude can make mistakes. Please double-check responses.