5. Guidelines in Scale Development

This chapter provides a systematic approach to developing reliable and valid measurement scales. We’ll explore all nine critical steps in the scale development process, with interactive examples and R code to reinforce key concepts.

1 Step 1: Determine Clearly What It Is You Want to Measure

The foundation of any good scale is a clear understanding of what you’re trying to measure. Ambiguity at this stage will cascade through the entire development process.

1.1 Theory as an Aid to Clarity

Theoretical frameworks provide the conceptual foundation for your scale by: - Defining the construct in relation to other variables - Specifying the boundaries of what should and shouldn’t be included - Guiding predictions about how the scale should behave

# Example: Job Satisfaction construct mapping
library(tidyverse)
library(psych)
library(ggplot2)

# Theoretical components of job satisfaction
job_sat_theory <- data.frame(
  Component = c("Work Content", "Supervision", "Pay", "Colleagues", "Promotion"),
  Definition = c("Tasks and responsibilities", "Quality of management", 
                "Compensation adequacy", "Peer relationships", "Growth opportunities"),
  Related_Constructs = c("Job Characteristics", "Leadership", "Equity Theory", 
                        "Social Support", "Career Development")
)

knitr::kable(job_sat_theory, caption = "Theoretical Framework for Job Satisfaction Scale")

Theoretical Framework for Job Satisfaction Scale
Component	Definition	Related_Constructs
Work Content	Tasks and responsibilities	Job Characteristics
Supervision	Quality of management	Leadership
Pay	Compensation adequacy	Equity Theory
Colleagues	Peer relationships	Social Support
Promotion	Growth opportunities	Career Development

1.2 Specificity as an Aid to Clarity

Moving from broad constructs to specific, measurable components:

Specificity Hierarchy

Broad: “Personality”
Intermediate: “Extraversion”
Specific: “Comfort in social situations with strangers”

1.3 Being Clear About What to Include in a Measure

Inclusion Criteria Checklist: - Does this aspect directly relate to the theoretical definition? - Can this be reliably observed or reported? - Is this distinct from related but different constructs?

# Interactive exercise: Categorize potential items
potential_items <- c(
  "I enjoy meeting new people",           # Core extraversion
  "I am physically healthy",              # Not extraversion
  "I feel energized in groups",           # Core extraversion  
  "I have many friends",                  # Outcome of extraversion
  "I speak loudly in conversations"       # Behavioral indicator
)

# Students can categorize these as: Core, Related, Irrelevant

2 Step 2: Generate an Item Pool

2.1 Choose Items That Reflect the Scale’s Purpose

Items should be directly aligned with your theoretical definition and collectively comprehensive of the construct domain.

# Example item pool generation for Academic Self-Efficacy
set.seed(123)

# Generate example item pool
item_pool <- data.frame(
  Item_ID = paste0("ASE_", 1:20),
  Item_Text = c(
    "I can learn difficult academic material",
    "I am confident in my study abilities", 
    "I can complete challenging assignments",
    "I believe I can succeed in my courses",
    "I can understand complex concepts when I try",
    "I am capable of getting good grades",
    "I can handle academic pressure well",
    "I trust my academic judgment",
    "I can solve difficult problems in my field",
    "I am confident in my test-taking abilities",
    "I can manage my study time effectively",
    "I believe in my academic capabilities",
    "I can overcome academic setbacks",
    "I am sure I can learn skills in my major",
    "I can adapt to different teaching styles",
    "I am confident presenting academic work",
    "I can critically evaluate information",
    "I believe I can meet academic standards",
    "I can persist through difficult coursework",
    "I am confident in my academic decisions"
  ),
  Domain = rep(c("Ability Beliefs", "Outcome Expectations", "Persistence", "Judgment"), 5)
)

# Display item pool summary
item_pool %>% 
  count(Domain) %>%
  knitr::kable(caption = "Item Pool Distribution by Domain")

Item Pool Distribution by Domain
Domain	n
Ability Beliefs	5
Judgment	5
Outcome Expectations	5
Persistence	5

2.2 Redundancy

Planned redundancy is essential in early stages: - Protects against item loss during validation - Allows selection of best-performing items - Ensures comprehensive domain coverage

# Demonstrate redundancy with correlation matrix
# Simulate responses to redundant items measuring same facet
n_participants <- 200
true_score <- rnorm(n_participants, mean = 0, sd = 1)

# Three redundant items with different amounts of error
item1 <- true_score + rnorm(n_participants, 0, 0.3)  # Less error
item2 <- true_score + rnorm(n_participants, 0, 0.5)  # Moderate error  
item3 <- true_score + rnorm(n_participants, 0, 0.7)  # More error

redundant_items <- data.frame(
  Confidence_1 = item1,
  Confidence_2 = item2, 
  Confidence_3 = item3
)

# Show correlations
cor_matrix <- cor(redundant_items)
print(round(cor_matrix, 2))

             Confidence_1 Confidence_2 Confidence_3
Confidence_1         1.00         0.84         0.76
Confidence_2         0.84         1.00         0.71
Confidence_3         0.76         0.71         1.00

2.3 Number of Items

Initial item pool should be 3-4 times larger than intended final scale

Final Scale Length	Initial Item Pool
5 items	15-20 items
10 items	30-40 items
20 items	60-80 items

2.4 Beginning the Process of Writing Items

Item Writing Guidelines: 1. Use simple, clear language 2. Avoid double-barreled questions 3. Match reading level to target population 4. Ensure items are answerable by all respondents

2.5 Characteristics of Good and Bad Items

# Examples of good vs. poor items
item_examples <- data.frame(
  Quality = c("Good", "Poor", "Good", "Poor"),
  Item = c(
    "I feel confident when speaking in public",
    "I feel confident when speaking in public and also when writing reports", # Double-barreled
    "I enjoy social gatherings",
    "Don't you think that social gatherings are usually enjoyable?" # Leading
  ),
  Problem = c("Clear, single concept", "Double-barreled", "Direct statement", "Leading question")
)

knitr::kable(item_examples, caption = "Examples of Good vs. Poor Item Writing")

Examples of Good vs. Poor Item Writing
Quality	Item	Problem
Good	I feel confident when speaking in public	Clear, single concept
Poor	I feel confident when speaking in public and also when writing reports	Double-barreled
Good	I enjoy social gatherings	Direct statement
Poor	Don’t you think that social gatherings are usually enjoyable?	Leading question

2.6 Positively and Negatively Worded Items

Reverse-Coded Items Caution

While reverse-coded items can control for acquiescence bias, they often: - Create method factors unrelated to the construct - Reduce scale reliability - Confuse respondents - Should be used sparingly and with careful consideration

# Simulate effect of reverse-coded items on factor structure
library(lavaan)

# Generate data with acquiescence bias
set.seed(456)
n <- 300
true_construct <- rnorm(n)
acquiescence <- rnorm(n, 0, 0.3)

# Forward items influenced by both construct and acquiescence
forward1 <- true_construct + acquiescence + rnorm(n, 0, 0.4)
forward2 <- true_construct + acquiescence + rnorm(n, 0, 0.4)

# Reverse items: construct effect reversed, but acquiescence still positive
reverse1 <- -true_construct + acquiescence + rnorm(n, 0, 0.4)
reverse2 <- -true_construct + acquiescence + rnorm(n, 0, 0.4)

mixed_scale <- data.frame(
  Forward1 = forward1,
  Forward2 = forward2, 
  Reverse1 = reverse1,
  Reverse2 = reverse2
)

# Show how reverse items can create artificial factors
fa_result <- fa(mixed_scale, nfactors = 2)
print(fa_result$loadings, cutoff = 0.3)


Loadings:
         MR1    MR2   
Forward1         1.031
Forward2         0.673
Reverse1  0.866       
Reverse2  0.957       

                 MR1   MR2
SS loadings    1.725 1.523
Proportion Var 0.431 0.381
Cumulative Var 0.431 0.812

3 Step 3: Determine the Format for Measurement

3.1 Thurstone Scaling

Historical approach where items are pre-scaled by judges to represent different levels of the attribute. Rarely used today due to: - Labor-intensive development process - Assumption of equal intervals - Limited flexibility

3.2 Guttman Scaling

Cumulative scaling where items form a hierarchy - endorsing a higher item implies endorsing all lower items.

# Example: Math ability Guttman scale
guttman_items <- data.frame(
  Level = 1:5,
  Item = c(
    "I can add single-digit numbers",
    "I can multiply two-digit numbers", 
    "I can solve linear equations",
    "I can work with quadratic equations",
    "I can solve calculus problems"
  ),
  Expected_Pattern = c("11111", "01111", "00111", "00011", "00001")
)

knitr::kable(guttman_items, caption = "Guttman Scale Example: Math Ability")

Guttman Scale Example: Math Ability
Level	Item	Expected_Pattern
1	I can add single-digit numbers	11111
2	I can multiply two-digit numbers	01111
3	I can solve linear equations	00111
4	I can work with quadratic equations	00011
5	I can solve calculus problems	00001

3.3 Scales With Equally Weighted Items

Most common approach
- items receive equal weight and are summed or averaged.

3.4 How Many Response Categories?

# Simulate reliability across different numbers of response options
simulate_reliability <- function(n_categories, n_items = 10, n_people = 200) {
  true_scores <- rnorm(n_people)
  
  responses <- matrix(NA, n_people, n_items)
  for(i in 1:n_items) {
    # Convert continuous to categorical
    continuous_response <- true_scores + rnorm(n_people, 0, 0.5)
    responses[,i] <- cut(continuous_response, 
                        breaks = n_categories, 
                        labels = FALSE)
  }
  
  alpha(responses)$total$raw_alpha
}

# Test different numbers of categories
categories <- 2:7
reliabilities <- sapply(categories, simulate_reliability)

reliability_data <- data.frame(
  Categories = categories,
  Alpha = reliabilities
)

# Plot results
ggplot(reliability_data, aes(x = Categories, y = Alpha)) +
  geom_line() + geom_point() +
  labs(title = "Scale Reliability by Number of Response Categories",
       x = "Number of Response Categories",
       y = "Cronbach's Alpha") +
  theme_minimal()

3.5 Specific Types of Response Formats

Likert Scale

Most popular format
- statements with agreement levels:

# Example Likert items and response options
likert_example <- data.frame(
  Statement = c(
    "I enjoy challenging myself with difficult tasks",
    "I prefer to avoid situations where I might fail",
    "I am motivated by competition with others"
  ),
  Response_Options = rep("1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree", 3)
)

knitr::kable(likert_example, caption = "Likert Scale Example")

Likert Scale Example
Statement	Response_Options
I enjoy challenging myself with difficult tasks	1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree
I prefer to avoid situations where I might fail	1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree
I am motivated by competition with others	1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree

Semantic Differential

Bipolar adjectives with rating scales between them:

Friendly  ___:___:___:___:___:___:___  Unfriendly
   1      2   3   4   5   6   7

Fast      ___:___:___:___:___:___:___  Slow  
   1      2   3   4   5   6   7

Visual Analog

Continuous line where respondents mark their position:

Not at all confident |________________| Extremely confident
                     0                100

4 Step 4: Have Initial Item Pool Reviewed by Experts

Expert review is crucial for establishing content validity and identifying potential problems before data collection.

4.1 Expert Selection Criteria

Subject matter expertise in the construct domain
Experience with scale development or psychometrics
Representation of key stakeholder perspectives
Typically 3-10 experts depending on construct complexity

# Expert review evaluation framework
expert_criteria <- data.frame(
  Criterion = c("Relevance", "Clarity", "Comprehensiveness", "Redundancy", "Bias"),
  Description = c(
    "Does item measure the intended construct?",
    "Is the item clearly worded and unambiguous?", 
    "Do items cover all aspects of the construct?",
    "Are there unnecessary duplicate items?",
    "Are items free from cultural/demographic bias?"
  ),
  Rating_Scale = rep("1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent", 5)
)

knitr::kable(expert_criteria, caption = "Expert Review Evaluation Framework")

Expert Review Evaluation Framework
Criterion	Description	Rating_Scale
Relevance	Does item measure the intended construct?	1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent
Clarity	Is the item clearly worded and unambiguous?	1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent
Comprehensiveness	Do items cover all aspects of the construct?	1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent
Redundancy	Are there unnecessary duplicate items?	1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent
Bias	Are items free from cultural/demographic bias?	1-4 scale: 1=Poor, 2=Fair, 3=Good, 4=Excellent

4.2 Content Validity Ratio (CVR)

Quantify expert agreement on item necessity:

# Calculate Content Validity Ratio
calculate_cvr <- function(n_essential, n_experts) {
  cvr <- (n_essential - (n_experts/2)) / (n_experts/2)
  return(cvr)
}

# Example: 8 experts, varying levels of agreement
expert_data <- data.frame(
  Item = paste0("Item_", 1:6),
  N_Essential = c(8, 7, 6, 5, 4, 2),
  N_Experts = rep(8, 6)
)

expert_data$CVR <- calculate_cvr(expert_data$N_Essential, expert_data$N_Experts)
expert_data$Decision <- ifelse(expert_data$CVR >= 0.75, "Retain", 
                              ifelse(expert_data$CVR >= 0.5, "Revise", "Remove"))

knitr::kable(expert_data, caption = "Content Validity Ratio Results", digits = 2)

Content Validity Ratio Results
Item	N_Essential	N_Experts	CVR	Decision
Item_1	8	8	1.00	Retain
Item_2	7	8	0.75	Retain
Item_3	6	8	0.50	Revise
Item_4	5	8	0.25	Remove
Item_5	4	8	0.00	Remove
Item_6	2	8	-0.50	Remove

5 Step 5: Cognitive Interviewing

Think-aloud protocols to understand how respondents interpret and process items.

5.1 Cognitive Interview Protocol

Comprehension: “What does this question mean to you?”
Retrieval: “How do you go about answering this?”
Judgment: “How confident are you in your answer?”
Response: “Why did you choose that response option?”

# Common issues identified in cognitive interviews
cognitive_issues <- data.frame(
  Issue_Type = c("Comprehension", "Retrieval", "Judgment", "Response", "Other"),
  Example = c(
    "Unclear technical terms or jargon",
    "Difficulty recalling relevant experiences", 
    "Uncertain about appropriate reference period",
    "Response options don't match experience",
    "Leading or socially desirable responding"
  ),
  Solution = c(
    "Simplify language, add definitions",
    "Provide memory aids or examples",
    "Clarify time frame explicitly", 
    "Expand or modify response options",
    "Reword to reduce bias"
  )
)

knitr::kable(cognitive_issues, caption = "Common Cognitive Interview Findings")

Common Cognitive Interview Findings
Issue_Type	Example	Solution
Comprehension	Unclear technical terms or jargon	Simplify language, add definitions
Retrieval	Difficulty recalling relevant experiences	Provide memory aids or examples
Judgment	Uncertain about appropriate reference period	Clarify time frame explicitly
Response	Response options don’t match experience	Expand or modify response options
Other	Leading or socially desirable responding	Reword to reduce bias

5.2 Sample Size for Cognitive Interviews

5-15 participants typically sufficient
Continue until saturation (no new issues emerge)
Include diverse demographic representation

6 Step 6: Consider Inclusion of Validation Items

Strategic inclusion of items to assess convergent and discriminant validity.

6.1 Types of Validation Items

validation_types <- data.frame(
  Type = c("Convergent", "Discriminant", "Known Groups", "Criterion"),
  Purpose = c(
    "Should correlate highly with your scale",
    "Should correlate minimally with your scale",
    "Should differentiate between relevant groups", 
    "Should predict important outcomes"
  ),
  Example = c(
    "Existing validated scale measuring same construct",
    "Scale measuring theoretically unrelated construct",
    "Expert vs. novice groups on expertise scale",
    "Job performance for job satisfaction scale"
  )
)

knitr::kable(validation_types, caption = "Types of Validation Items")

Types of Validation Items
Type	Purpose	Example
Convergent	Should correlate highly with your scale	Existing validated scale measuring same construct
Discriminant	Should correlate minimally with your scale	Scale measuring theoretically unrelated construct
Known Groups	Should differentiate between relevant groups	Expert vs. novice groups on expertise scale
Criterion	Should predict important outcomes	Job performance for job satisfaction scale

6.2 Planning Validation Strategy

# Example validation matrix for Academic Self-Efficacy scale
validation_matrix <- data.frame(
  Validation_Scale = c("General Self-Efficacy", "Academic Achievement", 
                      "Test Anxiety", "Social Desirability"),
  Expected_Correlation = c("High Positive (.6-.8)", "Moderate Positive (.3-.5)",
                          "Moderate Negative (-.3 to -.5)", "Low (.0-.3)"),
  Validity_Type = c("Convergent", "Criterion", "Discriminant", "Response Bias")
)

knitr::kable(validation_matrix, caption = "Validation Strategy for Academic Self-Efficacy")

Validation Strategy for Academic Self-Efficacy
Validation_Scale	Expected_Correlation	Validity_Type
General Self-Efficacy	High Positive (.6-.8)	Convergent
Academic Achievement	Moderate Positive (.3-.5)	Criterion
Test Anxiety	Moderate Negative (-.3 to -.5)	Discriminant
Social Desirability	Low (.0-.3)	Response Bias

7 Step 7: Administer Items to a Development Sample

7.1 Sample Size Considerations

# Sample size guidelines for scale development
sample_guidelines <- data.frame(
  Analysis_Type = c("Item Analysis", "Exploratory FA", "Confirmatory FA", "IRT Analysis"),
  Minimum_N = c("5-10 per item", "5-10 per item", "10-20 per item", "500-1000+"),
  Recommended_N = c("200+", "300+", "400+", "1000+"),
  Considerations = c(
    "More stable item statistics",
    "Stable factor structure", 
    "Adequate power for fit indices",
    "Stable item parameters"
  )
)

knitr::kable(sample_guidelines, caption = "Sample Size Guidelines by Analysis Type")

Sample Size Guidelines by Analysis Type
Analysis_Type	Minimum_N	Recommended_N	Considerations
Item Analysis	5-10 per item	200+	More stable item statistics
Exploratory FA	5-10 per item	300+	Stable factor structure
Confirmatory FA	10-20 per item	400+	Adequate power for fit indices
IRT Analysis	500-1000+	1000+	Stable item parameters

7.2 Data Collection Best Practices

Data Collection Checklist

✅ Randomize item order (where theoretically appropriate)
✅ Include attention checks to identify careless responding
✅ Balance response options to avoid order effects
✅ Pilot test administration procedures
✅ Plan for missing data handling strategies

# Simulate development sample data
set.seed(2024)
n_participants <- 300
n_items <- 20

# Simulate true factor structure
factor1 <- rnorm(n_participants, 0, 1)  # Primary factor
factor2 <- rnorm(n_participants, 0, 0.3)  # Minor method factor

# Create item responses with varying quality
item_loadings <- c(0.8, 0.7, 0.6, 0.75, 0.65,  # Good items
                   0.4, 0.45, 0.35,              # Marginal items  
                   0.2, 0.15,                    # Poor items
                   0.7, 0.8, 0.6, 0.65, 0.7,    # More good items
                   0.3, 0.25, 0.4, 0.45, 0.5)   # Mixed quality

development_data <- matrix(NA, n_participants, n_items)
for(i in 1:n_items) {
  true_score <- item_loadings[i] * factor1 + 0.2 * factor2 + rnorm(n_participants, 0, 0.5)
  # Convert to 5-point scale
  development_data[,i] <- pmax(1, pmin(5, round(true_score + 3)))
}

colnames(development_data) <- paste0("Item_", 1:n_items)
development_df <- as.data.frame(development_data)

# Save simulated data
write.csv(development_df, "development_sample.csv", row.names = FALSE)

# Preview data
head(development_df[,1:6])

8 Step 8: Evaluate the Items

8.1 Initial Examination of Items’ Performance

Start with basic descriptive statistics to identify obvious problems.

# Calculate basic item statistics
item_stats <- describe(development_df)
item_stats$item <- rownames(item_stats)

# Flag potential problems
item_stats$floor_effect <- item_stats$mean <= 1.5
item_stats$ceiling_effect <- item_stats$mean >= 4.5  
item_stats$low_variance <- item_stats$sd < 0.8
item_stats$high_skew <- abs(item_stats$skew) > 2

# Display problematic items
problem_items <- item_stats[item_stats$floor_effect | item_stats$ceiling_effect | 
                           item_stats$low_variance | item_stats$high_skew, 
                           c("item", "mean", "sd", "skew", "floor_effect", 
                             "ceiling_effect", "low_variance", "high_skew")]

if(nrow(problem_items) > 0) {
  knitr::kable(problem_items, caption = "Items with Potential Problems")
} else {
  cat("No items flagged for major distributional problems")
}

Items with Potential Problems
	item	mean	sd	skew	floor_effect	ceiling_effect	low_variance	high_skew
Item_3	Item_3	3.083333	0.7560606	0.0470263	FALSE	FALSE	TRUE	FALSE
Item_6	Item_6	3.046667	0.6526167	-0.1182328	FALSE	FALSE	TRUE	FALSE
Item_7	Item_7	2.993333	0.7268999	-0.0420065	FALSE	FALSE	TRUE	FALSE
Item_8	Item_8	2.960000	0.6479192	-0.1092957	FALSE	FALSE	TRUE	FALSE
Item_9	Item_9	2.983333	0.6359273	0.0135736	FALSE	FALSE	TRUE	FALSE
Item_10	Item_10	3.050000	0.5790374	-0.0012877	FALSE	FALSE	TRUE	FALSE
Item_13	Item_13	3.006667	0.7540008	-0.0575419	FALSE	FALSE	TRUE	FALSE
Item_16	Item_16	3.013333	0.6592452	-0.1535553	FALSE	FALSE	TRUE	FALSE
Item_17	Item_17	3.046667	0.5466427	0.0298134	FALSE	FALSE	TRUE	FALSE
Item_18	Item_18	3.036667	0.6954142	0.0701706	FALSE	FALSE	TRUE	FALSE
Item_19	Item_19	3.056667	0.7270149	-0.0860590	FALSE	FALSE	TRUE	FALSE
Item_20	Item_20	3.026667	0.7171727	0.1237242	FALSE	FALSE	TRUE	FALSE

8.2 Reverse Scoring

Handle reverse-coded items before further analysis:

# Identify reverse-coded items (for demonstration, assume items 6, 8, 10 are reverse-coded)
reverse_items <- c("Item_6", "Item_8", "Item_10")

development_df_scored <- development_df
# Reverse score (for 5-point scale: 6 - original score)
development_df_scored[reverse_items] <- 6 - development_df_scored[reverse_items]

# Verify reversal worked
original_means <- colMeans(development_df[reverse_items])
reversed_means <- colMeans(development_df_scored[reverse_items])

comparison <- data.frame(
  Item = reverse_items,
  Original_Mean = original_means,
  Reversed_Mean = reversed_means,
  Sum_Check = original_means + reversed_means  # Should equal 6
)

knitr::kable(comparison, caption = "Reverse Scoring Verification", digits = 2)

Reverse Scoring Verification
	Item	Original_Mean	Reversed_Mean	Sum_Check
Item_6	Item_6	3.05	2.95	6
Item_8	Item_8	2.96	3.04	6
Item_10	Item_10	3.05	2.95	6

8.3 Item-Scale Correlations

Examine how well each item correlates with the total scale:

# Calculate corrected item-total correlations
total_score <- rowSums(development_df_scored)

item_total_cors <- sapply(development_df_scored, function(x) {
  # Corrected item-total correlation (remove item from total)
  corrected_total <- total_score - x
  cor(x, corrected_total)
})

# Create summary table
item_analysis <- data.frame(
  Item = names(item_total_cors),
  Item_Total_r = item_total_cors,
  Quality = ifelse(item_total_cors >= 0.7, "Excellent",
                  ifelse(item_total_cors >= 0.5, "Good", 
                        ifelse(item_total_cors >= 0.3, "Acceptable", "Poor")))
)

# Sort by correlation
item_analysis <- item_analysis[order(item_analysis$Item_Total_r, decreasing = TRUE),]

knitr::kable(item_analysis, caption = "Item-Total Correlations", digits = 3)

Item-Total Correlations
	Item	Item_Total_r	Quality
Item_12	Item_12	0.792	Excellent
Item_1	Item_1	0.773	Excellent
Item_15	Item_15	0.753	Excellent
Item_4	Item_4	0.745	Excellent
Item_2	Item_2	0.739	Excellent
Item_5	Item_5	0.725	Excellent
Item_11	Item_11	0.704	Excellent
Item_13	Item_13	0.701	Excellent
Item_14	Item_14	0.700	Good
Item_3	Item_3	0.613	Good
Item_7	Item_7	0.610	Good
Item_20	Item_20	0.593	Good
Item_18	Item_18	0.505	Good
Item_19	Item_19	0.496	Acceptable
Item_16	Item_16	0.450	Acceptable
Item_17	Item_17	0.403	Acceptable
Item_9	Item_9	0.344	Acceptable
Item_10	Item_10	-0.158	Poor
Item_8	Item_8	-0.499	Poor
Item_6	Item_6	-0.533	Poor

8.4 Item Variances

# Examine item variances
variances <- sapply(development_df_scored, var)
variance_summary <- data.frame(
  Statistic = c("Mean", "Median", "Min", "Max", "SD"),
  Value = c(mean(variances), median(variances), min(variances), 
           max(variances), sd(variances))
)

knitr::kable(variance_summary, caption = "Item Variance Summary", digits = 3)

Item Variance Summary
Statistic	Value
Mean	0.601
Median	0.549
Min	0.299
Max	1.009
SD	0.199

# Flag low variance items
low_var_threshold <- 0.8
low_variance_items <- names(variances[variances < low_var_threshold])
if(length(low_variance_items) > 0) {
  cat("Items with low variance (<", low_var_threshold, "):", paste(low_variance_items, collapse = ", "))
}

Items with low variance (< 0.8 ): Item_3, Item_4, Item_5, Item_6, Item_7, Item_8, Item_9, Item_10, Item_13, Item_14, Item_15, Item_16, Item_17, Item_18, Item_19, Item_20

8.5 Item Means

# Analyze item means for response bias
means <- colMeans(development_df_scored)
means_summary <- data.frame(
  Statistic = c("Mean", "Median", "Min", "Max", "SD"),
  Value = c(mean(means), median(means), min(means), max(means), sd(means))
)

knitr::kable(means_summary, caption = "Item Means Summary", digits = 3)

Item Means Summary
Statistic	Value
Mean	3.021
Median	3.028
Min	2.950
Max	3.083
SD	0.037

# Visualize item means
means_df <- data.frame(Item = factor(names(means), levels = names(means)), Mean = means)

ggplot(means_df, aes(x = Item, y = Mean)) +
  geom_point() +
  geom_hline(yintercept = 3, linetype = "dashed", color = "red") +  # Scale midpoint
  labs(title = "Item Means Distribution", 
       y = "Mean Response", x = "Item") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

8.6 Dimensionality

Explore the factor structure of your items:

# Parallel analysis to determine number of factors
# Parallel analysis
pa_result <- fa.parallel(development_df_scored, fa = "fa", n.iter = 100)

Parallel analysis suggests that the number of factors =  1  and the number of components =  NA

# Extract eigenvalues
eigenvalues <- pa_result$fa.values
n_factors_pa <- pa_result$nfact

cat("Parallel analysis suggests", n_factors_pa, "factors\n")

Parallel analysis suggests 1 factors

# Exploratory factor analysis
efa_result <- fa(development_df_scored, nfactors = n_factors_pa, rotate = "oblimin")

# Display factor loadings
print(efa_result$loadings, cutoff = 0.3)


Loadings:
        MR1   
Item_1   0.812
Item_2   0.784
Item_3   0.645
Item_4   0.789
Item_5   0.760
Item_6  -0.556
Item_7   0.620
Item_8  -0.518
Item_9   0.339
Item_10       
Item_11  0.751
Item_12  0.822
Item_13  0.730
Item_14  0.738
Item_15  0.803
Item_16  0.467
Item_17  0.413
Item_18  0.537
Item_19  0.529
Item_20  0.621

                 MR1
SS loadings    8.303
Proportion Var 0.415

# Factor analysis summary
fa_summary <- data.frame(
  Factor = paste0("Factor_", 1:n_factors_pa),
  Eigenvalue = efa_result$values[1:n_factors_pa],
  Proportion_Var = efa_result$values[1:n_factors_pa] / ncol(development_df_scored),
  Cumulative_Var = cumsum(efa_result$values[1:n_factors_pa]) / ncol(development_df_scored)
)

knitr::kable(fa_summary, caption = "Factor Analysis Summary", digits = 3)

Factor Analysis Summary
Factor	Eigenvalue	Proportion_Var	Cumulative_Var
Factor_1	8.303	0.415	0.415

8.7 Reliability

Calculate internal consistency reliability:

# Cronbach's alpha for full scale
alpha_full <- alpha(development_df_scored)

Some items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option

alpha_value <- alpha_full$total$raw_alpha

# McDonald's omega
omega_result <- omega(development_df_scored, plot = FALSE)
omega_value <- omega_result$omega.tot

# Alpha if item deleted
alpha_if_deleted <- alpha_full$alpha.drop$raw_alpha

reliability_summary <- data.frame(
  Measure = c("Cronbach's Alpha", "McDonald's Omega", "Split-Half", "Guttman L2"),
  Value = c(alpha_value, omega_value, 
           splitHalf(development_df_scored)$raw, 
           alpha_full$total$G2),
  Interpretation = c(
    ifelse(alpha_value >= 0.9, "Excellent", 
           ifelse(alpha_value >= 0.8, "Good", 
                  ifelse(alpha_value >= 0.7, "Acceptable", "Poor"))),
    ifelse(omega_value >= 0.9, "Excellent", 
           ifelse(omega_value >= 0.8, "Good", 
                  ifelse(omega_value >= 0.7, "Acceptable", "Poor"))),
    "Split-half reliability",
    "Guttman's Lambda 2"
  )
)

knitr::kable(reliability_summary, caption = "Reliability Analysis", digits = 3)

Reliability Analysis
Measure	Value	Interpretation
Cronbach’s Alpha	0.884	Good
McDonald’s Omega	0.935	Excellent
Split-Half	0.884	Split-half reliability
Guttman L2	0.935	Guttman’s Lambda 2

# Items that would improve alpha if deleted
improve_alpha <- names(alpha_if_deleted[alpha_if_deleted > alpha_value])
if(length(improve_alpha) > 0) {
  cat("\nItems that would improve alpha if deleted:", paste(improve_alpha, collapse = ", "))
}

9 Step 9: Optimize Scale Length

9.1 Effect of Scale Length on Reliability

Demonstrate the relationship between scale length and reliability:

# Spearman-Brown formula for reliability prediction
spearman_brown <- function(reliability, length_multiplier) {
  (length_multiplier * reliability) / (1 + (length_multiplier - 1) * reliability)
}

# Test different scale lengths
current_alpha <- alpha_value
length_multipliers <- c(0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 3)
predicted_alphas <- sapply(length_multipliers, function(x) spearman_brown(current_alpha, x))

length_analysis <- data.frame(
  Scale_Length = round(ncol(development_df_scored) * length_multipliers),
  Length_Ratio = length_multipliers,
  Predicted_Alpha = predicted_alphas,
  Alpha_Change = predicted_alphas - current_alpha
)

knitr::kable(length_analysis, caption = "Scale Length vs. Reliability", digits = 3)

Scale Length vs. Reliability
Scale_Length	Length_Ratio	Predicted_Alpha	Alpha_Change
5	0.25	0.656	-0.228
10	0.50	0.793	-0.092
15	0.75	0.851	-0.033
20	1.00	0.884	0.000
25	1.25	0.905	0.021
30	1.50	0.920	0.035
40	2.00	0.939	0.054
60	3.00	0.958	0.074

# Visualize relationship
ggplot(length_analysis, aes(x = Scale_Length, y = Predicted_Alpha)) +
  geom_line() + geom_point() +
  geom_hline(yintercept = current_alpha, linetype = "dashed", color = "red") +
  labs(title = "Predicted Reliability by Scale Length",
       x = "Number of Items", y = "Predicted Cronbach's Alpha") +
  theme_minimal()

9.2 Effects of Dropping “Bad” Items

Systematically evaluate which items to remove:

# Identify worst performing items
worst_items <- item_analysis[item_analysis$Item_Total_r < 0.3, "Item"]

if(length(worst_items) > 0) {
  # Calculate alpha without worst items
  remaining_items <- setdiff(names(development_df_scored), worst_items)
  alpha_without_worst <- alpha(development_df_scored[remaining_items])$total$raw_alpha
  
  cat("Alpha without worst items (", paste(worst_items, collapse = ", "), "): ", 
      round(alpha_without_worst, 3), "\n")
  cat("Alpha improvement: ", round(alpha_without_worst - alpha_value, 3), "\n")
}

Alpha without worst items ( Item_10, Item_8, Item_6 ):  0.932 
Alpha improvement:  0.047

# Sequential item removal analysis
sequential_removal <- function(data, criterion = "alpha") {
  remaining_items <- names(data)
  removal_history <- data.frame()
  
  while(length(remaining_items) > 5) {  # Stop at minimum 5 items
    current_alpha <- alpha(data[remaining_items])$total$raw_alpha
    
    # Test removing each item
    alphas_without <- sapply(remaining_items, function(item) {
      test_items <- setdiff(remaining_items, item)
      if(length(test_items) < 3) return(NA)  # Need minimum items
      alpha(data[test_items])$total$raw_alpha
    })
    
    # Find item whose removal most improves alpha
    best_removal <- names(which.max(alphas_without))
    alpha_improvement <- max(alphas_without, na.rm = TRUE) - current_alpha
    
    if(alpha_improvement <= 0) break  # Stop if no improvement
    
    # Record this step
    removal_history <- rbind(removal_history, data.frame(
      Step = nrow(removal_history) + 1,
      Removed_Item = best_removal,
      Items_Remaining = length(remaining_items) - 1,
      Alpha_Before = current_alpha,
      Alpha_After = max(alphas_without, na.rm = TRUE),
      Improvement = alpha_improvement
    ))
    
    # Remove the item
    remaining_items <- setdiff(remaining_items, best_removal)
  }
  
  return(list(history = removal_history, final_items = remaining_items))
}

# Run sequential removal
removal_results <- sequential_removal(development_df_scored)

Some items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' optionSome items ( Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option

if(nrow(removal_results$history) > 0) {
  knitr::kable(removal_results$history, caption = "Sequential Item Removal Analysis", digits = 3)
  
  cat("\nFinal recommended items:", paste(removal_results$final_items, collapse = ", "))
  final_alpha <- alpha(development_df_scored[removal_results$final_items])$total$raw_alpha
  cat("\nFinal scale alpha:", round(final_alpha, 3))
}


Final recommended items: Item_1, Item_2, Item_3, Item_4, Item_5, Item_7, Item_11, Item_12, Item_13, Item_14, Item_15, Item_20
Final scale alpha: 0.936

9.3 Tinkering With Scale Length

Explore optimal scale length through systematic testing:

# Test different combinations of top-performing items
top_items <- item_analysis[order(item_analysis$Item_Total_r, decreasing = TRUE), "Item"]

# Test scales of different lengths using best items
scale_lengths <- c(5, 8, 10, 12, 15)
optimal_analysis <- data.frame()

for(length in scale_lengths) {
  if(length <= length(top_items)) {
    selected_items <- top_items[1:length]
    scale_alpha <- alpha(development_df_scored[selected_items])$total$raw_alpha
    mean_item_total <- mean(item_analysis[item_analysis$Item %in% selected_items, "Item_Total_r"])
    
    optimal_analysis <- rbind(optimal_analysis, data.frame(
      Scale_Length = length,
      Alpha = scale_alpha,
      Mean_Item_Total_r = mean_item_total,
      Alpha_per_Item = scale_alpha / length
    ))
  }
}

knitr::kable(optimal_analysis, caption = "Optimal Scale Length Analysis", digits = 3)

Optimal Scale Length Analysis
Scale_Length	Alpha	Mean_Item_Total_r	Alpha_per_Item
5	0.898	0.760	0.180
8	0.925	0.742	0.116
10	0.933	0.724	0.093
12	0.936	0.704	0.078
15	0.935	0.660	0.062

# Recommend optimal length
if(nrow(optimal_analysis) > 0) {
  # Find length that maximizes alpha while being parsimonious
  optimal_row <- which.max(optimal_analysis$Alpha)
  recommended_length <- optimal_analysis$Scale_Length[optimal_row]
  recommended_alpha <- optimal_analysis$Alpha[optimal_row]
  
  cat("\nRecommended scale length:", recommended_length, "items")
  cat("\nExpected alpha:", round(recommended_alpha, 3))
}


Recommended scale length: 12 items
Expected alpha: 0.936

9.4 Split Samples

Demonstrate cross-validation approach:

# Split sample for cross-validation
set.seed(123)
sample_size <- nrow(development_df_scored)
split_index <- sample(1:sample_size, size = floor(sample_size * 0.5))

# Development sample (50%)
dev_sample <- development_df_scored[split_index, ]
# Validation sample (50%) 
val_sample <- development_df_scored[-split_index, ]

# Develop scale on first half
dev_alpha <- alpha(dev_sample)$total$raw_alpha

Some items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option

dev_item_stats <- describe(dev_sample)

# Validate on second half
val_alpha <- alpha(val_sample)$total$raw_alpha

Some items ( Item_6 Item_8 Item_10 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option

val_item_stats <- describe(val_sample)

# Compare samples
split_comparison <- data.frame(
  Sample = c("Development", "Validation"),
  N = c(nrow(dev_sample), nrow(val_sample)),
  Alpha = c(dev_alpha, val_alpha),
  Mean_Item_Mean = c(mean(dev_item_stats$mean), mean(val_item_stats$mean)),
  Mean_Item_SD = c(mean(dev_item_stats$sd), mean(val_item_stats$sd))
)

knitr::kable(split_comparison, caption = "Split-Sample Cross-Validation", digits = 3)

Split-Sample Cross-Validation
Sample	N	Alpha	Mean_Item_Mean	Mean_Item_SD
Development	150	0.870	3.007	0.768
Validation	150	0.897	3.036	0.762

# Test if alphas are significantly different
alpha_difference <- abs(dev_alpha - val_alpha)
cat("\nAlpha difference between samples:", round(alpha_difference, 3))


Alpha difference between samples: 0.027

if(alpha_difference < 0.05) {
  cat("\nGood cross-validation: Alpha values are similar")
} else {
  cat("\nCaution: Large difference in alpha between samples")
}


Good cross-validation: Alpha values are similar

10 Knowledge Check Exercises

10.1 Exercise 1: Item Quality Assessment

# Evaluate these items - identify problems
exercise_items <- c(
  "I am happy and satisfied with my life",
  "How often do you feel anxious?",
  "I don't not feel uncomfortable in social situations", 
  "My supervisor provides clear guidance and is also fair in evaluations",
  "I am confident in my abilities"
)

# Students should identify: double-barreled, double negative, etc.

10.2 Exercise 2: Response Format Selection

For each construct, recommend the most appropriate response format and justify your choice:

Pain intensity (medical setting)
Brand preference (consumer research)
Frequency of behaviors (behavioral assessment)
Attitude toward policy (political research)

10.3 Exercise 3: Expert Review Analysis

# Given expert ratings, calculate CVR and make decisions
expert_ratings <- data.frame(
  Item = paste0("Item_", 1:8),
  Essential = c(7, 6, 8, 4, 5, 3, 7, 8),
  Useful = c(1, 2, 0, 3, 2, 4, 1, 0), 
  Unnecessary = c(0, 0, 0, 1, 1, 1, 0, 0)
)

# Calculate CVR for each item (N_experts = 8)
# Make retention decisions
# Which items need revision?

10.4 Exercise 4: Item Analysis Interpretation

# Interpret these item statistics and make recommendations
mystery_items <- data.frame(
  Item = c("A", "B", "C", "D", "E"),
  Mean = c(4.2, 2.1, 3.0, 1.3, 4.8),
  SD = c(0.6, 1.1, 1.2, 0.5, 0.4),
  Item_Total_r = c(0.65, 0.72, 0.43, 0.15, 0.28),
  Alpha_if_deleted = c(0.82, 0.81, 0.83, 0.87, 0.85)
)

# Current alpha = 0.84
# Which items would you: Retain, Revise, or Remove?

10.5 Exercise 5: Scale Optimization

Design a strategy for optimizing a 25-item scale where: - Current alpha = 0.78 - 8 items have item-total correlations < 0.30 - Target alpha = 0.85 - Minimum acceptable length = 10 items

10.6 Exercise 6: Item Pool Development

Task: Create a 15-item pool for measuring “Digital Learning Self-Efficacy” - confidence in one’s ability to learn using digital technologies.

Consider: - Theoretical domains (technical skills, learning strategies, troubleshooting) - Target population (college students) - Planned redundancy - Item quality guidelines

11 Summary: Complete Scale Development Process

Steps 1-9 Checklist

✅ Step 1: Construct clearly defined with theoretical foundation
✅ Step 2: Comprehensive item pool generated (3-4x final scale length)
✅ Step 3: Response format selected based on construct and population
✅ Step 4: Expert review completed with content validity assessment
✅ Step 5: Cognitive interviews conducted to refine item wording
✅ Step 6: Validation items strategically included
✅ Step 7: Development sample collected with adequate size
✅ Step 8: Items evaluated through comprehensive psychometric analysis
✅ Step 9: Scale length optimized for reliability and parsimony

Next Steps: Final validation study with independent sample

12 Final Recommendations

Based on the analyses above, here are the key recommendations for your scale:

# Generate final recommendations based on analyses
if(exists("removal_results") && nrow(removal_results$history) > 0) {
  final_items <- removal_results$final_items
  final_alpha <- alpha(development_df_scored[final_items])$total$raw_alpha
  
  cat("FINAL SCALE RECOMMENDATIONS:\n")
  cat("==========================\n")
  cat("Recommended items:", length(final_items), "\n")
  cat("Expected reliability:", round(final_alpha, 3), "\n")
  cat("Items to retain:", paste(final_items, collapse = ", "), "\n")
  
  removed_items <- setdiff(names(development_df_scored), final_items)
  if(length(removed_items) > 0) {
    cat("Items to remove:", paste(removed_items, collapse = ", "), "\n")
  }
} else {
  cat("All items performed adequately. Consider minor refinements based on expert feedback.")
}

FINAL SCALE RECOMMENDATIONS:
==========================
Recommended items: 12 
Expected reliability: 0.936 
Items to retain: Item_1, Item_2, Item_3, Item_4, Item_5, Item_7, Item_11, Item_12, Item_13, Item_14, Item_15, Item_20 
Items to remove: Item_6, Item_8, Item_9, Item_10, Item_16, Item_17, Item_18, Item_19

13 Additional Practice Datasets

# Generate practice dataset for students to work with
set.seed(789)
practice_responses <- matrix(
  sample(1:5, 500, replace = TRUE, prob = c(0.1, 0.2, 0.4, 0.2, 0.1)), 
  nrow = 100, ncol = 5
)
colnames(practice_responses) <- paste0("Practice_Item_", 1:5)

# Save for later exercises
write.csv(practice_responses, "practice_scale_data.csv", row.names = FALSE)

# Basic descriptive statistics
describe(practice_responses)

14 Key Takeaways

Critical Success Factors

Theoretical Foundation: Always start with clear construct definition
Iterative Process: Scale development requires multiple rounds of refinement
Sample Size: Invest in adequate sample sizes for stable results
Multiple Indicators: Use various psychometric indices, not just Cronbach’s alpha
Cross-Validation: Always validate findings in independent samples
Practical Considerations: Balance psychometric quality with usability

15 Further Reading

For deeper understanding of scale development principles:

DeVellis & Thorpe (2021): Comprehensive guide to all aspects of scale development
Nunnally & Bernstein (1994): Classic text on psychometric theory
Fabrigar et al. (1999): Guidelines for factor analysis in scale development
Sijtsma (2009): Critical perspective on reliability assessment

DeVellis, R. F., & Thorpe, C. T. (2021). Scale development: Theory and applications (5th ed.). SAGE Publications.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of cronbach’s alpha. Psychometrika, 74(1), 107–120.

This completes the comprehensive guide to all nine steps of scale development as outlined by DeVellis & Thorpe (2021). The notebook provides both theoretical understanding and hands-on practice with real R code examples.