RAND-12 Datavisualizations
Overview
This document demonstrates how to create various visualizations of
the RAND12 instrument using ggplot2 in
R.
It also serves as a user manual for an accompanying clean R
script that you can download separately.
Each section includes an explanation followed by the corresponding
code.
Click the Code button in the bottom right corner to
display the code for each step/plot we are making.
1. Setup
Installing and Loading Packages
Before we start, we need to install and load the required packages to
run the script.
The code below creates a list of the names of all packages we
need.
The next part checks if you have these packages installed, and if not,
installs them—before they are loaded into R.
packages <- c(
"ggplot2", "dplyr", "tidyr", "ggsci", "ggrain",
"viridis", "hrbrthemes", "ggsignif"
)
new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]
if(length(new.packages)) install.packages(new.packages)
# Load all the packages without printing the output
invisible(lapply(packages, library, character.only = TRUE))Defining a Custom Theme
The default theme of ggplot has a gray background designed to make graphical elements “pop” out. For a cleaner look, we define our own custom theme that we will use for all figures. By defining theme_nice(), we can use this function in all our plots by adding + theme_nice(), saving time and ensuring consistency.
Defining Dimension and Response Level Labels
To easily add dimension and response level labels/names to our plots, we create vectors of dimension labels and response level labels. Below we specify both English and Norwegian versions, so you can switch between them as needed.
# English labels
english_labels <- c(
PF = "Physical Functioning",
RP = "Role-Physical",
BP = "Bodily Pain",
GH = "General Health",
VT = "Vitality",
SF = "Social Functioning",
RE = "Role-Emotional",
MH = "Mental Health",
physical_sum = "PSC-12 (Physical Total Score)",
mental_sum = "MSC-12 (Mental Total Score)"
)
# Norwegian labels
norwegian_labels <- c(
PF = "Fysisk funksjon",
RP = "Fysisk rollefunksjon",
BP = "Kroppssmerter",
GH = "Generell helse",
VT = "Vitalitet",
SF = "Sosial funksjon",
RE = "Emosjonell rollefunksjon",
MH = "Mental helse",
physical_sum = "PSC-12 (Fysisk Totalskåre)",
mental_sum = "MSC-12 (Mental Totalskåre)"
)2. Loading Your Own Data (Optional)
If you have your own RAND12 dataset, you can use the code below to load it here. Your data should be in standard wide format—each RAND-12 dimension must be stored as its own variable (column). The dataset you upload must already include correctly scored RAND-12 variables. The summary scores should be pre-scored according to the official RAND-12 algorithm (e.g., standardized T-scores with mean = 50, SD = 10). This script does not perform scoring – it only visualizes existing scores.
| Variable | Description | Scale (numeric, theoretical range) | Note |
|---|---|---|---|
PF |
Physical Functioning | 0–100 | Higher scores = better physical functioning |
RP |
Role-Physical | 0–100 | Higher scores = fewer limitations in physical roles |
BP |
Bodily Pain | 0–100 | Higher scores = less pain |
GH |
General Health | 0–100 | Higher scores = better perceived general health |
VT |
Vitality | 0–100 | Higher scores = more energy, less fatigue |
SF |
Social Functioning | 0–100 | Higher scores = better ability to engage socially |
RE |
Role-Emotional | 0–100 | Higher scores = fewer emotional role limitations |
MH |
Mental Health | 0–100 | Higher scores = better psychological well-being |
physical_sum |
Physical Component Summary (PSC-12) | Typically ~0–100 (T-score) | Higher scores = better overall physical health; must be pre-scored |
mental_sum |
Mental Component Summary (MSC-12) | Typically ~0–100 (T-score) | Higher scores = better overall mental health; must be pre-scored |
Below are commented code examples for loading data from different file formats. Uncomment the relevant lines and modify the file path as needed.
# For SPSS (.sav):
# library(haven)
# df <- read_sav("your_data.sav") %>%
# mutate(across(where(is.labelled), as_factor))
# For Stata (.dta):
# library(haven)
# df <- read_dta("your_data.dta")
# For Excel (.xls/.xlsx):
# library(readxl)
# df <- read_excel("your_data.xlsx")
# For CSV (.csv):
# library(readr)
# df <- read_csv("your_data.csv")3. Data Simulation (skip if using own data)
In order to have some data to showcase the various ways to visualize the RAND-12 instrument, we simulate a dataset of 3,500 responses across seven years (2019–2025). The dataset includes demographic variables such as gender, age group, education, treatment status, and diagnosis. We also include linear year effects and predefined group-level effects to mimic realistic variation in physical and mental health domains. If you’re using your own RAND-12 data, you can skip this section.
set.seed(123)
n_per_year <- 500
years <- 2019:2025
n_total <- n_per_year * length(years)
df <- data.frame(
year = rep(years, each = n_per_year),
gender = sample(c("Male", "Female"), n_total, replace = TRUE, prob = c(0.48, 0.52)),
age_group = sample(c("<30", "30-39", "40-49", "50-59", "60-69", "70+"), n_total, replace = TRUE,
prob = c(0.15, 0.2, 0.2, 0.2, 0.15, 0.1)),
education = sample(c("Low", "Medium", "High"), n_total, replace = TRUE, prob = c(0.3, 0.4, 0.3)),
treatment_status = sample(c("In treatment", "No treatment", "Unknown"), n_total, replace = TRUE,
prob = c(0.6, 0.3, 0.1))
)
diagnosis_levels <- paste("diagnosis", 1:15)
df$diagnosis <- sample(diagnosis_levels, n_total, replace = TRUE)
df$diagnosis <- factor(df$diagnosis, levels = diagnosis_levels)
diagnosis_effect_physical <- setNames(seq(10, -10, length.out = 15), diagnosis_levels)
diagnosis_effect_mental <- setNames(seq(8, -8, length.out = 15), diagnosis_levels)
df <- df %>%
rowwise() %>%
mutate(age_num = case_when(
age_group == "<30" ~ runif(1, 18, 29),
age_group == "30-39" ~ runif(1, 30, 39),
age_group == "40-49" ~ runif(1, 40, 49),
age_group == "50-59" ~ runif(1, 50, 59),
age_group == "60-69" ~ runif(1, 60, 69),
age_group == "70+" ~ runif(1, 70, 85)
)) %>%
ungroup()
df$age_group <- factor(df$age_group, levels = c("<30", "30-39", "40-49", "50-59", "60-69", "70+"))
df$education <- factor(df$education, levels = c("Low", "Medium", "High"))
df$treatment_status <- factor(df$treatment_status, levels = c("In treatment", "No treatment", "Unknown"))
df$gender <- factor(df$gender, levels = c("Male", "Female"))
age_effect_physical <- c("<30" = 0, "30-39" = -2, "40-49" = -4, "50-59" = -6, "60-69" = -8, "70+" = -10)
gender_effect_physical <- c("Male" = 0, "Female" = -10)
edu_effect <- c("Low" = -5, "Medium" = 0, "High" = 10)
treatment_effect_physical <- c("In treatment" = 7, "No treatment" = -4, "Unknown" = 0)
df <- df %>% mutate(year_effect = (year - 2019) * 1.5)
df <- df %>%
mutate(
PF = 60 + age_effect_physical[age_group] + gender_effect_physical[gender] +
edu_effect[education] + treatment_effect_physical[treatment_status] +
diagnosis_effect_physical[diagnosis] + year_effect + rnorm(n_total, 0, 5),
RP = 55 + age_effect_physical[age_group] + gender_effect_physical[gender] +
edu_effect[education] + treatment_effect_physical[treatment_status] +
diagnosis_effect_physical[diagnosis] + year_effect + rnorm(n_total, 0, 5),
BP = 35 + age_effect_physical[age_group] + gender_effect_physical[gender] +
edu_effect[education] + treatment_effect_physical[treatment_status] +
diagnosis_effect_physical[diagnosis] + year_effect + rnorm(n_total, 0, 5),
GH = 40 + age_effect_physical[age_group] + gender_effect_physical[gender] +
edu_effect[education] + treatment_effect_physical[treatment_status] +
diagnosis_effect_physical[diagnosis] + year_effect + rnorm(n_total, 0, 5)
)
gender_effect_mental <- c("Male" = 0, "Female" = -10)
treatment_effect_mental <- c("In treatment" = 5, "No treatment" = -6, "Unknown" = 0)
df <- df %>%
mutate(
VT = 56 + gender_effect_mental[gender] + edu_effect[education] +
treatment_effect_mental[treatment_status] + diagnosis_effect_mental[diagnosis] +
year_effect + rnorm(n_total, 0, 5),
SF = 40 + gender_effect_mental[gender] + edu_effect[education] +
treatment_effect_mental[treatment_status] + diagnosis_effect_mental[diagnosis] +
year_effect + rnorm(n_total, 0, 5),
RE = 35 + gender_effect_mental[gender] + edu_effect[education] +
treatment_effect_mental[treatment_status] + diagnosis_effect_mental[diagnosis] +
year_effect + rnorm(n_total, 0, 5),
MH = 60 + gender_effect_mental[gender] + edu_effect[education] +
treatment_effect_mental[treatment_status] + diagnosis_effect_mental[diagnosis] +
year_effect + rnorm(n_total, 0, 5)
)
df <- df %>%
mutate(across(c(PF, RP, BP, GH, VT, SF, RE, MH), ~ pmax(pmin(., 100), 0)))
df <- df %>%
mutate(
physical_sum = rowMeans(select(., PF, RP, BP, GH), na.rm = TRUE),
mental_sum = rowMeans(select(., VT, SF, RE, MH), na.rm = TRUE)
)
df$year_effect <- NULL4. Data Preparation
Here, we convert our dataset from a wide format (where each RAND-12 subscale is stored in its own column) to a long format. This transformation is necessary to efficiently create faceted plots in ggplot2, allowing us to loop over subscales in a tidy, structured way.
Below, you can preview how the data looks before and after pivoting:
df_long <- df %>%
pivot_longer(
cols = c(PF, RP, BP, GH, VT, SF, RE, MH, physical_sum, mental_sum),
names_to = "subscale",
values_to = "score"
)
# View data before pivot
head(df)## # A tibble: 6 × 17
## year gender age_group education treatment_status diagnosis age_num PF
## <int> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl>
## 1 2019 Female 30-39 Low In treatment diagnosis 7 34.0 58.7
## 2 2019 Male 50-59 High In treatment diagnosis 8 56.7 74.7
## 3 2019 Female 60-69 Medium In treatment diagnosis 2 61.6 64.9
## 4 2019 Male 30-39 Low In treatment diagnosis 8 37.8 70.2
## 5 2019 Male 50-59 Medium In treatment diagnosis 6 58.6 74.6
## 6 2019 Female 50-59 Low No treatment diagnosis 4 52.8 46.4
## # ℹ 9 more variables: RP <dbl>, BP <dbl>, GH <dbl>, VT <dbl>, SF <dbl>,
## # RE <dbl>, MH <dbl>, physical_sum <dbl>, mental_sum <dbl>
## # A tibble: 6 × 9
## year gender age_group education treatment_status diagnosis age_num subscale
## <int> <fct> <fct> <fct> <fct> <fct> <dbl> <chr>
## 1 2019 Female 30-39 Low In treatment diagnosis 7 34.0 PF
## 2 2019 Female 30-39 Low In treatment diagnosis 7 34.0 RP
## 3 2019 Female 30-39 Low In treatment diagnosis 7 34.0 BP
## 4 2019 Female 30-39 Low In treatment diagnosis 7 34.0 GH
## 5 2019 Female 30-39 Low In treatment diagnosis 7 34.0 VT
## 6 2019 Female 30-39 Low In treatment diagnosis 7 34.0 SF
## # ℹ 1 more variable: score <dbl>
5. The PSC-12 and MSC-12 summary scores
Having followed the steps above, we are now ready to create data visualizations. We will start by plotting the PSC-12 and MSC-12 summary dimensions. Due to the transformation of the data into long format above, we can now easily plot these side by side using facets - meaning that the plotting area is subdivided into two panels representing each summary dimension.
PLOT 1: Density plots
How it works:
filter(subscale %in% c("physical_sum", "mental_sum"))selects only the PSC-12 and MSC-12 summary scales for plottinggeom_densityadds the density curve for each scalefill = "#A4A9AD"manually sets colour - change as you like- for Norwegian labels, change
as_labeller(english_labels)toas_labeller(norwegian_labels)
p1 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = score)) +
geom_density(fill = "#A4A9AD", alpha = 0.8, color = "white") +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) uncomment to get Norwegian labels
labs(title = "Distribution of Summary Scores",
x = "Score", y = "Density") +
theme_nice()
# ggsave("p1_summary_density.pdf", p1, width = 13, height = 8)
p1PLOT 2: Density plots with group comparisons.
Plot 1 can easily be adapted to add group comparisons. Note that all comparisons respect the faceting, and are thus performed within each facet
How it works:
- Within the global aesthetic mapping
aes(), we addfill = educationto compare groups by using colour of the curve. scale_fill_uchicago()defines the fill palette. writescale_fill_to see other options based on the packages loaded.
p2 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = score, fill = education)) +
geom_density(alpha = 0.7, color = NA) +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
scale_fill_uchicago() +
labs(title = "Summary Scores by Education",
x = "Score", y = "Density", fill = "Education") +
theme_nice()
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) # optional
# ggsave("p2_summary_density_education.pdf", p2, width = 13, height = 8)
p2PLOT 3: Violin + Boxplot
Boxplots can be handy as they provide some summary statistics (median, IQR, outliers), but may miss important aspects of the data distribution—such as skewness or uneven spread. Combining boxplots with violin plots adds a density curve, helping to visualize the overall shape and variability more clearly. Notice that ggplot works like a blank canvas, allowing us to layer multiple geometries.
How it works:
geom_violin(alpha = 0.5, color = NA)draws a smooth density shape for each group; alpha = 0.5 makes the fill semi-transparent so elements beneath (like the boxplot) remain visiblegeom_boxplot(width = 0.2, color = "black", alpha = 0.8) overlays a narrower boxplot;width = 0.2controls how wide the box is (smaller width keeps it centered inside the violin), andalpha = 0.8makes it slightly transparent for better visual layering without losing clarity
p3 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = education, y = score, fill = education)) +
geom_violin(alpha = 0.5, color = NA) +
geom_boxplot(width = 0.2, color = "black", alpha = 0.8) +
scale_fill_uchicago() +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Summary Scores by Education",
x = "Education", y = "Score", fill = "Education") +
theme_nice()
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) # optional
# ggsave("p3_summary_box_violin.pdf", p3, width = 13, height = 8)
p3Plot 3.2: Group Comparisons with Significance Tests
Plots like the one above, where the focus is on comparing groups, can
also incorporate formal significance testing directly within
ggplot2().
The ggsignif package is an add-on for ggplot that enables
the addition of significance brackets. In the example below, we specify
pairwise comparisons between all levels of education. Due to the
faceting, each set of comparisons is performed separately within each
summary scale.
How it works:
geom_signif()adds statistical comparison brackets directly to the plot
comparisons = list(...)defines the specific group pairs to test (e.g., Low vs Medium)
test = "t.test"runs a Student’s t-test for each comparison; you can also use"wilcox.test"for non-parametric data or other options like"anova"or"kruskal.test"depending on the context
map_signif_level = TRUEcontrols how p-values are displayed (e.g., , , )
step_increase = 0.1staggers the brackets vertically to prevent overlap
Tip: type ?geom_signif in the console to learn more
about customization options.
p3_2 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = education, y = score, fill = education)) +
geom_violin(alpha = 0.5, color = NA) +
geom_boxplot(width = 0.2, color = "black", alpha = 0.8) +
geom_signif(comparisons = list(c("Low", "Medium"),
c("Low", "High"),
c("Medium", "High")),
test = "t.test",
map_signif_level = T,
step_increase = 0.1) +
scale_fill_uchicago() +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Summary Scores by Education",
x = "Education", y = "Score", fill = "Education") +
theme_nice()
p3_2Plot 4: Histogram of MSC-12
While the previous plots compared both summary scores side by side,
we can also narrow the focus to a single dimension.
Here, we filter the dataset to include only the MSC-12 (mental summary
score), and use a histogram to show the distribution of raw values.
How it works:
filter(subscale == "mental_sum")selects only the MSC-12 scores for plotting
geom_histogram()creates a histogram showing how many observations fall into each bin
binwidth = 2controls the size of each bin on the x-axis — smaller values give more detail, larger ones smooth the shape
fill = "#A4A9AD"sets the bar color, andcolor = "white"outlines each bar for clarity
p4 <- ggplot(df_long %>% filter(subscale == "mental_sum"),
aes(x = score)) +
geom_histogram(binwidth = 2, fill = "#A4A9AD", color = "white") +
labs(title = "Distribution of MSC-12",
x = "Score", y = "Count") +
theme_nice()
# ggsave("p4_hist_mental_sum.pdf", p4, width = 13, height = 8)
p4Plot 5: Raincloud Plot by Gender
After briefly focusing on the MSC-12 score alone, we now return to
comparing both summary scores—this time grouped by gender.
To do so, we use a raincloud plot, which combines a density
curve, boxplot, and raw data points into one compact and informative
visualization.
How it works:
geom_rain()creates the raincloud plot by layering three elements: a density curve, jittered points, and a boxplot
- The density curve is estimated from the data using
kernel density estimation, much like
geom_violin(). You can adjust its appearance with arguments such as:adjustto control smoothing (higher = smoother curve)scaleto control width scaling (e.g., by area or count)sideto specify which side of the axis the curve appears on
point.argscontrols the appearance of the raw data points; with large datasets, a lowalpha(e.g., 0.054) avoids overplotting. For smaller samples, increasealphaso points remain visible
boxplot.argsmodifies the boxplot layer — e.g.,width = 0.1keeps it compact, andalpha = 0.9ensures it’s clearly visible
facet_wrap(~ subscale)separates PSC-12 and MSC-12 into their own panels
- Both
fillandcolorare mapped togender, and styled using theuchicagocolor palette
Raincloud plots are especially useful when you want to show
both summary statistics and the full distribution,
including the raw data points.
They are ideal for medium to large samples where standard boxplots or
violins might hide important details such as spread, outliers, or uneven
distributions.
Compared to boxplots alone, rainclouds give a richer and more
transparent view of the data—particularly helpful in reports and
presentations.
Note: geom_rain() comes from the ggdist
package, which includes many other useful geoms for uncertainty and
distribution visualization. Type ?ggdist or visit https://mjskay.github.io/ggdist
to learn more.
p5 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = gender, y = score, fill = gender, color = gender)) +
geom_rain(alpha = 0.7,
point.args = list(alpha = 0.05, size = 3),
boxplot.args = list(color = "black", alpha = 0.6, width = 0.1)) +
scale_fill_uchicago() +
scale_color_uchicago() +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Summary Scores by Gender",
x = "Gender", y = "Score") +
theme_nice()
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) # optional Norwegian labels
# ggsave("p5_raincloud_summary_gender_facet.pdf", p5, width = 13, height = 8)
p5Plot 6: Raincloud Plot by Treatment Status with Group Means
We now apply the same raincloud approach to a different grouping
variable: treatment status.
In addition to visualizing the full distribution, we enhance
interpretability by overlaying a reference line for each group’s mean
score.
This makes it easier to visually compare average scores across groups
within each summary scale.
How it works:
geom_rain()works as before, combining raw points, boxplots, and density curves
boxplot.args.posnudges the boxplots slightly to the side for better separation between overlapping elements
stat_summary()adds dashed reference lines for the group means usinggeom = "hline"
yintercept = ..y..extracts the mean value from each group and applies it within each facet
theme(legend.position = "none")removes the legend — this can be useful when group labels are already shown on the x-axis and removing the legend gives a cleaner visual appearance
facet_wrap(~ subscale)again splits the plot into PSC-12 and MSC-12 panels
Tip: If your data is skewed or contains outliers, you can replace
fun = mean with fun = median to show group
medians instead.
p6 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = treatment_status, y = score,
fill = treatment_status, color = treatment_status)) +
geom_rain(
point.args = list(alpha = 0.04, size = 3),
boxplot.args = list(color = "black", outlier.shape = NA, alpha = 0.7, width = 0.1),
boxplot.args.pos = list(position = ggpp::position_dodgenudge(x = 0.08, width = 0.12)),
violin.args = list(alpha = 0.6)
) +
stat_summary(fun = mean, geom = "hline",
aes(yintercept = ..y.., group = treatment_status, color = treatment_status),
linewidth = 0.4, linetype = "dashed", show.legend = FALSE) +
scale_fill_uchicago() +
scale_color_uchicago() +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Summary Scores by Treatment Status (with Group Means)",
x = "Treatment Status", y = "Score", colour = "", fill = "") +
theme_nice() +
theme(legend.position = "none")
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) # optional Norwegian labels
# ggsave("p6_treatment_raincloud_ref.pdf", p6, width = 13, height = 8)
p6Plot 7: Boxplot by Age Group
While raincloud plots offer rich visual detail, there are situations
where a more compact summary is preferred—especially when the number of
groups increases.
Here, we use a standard boxplot to compare summary scores across age
groups, focusing on medians and variability within each group.
Boxplots are space-efficient and easy to interpret, great for getting a quick overview of score distributions across multiple categories.
How it works:
geom_boxplot()displays the distribution of scores within each age group using medians, quartiles, and potential outliers
alpha = 0.7makes the fill semi-transparent for a softer appearance
scale_fill_viridis_d(option = "cividis")applies a perceptually uniform, colorblind-friendly palette
p7 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = age_group, y = score, fill = age_group)) +
geom_boxplot(color = "black", alpha = 0.7) +
scale_fill_viridis_d(option = "cividis") +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Summary Scores by Age Group",
x = "Age Group", y = "Score") +
theme_nice()
# facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels)) # optional
# ggsave("p7_boxplot_age_summary.pdf", p7, width = 13, height = 8)
p7Plot 8: Trend Over Time by Gender (Observed Means)
In the next set of plots, we shift focus from cross-sectional group
differences to changes over time.
Visualizing trends can reveal whether health scores improve, decline, or
remain stable across years for different groups.
We start with a simple approach by plotting the observed group means for each year. Here, the PSC-12 and MSC-12 scores are shown over time for males and females.
How it works:
stat_summary(fun = mean, geom = "line")draws a line connecting the yearly group means
stat_summary(fun = mean, geom = "point")adds larger points to mark each mean explicitly
color = gendermaps group membership to color, using theuchicagopalette for clarity
facet_wrap(~ subscale)splits the plot by summary scale
p8 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = year, y = score, color = gender)) +
stat_summary(fun = mean, geom = "line", linewidth = 1) +
stat_summary(fun = mean, geom = "point", size = 5) +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
scale_color_uchicago() +
labs(title = "Summary Scores Over Time by Gender",
x = "Year", y = "Score") +
theme_nice()
# ggsave("p8_trend_gender_summary.pdf", p8, width = 13, height = 8)
p8Plot 9: Loess Trend by Age Group and Gender (Summary Scores)
This plot extends our trend analysis by showing how summary scores
evolve over time across both age group and
gender.
By faceting by gender and using color to distinguish age groups, we can
observe interaction patterns—such as whether trends differ by age within
each gender.
How it works:
filter(subscale %in% c("physical_sum", "mental_sum"))limits the plot to the two summary scores
geom_smooth(method = "loess", se = TRUE)draws smooth, non-linear trends with shaded confidence intervals
color = age_groupassigns different colors to age groups
facet_grid(rows = vars(gender), cols = vars(subscale))creates a two-dimensional panel layout—one column per summary score, and one row per gender
labeller = labeller(subscale = english_labels)applies clean labels to the facet strips
coord_cartesian(ylim = c(20, 75))constrains the y-axis to a practical range
guides(colour = guide_legend(override.aes = list(fill = NA)))overrides the default legend appearance by removing any fill color in legend keys associated with color. This prevents the legend from showing a gray background, resulting in a clean, transparent background for legend items.
p9 <- ggplot(df_long %>% filter(subscale %in% c("physical_sum", "mental_sum")),
aes(x = year, y = score, color = age_group)) +
geom_smooth(method = "loess", se = TRUE, alpha = 0.4, linewidth = 0.8) +
facet_grid(rows = vars(gender), cols = vars(subscale),
labeller = labeller(subscale = english_labels)) +
scale_color_uchicago() +
labs(title = "Summary Scores Over Time by Age Group and Gender (Loess)",
x = "Year", y = "Score", colour = "Age groups") +
coord_cartesian(ylim = c(20, 75)) +
theme_nice() +
guides(colour = guide_legend(override.aes = list(fill = NA)))
# facet_grid(rows = vars(gender), cols = vars(subscale),
# labeller = labeller(subscale = norwegian_labels)) # optional Norwegian labels
# ggsave("p9_trend_loess_age_gender_summary.pdf", p9, width = 13, height = 8)
p9## `geom_smooth()` using formula = 'y ~ x'
Subdomain-Level Visualizations
Up to this point, we’ve focused on the two summary scores of the
RAND-12 instrument: PSC-12 and MSC-12.
In the following plots, we explore selected subdomains that make up the
summary scores, allowing for a more detailed understanding of patterns
across specific health dimensions.
We start with two mental health-related subdomains and compare their
distributions across diagnostic groups, using both boxplots and
raincloud plots.
We then close the guide with a loess-based trend visualization covering
all eight subdomains.
Plot 10: Boxplot by Diagnosis (Mental Subdomains)
This plot shows how two specific mental health subdomains—Vitality
(VT) and Social Functioning (SF)—vary across diagnostic groups.
We use a standard boxplot for each group and subdomain, with flipped
coordinates for easier reading.
How it works:
filter(subscale %in% c("VT", "SF"))selects only the relevant mental health subdomains for plotting
aes(x = diagnosis, y = score)maps the diagnostic categories to the x-axis and score to the y-axis
geom_boxplot()visualizes the distribution within each group, including median, quartiles, and potential outliers
coord_flip()rotates the plot for better readability of long diagnosis labels
theme(legend.position = "none")removes the legend, as the groupings are already shown on the x-axis
p10 <- ggplot(df_long %>% filter(subscale %in% c("VT", "SF")),
aes(x = diagnosis, y = score, fill = diagnosis)) +
geom_boxplot(color = "black", alpha = 0.8) +
scale_fill_viridis_d(option = "cividis") +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Mental Health Subdomains by Diagnosis",
x = "Diagnosis", y = "Score") +
theme_nice() +
coord_flip() +
theme(legend.position = "none")
# ggsave("p10_boxplot_diagnosis_mental.pdf", p9, width = 13, height = 8)
p10Plot 11: Raincloud Plot by Diagnosis (Mental Subdomains)
To complement the boxplot above, we now use a raincloud plot to show
the same subdomains—Vitality and Social Functioning—but with greater
detail.
This visualization combines individual data points, smoothed density
shapes, and boxplots, offering a richer look at how scores are
distributed within each diagnostic group.
How it works:
filter(subscale %in% c("VT", "SF"))again restricts the data to the two mental health subdomains
geom_rain()overlays three layers: raw points (point.args), boxplots (boxplot.args), and density curves
point.args = list(alpha = 0.05)ensures that individual points are shown without overwhelming the plot
geom_hline(yintercept = 50)adds a dashed reference line at 50, useful if scores are standardized or normed
coord_flip()rotates the plot horizontally to improve label readability
theme(legend.position = "none")hides the legend since the x-axis already shows diagnostic groups
p11 <- ggplot(df_long %>% filter(subscale %in% c("VT", "SF")),
aes(x = diagnosis, y = score, fill = diagnosis)) +
geom_rain(alpha = 0.6,
point.args = list(alpha = 0.05, shape = 21, size = 1.6),
boxplot.args = list(color = "black", alpha = 0.8)) +
scale_fill_viridis_d(option = "cividis") +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
labs(title = "Mental Subdomains by Diagnosis (Raincloud)",
x = "Diagnosis", y = "Score") +
theme_nice() +
coord_flip() +
geom_hline(yintercept = 50, linetype = "dashed") +
theme(legend.position = "none")
p11Plot 12: Loess Trend by Education (All Subdomains)
To conclude the guide, we plot trends over time for all eight RAND-12
subdomains.
This visualization helps reveal how physical and mental health domains
vary by education level across survey years.
We use a loess smoother to highlight overall patterns without assuming a linear trend, and we restrict the y-axis to focus on the range where most scores are expected to fall.
How it works:
filter(subscale %in% ...)selects all eight subdomains for plotting
(This step isn’t strictly necessary if your dataset already contains only these subscales, but it’s useful for clarity)
geom_smooth(method = "loess", se = TRUE)adds a nonparametric trend line for each education group, with a shaded confidence band
color = educationdistinguishes groups visually using theuchicagopalette
facet_wrap(~ subscale)shows one panel per subdomain, making it easy to compare patterns across dimensions
coord_cartesian(ylim = c(0, 80))keeps the y-axis fixed between 0 and 80 for all panels
p12 <- ggplot(df_long %>% filter(subscale %in% c("PF", "RP", "BP", "GH", "VT", "SF", "RE", "MH")),
aes(x = year, y = score, color = education)) +
geom_smooth(method = "loess", se = TRUE, alpha = 0.5, linewidth = 0.8) +
facet_wrap(~ subscale, labeller = as_labeller(english_labels)) +
scale_color_jama() +
labs(title = "Subdomain Scores Over Time by Education",
x = "Year", y = "Score", colour = "Education") +
coord_cartesian(ylim = c(20, 80)) +
theme_nice() +
guides(colour = guide_legend(override.aes = list(fill = NA)))
# ggsave("p12_trend_loess_subdomains.pdf", p12, width = 13, height = 8)
p12## `geom_smooth()` using formula = 'y ~ x'
Final Notes: Customizing Your ggplot Visualizations
All plots in this guide use ggplot2
with a consistent clean style via the custom theme
theme_nice().
Grouped comparisons are shown using color fills or outlines, and many
aesthetic choices are fully customizable.
Below are some helpful tips and resources for adapting the
visualizations to your own needs.
Themes
We use theme_nice() throughout the guide for a clean and
readable look.
To switch to another built-in ggplot theme, simply replace it with:
theme_bw()– basic black and white theme
theme_classic()– minimal axes with no grid
theme_minimal()– clean layout with light grid lines
theme_gray()– default ggplot theme
theme_void()– completely blank (good for custom plots or maps)
Color Palettes
Most plots use the UChicago palette from the
ggsci package for readability.
You can change the palette depending on your audience or style
preferences.
From ggsci
scale_fill_uchicago()
scale_color_uchicago()
Other options include:
scale_fill_npg()scale_fill_lancet()
scale_fill_jco()
scale_fill_nejm()
scale_fill_d3(),scale_fill_simpsons(),scale_fill_tron(),scale_fill_futurama(), etc.
(Use the corresponding scale_color_...() versions for
outlines)
From viridis (colorblind-friendly)
scale_fill_viridis_d(option = \"cividis\")
scale_fill_viridis_d(option = \"magma\")
scale_fill_viridis_d(option = \"plasma\")
scale_fill_viridis_d(option = \"inferno\")
scale_fill_viridis_d(option = \"viridis\")
Manually Setting Colors
To set exact colors, use scale_fill_manual() and
scale_color_manual().
Examples:
scale_fill_manual(values = c("Low" = "#E41A1C", "Medium" = "#377EB8", "High" = "#4DAF4A"))
scale_color_manual(values = c("Male" = "#984EA3", "Female" = "#FF7F00"))Color Resources
- colorbrewer2.org – for
palettes by data type
- coolors.co – for custom palette
building
- r-graph-gallery.com – ggplot palette examples
Facet Labels Language
All plots use English labels by default.
To switch to Norwegian, update the labeller line like this:
facet_wrap(~ subscale, labeller = as_labeller(norwegian_labels))
Saving Figures
All plots are saved as named objects (p1,
p2, …, p13).
To export them:
ggsave("plotname.pdf", p1, width = 13, height = 8)ggsave("plotname.png", p1, width = 13, height = 8, dpi = 1000)
Tip: All export lines are already included and commented out in the code blocks above.