EQ5D Data visualizations
Overview
This document demonstrates how to create various visualizations of
the EQ5D instrument using ggplot2 in
R.
It also serves as a user manual for an accompanying clean R
script that you can download separately.
Each section includes an explanation followed by the corresponding
code.
Click the Code button in the bottom right corner to
display the code for each step/plot we are making.
1. Setup
Installing and Loading Packages
Before we start, we need to install and load the required packages to
run the script.
The code below creates a list of the names of all packages we
need.
The next part checks if you have these packages installed, and if not,
installs them—before they are loaded into R.
# List of required packages
packages <- c(
"ggplot2","dplyr","tidyr","forcats","scales",
"viridis","RColorBrewer","ggsci","ggridges","ggdist"
)
# Install any packages that are not already installed
new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]
if (length(new.packages)) install.packages(new.packages)
# Load all the packages without printing the output
invisible(lapply(packages, library, character.only = TRUE))Defining a Custom Theme
The default theme of ggplot has a gray background designed to make graphical elements “pop” out. For a cleaner look, we define our own custom theme that we will use for all figures. By defining theme_nice(), we can use this function in all our plots by adding + theme_nice(), saving time and ensuring consistency.
Defining Dimension and Response Level Labels
To easily add dimension and response level labels/names to our plots, we create vectors of dimension labels and response level labels. Below we specify both English and Norwegian versions, so you can switch between them as needed.
# Dimension labels (English & Norwegian)
eq5d_dim_eng <- c(
MO = "Mobility (MO)",
SC = "Self-Care (SC)",
UA = "Usual Activities (UA)",
PD = "Pain/Discomfort (PD)",
AD = "Anxiety/Depression (AD)"
)
eq5d_dim_nor <- c(
MO = "Mobilitet (MO)",
SC = "Egenomsorg (SC)",
UA = "Vanlige aktiviteter (UA)",
PD = "Smerte/ubehag (PD)",
AD = "Angst/Depresjon (AD)"
)
# Response level labels
eq5d_levels <- c("1","2","3","4","5")
eq5d_labels_eng <- c(
"No problem","Slight problem","Moderate problems",
"Severe problems","Extreme problems"
)
eq5d_labels_nor <- c(
"Ingen problemer","Litt problemer","Moderat problemer",
"Alvorlige problemer","Ekstreme problemer"
)2. Loading Your Own Data (Optional)
If you have your own EQ-5D dataset, you can use the code below to load it here. Your data should be in standard wide format—each EQ-5D dimension must be stored as its own variable (column). Each dimension should be numeric, with levels ranging from 1 = No problems to 5 = Extreme problems. The EQ_VAS variable should be numeric, with a theoretical range from 0–100.
| Variable Name | Full Domain Name |
|---|---|
MO |
Mobility |
SC |
Self-Care |
UA |
Usual Activities |
PD |
Pain/Discomfort |
AD |
Anxiety/Depression |
EQ_VAS |
Visual Analogue Scale (0-100) |
Below are commented code examples for loading data from different file formats. Uncomment the relevant lines and modify the file path as needed.
# For SPSS (.sav):
# library(haven)
# df <- read_sav("your_data.sav") %>%
# mutate(across(where(is.labelled), as_factor))
# For Stata (.dta):
# library(haven)
# df <- read_dta("your_data.dta")
# For Excel (.xls/.xlsx):
# library(readxl)
# df <- read_excel("your_data.xlsx")
# For CSV (.csv):
# library(readr)
# df <- read_csv("your_data.csv")3. Data Simulation (skip if using own data)
In order to have some data to showcase the various ways to visualize the EQ-5D instrument, we simulate a dataset of 2,000 responses. This includes demographic variables, hospital effects, and a sine-wave year effect. If you’re using your own data, skip this section.
set.seed(123)
n <- 2000
# Demographics & groups
gender <- sample(c("male","female"), n, replace = TRUE)
age_numeric <- sample(18:100, n, replace = TRUE)
age_breaks <- c(18,30,40,50,60,70,80,90,101)
age_labels <- c("18-29","30-39","40-49","50-59","60-69","70-79","80-89","90+")
age <- cut(age_numeric, breaks = age_breaks, labels = age_labels, right = FALSE)
treatment <- factor(sample(c("pre","post"), n, replace = TRUE),
levels = c("pre","post"), ordered = TRUE)
marital_status <- factor(sample(c("married","divorced/separated","single","widowed"),
n, replace = TRUE, prob = c(0.5,0.2,0.2,0.1)),
levels = c("married","divorced/separated","single","widowed"),
ordered = TRUE)
smoking <- sample(c("yes","no"), n, replace = TRUE, prob = c(0.3,0.7))
education <- factor(sample(c("Low","Medium","High"), n, replace = TRUE, prob = c(0.3,0.4,0.3)),
levels = c("Low","Medium","High"), ordered = TRUE)
income <- factor(sample(c("Low","Medium","High"), n, replace = TRUE, prob = c(0.3,0.4,0.3)),
levels = c("Low","Medium","High"), ordered = TRUE)
# Hospital effect
hospital_levels <- paste0("hospital ", 1:15)
hospital <- factor(sample(hospital_levels, n, replace = TRUE),
levels = hospital_levels, ordered = TRUE)
hospital_offsets <- seq(-25,25,length.out=15) + rnorm(15,0,3)
names(hospital_offsets) <- hospital_levels
# Simulate EQ-5D dimensions (1–5)
simulate_dim <- function(mean_target, smoker, age, education) {
b <- round(rnorm(1, mean = mean_target, sd = 1))
b <- b + ifelse(smoker=="yes", 2, 0)
b <- b + ifelse(age %in% c("60-69","70-79","80-89","90+"), 1,
ifelse(age %in% c("18-29","30-39"), -1, 0))
b <- b + ifelse(education=="Low", 1, ifelse(education=="High", -1, 0))
round(min(max(b, 1), 5))
}
MO <- mapply(simulate_dim, 2.5, smoking, age, education)
SC <- mapply(simulate_dim, 2.8, smoking, age, education)
UA <- mapply(simulate_dim, 3.1, smoking, age, education)
PD <- mapply(simulate_dim, 3.4, smoking, age, education)
AD <- mapply(simulate_dim, 3.8, smoking, age, education)
# Simulate EQ VAS (0–100)
simulate_vas <- function(baseline, gender, income, education,
marital_status, treatment, smoker, age, hospital) {
b <- qnorm(runif(1, pnorm(-1.2), pnorm(1.2))) * 8 + baseline
b <- b + ifelse(gender=="male", 3, -3)
b <- pmin(pmax(b, 0), 100)
b <- b + ifelse(income=="Low",-15, ifelse(income=="High",15,0))
b <- b + ifelse(education=="Low",-10, ifelse(education=="High",7,0))
b <- b + ifelse(marital_status=="married",8,
ifelse(marital_status=="divorced/separated",-7,
ifelse(marital_status=="widowed",-10,0)))
b <- b + ifelse(treatment=="post",10,0)
b <- b + ifelse(smoker=="yes",-10,0)
b <- b - ifelse(age %in% c("60-69","70-79","80-89","90+"), 8, 0)
b <- b + hospital_offsets[as.character(hospital)]
round(min(max(b, 0),100))
}
EQ_VAS <- mapply(simulate_vas, baseline = 60,
gender, income, education, marital_status,
treatment, smoking, age, hospital)
# Assemble and add year + sine-wave effect
df <- data.frame(
MO, SC, UA, PD, AD,
gender, age, EQ_VAS, treatment,
marital_status, smoking, education, income,
hospital
) %>%
mutate(
year = rep(2018:2025, each = n()/8),
EQ_VAS = round(pmin(100, pmax(0,
EQ_VAS + sin(2*pi*(year-2018)/7)*10)))
)4. Data Preparation
Here, we convert our dataset from a wide format (where each EQ-5D dimension is in its own column) to a long format. This transformation makes it easier to create faceted plots in ggplot2.
Below you will see how the data looks before and after pivoting:
df_long <- df %>%
pivot_longer(
cols = c(MO,SC,UA,PD,AD),
names_to = "Dimension",
values_to = "Response"
)
# View data before pivot
head(df)## MO SC UA PD AD gender age EQ_VAS treatment marital_status smoking
## 1 5 5 5 5 5 male 60-69 56 pre married yes
## 2 5 5 5 5 5 male 60-69 42 pre divorced/separated yes
## 3 3 4 4 5 4 male 50-59 46 pre single yes
## 4 3 4 4 5 3 female 90+ 53 pre married no
## 5 2 1 1 5 2 male 18-29 52 post single no
## 6 5 5 5 5 5 female 80-89 25 pre married no
## education income hospital year
## 1 Low Medium hospital 11 2018
## 2 Low High hospital 5 2018
## 3 High Low hospital 10 2018
## 4 Medium High hospital 3 2018
## 5 High Low hospital 2 2018
## 6 Low Medium hospital 3 2018
## # A tibble: 6 × 12
## gender age EQ_VAS treatment marital_status smoking education income hospital
## <chr> <fct> <dbl> <ord> <ord> <chr> <ord> <ord> <ord>
## 1 male 60-69 56 pre married yes Low Medium hospita…
## 2 male 60-69 56 pre married yes Low Medium hospita…
## 3 male 60-69 56 pre married yes Low Medium hospita…
## 4 male 60-69 56 pre married yes Low Medium hospita…
## 5 male 60-69 56 pre married yes Low Medium hospita…
## 6 male 60-69 42 pre divorced/sepa… yes Low High hospita…
## # ℹ 3 more variables: year <int>, Dimension <chr>, Response <dbl>
5. EQ5D Dimension Profiles
Having followed the steps above, we are now ready to create data visualizations. We will start by plotting the EQ-5D dimensions to investigate distributions of responses across:
- Mobility
- Self-Care
- Usual Activities
- Pain/Discomfort
- Anxiety/Depression
Most of the plots we will create are facet plots, meaning each figure is subdivided into panels—one per dimension—so we can easily compare response patterns.
Bar Charts of EQ5D Dimensions
This section presents bar charts that visualize the proportion of respondents selecting each category.
Faceted Bar Chart for All Dimensions
How this works:
- The x-axis labels use
eq5d_labels_eng, and the facets are labeled viaeq5d_dim_eng. - To switch to Norwegian, change to
eq5d_labels_norandeq5d_dim_nor. - In order to create the bar chart, we need to calculate number of observations for each level within each dimension.
group_by(Dimension, Response) andsummarise(count = n())tally the number of observations in each response category for each dimension.group_by(Dimension)+mutate(proportion = count / sum(count))converts counts into proportions, so that within each facet the bars sum to 1 (i.e., 100%).- We ungroup before plotting because the summary columns are now fixed, and geom_bar(stat=“identity”) uses our precomputed proportion for bar heights.
scale_x_discrete()is used to split the x variable labels, to avoid overplotting
df_bar <- df_long %>%
group_by(Dimension, Response) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(Dimension) %>%
mutate(proportion = count / sum(count)) %>%
ungroup()
p_bar_all <- ggplot(df_bar,
aes(
x = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng),
y = proportion,
fill = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng)
)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(proportion*100,1), "%")),
vjust = -0.3, size = 3) +
facet_wrap(~ Dimension, labeller = labeller(Dimension = eq5d_dim_eng)) +
scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0,0.05))) +
scale_fill_brewer(palette = "Blues", name = "", labels = eq5d_labels_eng) +
labs(title = "Distribution of EQ-5D responses", x = "", y = "Percent") +
scale_x_discrete(
labels = function(x) stringr::str_wrap(x, width = 8)
) +
theme_nice() +
theme(
axis.text.x = element_text(size = 10)
)
p_bar_allBar Chart with Group Comparison by Gender
To compare distributions across gender, we add gender to the grouping and map it to the fill aesthetic.
How this works:
- Including gender in
group_by()ensures that proportions are calculated separately for males and females within each dimension. position = position_dodge()places bars side-by-side instead of stacking.- The mapping
fill = genderautomatically assigns each gender a distinct color from the Chicago palette.
df_bar_gender <- df_long %>%
group_by(Dimension, Response, gender) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(Dimension, gender) %>%
mutate(proportion = count / sum(count)) %>%
ungroup()
p_bar_gender <- ggplot(df_bar_gender,
aes(
x = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng),
y = proportion,
fill = gender
)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_text(aes(label = paste0(round(proportion*100,1), "%")),
position = position_dodge(width = 0.9), vjust = -0.3, size = 3) +
facet_wrap(~ Dimension, labeller = labeller(Dimension = eq5d_dim_eng)) +
scale_fill_uchicago() +
labs(title = "EQ-5D responses by gender", x = "", y = "Percent", fill = "Gender") +
scale_x_discrete(
labels = function(x) stringr::str_wrap(x, width = 8)
) +
scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0,0.05))) +
theme_nice() +
theme(
axis.text.x = element_text(size = 10)
)
p_bar_genderBar Chart for a Single Dimension (Mobility)
How this works:
- We start from
df_bar, which already contains the precomputed proportion for each (Dimension, Response) pair. filter(Dimension == "MO")isolates only the Mobility rows. Adapt to choose the dimension of interest- In
aes(),x = factor(Response…)ensures the five response levels are ordered and labeled correctly,y = proportionuses our precomputed proportions, andfillcolors each bar by its response level. geom_bar(stat = "identity") draws bars with heights equal to our proportion values (instead of letting ggplot count the data).geom_text()adds the percentage labels just above each bar.
df_bar_mo <- df_bar %>% filter(Dimension == "MO")
p_bar_mo <- ggplot(df_bar_mo,
aes(
x = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng),
y = proportion,
fill = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng)
)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(proportion*100,1), "%")),
vjust = -0.3, size = 3) +
scale_fill_brewer(palette = "Blues", name = "", labels = eq5d_labels_eng) +
labs(title = "Mobility (MO) response distribution", x = "", y = "Percent") +
scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0,0.05))) +
theme_nice()
p_bar_moBar Chart for Mobility by Gender
How this works:
- We begin with
df_long, filter toDimension == "MO", thengroup_by(Response, gender)to get counts per response per gender. - Re-grouping by gender and computing
proportion = count / sum(count) gives us the share of each response within each gender. - In the plot,
fill = genderassigns a distinct color to each gender, andposition = position_dodge()places their bars side-by-side. geom_text()is also dodged so the percentage labels align correctly above each gender’s bar
df_bar_mo_g <- df_long %>%
filter(Dimension == "MO") %>%
group_by(Response, gender) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(gender) %>%
mutate(proportion = count / sum(count)) %>%
ungroup()
p_bar_mo_g <- ggplot(df_bar_mo_g,
aes(
x = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng),
y = proportion,
fill = gender
)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_text(aes(label = paste0(round(proportion*100,1), "%")),
position = position_dodge(width = 0.9), vjust = -0.3, size = 3) +
scale_fill_uchicago() +
labs(title = "MO response distribution by gender", x = "", y = "Percent", fill = "Gender") +
scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0,0.05))) +
theme_nice()
p_bar_mo_gStacked Bar Charts
Stacked bar charts are a handy way to visualize the distribution of responses across all five dimensions, and to compare groups at the same time. They show the composition of each dimension’s responses as 100% stacks, and can be faceted and/or grouped.
How this works:
- We recode
Responseto a factor with proper labels so the stacks appear in the correct order. group_by(Dimension, Response)(and +smokingfor the grouped version) aggregates counts.mutate(pct = count / sum(count) * 100)converts counts into percentages for each Dimension (or Dimension + smoking).- With
position = "stack",geom_bar()layers the categories on top of each other to reach 100%. facet_wrap(~ Dimension)splits the plot into panels by dimension.
# Prepare data for the stacked bar chart
df_stack <- df_long %>%
mutate(Response = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng)) %>%
group_by(Dimension, Response) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(Dimension) %>%
mutate(pct = count / sum(count) * 100) %>%
ungroup()
# Create the stacked bar chart
p_stack_all <- ggplot(df_stack, aes(x = Dimension, y = pct, fill = Response)) +
geom_bar(stat = "identity", position = "stack", colour = "white") +
geom_text(aes(label = paste0(round(pct,1), "%")),
position = position_stack(vjust = 0.5), size = 3) +
scale_y_continuous(labels = function(x) paste0(x, "%"),
expand = expansion(mult = c(0,0.02))) +
scale_fill_brewer(palette = "Blues", name = "", labels = eq5d_labels_eng) +
labs(title = "Stacked EQ-5D Responses (no group)",
x = "Dimension",
y = "Percent") +
theme_nice()
p_stack_allStacked Bar Chart by Smoking
Here we break down the stacked bars by smoking status, faceted by dimension.
Only change from previous plot:
- add
smokingto thegroup_by()arguments - add
smokingto thex =argument
# Prepare data for the grouped stacked bar chart
df_stack_smoke <- df_long %>%
mutate(Response = factor(Response, levels = eq5d_levels, labels = eq5d_labels_eng)) %>%
group_by(Dimension, smoking, Response) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(Dimension, smoking) %>%
mutate(pct = count / sum(count) * 100) %>%
ungroup()
# Create the facet-wrapped stacked bar chart
p_stack_smoke <- ggplot(df_stack_smoke, aes(x = smoking, y = pct, fill = Response)) +
geom_bar(stat = "identity", position = "stack", colour = "white") +
#uncomment to add percentages as text
#geom_text(aes(label = paste0(round(pct,1), "%")),
# position = position_stack(vjust = 0.5), size = 3) +
facet_wrap(~ Dimension, labeller = labeller(Dimension = eq5d_dim_eng)) +
scale_y_continuous(labels = function(x) paste0(x, "%"),
expand = expansion(mult = c(0,0.02))) +
scale_fill_brewer(palette = "Blues", name = "", labels = eq5d_labels_eng) +
labs(title = "Stacked EQ-5D Responses by Smoking",
x = "Smoking Status",
y = "Percent") +
theme_nice()
p_stack_smoke6. Creating EQ VAS Plots
The EQ VAS score allows for creating a variety of plots, depending on interest. Here, the script will showcase how to look at distirbutions across different groups (histograms, densityplots, boxplots, ridgline plot), and how to examine linear (and curvelinear) relationships between variables of interest and EQ VAS
EQ VAS Histogram (No Group)
This histogram shows the overall distribution of EQ VAS scores.
How this works:
aes(x = EQ_VAS)maps the VAS scores to the x-axis.geom_histogram(binwidth = 5)groups data into bins of width 5; adjust binwidth to refine granularity.fill = "grey"sets the colour of the bars, whereascolour = "whitecolours the gaps between. Set the colour you like.
p_vas_hist <- ggplot(df, aes(x = EQ_VAS)) +
geom_histogram(binwidth = 5, fill = "grey", colour = "white") +
labs(title = "EQ VAS Histogram",
x = "EQ VAS Score",
y = "Count") +
theme_nice()
p_vas_histEQ VAS Histogram by Education
Compare the VAS distribution across education levels with side-by-side bars.
How this works:
fill = educationassigns each level a color.position = "dodge"arranges bars side-by-side.- The Chicago palette is applied via
scale_fill_uchicago().
p_vas_hist_ed <- ggplot(df, aes(x = EQ_VAS, fill = education)) +
geom_histogram(position = "dodge", binwidth = 5, colour = "white") +
scale_fill_uchicago() +
labs(title = "EQ VAS Histogram by Education",
x = "EQ VAS",
y = "Count",
fill = "Education") +
theme_nice()
p_vas_hist_edEQ VAS Density by Education
Overlay density curves for each education level, with group-specific mean lines. How this works:
geom_density()sets the plot to a density plot, and requires only an x variable specified.- Mapping
fillandcolourtoeducationdraws separate density curves by levels of education. - Transparency (
alpha) helps visualize overlaps. - Mean lines added with
geom_vline()highlight group means
p_vas_den_ed <- ggplot(df, aes(x = EQ_VAS, fill = education, colour = education)) +
geom_density(alpha = 0.5) +
scale_fill_uchicago() +
scale_colour_uchicago() +
labs(title = "EQ VAS Density by Education",
x = "EQ VAS",
y = "Density") +
theme_nice()
# Add group-specific mean lines
group_vas_means <- df %>%
group_by(education) %>%
summarise(mean_vas = mean(EQ_VAS, na.rm = TRUE), .groups = "drop")
p_vas_den_ed <- p_vas_den_ed +
geom_vline(data = group_vas_means,
aes(xintercept = mean_vas, colour = education),
linetype = "dashed", linewidth = 0.5)
p_vas_den_edEQ VAS Boxplot by Hospital
Boxplots are especially handy when you need to compare many groups at once—they show medians, IQRs, and outliers compactly.
How this works:
aes(x = EQ_VAS, y = hospital)rotates the usual orientation so hospitals run vertically.- Boxes span the 25th–75th percentiles; whiskers indicate variability, and points beyond are outliers.
coord_cartesian(xlim = c(0,100))forces the x-axis to span the entire theoretical range of the EQVAS score.
p_vas_box <- ggplot(df, aes(x = EQ_VAS, y = hospital)) +
geom_boxplot(colour = "black", fill = "#6495ED") +
coord_cartesian(xlim = c(0,100)) +
labs(title = "EQ VAS Boxplot by Hospital",
x = "EQ VAS Score",
y = "Hospital") +
theme_nice()
p_vas_boxEQ VAS Ridgeline by Hospital
Ridgeline plots overlay multiple density estimates, making it easy to compare distribution shapes across many groups.
How this works:
aes(y = hospital)creates a separate density for each hospital.fill = after_stat(x)applies a gradient based on the x-value (VAS score).rel_min_heightandscalecontrol overlap and vertical spacing.scale_fill_viridis_c(option = "viridis")creates a nice colour gradient (enter?scale_fill_viridisin console to learn more)
p_vas_ridge <- ggplot(df, aes(x = EQ_VAS, y = hospital, fill = after_stat(x))) +
geom_density_ridges_gradient(scale = 1, rel_min_height = 0.01) +
scale_fill_viridis_c(option = "viridis", name = "EQ VAS") +
labs(title = "EQ VAS Ridgeline by Hospital",
x = "EQ VAS Score",
y = "Hospital") +
theme_nice()
p_vas_ridge## Picking joint bandwidth of 6.11
EQ VAS Trend Plots
We now examine trends over time using both linear and smooth fits.
Observed Mean EQ VAS by Year & Gender
Plot the annual mean VAS for each gender with lines and points.
How this works:
stat_summary(fun = mean, geom = "line")connects yearly means.stat_summary(fun = mean, geom = "point")overlays mean points.Colorandfillmapping to gender differentiates the series.
p_trend_obs <- ggplot(df, aes(x = year, y = EQ_VAS, colour = gender, fill = gender)) +
stat_summary(fun = mean, geom = "line", linewidth = 1) +
stat_summary(fun = mean, geom = "point", shape = 21, size = 5) +
scale_fill_uchicago() +
scale_colour_uchicago() +
labs(title = "Observed Mean EQ VAS by Year & Gender",
x = "Year",
y = "Mean EQ VAS") +
theme_nice()
p_trend_obsLinear Trend (LM) EQ VAS by Gender
Add linear regression lines and confidence bands; you can swap or customize the smoother.
How this works & Tweaks:
method = "lm"fits OLS regression lines; use method = “loess” for local smoothing.- To fit polynomials, add
formula = y ~ poly(x, 2)orpoly(x,3)etc. inside geom_smooth(). - Set
se = FALSEto hide confidence intervals.
p_trend_lm <- ggplot(df, aes(x = year, y = EQ_VAS, colour = gender, fill = gender)) +
geom_smooth(method = "lm", se = TRUE) +
scale_fill_uchicago() +
scale_colour_uchicago() +
labs(title = "Linear Trend EQ VAS by Gender",
x = "Year",
y = "EQ VAS") +
theme_nice()
p_trend_lm## `geom_smooth()` using formula = 'y ~ x'
LOESS Trend EQ VAS by Gender
Capture non-linear patterns over time with LOESS smoothing; adjust span to control wiggliness.
How this works & tweaks:
- span controls the degree of smoothing: smaller = more sensitive, larger = smoother.
- For large datasets, LOESS may switch internally to GAM—explicitly forcing LOESS ensures consistency.
p_trend_loess <- ggplot(df, aes(x = year, y = EQ_VAS, colour = gender, fill = gender)) +
geom_smooth(method = "loess", se = TRUE, span = 0.75) +
scale_fill_uchicago() +
scale_colour_uchicago() +
labs(title = "LOESS Trend EQ VAS by Gender",
x = "Year",
y = "EQ VAS") +
coord_cartesian(ylim = c(0,100)) +
theme_nice()
p_trend_loess## `geom_smooth()` using formula = 'y ~ x'
LOESS Trend of EQ VAS by Education & Age
Facet the LOESS trend by age group and color by education level. How this works:
facet_wrap(~ age)splits panels by age group.Colorandfillmapped to education differentiate trend lines within each pane
p_trend_loess_age <- ggplot(df, aes(x = year, y = EQ_VAS, colour = education, fill = education)) +
geom_smooth(method = "loess", se = TRUE, alpha = 0.2) +
facet_wrap(~ age) +
coord_cartesian(ylim = c(0,100)) +
scale_fill_uchicago() +
scale_colour_uchicago() +
labs(title = "LOESS Trend of EQ VAS by Education & Age",
x = "Year",
y = "EQ VAS") +
theme_nice()
p_trend_loess_age## `geom_smooth()` using formula = 'y ~ x'
Tips & Tweaks
- Aesthetic mappings
(aes()): map variables tox,y,fill, andcolour. - Data prep: use
group_by(),summarise(), andmutate()to compute counts, proportions, or means before plotting. - Bar charts: switch
stat="identity"↔︎stat="count", adjust position ("dodge","stack","fill"), or recode factors for order. - Histograms: tweak
binwidth,breaks, orboundaryto refine. Useposition="fill"for relative frequencies. - Density plots: control bandwidth via
bw, setalphafor transparency. - Smoothers: in
geom_smooth(), change method (e.g.,"lm","loess","gam"), tweak span, or supply formula for polynomial fits. - Faceting: use
facet_wrap(~ variable)orfacet_grid(rows ~ cols)for multi-panel layouts. - Color palettes: experiment with palettes such as
scale_fill_brewer(),scale_fill_viridis_d(), orscale_fill_uchicago(), or define custom palettes withscale_fill_manual(). - Saving: uncomment
ggsave()calls to export figures; adjust width, height, and dpi to match your publication requirements.