Practical 3 - Medical Imaging: Detecting Diabetic Retinopathy with CNNs

Submission. Knit your Rmd file into HTML and submit it via Brightspace. Include a link to your git repository (GitHub or git.cs.dal.ca) as a comment on your submission.

0 Setup

Unlike the tabular and text modules, deep learning on images leans on libraries written in C++/CUDA. These can be a bit challenging to install and use via R but the mlverse stack is a modern and supposedly self-contained and cross-platform set of R bindings for some of the main deep learning/image libraries:

torch is one of the main current neural network library (the other being tensorflow) and provides lots of functions for defining neural-network layers and converting them to efficient code for training and running on GPUs/CPUs. The R library is really just a set of R functions which let you run underlying C++/CUDA from within R (these are known as bindings). When you install torch in R, it downloads its own copy of LibTorch on first use.
torchvision is a set of efficient helper functions for handling images in neural networks (e.g., image transforms, dataset helpers, and access to pre-trained networks).
magick is a tool for viewing and manipulating images which works on Mac, Windows, or Linux. These tasks can be surprisingly operating system specific usually.

Copy the following into the console to install dependencies (make sure it finishes correctly):

install.packages(c(
  "torch", "torchvision", "luz",   # deep learning stack
  "tidyverse", "here", "fs",       # data wrangling + reproducible paths
  "magick", "gridExtra",           # image display
  "yardstick"                      # evaluation metrics (also used in Lab 1)
))

Then copy the following into the console to finish the torch install (it will take a few minutes to complete):

install_torch()

If that all worked, the following should run without an error as a normal code-block:

library(torch)        # tensors, autograd, neural-network layers
library(torchvision)  # image transforms, datasets, pretrained models
library(luz)          # high-level training loop (fit/predict/metrics)
library(tidyverse)    # dplyr, ggplot2, readr, purrr, ...
library(here)         # reproducible file paths
library(fs)           # filesystem helpers
library(magick)       # image display only
library(gridExtra)    # arranging plots in a grid
library(yardstick)    # ROC-AUC, sensitivity, specificity, confusion matrix

torch_manual_seed(GLOBAL_SEED)
set.seed(GLOBAL_SEED)

1 Introduction

Medical imaging is one of the most data-rich corners of medicine with a single trip to the emergency room able to generate hundreds of gigabytes across X-ray, CT, MRI, PET and ultrasound. As covered in the lecture, the shape of that data ranges from a single number (e.g., optical density) up to 4D tensors (a PET-CT time-series), and the questions we ask ranges from preprocessing and segmentation through to detection, classification and outcome prediction.

This practical focuses on 2D colour fundus photographs (CFPs) of the retina and the task of detecting diabetic retinopathy (DR) i.e., damage to the retinal blood vessels caused by diabetes. This is one of the leading global causes of preventable blindness. Manual screening is done accurately by ophthalmologists but does not easily scale (especially across resource-limited settings) to the ~500 million people living with diabetes. Predictive models may help fill this scaling gap.

We will very loosely mirror the analyses in the two assigned papers and refer back to them in the questions:

Gulshan et al. (2016, JAMA) trained an Inception-v3 CNN (with ImageNet transfer learning) to detect referable DR, reaching an AUC of ~0.99 and reporting two clinical operating points (one tuned for high sensitivity, one for high specificity).
Arcadu et al. (2019, npj Digital Medicine) went further and tried to predict future DR progression from a single visit, using field-specific Inception-v3 networks aggregated by a random forest, and interpreted them with attribution maps and SHAP.

Learning objectives: by the end of this practical you should be able to:

Load, inspect and normalise medical image data as torch tensors
Build, train and evaluate a convolutional neural network (CNN) from scratch with luz
Apply data augmentation and explain its role as a prior on image variability
Use transfer learning to apply an ImageNet-pretrained network to the problem
Evaluate a clinical classifier with ROC-AUC, sensitivity/specificity and operating points
Connect model performance to the real-world hurdles of clinical deployment

2 The Data

2.1 Download and cache

We use a curated subset of fundus photographs graded for DR severity on the 5-point International Clinical Diabetic Retinopathy (ICDR) scale (the clinical cousin of the ETDRS scale used by Arcadu et al.):

Grade	Severity
0	No DR
1	Mild non-proliferative DR
2	Moderate non-proliferative DR
3	Severe non-proliferative DR
4	Proliferative DR

As in Lab 1, we download a prepared archive once and cache it locally so the chunk only re-runs if the file is missing.

data_dir  <- here()
cache_dir <- fs::path(data_dir, "cache")
fs::dir_create(cache_dir)

data_url <- "https://maguire-lab.github.io/health_data_science_research_2026/static_files/practicals/dr_fundus.zip"
zip_path <- fs::path(cache_dir, "dr_fundus.zip")
img_root <- fs::path(data_dir, "dr_fundus")

if (!fs::dir_exists(img_root)) {
  if (!fs::file_exists(zip_path)) {
    download.file(data_url, zip_path, mode = "wb", quiet = TRUE)
  }
  unzip(zip_path, exdir = data_dir)
}

# The archive contains:
#   dr_fundus/images/<id>.jpg   -- the fundus photographs
#   dr_fundus/labels.csv        -- columns: image, split, grade
fs::dir_ls(img_root)

## /home/fin/Documents/teaching/2025_2026/health_data_science_research_2026/static_files/practicals/redevelop/image_lab/dr_fundus/images
## /home/fin/Documents/teaching/2025_2026/health_data_science_research_2026/static_files/practicals/redevelop/image_lab/dr_fundus/labels.csv

2.2 Load the labels

labels <- read_csv(fs::path(img_root, "labels.csv")) |>
  mutate(
    path  = fs::path(img_root, "images", image),
    grade = as.integer(grade)
  ) |>
  # Simplify to the two extremes of the ICDR scale: grade 0 vs grade 4
  filter(grade %in% c(0L, 4L)) |>
  mutate(
    grade     = factor(grade, levels = c(0, 4)),
    # binary target: 1 = severe (grade 4, certainly referable), 0 = healthy (grade 0)
    referable = as.integer(grade == 4)
  )

glimpse(labels)

## Rows: 230
## Columns: 5
## $ image     <chr> "a_IDRiD_005.jpg", "a_IDRiD_006.jpg", "a_IDRiD_007.jpg", "a_…
## $ split     <chr> "train", "train", "train", "train", "train", "train", "train…
## $ grade     <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ path      <fs::path> "/home/fin/Documents/teaching/2025_2026/health_data_sci…
## $ referable <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …

To keep this teaching example clean and the dataset as small as possible, we frame the task as the easiest possible version of the clinical problem: telling the two extremes apart - a clearly-healthy eye (grade 0, no DR) versus the most severe (grade 4, proliferative DR). We will NOT use cross-validation this time and instead use a separate validation and test dataset. This reduces the number of times we have to train these models.

Q1 (2 marks): In your own words, what is diabetic retinopathy, and what is the clinical benefit of being able to flag referable DR automatically from a photograph? Why might a screening tool prefer to err towards over-referral? (Hint: see the opening paragraphs of Gulshan et al.)

Q2 (2 marks): Based on glimpse() and the data dictionary above, what information about how these images were captured is missing that could matter for an analysis? Give two examples and say why each matters.

2.3 Class balance

labels |>
  mutate(label = if_else(referable == 1, "Severe DR", "Healthy")) |>
  count(split, label) |>
  ggplot(aes(label, n, fill = label)) +
  geom_col() +
  facet_wrap(~split) +
  scale_fill_manual(values = c("Healthy" = "grey60",
                               "Severe DR"  = "firebrick")) +
  labs(x = NULL, y = "Number of Images",
       title = "Healthy vs Severe DR Balance per Split") +
  theme(legend.position = "none")

Q3 (1 marks): What sort of ML issues may arise with a dataset that has this sort of label/class balance?

2.4 Look at the images

It is always worth looking directly at the raw data before modelling. Each time you re-run this block you will get a different random sample of images.

samples <- labels |>
  filter(split == "train") |>
  slice_sample(n = 6)

plots <- map2(samples$path, samples$grade, \(p, g) {
  magick::image_ggplot(magick::image_read(p)) +
    ggtitle(paste("Grade", g)) +
    theme(plot.title = element_text(size = 9))
})

gridExtra::grid.arrange(grobs = plots, nrow = 2, ncol = 3)

Q4 (3 marks): Looking at these images, identify two properties that vary between photographs of the same class that we may want to normalise/process before training? Can you see any consistent differences between Grade 4 and Grade 0 photographs? (Rerun the code-block a few times to get different images).

3 Pre-processing

3.1 From image file to tensor

A colour image is a 3D tensor of shape channels × height × width. torch expects a 4D batch of shape images × channels × height × width. We build a small, lazy pipeline that reads each file only when needed (so we never hold the whole dataset in memory at once - although this specific dataset is small enough to do that on most computers) and:

Loads the JPEG into a [0, 1] tensor,
Resizes it to 224 × 224 (most neural networks require a single consistent input size),
Normalises each channel using the mean colour intensity and standard deviation across the entire ImageNet dataset.

We use ImageNet numbers in step 3 because in Section 6 we reuse a network pretrained on ImageNet, and a pretrained network expects its inputs scaled the same way as the training data.

imagenet_mean <- c(0.485, 0.456, 0.406) # manually pulled from the internet... "magic numbers" that you probably shouldn't trust normally
imagenet_std  <- c(0.229, 0.224, 0.225)

# transform for evaluation / training-from-scratch: deterministic resize + normalise
eval_transform <- function(path) {
  base_loader(path) |>            # H x W x C array in [0, 1]
    transform_to_tensor() |>      # -> C x H x W float tensor
    transform_resize(c(224, 224)) |>
    transform_normalize(mean = imagenet_mean, std = imagenet_std)
}

3.2 A `torch` dataset and dataloaders

We need to do some special code to load our transformed data into a torch network for training. For efficiency, we train neural networks using “batches” of training data. This means we have it make predictions on a set of images/inputs all together and then update all the weights at once based on how well it did across the batch as a whole.

fundus_dataset <- dataset(
  name = "FundusDataset",
  initialize = function(df, transform) {
    self$paths     <- df$path
    # torch uses 1-based class indices, so referable {0,1} -> {1,2}
    self$y         <- torch_tensor(df$referable + 1L, dtype = torch_long())
    self$transform <- transform
  },
  .getitem = function(i) {
    list(x = self$transform(self$paths[i]), y = self$y[i])
  },
  .length = function() {
    length(self$paths)
  }
)

train_df <- labels |> filter(split == "train")
valid_df <- labels |> filter(split == "valid")
test_df  <- labels |> filter(split == "test")

train_ds <- fundus_dataset(train_df, eval_transform)
valid_ds <- fundus_dataset(valid_df, eval_transform)
test_ds  <- fundus_dataset(test_df,  eval_transform)

batch_size   <- 16
train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE)
valid_dl <- dataloader(valid_ds, batch_size = batch_size, shuffle = FALSE)
test_dl  <- dataloader(test_ds,  batch_size = batch_size, shuffle = FALSE)

Let’s confirm the shape of one batch:

one_batch <- train_dl |> dataloader_make_iter() |> dataloader_next()
dim(one_batch$x)   # expect: batch_size x 3 x 224 x 224

## [1]  16   3 224 224

Q5 (2 marks): Explain what each of the four numbers in the batch dimension means. Why is the channel dimension 3 here? If, instead of CFP images, we were using a 3D MRI grayscale image that was 1000 pixels/voxels in each dimension, what would be the dimensions output by the above command?

Q6 (1 marks): If we didn’t want to use pre-trained ImageNet networks then we would still probably want to normalise the intensities in our dataset. What issue might we cause if we normalised using the mean and standard deviation of the pixel intensities across the whole dataset (i.e., train, test, and validate combined)?

4 A CNN from Scratch

We first train a small CNN from scratch, with no outside knowledge. Convolutions slide learnable filters across the image to capture local spatial patterns (see lecture); stacking convolutional layers, non-linearities (sigmoid/ReLUs), and pooling blocks builds up “increasingly higher-order learnt representations.”

Why convolutions?

A fully-connected layer would give every input pixel its own weight to every output unit. For a 224×224×3 image that is ~150,000 inputs per output unit far too many parameters to learn from a few thousand images. This would also ignore the important concept that pixels which are near each other in an image are related to one another. A convolutional layer enforce two structural assumptions that align with how images actually work:

Locality. Each output looks at only a small patch of the input - here a kernel_size = 3 filter sees a 3×3 neighbourhood at a time. A learnable filter is just a small grid of weights that is multiplied element-wise with the patch and summed to a single number; sliding it across the whole image produces a feature map highlighting wherever the pattern encoded in the convolutional filter (an edge, a blob, a particular colour gradient) occurs.
Weight sharing. The same filter is reused at every position. So a filter that has learned to detect, say, a microaneurysm-like dark spot will fire wherever that spot appears - the network does not have to relearn it for every corner of the image. The fancy term for this is translation equivariance.

Each conv layer learns many filters at once — the second number in nn_conv2d(3, 16, …) is the number of output filters, so this layer turns the 3 colour channels into 16 feature maps. As we go deeper we ask for more filters (16 to 32 to 64) because later layers compose the simple early patterns into rarer, higher-order ones (e.g., edges to textures to specific lesion-shaped motifs). Each conv is followed by a nn_relu(), a simple max(0, x) nonlinearity like any normal neural network.

Why pooling?

nn_max_pool2d(2) slides a 2×2 window across each feature map and keeps only the maximum value in each window, halving the height and width (so 224 to 112 to 56 …). This does three useful things:

Translation invariance. Once a filter has reported “an edge is present somewhere in this patch,” we often do not care about its exact pixel position. Keeping the strongest response and discarding precise location makes the network robust to the lesion being shifted by a few pixels. This is useful, since fundus images are rarely perfectly aligned to one another (i.e., “registered”).
A growing receptive field. After pooling, each value summarises a larger region of the original image, so the next 3×3 filter effectively “sees” a bigger area. This is how a stack of small filters can end up responding to large structures.
Cheaper computation. Halving the resolution quarters the number of values the next layer must process, which is what lets us afford more filters as we go deeper.

The final nn_adaptive_avg_pool2d(c(1, 1)) is global average pooling: it collapses each of the 64 feature maps to a single number corresponding to its average activation over the whole image. This length-64 summary vector finally feeds into the classifier part of network.

We define this “simple” CNN using the following code:

simple_cnn <- nn_module(
  "SimpleCNN",
  initialize = function() {
    self$features <- nn_sequential(
      nn_conv2d(3, 16, kernel_size = 3, padding = 1), nn_relu(), nn_max_pool2d(2),
      nn_conv2d(16, 32, kernel_size = 3, padding = 1), nn_relu(), nn_max_pool2d(2),
      nn_conv2d(32, 64, kernel_size = 3, padding = 1), nn_relu(),
      nn_adaptive_avg_pool2d(c(1, 1))   # global average pool -> 64 x 1 x 1
    )
    self$classifier <- nn_sequential(
      nn_dropout(p = 0.5),
      nn_linear(64, 2)                  # 2 logistic functions to get the probability of grade 0 and grade 4
    )
  },
  forward = function(x) {
    x <- self$features(x)
    x <- torch_flatten(x, start_dim = 2)
    self$classifier(x)
  }
)

We then use luz to wrap the training loop. It minimise cross-entropy and tracks accuracy on the separate validation set each time we train on all of our training data batches (also known as an epoch).

fitted_scratch <- simple_cnn |>
  setup(
    loss      = nn_cross_entropy_loss(),
    optimizer = optim_adam,
    metrics   = list(luz_metric_accuracy())
  ) |>
  fit(train_dl, epochs = 5, valid_data = valid_dl, verbose = FALSE)

luz::get_metrics(fitted_scratch) |>
  ggplot(aes(epoch, value, colour = set)) +
  geom_line() + geom_point() +
  facet_wrap(~metric, scales = "free_y") +
  labs(title = "Training-from-scratch learning curves",
       x = "Epoch", y = NULL, colour = NULL)

Q7 (2 marks): Look at the validation accuracy and loss across epochs. Do you think small network is actually learning to separate referable from non-referable eyes, why or why not?

5 Data Augmentation

When real training data is scarce, augmentation synthesises plausible new examples by applying random transformations (i.e., inducing a prior on how we expect images to vary).

demo_path <- train_df$path[1]
orig <- base_loader(demo_path) |> transform_to_tensor()

show_tensor <- function(t, title) {
  arr <- as.array(t$permute(c(2, 3, 1)))           # C,H,W -> H,W,C for plotting
  arr <- (arr - min(arr)) / (max(arr) - min(arr))  # rescale to [0,1] for display
  magick::image_ggplot(magick::image_read(as.raster(arr))) +
    ggtitle(title) + theme(plot.title = element_text(size = 9))
}

aug1 <- orig |> transform_random_horizontal_flip(p = 1)
aug2 <- orig |> transform_color_jitter(brightness = 0.4, contrast = 0.4)

gridExtra::grid.arrange(
  show_tensor(orig, "Original"),
  show_tensor(aug1, "Horizontal flip"),
  show_tensor(aug2, "Colour jitter"),
  nrow = 1
)

We add augmentation only to the training pipeline (never to validation/test - we evaluate on untouched images to avoid overfitting):

train_transform <- function(path) {
  base_loader(path) |>
    transform_to_tensor() |>
    transform_random_resized_crop(size = c(224, 224), scale = c(0.8, 1.0)) |>
    transform_random_horizontal_flip(p = 0.5) |>
    transform_color_jitter(brightness = 0.2, contrast = 0.2) |>
    transform_normalize(mean = imagenet_mean, std = imagenet_std)
}

Q8a (2 marks): These augmentations have some real potential drawbacks especially with data from eyes! Explain why flipping an eye horizontally may be misleading in this context then copy the above code to create a new augmentation pipeline train_transform_fixed which removes the horizontal flip and adds a small random rotation (transform_random_rotation(degrees = ...)). (Hint: use ?transform_random_rotation to figure out the parameters for this function).

## Uncomment, complete the rotation, and rebuild the augmented training dataloader.
# train_transform <- function(path) {
#   base_loader(path) |>
#     transform_to_tensor() |>
#     transform_random_resized_crop(size = c(224, 224), scale = c(0.8, 1.0)) |>
#     transform_random_horizontal_flip(p = 0.5) |>
#     transform_random_rotation(degrees = ___) |>
#     transform_color_jitter(brightness = 0.2, contrast = 0.2) |>
#     transform_normalize(mean = imagenet_mean, std = imagenet_std)
# }

Now apply your new augmentations - this will use the incorrect augmentations if you haven’t completed Q8!

train_ds_aug <- fundus_dataset(train_df, train_transform)
train_dl_aug <- dataloader(train_ds_aug, batch_size = batch_size, shuffle = TRUE)

Now we re-train the exact same simple_cnn from Section 4, changing only the data it sees: the augmented train_dl_aug in place of the un-augmented train_dl. Everything else (architecture, loss, optimiser, epochs) is held fixed so the comparison is like-for-like.

fitted_scratch_aug <- simple_cnn |>
  setup(
    loss      = nn_cross_entropy_loss(),
    optimizer = optim_adam,
    metrics   = list(luz_metric_accuracy())
  ) |>
  fit(train_dl_aug, epochs = 5, valid_data = valid_dl, verbose = FALSE)

bind_rows(
  luz::get_metrics(fitted_scratch)     |> mutate(run = "No Augmentation"),
  luz::get_metrics(fitted_scratch_aug) |> mutate(run = "Augmentation")
) |>
  filter(set == "valid") |>
  ggplot(aes(epoch, value, colour = run)) +
  geom_line() + geom_point() +
  facet_wrap(~metric, scales = "free_y") +
  labs(title = "Simple CNN: validation curves with vs without augmentation",
       x = "Epoch", y = NULL, colour = NULL)

Q8b (2 marks): Compare the validation curves of the augmented and un-augmented runs of the same network. Did augmentation help, hurt, or make little difference here, and why might that be the case for a dataset and network of this size?

6 Transfer Learning

Training from scratch on a few hundred images is hopeless. The breakthrough in both assigned papers was transfer learning: start from a network already trained on the 1.2-million-image, 1000-class ImageNet dataset, keep its learned edge/texture/shape detectors, and only re-train the classification part of the network for our task.

Here we load a pretrained ResNet-18, freeze its convolutional backbone, and replace its final fully-connected layer with a fresh 2-class head.

train_ds_aug <- fundus_dataset(train_df, train_transform)
train_dl_aug <- dataloader(train_ds_aug, batch_size = batch_size, shuffle = TRUE)


resnet_transfer <- nn_module(
  "ResNetTransfer",
  initialize = function() {
    self$model <- model_resnet18(pretrained = TRUE)
    # freeze the backbone: only the new head will be trained
    for (par in self$model$parameters) par$requires_grad_(FALSE)
    n_in <- self$model$fc$in_features
    self$model$fc <- nn_linear(n_in, 2)   # new, trainable classification head
  },
  forward = function(x) {
    self$model(x)
  }
)

fitted_tl <- resnet_transfer |>
  setup(
    loss      = nn_cross_entropy_loss(),
    optimizer = optim_adam,
    metrics   = list(luz_metric_accuracy())
  ) |>
  fit(train_dl_aug, epochs = 5, valid_data = valid_dl, verbose = FALSE)

luz::get_metrics(fitted_tl) |>
  ggplot(aes(epoch, value, colour = set)) +
  geom_line() + geom_point() +
  facet_wrap(~metric, scales = "free_y") +
  labs(title = "Transfer-learning learning curves",
       x = "Epoch", y = NULL, colour = NULL)

Q9 (2 marks): What is ImageNet, and why would convolution and pooling layers learnt on photos of cats, cars and mushrooms improve our ability to predict DR from CFP images (none of which are in the ImageNet dataset)?

7 Evaluating Like a Clinician

Accuracy is misleading under class imbalance (Lab 1… may help with an earlier question in this lab…). Clinically, and in both papers, models are judged on the ROC curve, the AUC, and a choice of operating point trading off sensitivity (catching disease) against specificity (avoiding false alarms).

We get predicted probabilities of grade 4 DR on the held-out test set:

test_logits <- predict(fitted_tl, test_dl)
test_probs  <- as.numeric(as.array(nnf_softmax(test_logits, dim = 2)[, 2]))

eval_df <- test_df |>
  mutate(
    .pred_referable = test_probs,
    truth = factor(referable, levels = c(1, 0),
                   labels = c("Referable", "Not referable"))
  )

yardstick::roc_auc(eval_df, truth, .pred_referable, event_level = "first")

yardstick::roc_curve(eval_df, truth, .pred_referable, event_level = "first") |>
  autoplot() +
  labs(title = "ROC curve: referable DR (test set)")

A model with two outputs gives us a whole curve, so we can pick operating points. Gulshan et al. reported one tuned for high specificity and one for high sensitivity.

Q10 (2 marks): Define sensitivity and specificity and explain how there is usual a trade-off between them. Which is more important in clinical screening program? Approximated using this ROC curve (dark line in plot above - just eyeball the value using the plot), how much specificity would you lose if you wanted perfect sensitivity? Similarly, how bad would the sensitivity be with perfect specificity?

8 Interpretability and Clinical Translation

Good performance metrics are necessary but nowhere near sufficient for clinical use.

Interpretability. Arcadu et al. did not stop at a number — they produced attribution maps (via guided back-propagation) showing where each network looked, and used SHAP to see which retinal fields drove the random-forest aggregation. Reassuringly, the maps lit up on microaneurysms, haemorrhages and exudates - the same lesions clinicians use.
Unfortunately, these sort of analysis methods proved a little too challenging to get working in a robust cross-platform way for this lab!

Q11 (1 marks): Why is an attribution/saliency map more convincing evidence than test accuracy/sensitivity/specificity alone when arguing a model is clinically trustworthy? Give one way a model could achieve high performance metrics for the wrong reason that an attribution map might expose

Q12 (1 marks): Gulshan et al. validated on two datasets, including the publicly available Messidor-2 (a different country, cameras and protocol). Why is this kind of external validation so much stronger evidence than a held-out split of the same data?

9 Reproducibility

sessionInfo()

## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas.so.3.12.0 
## LAPACK: /usr/lib/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/Halifax
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] yardstick_1.4.0   gridExtra_2.3     magick_2.9.1      fs_2.1.0         
##  [5] here_1.0.2        lubridate_1.9.5   forcats_1.0.1     stringr_1.6.0    
##  [9] dplyr_1.2.1       purrr_1.2.2       readr_2.2.0       tidyr_1.3.2      
## [13] tibble_3.3.1      ggplot2_4.0.3     tidyverse_2.0.0   luz_0.5.2        
## [17] torchvision_0.9.0 torch_0.17.0     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1    timeDate_4052.112   farver_2.1.2       
##  [4] S7_0.2.2            fastmap_1.2.0       digest_0.6.39      
##  [7] rpart_4.1.27        timechange_0.4.0    lifecycle_1.0.5    
## [10] survival_3.8-6      processx_3.9.0      magrittr_2.0.5     
## [13] compiler_4.6.0      rlang_1.2.0         sass_0.4.10        
## [16] progress_1.2.3      tools_4.6.0         yaml_2.3.12        
## [19] data.table_1.18.4   knitr_1.51          prettyunits_1.2.0  
## [22] labeling_0.4.3      bit_4.6.0           DiceDesign_1.10    
## [25] RColorBrewer_1.1-3  parsnip_1.6.0       withr_3.0.2        
## [28] workflows_1.3.0     nnet_7.3-20         grid_4.6.0         
## [31] tune_2.1.0          future_1.70.0       globals_0.19.1     
## [34] scales_1.4.0        MASS_7.3-65         zeallot_0.2.0      
## [37] cli_3.6.6           rmarkdown_2.31      crayon_1.5.3       
## [40] generics_0.1.4      future.apply_1.20.2 rstudioapi_0.18.0  
## [43] tzdb_0.5.0          cachem_1.1.0        splines_4.6.0      
## [46] dials_1.4.3         parallel_4.6.0      coro_1.1.0         
## [49] vctrs_0.7.3         hardhat_1.4.3       Matrix_1.7-5       
## [52] jsonlite_2.0.0      callr_3.7.6         hms_1.1.4          
## [55] bit64_4.8.0         listenv_0.10.1      jpeg_0.1-11        
## [58] gower_1.0.2         jquerylib_0.1.4     recipes_1.3.2      
## [61] parallelly_1.47.0   glue_1.8.1          codetools_0.2-20   
## [64] ps_1.9.3            rsample_1.3.2       stringi_1.8.7      
## [67] gtable_0.3.6        furrr_0.4.0         pillar_1.11.1      
## [70] rappdirs_0.3.4      htmltools_0.5.9     ipred_0.9-15       
## [73] lava_1.9.1          R6_2.6.1            rprojroot_2.1.1    
## [76] vroom_1.7.1         evaluate_1.0.5      lattice_0.22-9     
## [79] bslib_0.10.0        class_7.3-23        Rcpp_1.1.1-1.1     
## [82] prodlim_2026.03.11  xfun_0.57           pkgconfig_2.0.3

Practical 3 - Medical Imaging: Detecting Diabetic Retinopathy with CNNs

REPLACE WITH YOUR NAME

2026-06-09

0 Setup

1 Introduction

2 The Data

2.1 Download and cache

2.2 Load the labels

2.3 Class balance

2.4 Look at the images

3 Pre-processing

3.1 From image file to tensor

3.2 A `torch` dataset and dataloaders

4 A CNN from Scratch

Why convolutions?

Why pooling?

5 Data Augmentation

6 Transfer Learning

7 Evaluating Like a Clinician

8 Interpretability and Clinical Translation

9 Reproducibility

Practical 3 - Medical Imaging: Detecting Diabetic Retinopathy with CNNs

REPLACE WITH YOUR NAME

2026-06-09

0 Setup

1 Introduction

2 The Data

2.1 Download and cache

2.2 Load the labels

2.3 Class balance

2.4 Look at the images

3 Pre-processing

3.1 From image file to tensor

3.2 A torch dataset and dataloaders

4 A CNN from Scratch

Why convolutions?

Why pooling?

5 Data Augmentation

6 Transfer Learning

7 Evaluating Like a Clinician

8 Interpretability and Clinical Translation

9 Reproducibility

3.2 A `torch` dataset and dataloaders