HRD Review - Biomarker Heatmap

github.com/insilica/bawto...fig3-heatmap.R

In the Homologous Recombination Deficiency (HRD) Biomarker review, biomarkers were extracted from prostate cancer clinical trials. sysrev.com/p/81395 provides an open access repository of every review decision. This vignette shows how to build the heatmap generated for the publication.

Get the data

get_answers(81395) gets raw data from project 81395, list_answers organizes that into a list of tables.

tbls <- rsr::get_answers(81395) |> rsr::list_answers()
#> list(basic=<tbl>,biomarker=<tbl>,...)

The basic table links articles (eg. aid 1781738) to basic label data (logical, string, category). In this project, a basic label describes article trial identifiers (eg NCT01682772):

tbls$basic |> 
  mutate(nct=unlist(`NCT Trial ID`)) |> 
  select(aid,nct)
#>        aid nct        
#> 1 11781750 NCT01682772
#> 2 11781738 NCT02854436

The other named values in get_answer_list are group labels, which are themselves tables. Here, the biomarker table describes which trials evaluated which genes.

tbls$biomarker |> 
  mutate(biomarker.name=unlist(biomarker.name)) |> 
  select(aid, biomarker.name)
#>        aid biomarker.name
#> 1 11781738 BRCA1 
#> 2 11781738 BRCA2 

Extracted tables can be joined by article aid. Here, joined basic/biomarker tables link trials and biomarkers. ggplot2::geom_tile can then quickly visualize which trials use which biomarkers.

join.tb = tbls$basic |> 
  inner_join(tbls$biomarker,by="aid")

# ggplot(join.tb,
#   aes(x=bmkr, y=study, fill=elig)) + 
#   geom_tile() + …

Build the heatmap

A bit more processing of the above tables and ggplot can generate a heatmap. Unfold the code to see how it is done.

joint = tbls$basic |> 
  inner_join(tbls$biomarker,by="aid") |> 
  select(aid, study=short_name, bmkr=biomarker.name, eli=eligibility) |> 
  purrr::modify(~ unlist(.,recursive = T))

top10 = joint |> group_by(bmkr) |> summarize(s = n_distinct(study)) |> 
  slice_max(n=13,order_by=s,with_ties = F)

ptb = joint |> 
  inner_join(top10,by="bmkr") |> 
  mutate(bmkr  = fct_rev(fct_infreq(bmkr)))  |> 
  mutate(study = fct_infreq(study)) |>
  complete(bmkr, study, fill=list(eli="none")) |> 
  mutate(eli = ifelse(eli=="sufficient","measured",eli)) |> 
  mutate(eli = factor(eli,levels=c("none","required_negative","measured"))) |> 
  mutate(eli = fct_recode(eli,
                          exclude          ="required_negative",
                          `measure/include`="measured"))
  
ggplot(ptb,aes(x=study, y=bmkr, fill=eli)) +
  geom_tile(col="white",size=0.5) + 
  scale_fill_manual(values=c("#161616","#4C9605","#CC2C11")) + 
  theme(text = element_text(size=12),
        panel.background = element_blank(),
        axis.text.x = element_text(angle=90,hjust=1), 
        legend.title = element_blank(),
        legend.position = "top") + 
  ylab("") + xlab("")

This is 90% of the way there, visit github.com/insilica/bawto to learn the rest.