BioBricks & ToxRefDB // Insilica.co

The ToxRefDB brick is a sqlite asset that contains mammal toxicity information that can aid in chemical risk assessments. This data is pulled from the ToxRefDB Clowder Archive.

To use the ToxRefDB brick, install biobricks and install the brick:

bash

pip install biobricks
biobricks configure # follow the prompts
biobricks install toxrefdb # installs ~400mb toxredfb.sqlite

The toxrefdb is installed at the brickpath which is created in your configuraiton step. The brick has a single asset:

bash

biobricks assets toxrefdb
# toxrefdb_sqlite: [brickpath]/brick/toxrefdb.sqlite

To load the database in R:

library(biobricks)
library(RSQLite)
toxref_assets <- biobricks::bbassets("toxrefdb")
toxref <- dbConnect(RSQLite::SQLite(), toxref_assets$toxrefdb_sqlite)
RSQLite::dbListTables(toxref)
# [1] "pod" "chemical" "study" "guideline" "endpoint" ...

There are 26 tables in this database, many of which have useful information. For this post we’ll focus on the pod or “point of departure” table and some related tables.

ToxRefDB has a lot of tables, we focus on a subset related to 'point of departure'

In toxicology, point of departure metrics describe a dose where a chemical has some kind of observable or measurable effect. NOAEL, or No Observed Adverse Effect Level, is the dose directly separating no observed adverse effect from an observed adverse effect.

The point of departure table records 58,447 NOAELs on 735 chemicals from 3633 studies. We can collect all that information into a single table:

library(tidyverse)
# get the basic noael data
# a chemical has an adverse effect when pod `dose` < `max_dose_level`
adverse_pod <- tbl(toxref, "pod") |> 
    filter(pod_type=="noael") |>
    mutate(adverse = dose_level < max_dose_level) |>
    select(chemical_id, study_id, adverse) |> 
    collect() 

# get the chemical name
chemical <- tbl(toxref, "chemical") |> 
    rename(chemical_name=preferred_name) |>
    collect()

# get guideline information - each study has a single guideline
study_guideline <- tbl(toxref, "study") |> 
    inner_join(tbl(toxref,"guideline"), by="guideline_id") |> 
    filter(!is.na(guideline_number)) |> # ignore rows w/out guideline
    select(study_id, guideline_number, guideline_name=name) |>
    collect()

# put it all together
pod <- adverse_pod |> 
    inner_join(chemical,by="chemical_id") |> 
    inner_join(study_guideline,by="study_id") |> 
    select(chemical_name, guideline_name, adverse) 

# chemical_name    guideline_name                        adverse
# Diquat dibromide Prenatal Developmental Toxicity Study 1
# Fludioxonil      90-day Oral Toxicity in Rodents       1
# Difenoconazole   90-day Oral Toxicity in Nonrodents    0
# Clomazone        Reproduction and Fertility Effects    0
# Tepraloxydim     Chronic Toxicity                      1

We can do a simple count of the number of hazardous and non hazardous compounds for each guideline. Here we are defining hazardous to be those compounds that have a no observed adverse effect level that is less than the maximum tested dose. In other words, here ‘hazardous’ compounds are those that have an observed effect at some level.

Counts of (non)hazardous compounds for each guideline in the toxrefdb `pod` table