Using machine learning to personalize CRISPR-Cas9 applications

Wilson L1, O'Brien A1, Reti D1, Horlbacher M1, Dunne R2 and Bauer D1

  1. CSIRO, Sydney, NSW, Australia.
  2. CSIRO, Data61, NSW, Australia.

Numerous studies have sought to build machine learning models that predict general CRISPR-Cas9 activity and while great progress has been made, these approaches are still limited. Small sequence variations can have a dramatic effect on the CRISPR-Cas9 system, leading to changes in on-target activity or increases the number of off-targets. Despite this risk, current tools are not accounting for genetic variation among a population. To address this, we developed VARiant-aware detection and SCoring of Off-Targets (VARSCOT), which allows researchers to design personalized CRISPR-Cas9 applications for specific individuals or populations. VARSCOT is able to use variant information to identify CRISPR-Cas9 target sites unique to a specific individual or population. We find our tool to be the most sensitive detection method for off-targets, finding 40% to 70% more experimentally verified off-targets compared to other popular software tools. VARSCOT uses a machine learning model to score off-target activity, leading to a 98% reduction in false positives when predicting which off-targets are active. As off-target activity varies with CRISPR-Cas9 concentration, VARSCOT’s model provides a probabilistic scores that accounts for different conditions.