Invited Talk
Veridical Data Science: The Practice of Responsible Data Analysis and Decision-Making
Bin Yu
Moderator : Pradeep Ravikumar
"A.I. is like nuclear energy -- both promising and dangerous" -- Bill Gates, 2019.
Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls :wq:ware ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the "dangers" of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics – bringing us a step forward towards veridical Data Science. We will illustrate the PCS framework in the modeling stage through the development of DeepTune images for characterization of neurons in the difficult V4 area of primary visual cortex.