Welcome to The Bump Hunting Project by Patient Rule Induction Method. This website hosts a brief description of the goal of the project and its software PRIMsrc
. It describes why and how you can use the software and provides some general remarks and links about it.
Overview
"Bump Hunting" refers to the procedure of mapping out local regions of the input space (attribute/feature/predictor) where a target function of interest, usually unknown, assumes larger (or smaller) values than its average over the entire space. These sought of extreme values in the target function are also known as local/global extrema.
The picture below illustrates the idea. The sunshine over the mountain range shows how light can uncover peaks, highlands and valleys, just like we want to do for data structures in the target function by "Bump Hunting".
(Bill Wight Photography, Copyright 2015, with permission)
"Bump Hunting" applies to mathematical / statistical problems such as:
- Mode(s) Hunting
- Local/Global Extremum(a) Finding
- Subgroup(s) Identification
- Outlier(s) Detection
- …
PRIMsrc
implements a unified treatment of the "Bump Hunting" task in high-dimensional space. It uses a generic rule-induction algorithm by recursive peelings derived from the Patient Rule Induction Method (PRIM), initially introduced by Fisher & Friedman in 1999 (see Wiki "References"). It generates simple decision rules delineating a region in the multi-dimensional input space, where the target function is unusually larger (or smaller).
Why Use PRIMsrc?
The fact that the method (i) makes minimal assumptions about the data, (ii) gives easily interpretable rules with estimated variance and (iii) can target for any desired responses (being supervised for Survival, Regression and Classification (SRC) settings), makes it highly attractive to the user.
Unlike classical regression, classification and clustering problems, "Bump Hunting" is interested in:
- Understanding and characterizing newly identified sub-groups of samples and homogeneous sub-populations
- Discovering and describing sub-groups of samples and sub-populations with extreme behaviors
- Identifying and predicting future sub-groups of samples and sub-populations with extreme behaviors
- Customizing and/or targeting sub-groups of samples and sub-populations with extreme behaviors
- …
Multiple applications exist in an increasing range of problems spanning from medical, engineering, marketing, business analysis and materials research:
- subgroup finding
- disparity subtyping
- alternative drug/treatment indication (re-purposing)
- personalized medicine (improved accuracy of diagnostication and/or prognostication)
- economical medicine (hot spotting)
- system reliability analysis in engineering
- duration analysis/modeling in economics
- event history analysis in sociology
- financial securities return
- insurance risk assessment
- …
Readme
Visit the software Readme webpage to learn about License, Downloads, Branches, Requirements, Installation and Usage
Wiki
Visit the project Wiki webpage for Roadmap, Documentation ,Examples, Publications, Case Studies, Support and How to Contribute (code and documentation).
Authors/Contributors
Jean-Eudes Dazard, PhD.
Center for Proteomics and Bioinformatics
Case Western Reserve University
Cleveland, Ohio, USA
J. Sunil Rao, PhD.
Division of Biostatistics
Department of Epidemiology and Public Health
The University of Miami
Miami, Florida, USA
Michael LeBlanc, PhD.
Fred Hutchinson Cancer Research Center
Public Health Sciences.
Department of Biostatistics, School of Public Health
The University of Washington
Seattle, Washington, USA
Michael Choe, MD.
Case Western Reserve University
Cleveland, Ohio, USA
Tarn Duong, PhD.
Research scientist
Computer Science Laboratory (LIPN)
University of Paris 13
Paris, France
Acknowledgements
Project funded in part by the National Institute of Health - National Cancer Institute, Grant: R01-CA160593 awarded to J.Sunil Rao/J-E. Dazard (co-PIs). This work was also made possible thanks to the help of Alberto Santana, MBA (Analyst Programmer, CWRU) and the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University. Thanks also to professional photographer Bill Wight CA for the nice illustration picture above.