Sequential BATTing Algorithm
(Bootstrapping and Aggregating of Thresholds from Trees)
Stat Med. 2017;36(9):1414–1428.
doi:10.1002/sim.7236
— SubgrpID R Package
PHASE 1: INPUT & SETUP
Input Data
- Predictors (biomarkers)
- Response variable
- Treatment indicator
- Censoring info (survival)
Parameters
- n.boot (default = 50)
- min.sigp.prcnt (default = 20%)
- type: c / s / b
- des.res: larger / smaller
Pre-filtering
(Optional)
Univariate / glmnet / CART
Reduce predictor dimensionality
Legend
Input / Output Data
Pre-processing
Sequential Loop Steps
Bootstrap Steps (B1–B6)
Aggregation
Decision / Stopping
Final Output
Loop Back
PHASE 2: SEQUENTIAL ITERATION LOOP
Initialize
Empty Signature (k = 1)
Evaluate All Remaining
Predictor Variables
BOOTSTRAP THRESHOLD DISCOVERY (repeated n.boot = 50 times per variable)
B1
Draw Bootstrap
Sample from
Training Data
B2
Fit Statistical Model
Cox (survival)
GLM (binary) / Linear (cont.)
B3
Extract Direction
from Interaction /
Main Effect Coefficients
B4
Generate Cutoff
Candidates
(5th–95th pctl, step = 5%)
B5
Score Each Cutoff
via p-values
(interaction or main effect)
B6
Select Best Cutoff
(lowest p-value)
for this Bootstrap
Repeat n.boot times
Aggregate Cutoffs
Compute MEDIAN threshold
across all bootstrap samples
Select Variable with
Most Significant
Relationship (lowest p)
Add New Rule to Signature
(variable, direction, threshold)
Stopping
Criteria Met?
Stopping Criteria:
1. LR test p-value > 0.05
2. Sig+ pop < min.sigp.prcnt (20%)
No
(k = k+1)
Yes
PHASE 3: EVALUATION & OUTPUT
Final Biomarker Signature
Matrix of rules: [variable | direction (< or >) | threshold | log-LL]
Predictive: Treatment x Subgroup interaction p-values
Prognostic: Main effect significance
Training Set
Evaluation
Apply signature
Compute subgroup stats
Nested Cross-Validation
Assess model
stability and
robustness
Test Set
Evaluation
Apply signature
to held-out data
Interaction Plot
Treatment vs Control
across subgroups
Results: p-values, subgroup ratios, group metrics, CSV/RData output