What this shows: for every setup that has ever produced an FF (original Fail + recovery Fail), the next N filled outcomes for that same setup are aggregated.
Each FF event is treated independently. NoFill / Expired rows roll forward (the model didn't fire, so no trade).
Score = (Post-FF WR − Baseline WR) × √Occurrences — penalises tiny samples.