This study aims to evaluate the techniques used for the validation of default probability (DP) models. By generating simulated stress data, we build ideal conditions to assess the adequacy of the metrics in different stress scenarios. In addition, we empirically analyze the evaluation metrics using the information on 30,686 delisted US public companies as a proxy of default. Using simulated data, we find that entropy based metrics such as measure M are more sensitive to changes in the characteristics of distributions of credit scores. The empirical sub-samples stress test data show that AUROC is the metric most sensitive to changes in market conditions, being followed by measure M. Our results can help risk managers to make rapid decisions regarding the validation of risk models in different scenarios. (C) 2016 Elsevier Inc. All rights reserved.