Comparing Models in python

At work, I’ve been building predictive models. As I began to iterate on my models (Linear, DNN, Wide and Deep), I discovered that I needed a framework for comparing the models. After doing some reading, I landed on doing k-fold Cross Validation and then comparing the Mean Squared Error distributions of the test sets of the 2 models using the t-test for statistical significance.

Here are some useful snipbits:

Calculating mean squared error with numpy

mean_squared_error = ((A - B) ** 2).mean(axis=0)

A convenience wrapper around Scikit-learn’s KFold.split

from sklearn.model_selection import KFold

def split(pandas_dataframe, n_splits=10):
    k_fold = KFold(n_splits=n_splits)
    for train_indices, test_indices in k_fold.split(pandas_dataframe):
        print("train sz {}, test_sz {}".format(len(train_indices), len(test_indices)))
        yield pandas_dataframe.iloc[train_indices], pandas_dataframe.iloc[test_indices]

The t-test for comparing the 2 sets of mean squared errors you get from your 2 k-fold comparisons

from scipy import stats

def print_stat_sig(old_mses, new_mses, old_total_mse=None, new_total_mse=None, label_mean=None):
    statistic, pvalue = stats.ttest_rel(old_mses, new_mses)
    print(statistic, pvalue)
    if pvalue < 0.01:
        # Small p-values are associated with large t-statistics.
        print('Significant: reject null hypothesis, i.e. there is a statistically significant difference')
        print('old mse mean {:.3E}, new mse mean {:.3E}'.format(np.mean(old_mses), np.mean(new_mses)))
        if label_mean:
            print_diff_and_percent_diff("{:d}-fold ".format(len(old_mses)), label_mean, math.sqrt(np.mean(old_mses)), math.sqrt(np.mean(new_mses)))
            if old_total_mse:
                print_diff_and_percent_diff("total", label_mean, math.sqrt(old_total_mse), math.sqrt(new_total_mse))
    else:
        print('Not Significant: null hypothesis cannot be rejected, i.e. these 2 sets of values may have come from the same place')