Comparing Models in python

At work, I’ve been building predictive models. As I began to iterate on my models (Linear, DNN, Wide and Deep), I discovered that I needed a framework for comparing the models. After doing some reading, I landed on doing k-fold Cross Validation and then comparing the Mean Squared Error distributions of the test sets of the 2 models using the t-test for statistical significance.

Here are some useful snipbits:

Calculating mean squared error with numpy

mean_squared_error = ((A - B) ** 2).mean(axis=0)

A convenience wrapper around Scikit-learn’s KFold.split

from sklearn.model_selection import KFold

def split(pandas_dataframe, n_splits=10):
    k_fold = KFold(n_splits=n_splits)
    for train_indices, test_indices in k_fold.split(pandas_dataframe):
        print("train sz {}, test_sz {}".format(len(train_indices), len(test_indices)))
        yield pandas_dataframe.iloc[train_indices], pandas_dataframe.iloc[test_indices]

The t-test for comparing the 2 sets of mean squared errors you get from your 2 k-fold comparisons

from scipy import stats

def print_stat_sig(old_mses, new_mses, old_total_mse=None, new_total_mse=None, label_mean=None):
    statistic, pvalue = stats.ttest_rel(old_mses, new_mses)
    print(statistic, pvalue)
    if pvalue < 0.01:
        # Small p-values are associated with large t-statistics.
        print('Significant: reject null hypothesis, i.e. there is a statistically significant difference')
        print('old mse mean {:.3E}, new mse mean {:.3E}'.format(np.mean(old_mses), np.mean(new_mses)))
        if label_mean:
            print_diff_and_percent_diff("{:d}-fold ".format(len(old_mses)), label_mean, math.sqrt(np.mean(old_mses)), math.sqrt(np.mean(new_mses)))
            if old_total_mse:
                print_diff_and_percent_diff("total", label_mean, math.sqrt(old_total_mse), math.sqrt(new_total_mse))
    else:
        print('Not Significant: null hypothesis cannot be rejected, i.e. these 2 sets of values may have come from the same place')

 

Hadoop finally hits home

Big Data has finally produced results for my team at work. Other parts of the business have been working on transforming our company wide data lake into a structured, post business rule cache in Hadoop. Our team is now able to do terabyte scale transformations on this dataset in hours instead of weeks. We’re also able to delete code and reimagine processes. It’s been a long time in coming but we’re finally there. Thanks to the open source community for delivering the goods: Hadoop, Cassandra, and MongoDB.

 

A meme is a picture with words on it?

ES: Our quality of life depends on recognizing we are agents of cultural adaptation. What we say and do matters. Looks can be deceiving, we are stronger than we think.

I recently overheard a coworker say that “A meme is a picture with words on it.” Understated. Every idea we have, word we speak, and action we perform has the potential to go viral just as this media format has. A meme loosely refers concepts that live in the mind or behaviors  that spread for whatever reason.

The word meme was patterned after gene, the basic unit of inheritance on the biological level. I guess changing the ‘g’ to a ‘m’ signaled that this was about the mind. Ask Richard Dawkins who coined the term. Genes and memes are similar but there is one huge important difference that is at the heart of my mission here.

Genes and memes spread: Some ideas spread with wild success. Malcolm Gladwell, in his book the Tipping Point, documents this. Chip and Dan Heath offer a framework for summarizing what makes ideas memorable in their book Made To Stick. As a member of a 7 billion strong human population, our genes must match our environment.

Genes and memes affect our behavior. Genes are our nature. Memes are our nurture. There is no sense arguing over which one has more of an effect on our behavior. They both do. My point is that we have conscious control over one: Memes.

Self-reflection and self-direction, consciousness and conscientiousness, mindfulness and right action: these are the stuff of memes. More than just funny pictures, they make this world a prison or paradise.

 

Getting under the skin

In this blog, I hope to open the book of my opinions to the world. I have always been opinionated and righteous but the time has come for me to be public about it. Join me in sharing what gets under our skin and how we can do better. Lets level up on our cultural adaptation.

While I aim to be entertaining and thought provoking, my ultimate goal is positive social change. Perhaps you will broaden your view by seeing mine. If my ideas don’t square with your experiences, let me know. I’m making them public to be vetted, albeit politely and constructively.