Understanding software behavior

Note: cross-posted from my website, where I recently attempted to jot down some brief descriptions of the problem domain I’m currently focused on as well as some tools and techniques I am particularly interested in. Cloud-based software systems should be considered among the most fascinating artifacts of human civilization. They... Read more

Word count is a monoid homomorphism and, who cares?

Why does Map-Reduce work? Counting word frequencies in a collection of documents is the “Hello World” of Hadoop, with good reason. It is a not-too-contrived task whose underlying structure is a natural fit for distributed computation. In this post we focus on better understanding that underlying structure using some tools from abstract algebr... Read more

Notes from BayLearn 2012 - Bay Area Machine Learning Symposium

One of the benefits of working in the San Francisco Bay Area is access to tons of interesting tech talks and other related events. Last week I attended the Bay Area Machine Learning Symposium, which was graciously hosted by Google. Below are some brief notes on some of the talks (any errors or misunderstandings are of course mine). From my per... Read more

Practical machine learning tricks from the KDD 2011 best industry paper "Detecting Adversarial Advertisements in the Wild" by D. Sculley et al

A machine learning research paper tends to present a newly proposed method or algorithm in relative isolation. Problem context, data preparation, and feature engineering are hopefully discussed to the extent required for reader understanding and scientific reproducibility, but are usually not the primary focus. Given the goals and constraints of... Read more