Understanding Cuisines Using a New Dataset from Yummly
What makes a recipe Italian? Thai? Our content ingestion pipeline uses machine learning methods to determine a recipe's cuisine, which facilitates search and personalization.
Want to try out your own algorithms? Now you can! Yummly provided a dataset for Kaggle playground competition to predict the cuisine of a recipe given its ingredients. Whose cuisine classifier will reign supreme?
Let's take a brief look at the dataset. A cuisine can often be identified by its distinctive ingredients. The ingredients most associated with each cuisine (using normalized pointwise mutual information) in the training set are:
Piece of cake, right? Recipes with different cuisines often use very similar ingredients. To illustrate this, we used t-SNE (details below) to visualize the training set for the competition in 2D. Each point represents a recipe. Click a cuisine to hide or show its data.
Asian recipes appear together in the upper left part of the plot, and there are clear Indian, Japanese, Mexican, and Cajun clusters, among others. Many other cuisines, however, are highly overlapping, which makes classification more challenging. For example, the center contains a mixture of European (French, Italian, British, Irish, Spanish) and Southern US cuisine.
At Yummly we augment these simple ingredient representations with additional features to improve classification: relationships among ingredients, details of the preparation, and the recipe name, for example. But we also solve a more challenging version of this problem — most recipes are not associated with a cuisine, and some fusion recipes have multiple cuisines.
Hungry for more? Join our growing team.
Gregory Druck Head of Research
Greg develops algorithms for extracting, structuring, searching, and recommending food-related content. He also analyzes behavioral data to understand the food world.