Cupping and Context – New Tools for Coffee Evaluation

When the “Coffee Flavor Wheel” and the “Coffee Lexicon” was brand new!

(March19, 2010) This week marked the release of two new resources for describing coffee flavor. In coordination, the SCAA and World Coffee Research released a Coffee Lexicon, a reference guide to 110 defined coffee flavor terms, and a somewhat matching update to the Coffee Flavor Wheel. While the new Flavor Wheel is a bit of the same old stuff, the Lexicon is really a different beast, and sadly you have to read all this junk below until I explain why I think so …

Also, good news: both are available free online and here are the links: For the Flavor Wheel it seems you must right-click or command-click on the image in the blog post. For the Lexicon, there is a link on this WCR page.

Background of the Flavor Wheelie

To really understand these tools, they need to be placed in the context of their historical or alternative versions, as well as the discussion of what they are not.

The SCAA Flavor Wheel was published about 20 years ago, based on version for other beverages and foods. It corresponded to terms in the Coffee Cupping Handbook by Ted Lingle. Both seemed to open up the mysteries of coffee tasting for me, and both seemed to describe a clear and systematic approach that could be reproduced by anyone who took the time to study them.

But for different reasons, each of these documents wore out their welcome at some point. For me that was around 10 years ago. The Cupping Basics book started to seem like an overly dry and rigid codification of some kind of foreign ritual. I remembered the battered hand-me-down textbooks issued in Jr. High where “The World Around You” was, at best, describing the world around someone 10 years older than you. In 1980, the Apollo Missions and Nixon are not  cutting-edge.

The flavor wheel was dated by its excessive focus on tainted flavors, mostly from bad processing, with an entire separate “wheel” of its own. And on the “positive” attribute side, there was a confusing and artificial split between “Taste” and “Aroma”. I have a black and white image of it here. There are similar criticisms of the wine wheel, developed at UC Davis in the 1980s, in that case an emphasis on taste defects that are no longer prevalent in the current, more advanced world of small-batch wine processing.

Coffee is not wine, and a vocabulary for defects is still needed. Even the most elite farms with highest levels of control in picking and processing produce bad coffee. For example, a defect vocabulary is needed to speak to the grassy/herbaceous/astringent flavors found from under-ripe coffee cherries. And no matter how selective a farm may be, these unripes are accidentally taken by pickers along with the red cherry.

But the old iteration of the SCAA wheel describes taints that belong more to the bulk commercial trade in coffee, where selective picking is not the norm, where coffees rejected by machines in the wet mill and the dry mill for top grade coffees are included in these shipments. The tolerances of permissible number of defects in green coffee<coffee grading for these categories is much looser. Yet these types of coffees, and these types of defects, are simply not the focus of the “specialty” coffee purveyor, they just aren’t germain to the tasting experience.

The Counter Culture Adaptation

Ever the innovators, Tim Hill and the other cuppers at Counter Culture Coffee (CCC) coffee developed their own version of the Flavor Wheel for their internal use, and shared it with all. This wheel was an expression of the language they used most around the cupping table. In this sense it bridges the basics of the established Coffee Flavor Wheel with a more personalized vocabulary, and focuses more intently on positive descriptors found in better qualities of coffee. It can be found for free download on the Counter Culture site.

(Sadly, it seems the chaos of the Sweet Maria’s Pizza Wheel made it rather unpopular with serious-minded coffee people. Lack of coherent instuctions on how to use the Pizza Wheel might have undermined it, and lack of descriptors, and any referencs to coffee …or perhaps the fact it is just a picture of a pizza painted on a wall. We’ll try harder next time.)

While the CCC wheel has some issues with legibility (the background colors have a brush patina to them), it is clearly much more expansive than the newly-released version of the SCAA/WCR wheel. They have a consumer-oriented version that includes only the “positive-focused” wheel, and the “lab-version” that includes a separate wheel of negative taint terms. Tim noted that “some vegetal and woody descriptors are on the positive side” because they are found in some varieties or processes, and might indicate a lesser coffee but do not necessarily detract from the cup.

As for the new version of the SCAA/WCR wheel, gone are the arbitrary Taste and Aroma base divisions, and there are 3 levels of specificity to define each of the outer, most-refined terms. So Malt is grouped under Cereal, which belongs to the base category Roasted, and Orange belongs to Citrus fruit which is based on Fruity. Some descriptors effectively have 2 levels of specificity, such as Rose filed under Floral, which belongs to … uh, Floral again. I think it was a way to bring graphic symmetry to the chart.

CCC did away with much of the hierarchy on their wheel. They have one level of grouping, except for the meta category Fruit. While it is clear the CC wheel has many more terms, it’s odd that 2 of the 3 cited above are absent. There are 7 terms under Floral on the CCC wheel, but no Rose (there is Rose Hips but that’s a different thing in my opinion). And there is no Malt, under Grain & Cereal nor under Sweet & Sugary. That seems fine since the CCC wheel is only expressive of terms they use. (I checked with Tim, and he noted 3 terms they accidentally omitted that will be in the next version: Blackberry, Malt, and Cardamom).

Limits of Experience

But this brings up an interesting conundrum that pervades in sensory analysis systems. If the most relevant expression of taste is the most experiential, it is also most authentic in the domain of the personal, and perhaps less universal in that way. It becomes a tool built for one’s own world, and possibly less useful when transferred outside of that realm of experience. Consider the borderline coffee descriptors, the vegetal or woody for example, that can be part of the positive or the negative side of the wheel, or perhaps that one cupper, one company, or even one culture might argue that a particular term is positive, while another would say it is negative. The CCC wheel is beautiful because it never claims to be anything other than a tool created for their own use, shared with the public. What of a Flavor Wheel issued by a coffee association though?

In contrast to the CCC wheel (let’s not bring up the Pizza Wheel again), the origin of the Wine Flavor Wheel was more broad and ambitious:  “The {Wine} Aroma Wheel was developed at the University of California at Davis in the early Eighties as a standard used to describe wine in uniform, non-judgmental terms”. Yet it seems that even the suggestion of taste and aroma qualities in a graphic form is a tool used in the process to judge. It’s impossible to completely decontextualize the nature of what one is testing. If it is wine or coffee, it’s probably understood that a repugnant term like “Sewer Gas” would be the basis for a negative judgement. (As far as I know, Sewer Gas does not appear on any Flavor Wheels).

To create a truly empirical tool, it must be abstracted as much as possible from the judgement of quality, and from the personal assessment of qualities.

So it’s important to note that a non-judgemental sensory analysis system is, at its base, at odds with most every instance of cupping in the coffee trade, or by consumers. In these cases, from the more experienced buyer with an extensive history of tasting, to the more casual coffee drinker, the entire point is to leverage an experience of tasting to decide the goodness of the beverage. This goodness might simply be to say that the cup in front of you is delicious. Or it might be to make a judgment whether to buy a green coffee lot, or what that green coffee lot is worth. In fact, within the whole basket of tastings that are qualitative are many heterogenous activities. A cupping is not a singular thing. There are many “cup tests” performed for many reasons, just as there might be many English Tests performed at many levels, in many ways, and evaluate many different aspects of a language.

The Bias of Context

Here in our own tasting room, we try to be clear on exactly what test we are performing, what we are looking for. Even in qualitative analysis, with the same method, the idea of what is “good” in a coffee presents us with a moving target. In that way, we could be endlessly specific. We could have a separate flavor wheel for each origin, or a separate language for each coffee process. Pulp-natural, wet-hulled, dry-process, and wet-process coffees can be completely different beasts, and what is “good” in one evaluation would be a bad defect in another, such as the presence of a Sumatra-like foresty/earthy note in a wet-processed Ethiopia coffee. Not good.

It’s also important, within limits, to admit bias on a cost or need basis. I am sure if I sat down at a “blind” cupping with coffee buyers and said, “all these coffees are available for purchase, and they start at $50 per pound,” that the result would be extremely conservative scores. Or if a buyer just landed a container of Colombias that will last 6 months, how will they score a new set of sample offers? An excellent book, Wine Tasting, A Professional Handbook, outlines these and many more categories of structural and psychological interferences with sensory testing.

This brings us to the newly released WCR Coffee Lexicon. For me, the Lexicon pushes the dialogue on coffee flavor in a new direction. First of all, it is a limited set of terms, 110 in all, with a stronger emphasis on core coffee flavor terms.

What does that mean? For me, there has been an intensification of focus, exponential in the last 5 years or so, on the most extrinsic, often far-flung aroma/flavor terms that might be applied to coffee. Part of this comes from that search for a personal and authentic connection to the coffee-flavor experience that each new taster and roaster desires. It leads us to the ubiquitous 3 term, comma-separated description on a roasted retail bag of coffee, describing exotic fruits or super-specific varieties, references to retro candy brands, flowers of all types, etc, etc. We too have used many of these terms, but at its worst it represents a competition to out-perform another brand that nears parody.

The problem, aside from testing the credibility we have with clients and customers, is that it’s nearly impossible to build a clear consensus among a set of coffee cuppers. When I used to participate on judging panels for Cup of Excellence, this became increasingly frustrating. There was an effort to contain the language. “This coffee is a red Porsche” and such was subject to criticism, as fanciful analogies like this are not something directly smelled or tasted … well, I guess you can, but licking a Porsche won’t remind you of a coffee flavor likely.

Yet even when there was a great consensus on the “goodness” of a coffee, and therefore its high value in a CoE auction, the key descriptive characteristics offered by the panel of “experts” was all over the map, and contradictory too. It wasn’t too bad when we could agree on, perhaps, berry-type fruit. But when some found floral as well, and someone else had cane sugar, and someone else molasses, and someone else insisted on melon, the picture of this coffee was just schizo. It wasn’t anyone’s fault, the process seemed frustratingly invalid. The idea to limit the vocabulary of judges or to make them acquiesce to an alternate term, to compromise their sensory experience, seemed like a poor solution.

The problem is in the test, the fanciful “search for the best” is a kind of uber-fantasy of qualitative judgement, of looking for the highest form of goodness. A paper from Kansas State that compared the usefulness of responses from trained coffee tasters to non-coffee-trade (but trained) sensory panelists underscored the problems with qualitative judgments on value: lack of consensus, and therefore an inability to act on cup test results in coffee research programs.

Beyond Personal Judgment?

In order to make an analysis of coffee that is more broad, and more useful in systematic and repeatable tests, the Lexicon is opposed at its core to the valuation of quality in coffee tasting. The question format the Lexicon employs is essentially a checkbox; is a particular attribute present, or is it not present? It’s not about the goodness or desirability of that aroma/flavor at all.

In order to do this, the Lexicon had to be limited to a range of terms that describes as best as possible the full coffee experience, but is brief enough to enforce consensus. And it also needed to replace the figure of the experienced coffee taster who draws on a “flavor memory” to evaluate the quality of coffees, with a self-contained reference sample. The tests the Lexicon is needed for are not ones where “that’s earthy like Sumatras used to be in the ’70s” or “that’s the right floral character for a Yirgacheffe” are valuable responses. So those immeasurable, subjective, personal experiences are replaced by an actual substance the taster has at hand, and can check.

And rather than being the usual sets of seasonal fruits and flowers brought in for “palate improvement” training sessions in cupping labs, these are national brand items widely available from supermarket stores. It’s less classy, it’s nice.

So if you want to read the definition for “light roasted”, as in any dictionary you have a bunch of words to describe that singular term or terms. And if you want to define the smell or taste of “light roasted”, you buy raw, blanched peanuts and you roast them in an oven at 425 f for 7 minutes, and serve it in a 1 ounce plastic cup with a lid for your smelling reference .

If one judge drinks only tea and cola, and has no coffee taste experience, no extensive “flavor memory” to draw on, no clue in the world what “blueberry” in a coffee might be like, you can strain 1 ounce of syrup from canned blueberries into a sniffer and cover. In fact, this opens up some terms to a better global understanding, as in many parts of the world, local coffee tasters have no idea what blueberry taste means, as it is a northern hemisphere item. These specific reference samples make that slightly more feasible.

What the Lexicon is Not

It’s admirable that the PDF document has instructions to use the Lexicon, and it outlines the limits of its use: The Lexicon

  • is not a replacement for cupping or other sensory tools. Specifically, this yes/no , present/not-present approach to attributes is a different test, with different results than qualitative cup testing.
  • is not truly global. While, as in the blueberry example, it can extend some understanding, the WCR admits it is developed in the US market, and can’t reflect the taste references of other cultures.
  • is not finished. It’s a work in progress.
  • is not focused on defect cupping terms. Because the WCR mission is chiefly to develop new coffee varieties that have good value in the market, there is no use focusing on terms to describe tainted coffees, or robusta, for example.

The Lexicon is an odd beast, as is any sensory science systems, that imposes necessary limits to responses as a trade-off for finding “useful” information. In some areas of science, results would seem invalid by restricting responses to an abreviated range.

But in previous coffee variety research, evaluating quality was an afterthought, often limited to the terms “Fair” or “Good” to describe how a set of new hybrids tasted. This WCR Lexicon is quite a better tool.

An Addendum: Is The Lexicon Useful to Coffee Cuppers?

For those in the coffee trade, the Lexicon seems that it could be adapted to our form of cupping, some aspects borrowed, but can it be used as it is intended? My answer is no.

Blind cupping is a quasi-empirical test, but in most instances it is not well defined: What precisely are we testing for? In our lab, we test to screen our interest in buying and sense of value, we test how well the coffee meets our preconceived expectations of what it “should” be, and we test to write reviews to communicate the quality we have found in lots we chose to buy.

Often we conduct these cuppings quite differently, some a quick look at many samples with less rigor, and others with multiple roast levels in-depth, to write our reviews. But in all cases, the reason to cup is because it is the most expedient way to compare coffees side-by-side.

While there is plenty of ritualized fetishism around the practice of cupping, the method reveals nothing about coffee in itself. From its inception in the late 19th century, grinding coffee into a bunch of cups and pouring hot water on them was simply the fastest way to judge many coffees at once. And the fact that the 40 or so “beans” ground into each vessel were not mixed meant that inconsistencies and defects would be intensified from one cup to the next.

Employing the Lexicon as intended is part of a totally different process, and not expedient at all. Whereas we might be able to rate 20 samples in 90 minutes, the system implicit in the Lexicon would look at 2 to 3 samples in that time. In our lab, we actually do keep reference samples for some key descriptors we use (for example, we have many samples of sugars, syrups and such to represent sweetness). But we do not reference them regularly in our cupping day to calibrate on type of sweetness, or intensity. By contrast, employing the use of all the reference samples in the Lexicon would be a huge job, and incredibly slow to calibrate upon.

Yet there are two things we will adopt from the Lexicon. The first is the ingredient set and directions for reference sample preparation. Since I participated a bit in discussion sessions for the Lexicon, I was able to experience some of these references, which are often not as literal as they may seem. For example, a peanut sample to represent “roasted” is asking you to ignore they peanutty aspect and focus on the roasted scent. A sample for “ash” found in a powdered dark cocoa is asking you to ignore the chocolate aspects to find the distinct ash component. There is still interpretation at hand, but calibrated interpretation. After all, to taste blueberry in an Ethiopian coffee, you cannot recreate it by actually adding blueberries to some other coffee. We are still discussing “blueberry-esque” quality.

The second aspect has already been much on my mind, and I hope the Lexicon will underscore. We in coffee (I am very much including myself in this we) have focused too much and for too long on extraneous “top notes” in coffee, while we have failed to form a strong language to describe the core coffee flavors. We have done this because, in our minds, we act as if all coffee, even supermarket coffee, has those core flavors, those base bitter-sweet coffee notes. We think our coffees have to be “more” than that.

So to elevate them to being super-duper special, we have sold people on the presence of candy and flowers in all our coffees, of fresh local fruits, or exotic tropical ones. We have promised all kinds of rainbows and butterflies, but been very clumsy to describe the base flavors that make a coffee really good. Yes, I know, I find many many flavors in coffee too, I get really excited about it all, and I do believe those top notes are there because I have tasted them for many years now (okay, butterflies no).

But under that is the essential coffee base flavor, bitterness that is (in a good coffee) balanced with sweetness. Roasted notes and ash notes largely are the axis for the bittering, and reducing sugars in all the various forms are the sweet. We have a lot of terms for that, but for the bitterness, I feel powerless and mute. That’s where I hope the Lexicon, with its reference samples for these core coffee aromas/flavors, can help us focus more.  -t.o.

PS: Sorry this piece is so long and loopy. It feels like I should have been able to make my point in brief, but I guess I couldn’t. The actual practices of cupping in both the more personally-relevant forms of making value judgment, and the less self-referential form of the Lexicon sensory panel, oriented toward forming consensus, seem to collapse into each other under the weight of a real “scientific” method. Ultimately I don’t think either are, but defiining the method by what actionable result must come from it seems to distinguish them. In CoE juries, consensus for descriptors in the final round always came in the form of the person holding a pad of paper, writing down what terms the jury offered. This must have involved heavy on-the-fly editing, essentially the formation of a consistent, non-conflicting narrative, the “story” of the cup flavor: In the end, poetics came to the rescue of the process. You can email with suggestions about how to be more brief, or where to buy a copy of The Handbook on Arabica Coffee in Taganyika, 1959, or where to get a medium-sized Sprudge “Kittens Cupping” shirt.