All those away goal graphics are misleading you. [Updated]

They also told me that you look like you got fat.

Updated November 5, 7:57pm

Keeping track of all the possible outcomes of a 2-leg playoff series can be cumbersome. Luckily over the past week a series of visualizations have popped up across the soccer internet to help us all out.

However, you are being misled. Very lightly and subtly, but misled nonetheless. The trouble is that these are great reference tables, but bad data visualizations -- and the key inclusion of color is ensuring that your eyes are interpreting these graphics as both. 

By coloring in each entry according to the winning team, the grid moves beyond being a dry reference table and takes on a new life as a kind of area map. We’re not just looking at this to see how many goals Dallas needs to score if Seattle scores one, we’re looking at that huge wave of rave green and that tiny dot of red and thinking “wow Dallas are screwed.”

Which, as it happens, they are. But yet these graphics are still being sloppy in communicating how much -- especially for the other closer games.  More space is communicating better chances, so the shape and size of each entry in the grid matters. There are two ways in which I think the existing graphics fall short on this front:

  1. uneven axes and rectangular grids

  2. over representing low frequency results

Ultimately, adjusting for both of these aspects would leave us with something more like this:

 

Numbers in each grid represent % of occurrence of given score in select European leagues from 2005-2010

Numbers in each grid represent % of occurrence of given score in select European leagues from 2005-2010

 Uneven axes and rectangular grids

These things are made of rectangles. This is probably just because people didn't fudge with the excel defaults, and didn't mind having the extra padding for text. But the result is that one team's axis is significantly shorter than the other. Now this is not the worst sin, as ultimately the area of each team's rectangles are the same. But our eyes interpret  height and width differently. There's no reason to not try to control for this uncertainty by making them all squares, so that one team doesn’t command more of our attention from unconscious bias.

Additionally, the original graphics posted featured a 5x6 grid. I'm not sure why, but the result is that the home teams get an entire additional row of real estate -- and sometimes low information real estate at that (but more on that later). Now in this case it worked out so that it was the y-axis that received this extra row, which somewhat compensates for the "wider" x-axis. Maybe this was an intentional evening-out of sorts, I don't know. But either way it's imprecise and presents an uneven universe of possibilities, and there's no reason to not make the entire graphic a square.

Over representing low frequency results

Even though all the grids and axes are evened out, we’re still over-representing certain data. Specifically, we are over-representing very high scoring games and blowouts. Many more games end 0-0 or 1-1 than 4-0 or 4-4, but the graphic is representing these score lines with the same visual weight. For reference, here is a distribution of scores in a similar grid:

Consider that while the 5x5 grid (goals from 0-4) represents 96.7% of all scenarios, a 4x4 grid (goals from 0-3) still represents 89.5% of all scenarios. That final row and column represent just 7.2% of all actual outcomes, yet they are taking up 36% of the space. That’s a disproportionate amount of area representing pretty unlikely events.

By shading in each entry corresponding to its frequency, we can avoid some of these pitfalls. In the straight color version, Montreal occupies 72% of the graphic even though only 64.3% of the time they get a score that sees them advance. The gradient helps adjust for this.

When I raised this stink on Twitter, the creator of the original graphics somewhat agreed, but offered that due to the unusual scoring incentives of the playoffs, he felt uncomfortable making score line distribution too central. (He also provided the very handy graphic of score line distributions which I have used to fill out these tables.) While it’s not ideal to be applying regular season data to the playoffs, I think it is certainly better than nothing. A 4-0 result is still rare and remarkable in the playoffs, and a 4-4 result especially so. And the playoff incentives are more likely to distort the distribution at the lower scores, as teams might bunker more than they would in a regular season game (as you would expect of Montreal on Sunday). I’m more interested in diluting the outer fringes of the board, where currently a score line with 0.1% frequency is occupying 4% of the space. Even if that 0.1% figure is off by an order of magnitude, it’s still a worthy change.

Also worth considering is how much information we’re really getting from some of these 4-0 squares anyway. The four- goal row is most useful when it’s telling us that the Red Bulls need to score four when Montreal scores just two. And Montreal scoring two is not a crazy scenario, it happens 19.3% of the time. But stepping back into a little bit of common sense tells us that when it gets to some of these more extreme score lines, it’s not very informative at all. Obviously the Red Bulls cannot afford to give up four goals, and obviously they cannot tie 4-4, as they are already behind. You know this without referencing the table or away goals. This part of the grid is only here as the extension of the more interesting part of the array, but we can de-emphasize it appropriately.

 

These 4-0 squares are kind of like if FiveThirtyEight somewhat prominently displayed one of their 10,000 simulations in which Hillary Clinton wins every single electoral vote. Sure it could happen, but it’s very unlikely, and in any case very obvious. The Red Bulls cannot afford to lose 0-4, just like Donald Trump cannot afford to lose every state. Now of course your brain know this, but your eye doesn't; at a glance, it first picks up that this square takes up the same amount of space as the most likely score, a 1-1 draw.

If this were a more robust, respectable, and generally better publication, we might have a nuanced model a la FiveThirtyEight to generate a distribution of likely scores taking into account match-up specific ELO rankings rather than imported frequencies from other competitions. Alas, this is a pedantic rant conducted over a lunch hour so this is what you get. But we can at least avoid representations that we know to be bluntly misleading (and indeed, imperfect and low information analytical efforts can be better than doing nothing at all).

UPDATE: I've done the graphics for the other three match-ups.

The Toronto - NYCFC graphic was actually fairly close in its original form. The original graphic was overstating Toronto's chances by only 4%, and the gradient may be a little harsh to NYCFC's primary color of light blue. Even at full opacity, it might look weak compared to Toronto's strong red, and at the bottom of the spectrum, it barely registers at all, even though the 4-0 and 4-1 squares are half of their entire entries. 

LA - Colorado is, as you'd expect, the exact same thing as RBNY - Montreal. The original graphic is over-representing both teams' chances, but the gradient redirects our attention back to the more meaningful part of the graphic.

For Seattle - Dallas, the original graphic was actually understating how screwed FC Dallas are. Even though Seattle is being given all that low information, but eye-catching 4 goal territory, the frequencies of the score lines Dallas's needs are so low that they were still getting the better of the graphic. I've reintroduced the 5 goal row for Dallas here, as I think it effectively communicates the daunting task ahead of them, whereas this row was just not as relevant for the other match-ups. Ideally I would have been able to throw in a 5 goal column for the away team as well, but alas the data was not available.

***

comments, questions, and general snark can reach the author at brit.byrd@gmail.com

Targeted Points vs. Actual Points [Updated]

On this page, you will find our best-case-scenario points projection (from Episode 019) along with an automatically updated sheet and chart on how we're doing for the season in relation to our targets.

An important distinction: the results in the "target" column are not predicted or expected results. As discussed in Episode 019, predicting 7 wins out of 12 and only one loss for the remainder of the season would be quite bold. Rather, they are targets that would deliver us 56 points for the whole season, and a solid chance at a first round bye in the playoffs. They can be considered as a "best case scenario" of sorts that, while very optimistic, acknowledges we're not going to win out.

RBNY vs. Orlando Predictions

Hey you, we see you. You're on the PATH train, it just came out of the underground at Journal Square, and you've got about ten minutes til Harrison and don't know what to do with your newly resuscitated data service. 

How about read our predictions?

Peaches

Orlando are not in a good place right now, with Adrian Heath recently parted and Kaka on a calf injury. I don't expect Larin to be able to carry Orlando to a W at RBA.  With Perrinelle not fully fit, and Zizzo seemingly not able to last a full 90, I expect to see Duvall and Collin return to the back 4 tonight. Veron looked dangerous against Portland, even if nobody could finish, but I expect us to be a bit sharper tonight.

2-0 RBNY.

Lineup: 4-2-2-2

 

Brit

Orlando is a team in turmoil, and we seem to be finding our legs. Now, this game isn't literally a "must win," in that it is still July and in the case that we don't, there's still plenty of chance that things can end up okay. Not great, but okay. But Dax is still right to say that "anything less than 3 points is unacceptable." This is a kind of game where elite (or even just good) teams prove themselves by relying on their individual quality and team organization, even if the ball hasn't been bouncing their way as of late.

I'm predicting a win, and I will admit this is not 100% anchored in reason (insofar as any of these predictions are). It's based on the premise that we are a top 3 team, the likes of which resoundingly beat FC Dallas and Toronto at home. If we don't bag all 3 points tonight, I will settle more comfortably into the frame of mind that we're somewhere between 3-6 in the East, scrapping together points with an eye toward a good playoff run.

2-0 RBNY.

Lineup: 4-2-2-2

 

Sam

With Heath gone and Kaka injured, this could be the game to put wind back in our sails. Major concern right now is the back line. Highly anticipating DP/Collin in the back, although Duvall will suffice if need be. Key matchup is Shea/Zizzo, due entirely to Zizzo's mediocre run of form. I expect the offense to click more and for Veron to play well in the 4-2-2-2.

2-1 RBNY.

Lineup: 4-2-2-2

RBNY at RSL Predictions

Due to this week's crazy schedule congestion, we will be recording after Wednesday's game in Utah, and including it with discussion of the Seattle game, the US Open Cup, as well as the latest USMNT action.

Before we kick off, we've supplemented our usual Twitter-prediction posts with some short blurbs and preferred starting 11s:


Sam: RSL are without Beckerman, and looked unconvincing against NYCFC and Portland. With two fully rested center backs, RBNY should take this one 2-1.


Peaches: It's going to be a tough time on the road, and we might be without Grelladinho. I think that RSL has enough offensive power to put 2 past us, and I think we'll rally back for a hard-earned road point. It seems like we're going to have some rotation (and dealing with nagging injuries to the likes of Zubar), and we may even see the likes of young guns like Derrick Etienne, Jr. Draw, 2-2.


Brit: Marsch's gamble to rest Collin and Baah on Sunday paid off -- so far. Hopefully their fresh legs see us through another solid defensive game, if not a clean sheet. Also don't see the team losing its eye for goal quite yet. A big Q is if Jesse rotates the midfield, and how someone like Davis might perform. Draw, 1-1.