Learning

Understand advanced NBA stats from the ground up

Tier 1: Core Concepts

The foundational ideas that unlock everything else

Tier 1 — Core Concepts

Per-100 Possession Thinking

The foundation of meaningful basketball statistics

Think of per-100 possession stats like converting currencies to a common denomination. Raw counting stats are distorted by two things: team pace (faster teams create more possessions per game) and minutes played (a starter logging 35 minutes simply has more opportunities than a bench player logging 22). Per-100 possession rates strip away both of these distortions.

The key is that per-100 stats normalize by the player's own possessions played, not the team's total. If a player was on the floor for 75 possessions and recorded 2 steals, their rate is 2.0 / 75 × 100 = 2.67 steals per 100 possessions.

This matters because per-game stats hide both pace and playing time. Consider two players who each average 1.5 steals per game. Player A plays 34 minutes on a 102-pace team — roughly 72 possessions. Player B plays 26 minutes on a 96-pace team — roughly 52 possessions. Player A's rate is 1.5 / 72 * 100 = 2.08 per 100. Player B's rate is 1.5 / 52 * 100 = 2.88 per 100. Same per-game number, but Player B is a far more disruptive defender per opportunity.

This logic extends across the entire box score. Turnover rate (TOV%) measures the percentage of a player's possessions that end in turnovers, not raw turnover counts. Offensive rebound percentage (ORB%) measures the share of available offensive rebounds a player grabs, not the raw total. Usage rate measures the share of team possessions a player "uses" (via a shot, turnover, or free throw trip) while on the court. All of these are possession-based rates, and all of them are more useful than the raw counts.

The practical takeaway is simple: whenever you see a per-game counting stat, ask yourself whether pace and minutes are inflating or deflating the number. Per-100 possession stats are not perfect — they do not account for quality of competition or lineup context — but they remove the two largest sources of noise in raw box score data.

Per-100 rate
Stat per 100 = (Stat / Player Poss) × 100
Player A: 1.5 STL, 34 min, 102-pace team → ~72 poss
(1.5 / 72) × 100 = 2.08 per 100
Player B: 1.5 STL, 26 min, 96-pace team → ~52 poss
(1.5 / 52) × 100 = 2.88 per 100
Key Takeaways
  • Raw counting stats are inflated by both team pace and minutes played — a starter on a fast team racks up more stats by default.
  • Per-100 possession rates normalize by the player's own possessions played, removing both pace and playing time distortion.
  • Most advanced rate stats (TOV%, ORB%, Usage) are already possession-adjusted by design.
  • Always check whether a stat is pace- or minutes-dependent before comparing players across different roles and teams.
Tier 1 — Core Concepts

Relative Stats (rTS%)

Why context matters more than raw numbers

League-average True Shooting percentage has risen steadily over the past two decades, from roughly 52% in the early 2000s to around 58% in recent seasons. This shift is driven by the three-point revolution, rule changes favoring offense, and improved shot selection. The consequence: a raw TS% number means something entirely different depending on when it was recorded.

Relative True Shooting (rTS%) solves this by subtracting the league-average TS% for that season from the player's TS%. The result tells you how far above or below average the player was in their specific context. A player shooting 56% TS in the 2004-05 season, when league average was ~52%, had an rTS% of +4.0 — an elite scorer. That same 56% TS in the 2024-25 season, when league average is ~58%, yields an rTS% of -2.0 — below average.

This distinction matters enormously for historical comparisons. Allen Iverson posted a 54.3% TS in 2005-06 — slightly above the ~53.5% league average that season, giving him an rTS% of +0.8. By modern standards, 54.3% would yield an rTS% of roughly -3.7, making him look like an inefficient scorer. But in his era, he was a tick above average. Meanwhile, a current player at 58% TS might appear more efficient than Iverson ever was, but they are simply average for their era. rTS% captures this context and makes cross-era comparisons meaningful.

The DataBallr stats table and PvP comparison views both show rTS%, and it is one of the most informative single numbers for evaluating scoring efficiency. When you see a player's rTS%, you are seeing their efficiency relative to the competition they actually faced, not an absolute number divorced from context.

rTS% formula
rTS% = Player TS% - League Average TS%
2005 example: 56% TS, league avg 52%
rTS% = 56.0 - 52.0 = +4.0 (elite)
2025 example: 56% TS, league avg 58%
rTS% = 56.0 - 58.0 = -2.0 (below average)
Key Takeaways
  • League-average TS% has risen significantly over time, making raw TS% comparisons across eras unreliable.
  • rTS% = Player TS% minus league average TS% for that season.
  • Positive rTS% means above-average efficiency; negative means below-average — regardless of era.
  • rTS% is one of the best single-number efficiency indicators available.
Tier 1 — Core Concepts

Three Factors (from Four)

The framework that explains almost all of basketball

Dean Oliver's Four Factors of basketball success — effective field goal percentage (eFG%), turnover rate (TOV%), offensive rebound percentage (ORB%), and free throw rate (FTA/FGA) — account for roughly 90-95% of the variance in offensive and defensive efficiency. Everything else in basketball is downstream of these four things. If your team shoots well, takes care of the ball, grabs offensive boards, and gets to the line, you win most of the time.

DataBallr collapses the four factors into three by combining eFG% and free throw rate into a single "shooting efficiency" factor represented by True Shooting percentage. TS% already captures both field goal efficiency and free throw value in a single number, so separating them would be redundant. The result is a cleaner three-factor framework: Shooting (TS%), Turnovers (TOV%), and Rebounding (ORB%/DRB%).

This three-factor model maps directly to the Six-Factor RAPM decomposition used on the ShotQuality page. Six-Factor RAPM breaks a player's total impact into six components: offensive and defensive contributions to each of the three factors. oTS measures a player's impact on team shooting efficiency. oTOV measures their impact on team turnover rate. oREB measures their impact on team offensive rebounding. The defensive counterparts (dTS, dTOV, dREB) measure how much the player helps or hurts the team on the other end.

The practical value of the three-factor framework is prioritization. When evaluating a lineup or a player, start with the three factors. If a lineup has a great net rating but terrible ORB%, you know the rebounding glass is a vulnerability even if the overall numbers look good. If a player has a strong overall RAPM but it is driven entirely by oTOV (they do not turn it over), you have a more specific picture of what they actually contribute.

Key Takeaways
  • Dean Oliver's Four Factors (eFG%, TOV%, ORB%, FT Rate) explain ~90-95% of offensive/defensive variance.
  • DataBallr uses three factors — Shooting (TS%), Turnovers, Rebounding — since TS% already captures both shooting and free throw value.
  • Six-Factor RAPM decomposes player impact into offense and defense for each factor: oTS/dTS, oTOV/dTOV, oREB/dREB.
  • Start with the three factors when diagnosing why a lineup is working or failing.
Tier 1 — Core Concepts

Shot Value & the Midrange Story

Why long twos died and short midrange survived

The core math of shot selection is points per shot (PPS). A two-point field goal made at 50% yields 2 × 0.50 = 1.00 PPS. A three-point field goal made at 33.3% also yields 3 × 0.333 = 1.00 PPS. So a 50% two-pointer and a 33.3% three-pointer produce identical expected value on the make-or-miss level. Since the league-average three-point percentage is roughly 36%, the average three is worth about 1.08 PPS. That means a two-point shot needs to be made at 54% to match the average three — a high bar.

But make-or-miss PPS is not the full picture. When a shot misses, the offense has a chance to grab the offensive rebound and score again. Offensive rebound rates vary dramatically by shot zone: rim misses are recovered at ~39%, short midrange at ~32%, above-the-break threes at ~27%, corner threes at ~28%, and long midrange at just ~24% — the lowest of any zone. This means the true expected value of a rim attempt is higher than raw PPS suggests (misses frequently lead to second chances), while long midrange misses are the least likely to generate additional scoring opportunities.

Long two-point shots — those from 16 to 23 feet, the classic midrange jumper — are where this combined math is most devastating. Most players shoot approximately 40% from that range, yielding just 0.80 PPS. That is 26% less efficient than an average three-pointer on a make-or-miss basis, and long midrange misses produce the fewest offensive rebounds of any zone. When the analytics revolution reached NBA front offices in the early 2010s, this was one of the clearest signals: the long two is a bad shot for most players.

But "most" is not "all." The short midrange (10-16 feet) survived because skilled shot creators shoot 48-52% or higher from there, especially off the dribble — and when they miss, short midrange misses are offensive rebounded at a meaningfully higher rate (~32%) than long twos (~24%). Certain elite midrange shooters — DeMar DeRozan, Kevin Durant, Chris Paul — sustain long midrange percentages well above 50%, clearing the efficiency threshold outright. The analytical case killed the lazy midrange, not the midrange itself. A Durant pull-up 18-footer at 52% is 1.04 PPS, competitive with an average three.

There is also a creation dimension that raw PPS misses. A player who can credibly score from the midrange forces the defense to extend, which opens driving lanes and creates kick-out threes for teammates. Shot quality and shot value are not the same thing — a shot that generates gravity has indirect value beyond its own conversion rate. The complete analytical picture accounts for direct efficiency, offensive rebounding probability on misses, and the ecosystem effects of a player's shot diet.

In the playoffs, where defensive intensity rises and three-point percentages typically drop, the short midrange becomes even more valuable. Half-court offense in tight games often runs through the midrange, because these shots are harder for defenses to take away than threes and less reliant on getting all the way to the rim against set defenses. The midrange is a pressure release valve when the easy shots dry up.

Points per shot: 2-pointers
50% × 2 = 1.00 PPS
Points per shot: 3-pointers
33.3% × 3 = 1.00 PPS (break-even) | 36% × 3 = 1.08 PPS (league avg)
Long two threshold
To match league-avg 3PT: need 54% on 2s (1.08 ÷ 2 = 54%)
OReb% by zone
Rim: 39% | Short Mid: 32% | Long Mid: 24% | Above-Break 3: 27% | Corner 3: 28%
Myth

Stats say the midrange is bad.

Reality

Stats say the average midrange is inefficient — around 0.80 PPS for long twos, with the lowest offensive rebound rate of any zone. But elite midrange shooters (50%+) beat that threshold, and shot creation value (pulling defenders, collapsing help) adds value beyond raw PPS. The data killed the lazy midrange, not the midrange itself.

Key Takeaways
  • Long 2s died because ~40% FG = 0.80 PPS, well below league-avg 3PT efficiency, with the lowest offensive rebound rate of any zone (~24%).
  • Short midrange survived because elite players beat the efficiency threshold, and misses are rebounded at a higher rate (~32%) than long twos.
  • Shot creation value exists beyond raw PPS — midrange gravity opens driving lanes and creates threes.
  • True shot value = PPS + (miss probability × OReb% × second-chance value). Rim shots and short midrange benefit most from this adjustment.
Tier 2: Impact Measurement

How to measure what a player actually contributes

Tier 2 — Impact Measurement

On/Off Differentials

The right question with the wrong method

The most fundamental question in player evaluation is: "How does the team perform with this player on the court versus without them?" On/off differentials attempt to answer this directly. Take a team's offensive and defensive rating when a player is on the court, subtract the ratings when the player sits, and the difference is the player's on/off split. A player whose team is +5 per 100 possessions better when they play seems valuable.

The fatal flaw is that on/off splits do not control for who else is on the court. When a star sits, their minutes are typically filled by the weakest bench players. When a bench player sits, the starters take over. So a star's "off" numbers are dragged down by bad teammates, inflating the differential — and a bench player's "off" numbers benefit from the starters, deflating or even inverting the signal.

This creates well-documented absurdities. A mediocre player on a team with a deep bench rotation can have terrible on/off numbers because the team is good even without them. A bench player who only appears in garbage-time blowouts alongside starters can post elite on/off numbers. In the 2016-17 season, several role players on the Warriors had better on/off numbers than Kevin Durant and Stephen Curry because those role players shared all their minutes with stars while the stars' "off" minutes included each other's rest.

On/off data is still useful — it reveals how the team performs in different configurations, and it is the raw material from which better metrics are built. But it is not an individual player evaluation tool. It measures lineup quality differences, not individual contribution. Treating on/off differentials as player grades is one of the most common analytical mistakes in basketball discourse.

Key Takeaways
  • On/off splits measure the difference in team performance with a player on vs off the court.
  • The critical flaw: they do not control for who else is playing in those minutes.
  • Stars' on/off numbers are inflated because their "off" minutes feature weaker bench lineups.
  • On/off data is useful as raw material but should not be treated as individual player evaluation.
Prerequisites:
Tier 2 — Impact Measurement

RAPM (Regularized Adjusted Plus-Minus)

The gold standard of player impact measurement

RAPM addresses the fundamental flaw of on/off differentials: confounding teammates. The insight is that every stint — the stretch of play between substitutions — has exactly 10 players on the court simultaneously. If you collect enough stints, you can use regression to disentangle each player's individual contribution from their teammates' and opponents' contributions.

The setup is a massive regression. Each stint becomes a row of data. For every player in the league, there is a column with +1 if the player is on offense, -1 if on defense, and 0 if not on the court during that stint. The dependent variable is the point differential per 100 possessions for that stint. The regression simultaneously estimates each player's contribution while controlling for all nine other players on the court. This is a dramatic improvement over raw on/off splits.

Without regularization, this regression would overfit badly. Many players share the court so frequently that their individual effects are nearly impossible to separate (multicollinearity). Players who only played 200 minutes might get extreme ratings — plus or minus 10 per 100 possessions — based on tiny samples. Ridge regression fixes this by adding a penalty that shrinks extreme estimates toward zero. The principle is sound: extraordinary claims require extraordinary evidence. A player rated +8 on 200 minutes of data should be pulled toward zero because we simply do not have enough information to trust that estimate.

The result is that RAPM values are typically smaller in magnitude than raw on/off numbers. A star player might have a +6 raw on/off differential but a +3 RAPM. That compression is not a bug — it is the regularization removing noise and teammate confounds. The remaining signal is cleaner and more predictive of future performance.

RAPM is the backbone of nearly every modern impact metric. RPM, EPM, LEBRON, DARKO, and other publicly available metrics all build on RAPM or use similar adjusted plus-minus frameworks as their foundation. Some add box score priors (using counting stats to improve estimates for low-minute players), but the core engine is always some form of regularized regression on lineup data.

Key Takeaways
  • RAPM uses regression on every stint to isolate individual player impact while controlling for all nine other players on court.
  • Ridge regularization shrinks noisy estimates toward zero, preventing overfitting on small samples.
  • RAPM values are smaller than raw on/off numbers — that is the regularization working correctly.
  • Nearly every modern impact metric (RPM, EPM, LEBRON, DARKO) is built on RAPM or similar frameworks.
Tier 2 — Impact Measurement

Stabilization & Sample Size

How much data you need before a stat means something

Different basketball statistics require vastly different amounts of data before they become reliable. Three-point percentage needs approximately 750 attempts — roughly two to three full seasons for most rotation players — before the signal outweighs the noise. Free throw percentage stabilizes faster, at around 300 attempts. Steal rate needs 2,000 or more defensive possessions. These thresholds are called stabilization points: the sample size at which a stat is roughly half signal and half noise.

Single-game plus-minus is perhaps the noisiest commonly cited stat in basketball. A player can be +20 in a game where they played well and -15 the next game while playing almost identically. The variance in single-game point differential is enormous relative to any individual player's contribution. Over a 48-minute game with 100 possessions, random variation in shooting, turnovers, and opponent quality can easily swing the score by 10+ points in either direction. Individual plus-minus absorbs all of this noise.

Season-level stats are substantially more reliable, but even full-season samples can be noisy for low-frequency events. A player who blocks 2 shots per game over 82 games has 164 blocks — a reasonable sample for per-game rate, but blocks are highly variable game-to-game. Three-point percentage at 5 attempts per game for 82 games gives 410 attempts, still short of the ~750 needed for stabilization. Multi-year samples are most reliable but introduce a different problem: players change over time. The player you are measuring over 3 seasons may not be the same player at the end as they were at the beginning.

The practical lesson is to calibrate your confidence to the sample size. Strong claims require strong evidence. A 10-game hot streak from three does not mean a player has "figured it out." A 200-minute RAPM sample does not reliably identify a player's true impact. The question is not whether data is useful at small samples — it always provides some information — but how much weight to put on it relative to priors.

Myth

This player has a bad +/-.

Reality

Single-game or even short-stretch raw +/- tells you almost nothing about individual quality. The variance is enormous relative to the signal. A player's +/- in any given game is dominated by teammate performance, opponent quality, and random variation — not their individual impact.

Key Takeaways
  • Three-point percentage needs ~750 attempts (~2-3 seasons) to stabilize; free throw percentage needs ~300 attempts.
  • Single-game plus-minus is almost entirely noise — do not use it to evaluate individual players.
  • Even full-season samples are noisy for low-frequency events like blocks, steals, and three-point shooting.
  • Calibrate your confidence to the sample size: strong claims need large samples.
Tier 2 — Impact Measurement

Padding / Regression to the Mean

Why extreme numbers in small samples usually aren't real

Suppose a lineup plays 50 possessions together and scores at a 120 offensive rating. Is that lineup truly a 120 ORTG group? Almost certainly not. With only 50 possessions, random variation — a couple of lucky threes, an opponent turnover streak, a favorable whistle run — can easily push a 112-quality lineup to 120 or above. Regression to the mean says the true value is somewhere between the observed 120 and the league average (~112), with the exact blend depending on the sample size.

The DataBallr WOWY page implements this principle through its padding feature. When you toggle padding on, the site blends a lineup's actual performance with league-average performance based on the number of possessions played. The formula is conceptually simple: Padded Rating = (Actual Rating × Possessions + League Average × Prior Weight) / (Possessions + Prior Weight). More possessions means the actual data dominates. Fewer possessions means the league average pulls harder.

This is not arbitrary smoothing — it is mathematically principled. Padded (regressed) ratings are better predictors of future performance than raw ratings because they filter out small-sample noise. A lineup with a 130 offensive rating over 30 possessions is almost certainly not a 130 ORTG lineup, and padding correctly discounts that extreme observation. A lineup with a 115 offensive rating over 500 possessions is probably close to a true 115 ORTG group, and padding barely adjusts it.

The intuition is straightforward: extraordinary claims require extraordinary evidence. An 80% three-point shooter on 5 attempts is not an 80% three-point shooter — they are a small-sample anomaly that should be regressed heavily toward the ~36% league average. A 42% three-point shooter on 500 attempts is probably a genuinely good three-point shooter, and regression barely budges their estimate. Padding applies this same logic to lineup ratings, player stats, and any other metric built on limited observations.

Key Takeaways
  • Small samples produce extreme observed values that overstate true quality in both directions.
  • Regression to the mean blends observed performance with league average, weighted by sample size.
  • Padded ratings are better predictors of future performance than raw ratings.
  • The WOWY page's padding toggle implements regression to the mean for lineup data.
Tier 3: Shooting Deep Dive

Understanding efficiency beyond basic percentages

Tier 3 — Shooting Deep Dive

TS% vs eFG% vs FG%

The hierarchy of shooting efficiency metrics

Field goal percentage is the oldest and simplest shooting metric: made shots divided by attempted shots. It has two critical blind spots. First, it treats all made field goals equally — a two-point layup and a three-point bomb both count as one make out of one attempt, even though the three is worth 50% more. Second, it completely ignores free throws, which are one of the most efficient scoring methods in the game. FG% can rank a high-volume free throw merchant below a player who never gets to the line.

Effective field goal percentage (eFG%) fixes the first problem. The formula adds a 50% bonus for made three-pointers: eFG% = (FGM + 0.5 × 3PM) / FGA. This accounts for the extra point a three-pointer provides. A player who goes 4-for-10 with all four makes being threes has an eFG% of (4 + 0.5 × 4) / 10 = 60%, reflecting the 12 points scored on 10 attempts. Without the adjustment, their raw FG% would be 40%, which dramatically understates their efficiency.

True Shooting percentage (TS%) fixes both problems. It captures scoring efficiency from all three sources — two-point field goals, three-point field goals, and free throws — in a single number. The formula is TS% = Points / (2 × TSA), where TSA (True Shooting Attempts) = FGA + 0.44 × FTA. The 0.44 multiplier estimates the number of possessions consumed by free throw trips. It is not 0.5 because not all free throws use a full possession: and-one free throws, technical free throws, and flagrant free throws all come without costing a possession.

TS% is the gold standard for single-number scoring efficiency and is what DataBallr uses throughout the platform. When you see rTS% (relative True Shooting), it is TS% adjusted for league context. The gap between a player's eFG% and TS% reveals how much of their scoring comes from free throw generation — players like James Harden and Joel Embiid historically show large gaps because they get to the line at elite rates.

eFG% formula
eFG% = (FGM + 0.5 × 3PM) / FGA
Example: 4-for-10, all 3s
eFG% = (4 + 0.5 × 4) / 10 = 60%
TS% formula
TS% = Points / (2 × (FGA + 0.44 × FTA))
Example: 28 pts, 20 FGA, 8 FTA
TS% = 28 / (2 × (20 + 0.44 × 8)) = 28 / 47.04 = 59.5%
Key Takeaways
  • FG% ignores three-point value and free throws — it is the weakest efficiency metric.
  • eFG% adds credit for three-pointers but still ignores free throws.
  • TS% captures all scoring sources and is the best single-number efficiency measure.
  • The gap between eFG% and TS% reveals how much a player benefits from free throw generation.
Prerequisites:
Tier 3 — Shooting Deep Dive

Shot Quality vs Shot Making

Separating shot creation from shot conversion

ShotQuality (SQ) measures the expected value of a shot based on everything known before the ball reaches the rim. It considers shot location (distance from the basket, angle), defender proximity, shot type (catch-and-shoot vs off-the-dribble), whether the shooter is in transition or half-court, and game situation factors. A wide-open corner three has a higher SQ than a contested pull-up from 22 feet, because historically the first shot goes in far more often than the second.

SQ Expected tells you about shot creation — are the player or lineup generating good looks? A team with a high SQ Expected is creating open shots, getting to the rim, and running effective offense. A team with a low SQ Expected is settling for tough shots, failing to create separation, or operating against elite defenses. This is a measure of process, not outcomes.

The gap between SQ Actual and SQ Expected reveals shot making — conversion skill beyond what the shot quality would predict. A player whose SQ Actual exceeds their SQ Expected is making shots at a rate better than the quality of their looks would suggest. They are beating the model. Conversely, a player whose SQ Actual trails their SQ Expected is missing makeable shots — perhaps due to a slump, fatigue, or simply being a below-average shooter.

Over large samples, shot making (the SQ Actual minus Expected gap) tends to regress toward zero for most players. The average NBA player converts shots at roughly the rate their shot quality predicts. But some elite shooters — Stephen Curry, Kevin Durant, Kyrie Irving — sustain positive shot-making margins over multiple seasons, suggesting genuine skill beyond what location and context predict. Identifying whether a player's hot streak is shot quality (they are getting better looks) or shot making (they are hitting tough shots) is critical for projecting whether the performance will continue.

Key Takeaways
  • SQ Expected measures shot creation quality — location, defender distance, shot type — before the ball reaches the rim.
  • The gap between SQ Actual and SQ Expected measures shot making — conversion skill beyond expected value.
  • Most players' shot-making margins regress toward zero over time, but elite shooters can sustain positive margins.
  • Separating shot quality from shot making helps predict whether hot or cold streaks will persist.
Tier 3 — Shooting Deep Dive

Volume-Efficiency Tradeoff

Why comparing a star's efficiency to a role player's is misleading

Comparing a 30-point scorer's True Shooting percentage to a 10-point role player's is fundamentally misleading. The role player takes the easiest shots available — catch-and-shoot corner threes created by someone else, open layups off cuts, putbacks on offensive rebounds. They never have to create their own shot against a set defense or take a tough pull-up jumper with the shot clock winding down. Their shot diet is curated for efficiency.

The star scorer operates in a completely different universe. They face the best defenders, draw double teams, and must create offense when the play breaks down. As usage increases — meaning a larger share of team possessions end with that player shooting, turning it over, or going to the line — the available shots get progressively harder. The easy looks are already claimed. What remains are contested jumpers, drives into help defense, and isolation possessions against primary defenders.

This is the volume-efficiency curve: as shot volume increases, efficiency typically decreases. A player who scores 20 points per game on 60% True Shooting would almost certainly not maintain 60% TS at 30 points per game. The additional 10 points would come from harder shots against more defensive attention. This is why "just give them more shots" is usually not the answer — and why a role player's efficiency numbers are not evidence that they could be a star if given the opportunity.

Context-aware evaluation means weighting both volume and efficiency together. A star maintaining 58% TS on 30 points per game is providing more offensive value than a role player at 62% TS on 10 points per game, because the star is producing far more total scoring output on only slightly lower efficiency — and they are doing it against much tougher defensive coverage. The DataBallr PvP page lets you compare players side by side with both usage and efficiency visible, making this tradeoff explicit.

Key Takeaways
  • Role players take easy, curated shots; stars must create against elite defensive attention.
  • As usage (shot volume) increases, efficiency typically decreases — the easy shots are already taken.
  • A role player's high efficiency does not mean they would maintain it at higher usage.
  • Evaluate scoring by weighing volume and efficiency together, not efficiency alone.
Tier 3 — Shooting Deep Dive

Free Throw Generation

The most underappreciated efficiency lever in basketball

Getting to the free throw line is one of the most reliable and consistently undervalued offensive skills in basketball. Free throws are worth approximately 1.5 points per trip — with league-average free throw shooting at roughly 78%, and most foul trips sending a player to the line for two attempts, the expected value is 2 × 0.78 = 1.56 points. That alone exceeds league-average scoring per possession (~1.12), making every standard foul trip an efficient outcome.

The TS% formula uses a 0.44 multiplier per free throw attempt (TSA = FGA + 0.44 × FTA) to estimate the possession cost of free throws averaged across all situations. This factor is less than 0.5 because not all free throws consume a full possession: and-one free throws come after a made basket (the possession was already used by the field goal attempt), technical and flagrant free throws are bonus possessions that cost nothing, and three-shot fouls cost only one possession despite three attempts. These scenarios pull the league-wide average below 0.5 per FTA. Using this average, each free throw attempt produces ~0.78 points while consuming ~0.44 possessions, yielding roughly 1.77 points per possession equivalent — one of the most efficient scoring methods in basketball. This is why free throw generation inflates TS% well above eFG%.

Free throw rate, measured as FTA/FGA, quantifies how often a player draws fouls relative to their field goal attempts. A player with a 0.40 FT rate takes 40 free throws for every 100 field goal attempts — they are either aggressive driving to the rim, skilled at drawing contact, or both. This metric is tied to playing style and physicality rather than shot-making variance, which makes it one of the most stable offensive statistics from season to season.

The TS%-to-eFG% gap is the clearest indicator of free throw generation value. A player shooting 50% from the field with heavy free throw generation might have a 50% eFG% but a 58% TS%. The 8-point gap is entirely free throw value. Players like James Harden, Joel Embiid, and Giannis Antetokounmpo have historically shown massive TS%-eFG% gaps, reflecting their dominance at drawing fouls. When evaluating offensive efficiency, ignoring free throw generation misses a major piece of the picture.

Free throw efficiency (league-wide average)
~78% FT% × 1 FTA = 0.78 pts, using ~0.44 possessions per FTA = ~1.77 pts/possession equivalent
Key Takeaways
  • Free throw attempts average ~0.78 points per FTA while consuming ~0.44 possessions each — roughly 1.77 points per possession equivalent, well above league average.
  • FT Rate (FTA/FGA) measures foul-drawing ability and is one of the most stable offensive stats.
  • The gap between TS% and eFG% directly reveals how much free throw generation boosts a player's efficiency.
  • Free throw generation is tied to playing style and physicality, not shooting luck.
Tier 4: Lineup & Context

Why basketball stats can't ignore who's on the court

Tier 4 — Lineup & Context

WOWY (With Or Without You)

Why individual stats always need lineup context

Individual stats are always entangled with teammates. A player's assist numbers depend on whether they share the court with shooters who convert their passes. A player's defensive rating depends on the team scheme and the quality of their co-defenders. A player's offensive rating depends on whether their team has floor spacing or clogs the paint. Evaluating any player in isolation from their context is analytically incomplete.

WOWY (With Or Without You) analysis addresses this by comparing how a team performs across different player combinations. At its simplest, WOWY asks: what is the team's offensive and defensive rating when Player A is on the court, and what is it when Player A sits? But it goes deeper than basic on/off splits. You can layer the analysis: how does the team perform when Player A AND Player B are both on the court? What about when Player A is on but Player B is off? This reveals whether two players are better together or apart.

These comparisons expose synergies and redundancies that raw individual stats cannot capture. If a team's offensive rating jumps 6 points when two specific players share the court but drops when either plays alone, those players have a measurable synergy — their skills amplify each other. Conversely, if a team's offense is no better with both players than with either individually, their skills may be redundant — they are competing for the same possessions or occupying the same role.

The DataBallr WOWY page lets you toggle individual players on and off for any team and see how the team's offensive and defensive ratings shift across different lineup configurations. You can filter by season, season type, and leverage situations. Combined with the padding feature (which regresses small-sample lineup data toward league average), the WOWY tool provides a granular look at how players interact within lineup ecosystems — far more informative than evaluating any player in a vacuum.

Key Takeaways
  • Individual stats are always entangled with teammate quality, scheme, and lineup context.
  • WOWY compares team performance across different player combinations to reveal synergies and redundancies.
  • Layered WOWY analysis (Player A with vs without Player B) is more informative than simple on/off splits.
  • The DataBallr WOWY page provides interactive lineup filtering with regression padding for small samples.
Prerequisites:
Tier 4 — Lineup & Context

Lineup Sample Size

Why most 5-man lineup data is too sparse to trust

Five-man lineup analysis has a fundamental data problem: combinatorial sparsity. An NBA roster carries roughly 15 players, yielding over 3,000 possible five-player combinations. In an 82-game season with about 100 possessions per team per game split across multiple lineup configurations, most five-man combinations play fewer than 100 possessions together. Many play fewer than 50. Some of the most interesting combinations — rare pairings during injury absences or late-game situations — might share only 20 possessions.

One hundred possessions is not nearly enough to draw reliable conclusions about a lineup's true quality. The standard deviation of point differential per 100 possessions is roughly 12-15 points for a single 100-possession sample. That means a true +3 lineup could easily register as -10 or +16 in any given 100-possession stretch. When the noise band is 5-10 times wider than the signal you are trying to detect, individual observations are close to meaningless.

This is precisely why RAPM uses player-level analysis rather than five-man lineup data. Instead of trying to evaluate thousands of five-man combinations with tiny sample sizes each, RAPM pools information across all lineups a player appears in. A player who appears in 15 different five-man lineups over a season contributes data from all of them to a single player-level estimate. This dramatically increases the effective sample size and produces much more stable ratings.

The practical implication is clear: when you encounter five-man lineup data showing a lineup with a +15 net rating over 80 possessions, resist the urge to declare it elite. That rating is well within the range of random variation. Use padding (regression to the mean) to get a better estimate, and focus on player-level or two-man combination data where sample sizes are substantially larger and more reliable.

Key Takeaways
  • With ~15 players and 3,000+ possible five-man combinations, most lineups play far too few possessions for reliable evaluation.
  • 100 possessions has a noise band of roughly plus or minus 12-15 points — dwarfing the actual signal.
  • RAPM pools information across all lineups a player appears in, dramatically improving signal quality.
  • Always use padding/regression when evaluating five-man lineup data, and prefer player-level or two-man combination analysis.
Tier 4 — Lineup & Context

Complementary vs Redundant Skills

The best lineup isn't always the five best players

Roster construction is not a fantasy basketball exercise where you stack the five highest-rated players. The best lineups are composed of players whose skills complement each other — each player provides something the others do not, and together they cover all the skills a lineup needs. Floor spacing, rim pressure, ball-handling, switchable defense, rebounding, and playmaking all need to be present, but they do not all need to come from the same player.

Spacing is the most visible example. A lineup with five non-shooters clogs the paint, eliminates driving lanes, and makes every half-court possession harder. A lineup with five shooters may stretch defenses but lacks rim pressure and offensive rebounding. The best lineups balance shooters and drivers — typically 3-4 capable shooters around 1-2 players who attack the basket. This is not a theory; it is directly observable in lineup-level offensive rating data across the league.

Shot diet interactions create subtler redundancies. When two high-usage ball-dominant guards share the court, each handles the ball less and takes fewer shots. Their individual production may decline even if the lineup performs adequately. This is not because either player got worse — it is because they compete for the same opportunities. One player's usage is partly at the expense of the other's. The lineup might be good, but it might be less than the sum of its parts.

WOWY data reveals these dynamics directly. If Player A's TS% drops 3 points when Player B is also on the court, that could signal skill redundancy — both players need the ball in the same situations, forcing one into a less comfortable role. If Player A's TS% rises when Player C joins the lineup, that suggests complementary skills — perhaps Player C's spacing opens driving lanes for Player A. The DataBallr WOWY and Stat Line Shift pages let you explore these interactions systematically.

Key Takeaways
  • The best lineups balance complementary skills — shooting, rim pressure, playmaking, defense — rather than stacking talent.
  • Floor spacing is the most critical lineup construction variable: 3-4 shooters around 1-2 rim attackers is the standard template.
  • Two high-usage players sharing the court may be redundant — they compete for the same opportunities.
  • WOWY data reveals synergies and redundancies by showing how individual stats shift with different lineup partners.
Prerequisites:
Tier 4 — Lineup & Context

Offensive Rebounding Tradeoffs

Second chances come with hidden costs

Offensive rebounding has clear, quantifiable value. Each offensive rebound extends a possession, providing a second-chance scoring opportunity worth roughly 1.0-1.1 points per second-chance possession. Teams that dominate the offensive glass — securing 30%+ of available offensive rebounds — generate several extra possessions per game. Over a season, that can be worth 3-5 additional wins.

But offensive rebounding comes with a transition defense cost. Players who crash the offensive glass are, by definition, not getting back on defense. When the defending team secures the rebound and pushes in transition, the offensive rebounding team is outnumbered. Transition possessions are among the most efficient in basketball — typically worth 1.10-1.15 points per possession — so giving up transition opportunities is a real price.

Shot diet complicates the picture further. Three-point misses produce longer, more unpredictable rebounds that scatter farther from the basket. Two-point misses, especially from close range, produce shorter rebounds that are easier to secure. A team that shoots a high volume of threes will naturally have a lower offensive rebound percentage even with the same effort and personnel, because the rebound opportunities are fundamentally harder. This means comparing ORB% across teams or eras without accounting for shot mix is misleading.

The RAPM decomposition makes these tradeoffs concrete. Consider two players. Steven Adams has historically shown an oREB impact of roughly +4.0 — his offensive rebounding is elite — but his oTS impact is about -1.0, reflecting the spacing cost of having a non-shooter on the floor. Joel Embiid shows the inverse: oTS of roughly +3.0 (he is a shooting big who spaces the floor) but oREB of about -1.5 (he does not crash the glass as aggressively). Neither profile is inherently better; they are different tradeoffs. A -6.0 relative ORB% is not necessarily bad if the lineup's shot mix is weighted toward threes and long midrange shots — the rebound opportunities simply are not there.

Key Takeaways
  • Offensive rebounds extend possessions and are worth ~1.0-1.1 points per second-chance opportunity.
  • Crashing the offensive glass sacrifices transition defense — and transition possessions are highly efficient for the opponent.
  • Three-point-heavy shot diets naturally depress ORB% because long misses scatter farther and are harder to secure.
  • RAPM decomposition (oREB vs oTS tradeoff) makes the spacing-vs-rebounding tradeoff concrete and player-specific.
Prerequisites:
Tier 5: Box Score & Impact Metrics

What the common metrics get right and wrong

Tier 5 — Box Score & Impact Metrics

BPM, PER, and Win Shares

The hierarchy of box-score-based all-in-one metrics, and why some are far better than others.

Player Efficiency Rating (PER), created by John Hollinger, was one of the first widely adopted all-in-one metrics. It weights box score stats — points, rebounds, assists, steals, blocks, turnovers, missed shots — with a set of coefficients, then normalizes to a league average of 15.0. The appeal is obvious: one number that captures everything. The problem is that the coefficients are essentially arbitrary. They don't come from any model of how basketball works; they're Hollinger's subjective judgments about how much a rebound is "worth" relative to an assist.

This produces predictable distortions. PER overweights volume scoring (high-usage players look better even if they're inefficient), penalizes missed shots more harshly than it rewards made ones (creating a bias against shot creators), and largely ignores defense beyond steals and blocks. A player who scores 22 points on 20 shots with 3 turnovers can rate higher than a player who scores 14 points on 8 shots with elite defense. The metric tells you who fills up the box score, not necessarily who helps their team win.

Win Shares, developed by Dean Oliver and refined by Basketball Reference, takes a different approach. It starts from team wins and distributes credit to individual players based on their box score contributions. Offensive Win Shares and Defensive Win Shares are calculated separately, then summed. The offensive side works reasonably well — it tracks scoring efficiency and volume in a way that correlates with actual offensive value. Defensive Win Shares, however, are heavily influenced by team defense and playing time rather than individual defensive skill, making them unreliable for evaluating individual defenders.

Box Plus-Minus (BPM) represents the most principled of the three. Created by Daniel Myers for Basketball Reference, BPM uses box score stats to estimate a player's per-100-possession impact relative to league average. Crucially, the weights in BPM aren't arbitrary — they come from regression against Regularized Adjusted Plus-Minus (RAPM), meaning BPM is trained to approximate the output of a more rigorous metric. This makes it meaningfully better than PER, though it still inherits the fundamental limitation that box score stats can only approximate true impact.

The hierarchy for single-number box-score metrics is clear: BPM > Win Shares > PER. But all three share a ceiling — they can only use what appears in the box score. A player's gravity (drawing defenders without the ball), screen quality, defensive positioning, and communication are invisible to all of them. For truly measuring impact, you need RAPM-based metrics or tracking data.

PER league average
PER is normalized so league average = 15.0 each season
BPM interpretation
BPM = 0.0 means league average; BPM = +5.0 means approximately +5 points per 100 possessions above average
Win Shares
Win Shares = OWS + DWS (roughly: 1 WS = 1 marginal win contributed)
Myth

PER is the best all-in-one stat.

Reality

PER is actually one of the weakest all-in-one metrics. Its arbitrary weightings significantly overrate high-usage inefficient scorers. BPM and RAPM-based metrics are substantially better at capturing actual player impact.

Key Takeaways
  • PER uses arbitrary weightings and overrates high-usage scorers — it is one of the weakest all-in-one metrics despite its popularity.
  • Win Shares distributes team wins to players but Defensive Win Shares are particularly unreliable for individual evaluation.
  • BPM is the best box-score all-in-one because its weights come from regression against RAPM rather than arbitrary assignment.
  • All box-score metrics share a ceiling: they cannot capture impact that doesn't appear in the box score.
Prerequisites:
Tier 5 — Box Score & Impact Metrics

The Defense Problem

Why box score metrics systematically undervalue elite defenders.

Box score metrics have a defense problem. Most defensive impact is invisible in traditional statistics, which means any metric built on box score data will systematically undervalue elite defenders. This isn't a minor oversight — it's a structural bias that distorts player evaluation across the league.

What box scores capture about defense is limited to blocks, steals, and defensive rebounds. These are real contributions, but they represent a tiny fraction of overall defensive value. A player who averages 2 steals per game is doing something measurable, but a player who positions himself so well that opponents avoid passing to his assignment entirely is doing something potentially more valuable — and it shows up nowhere in the box score.

The invisible majority of defense includes: positioning that deters shots from even being attempted (deterrence), rotational awareness and help defense timing, communication that organizes teammates, screen navigation that stays attached to shooters, closeout quality that contests without fouling, and post defense technique. None of these have box score entries. A defender who forces a bad pass by being in the right help position gets no statistical credit. A defender who navigates three screens to contest a three-pointer gets no credit unless the shot is blocked.

This creates a systematic bias in player evaluation. A player who scores 25 points per game on good efficiency has clear, quantifiable evidence of their value in every box score metric. A player who holds opponents to 5% below their expected effective field goal percentage through positioning and basketball IQ has almost no box-score evidence. PER, Win Shares, and even BPM will rate the scorer higher, even if the defender's total impact is equal or greater.

This is precisely why RAPM and tracking-based metrics were revolutionary. RAPM measures outcomes — points scored and allowed per 100 possessions — regardless of whether the mechanism is visible in box scores. If a player makes the team's defense better when he's on the court, RAPM captures it, even if the mechanism is "opponents take worse shots because they're afraid to drive against him." Players like Draymond Green, Rudy Gobert, and Marcus Smart had outsized defensive impact that box-score metrics consistently underestimated but RAPM accurately measured.

Key Takeaways
  • Blocks, steals, and defensive rebounds represent only a small fraction of actual defensive value.
  • Deterrence, rotations, communication, and positioning are invisible in box scores but often more valuable than counting stats.
  • This creates a structural bias: offensive players look better in box-score metrics than equivalent-impact defenders.
  • RAPM-based metrics solved this by measuring on/off outcomes rather than counting individual events.
Prerequisites:
Tier 5 — Box Score & Impact Metrics

Counting Stats vs Rate Stats

When totals matter, when rates matter, and the traps of comparing across different volumes.

Counting stats — total points, rebounds, assists, steals — measure cumulative production over a game or season. Rate stats — points per possession, True Shooting percentage, assist percentage, rebound rate — measure efficiency or frequency independent of playing time. Both have legitimate uses, and choosing the wrong one for a given question leads to misleading conclusions.

Counting stats answer the question: "How much total value did this player produce?" A player who plays 36 minutes per game at average efficiency produces more total output than a player who plays 20 minutes at high efficiency. Availability and durability are real forms of value. When you're asking about a team's total output or a player's cumulative contribution over a season, counting stats are appropriate. The player who played 75 games contributed more total production than the one who played 45 games, all else being equal.

Rate stats answer the question: "How good is this player when they play?" They isolate skill level and efficiency from playing time and opportunity. A player averaging 18 points per 75 possessions in 22 minutes might be the same caliber player as one averaging 18 points per 75 possessions in 34 minutes — the first is just getting fewer opportunities. Rate stats help identify underutilized players and separate skill from situation.

The critical trap is comparing rates across very different volumes. A player who shoots 50% from three on 2 attempts per game is not a better shooter than one who shoots 38% on 10 attempts per game. The high-volume shooter is taking — and making — harder shots at greater frequency. As volume increases, difficulty increases and efficiency naturally drops. Similarly, a player's per-36-minute stats can look absurd if they only play 12 minutes per game, because the sample is small and the competition faced during those minutes may be weaker (bench units).

The best approach is to use both in context. Rate stats tell you about player quality; counting stats tell you about total production. A player who leads the league in points per possession but plays 18 minutes per game is highly efficient but not the league's most valuable scorer. A player who leads in total points but does so on below-average efficiency is producing volume, but at a cost to team offense. The full picture requires both dimensions.

Per-100 possessions
Per-100 stat = (Raw stat / Player possessions) * 100
Volume matters
50% on 2 three-point attempts/game = 1.0 made; 38% on 10 attempts = 3.8 made
Key Takeaways
  • Counting stats measure total production; rate stats measure efficiency or frequency per opportunity.
  • Use counting stats to assess cumulative impact and rate stats to assess skill level independent of minutes.
  • Never compare rates across dramatically different volumes — small-sample rates are unreliable and inflated.
  • The best analysis uses both: rate stats for player quality, counting stats for total value contributed.
Tier 6: Draft & Projection

Projecting college players to the NBA

Tier 6 — Draft & Projection

College-to-NBA Translation

Why college stats don't translate 1:1 to the NBA, and the three adjustments that matter.

A player averaging 20 points per game in college will not average 20 points per game in the NBA. This is obvious, but the reasons go beyond "the NBA is harder." There are at least three systematic differences between college and NBA basketball that require explicit statistical adjustments: pace, competition quality, and role changes.

Pace adjustment is the most mechanical. College basketball uses a 30-second shot clock (changed from 35 seconds in 2015), while the NBA uses 24 seconds. College games also have shorter halves (20 minutes vs 24 minutes of live play per half). These differences mean college games have fewer possessions per game than NBA games, and the pace varies enormously across conferences and teams. A player scoring 18 points per game in a slow-paced Big Ten system might be producing at a higher per-possession rate than a player scoring 22 points per game in a fast-paced system. Per-100-possession rates are essential for cross-context comparison.

Competition adjustment is harder but equally important. Not all college conferences are equal in talent density. A dominant scorer in a mid-major conference is facing weaker defenders, less sophisticated schemes, and less athletic opponents than a dominant scorer in the ACC or Big 12. Conference strength-of-schedule metrics and opponent-adjusted efficiency help normalize performance, but imperfectly -- the gap between college and NBA competition is so large that even conference adjustments have wide error bars.

Role adjustment is often the most underappreciated factor. Many top college players are the clear primary option on their team -- they have the ball constantly, run pick-and-roll as the lead handler, and take the highest volume of shots. In the NBA, most rookies enter as secondary or tertiary options alongside established stars. Their usage rate drops significantly, which means their counting stats drop. The question is whether their rates hold up: can they still score efficiently on fewer touches? Can they create for others in a more structured offense? Can they defend NBA-level athletes?

The best draft projection models weight rate stats over counting stats, prioritize physical tools (height, wingspan, athleticism) that project to the NBA level, and heavily weight age -- younger players at the same production level have more projection room. A 19-year-old freshman averaging 16 points per game on good efficiency is generally a better prospect than a 22-year-old senior averaging 22 points per game, because the younger player is producing at a high level with more physical and skill development ahead.

Pace normalization
Per-100 = (Stat / Team possessions) * 100 -- normalizes across different game speeds
Usage drop
Typical college star: 28-32% USG. Typical NBA rookie: 18-22% USG
Key Takeaways
  • College and NBA basketball differ systematically in pace, competition, and player roles -- raw stats don't transfer directly.
  • Per-100-possession rates are essential for comparing players across different college systems and to the NBA.
  • Role compression is the most underappreciated adjustment: most college stars become secondary options as rookies.
  • The best draft models emphasize rate stats, physical tools, and age over raw counting stats.
Tier 6 — Draft & Projection

Physical Measurables

How height, wingspan, and combine data inform player projection -- and where they fall short.

NBA Draft Combine measurements -- height (with and without shoes), wingspan, standing reach, hand size, body fat percentage, lane agility, and vertical leap -- provide a physical baseline for prospect evaluation. These numbers don't predict careers, but they provide essential context for understanding what a player can and cannot do at the NBA level.

Wingspan relative to height is particularly predictive for defensive projection. The average NBA player has a wingspan roughly 4-5 inches longer than their height. Players who significantly exceed this -- a +6" or greater wingspan differential -- have a measurable advantage in contesting shots, generating deflections, finishing at the rim through traffic, and rebounding outside their positional norm. A 6'5" guard with a 7'0" wingspan can effectively contest shots like a player several inches taller.

Height alone is frequently misleading because it doesn't capture functional reach. A 6'7" player with a 7'2" wingspan has a longer standing reach -- and therefore a more effective contest radius -- than a 6'9" player with a 6'9" wingspan. When evaluating whether a player can guard a position or finish over length, standing reach and wingspan matter more than listed height. This is why the NBA shifted from "height in shoes" to "height without shoes" measurements, and why wingspan data has become central to draft evaluation.

These measurements are most useful as context and probability modifiers, not destiny. A shorter wingspan doesn't mean a player can't defend -- Chris Paul and Jrue Holiday are elite defenders with average-to-modest wingspans for their positions, compensating with strength, anticipation, and relentless effort. But shorter wingspan does mean a player needs those other tools to compensate, which narrows the path to defensive impact.

Measurements also inform positional versatility, which is increasingly valuable in a switch-heavy NBA. Can a wing guard centers in a switch? Wingspan, standing reach, and lateral quickness data all feed that assessment. A player with elite length and agility has a wider defensive range across positions, making them more valuable in modern defensive schemes that prioritize switching.

Wingspan differential
Wingspan differential = Wingspan - Height (without shoes); NBA average is roughly +4 to +5 inches
Standing reach
Standing reach correlates with shot-contest effectiveness and rebounding range
Key Takeaways
  • Wingspan relative to height is more predictive than height alone, especially for defensive projection.
  • Standing reach determines effective contest range -- it matters more than listed height for shot-blocking and rebounding.
  • Measurables are probability modifiers, not destiny: players with average wingspans can still be elite defenders through strength, anticipation, and effort.
  • Positional versatility (can a player guard multiple positions?) depends on the combination of length, agility, and strength.
Tier 6 — Draft & Projection

Comps & Similarity

How statistical similarity works, what it tells you, and the critical mistake of taking comps literally.

Statistical similarity -- commonly called "comps" -- finds historical players whose measurable profiles (stats, physical measurements, age) most closely match a current prospect at the same point in their career. The idea is straightforward: if we can find players who looked similar at age 20, their career trajectories give us a plausible range of outcomes for the current prospect.

The mechanics vary by implementation, but most similarity algorithms compute distance across a set of features: per-possession stats (scoring, rebounding, playmaking, defense), physical measurements (height, wingspan, weight), and contextual factors (conference strength, team quality, role). Players with the smallest statistical distance form the comp pool. Some models weight certain features more heavily -- physical tools might matter more than counting stats, for instance.

Comps are genuinely useful as a heuristic for establishing outcome ranges. If a prospect's closest historical matches are 60% productive NBA players and 40% out of the league within three years, that tells you something real about the probability distribution. A prospect whose comp pool is dominated by All-Stars is a safer bet than one whose comp pool is evenly split between stars and busts, even if both have the same "best case" comparison.

The critical mistake is taking individual comps literally. "Player X's closest comp is Kevin Durant" does not mean Player X will be like Kevin Durant. It means their measurable profile at that age was statistically similar along the dimensions being compared. Kevin Durant's career was shaped by factors that don't appear in the similarity calculation: his specific skill development path, coaching, team context, health, and the irreducible element of individual talent that no model fully captures.

Base rates are the essential reality check. Most draft picks, regardless of how good their comps look, don't become stars. The base rate for a lottery pick becoming an All-Star is roughly 20-25% averaged across all 14 slots, though it varies enormously by position in the draft (top-3 picks are 40-60%, while picks 8-14 can be under 10%). Outside the lottery, it drops to low single digits. Comps can shift these probabilities -- a prospect whose top-10 comps include 4 All-Stars has better odds than base rate -- but they can't override the fundamental uncertainty of projection. The best use of comps is establishing a distribution of reasonable outcomes, not predicting a specific career path.

Key Takeaways
  • Statistical similarity identifies historical players with comparable profiles at the same age, providing an outcome distribution.
  • Comps are useful for establishing probability ranges -- not for predicting that a player will follow a specific career path.
  • Base rates matter: even lottery picks average only ~20-25% All-Star probability, so comps shift probabilities, not certainties.
  • The best use of comps is asking "what happened to players who looked like this?" not "who will this player become?"
Tier 6 — Draft & Projection

Draft Position Value

The steep decay curve of draft pick value and its implications for trade decisions.

Draft pick value does not decline linearly. It follows a steep decay curve where the difference between adjacent picks is largest at the top of the draft and shrinks as you move down. The expected value of the first overall pick is dramatically higher than the fifth pick, which is dramatically higher than the tenth, which is only modestly higher than the fifteenth. This shape has been remarkably consistent across decades of draft data.

The approximate probabilities tell the story. Picks 1-3 have historically produced an All-Star caliber player roughly 40-50% of the time. Picks 4-7 drop to approximately 20-25%. Picks 8-14 land around 10-15%. Late first-round picks (15-30) are in the 5-8% range. Second-round picks have roughly a 2-5% chance of becoming significant rotation players, let alone stars. These are historical averages and vary by draft class strength, but the shape of the curve is consistent.

This nonlinear decay has major implications for trade valuation. The gap in expected value between pick 1 and pick 5 is much larger than the gap between pick 20 and pick 25, even though both are separated by four positions. This means trading down from the top of the draft -- swapping pick 3 for picks 8 and 15, say -- typically means giving up more expected value than you receive, even though you're getting two picks for one.

The mathematical case for trading up in the draft is generally strong: concentrating draft capital into higher picks produces more total expected value than spreading it across lower picks. Two picks with a 10% chance of producing a star are worth less in expected value than one pick with a 35% chance. This is why teams often package multiple assets to move up in the lottery -- the math supports it.

However, these are base rates, not destiny. Individual team scouting conviction can rationally override aggregate probabilities. If a team's evaluation process identifies a specific player at pick 8 whom they believe is a future star with, say, 50% confidence, then the base rate for "pick 8 in general" is less relevant than their specific assessment. The question is whether the team's scouting is actually better than base rates -- and historically, most teams' scouting accuracy is closer to base rates than they believe.

All-Star probability by range
Picks 1-3: ~40-50% | Picks 4-7: ~20-25% | Picks 8-14: ~10-15% | Picks 15-30: ~5-8%
Concentration principle
1 pick at 35% star probability > 2 picks at 10% each (35% vs 19% chance of at least one star)
Key Takeaways
  • Draft pick value follows a steep decay curve -- the gap between picks 1 and 5 is far larger than between picks 20 and 25.
  • Picks 1-3 produce All-Star caliber players roughly 40-50% of the time; this drops steeply through the first round.
  • Trading up concentrates expected value and is generally mathematically sound; trading down typically loses net expected value.
  • Individual scouting conviction can override base rates, but most teams' accuracy is closer to base rates than they think.
Tier 7: Meta: How to Think

Analytical traps and how to avoid them

Tier 7 — Meta: How to Think

Correlation vs Causation

The most common analytical trap in basketball: confusing what happens together with what causes what.

"Teams that shoot more threes win more games" is a real correlation in modern NBA data. But "shoot more threes to win more" doesn't necessarily follow. Better teams tend to have better shooters, who naturally take and make more threes. Better teams also tend to play with a lead more often, which means opponents pack the paint and concede perimeter looks. The three-point shooting is partly a consequence of being good, not just a cause of it.

This trap appears constantly in basketball analysis. "Players with higher usage have lower efficiency" could mean that high usage hurts efficiency (a reasonable mechanical argument -- more shots means harder shots). But it could also mean that teams give high usage to players who can handle it, and even those players are being asked to create beyond their optimal efficiency point. The causal direction is ambiguous, and both mechanisms are probably operating simultaneously.

How to think about it: look for mechanisms, not just correlations. Does the proposed causal direction make physical and tactical sense? Are there confounding variables that could explain both sides of the correlation? Would an intervention -- actually changing the behavior -- produce the predicted result? If a team that currently shoots 30 threes per game started shooting 40, would they win more? Not necessarily, because the additional 10 threes might be worse shots than the ones they're already taking.

A concrete example: pace and winning. In certain eras, faster-paced teams won more games. The conclusion "play faster to win" was popular. But the teams that played fast often did so because they had athletic, versatile rosters capable of running in transition -- and that athleticism caused both the fast pace and the winning. A slow, unathletic team adopting a faster pace wouldn't gain the same advantage because they lack the underlying talent that makes pace effective.

The antidote is thinking in terms of causal models rather than correlations. Ask: what would happen if we intervened and changed X while holding everything else constant? If the answer is unclear, the correlation alone isn't enough to guide strategy. This is why controlled experiments (rare in basketball) and careful observational methods (like matching similar situations) are more valuable than simple correlation analysis.

Key Takeaways
  • Correlation between two basketball variables doesn't mean one causes the other -- both might be caused by a third factor.
  • Always look for the mechanism: does the proposed causal direction make tactical and physical sense?
  • Think in terms of interventions: if you changed only X, would the outcome actually change as predicted?
  • Many popular basketball "insights" are correlations dressed up as causal claims.
Tier 7 — Meta: How to Think

Survivorship Bias

The invisible filter that makes remaining players look better than the full population.

"Midrange shooters in today's NBA are actually efficient" is a statement you'll hear in debates about the three-point revolution. And it's true -- the midrange shooters who remain in the modern NBA are, on average, pretty good at it. But this is survivorship bias in action. We're only observing the players who were good enough at midrange shooting to justify keeping it in their game. The bad midrange shooters were either coached out of taking those shots or filtered out of the league entirely.

The full picture looks different. If you include all the midrange shots that would have been taken by players who no longer take them (or no longer play), the average midrange shot was inefficient. The current sample is pre-selected for success. It's like concluding that skydiving is safe by only surveying people who landed successfully.

Survivorship bias appears throughout basketball analysis. "Tall players succeed in the NBA" seems true when you look at current rosters, but you're only seeing the tall players who made it. The many tall players who failed -- who lacked coordination, skill, or basketball IQ -- aren't in the sample. The correlation between height and success is real but much weaker than the surviving sample suggests.

The same bias distorts our view of career longevity and veteran performance. "Veterans are more efficient than young players" might be true in raw data, but it's partly because inefficient players don't survive long enough to become veterans. The sample of 10-year veterans is pre-selected for quality. A more honest statement would be: "Players who were good enough to play 10 years are efficient," which is nearly tautological.

How to guard against survivorship bias: always ask "what's missing from this sample?" and "who was filtered out before I observed this data?" If you're analyzing a group that has already been through a selection process (making the NBA, keeping a roster spot, maintaining a role), recognize that the group's averages don't represent the full population. The conclusions you draw apply only to survivors, not to the broader category.

Key Takeaways
  • Survivorship bias occurs when we only observe the successes and not the failures that were filtered out.
  • Current NBA midrange shooters look efficient because the inefficient ones stopped shooting midrange -- the sample is pre-selected.
  • Always ask: "what's missing from this sample?" and "who was filtered out before I observed this?"
  • Conclusions drawn from survivor samples apply only to survivors, not to the full population they came from.
Tier 7 — Meta: How to Think

Eye Test vs Stats

A false dichotomy -- the best analysis combines both, and each covers the other's blind spots.

The "eye test versus analytics" debate is a false dichotomy. The eye test and statistical analysis answer different questions, and both have systematic blind spots that the other can cover. Treating them as opposing approaches rather than complementary tools leads to worse analysis from both sides.

What stats capture that the eye misses: accumulated impact over hundreds or thousands of possessions. The human eye cannot track the difference between 58% True Shooting and 55% True Shooting in real time -- both look like "makes some, misses some." But over a full season (~1,500 true shooting attempts), that 3-percentage-point gap represents roughly 90 additional points of scoring value. Stats are patient pattern detectors that aggregate signal across samples too large for human memory.

What the eye captures that stats miss: defensive communication and effort variation from game to game. Gravity effects -- a player standing in the corner pulling a defender away from the paint without ever touching the ball. Scheme context -- a player's poor shooting numbers might stem from a poorly designed offense rather than individual skill. Movement quality changes that signal injury before the player reports one. The eye detects context and mechanism; stats detect outcomes and magnitude.

The best analysts integrate both. Stats identify what to investigate: "this player's defensive RAPM is elite -- why?" Film then explains the mechanism: positioning, rotations, communication, or some other specific behavior. Conversely, the eye test generates hypotheses: "this player seems to make everyone around him better." Stats then test the hypothesis: do the player's on-court lineups actually outperform expectations? WOWY data provides the answer.

"I don't need stats, I watch the games" and "I don't need to watch, I have the numbers" are both incomplete approaches that leave value on the table. The first misses accumulated effects invisible to human observation. The second misses context and mechanism that numbers alone can't reveal. The goal is not to choose a side but to use each tool where it's strongest.

Key Takeaways
  • Stats detect accumulated impact over large samples that the eye cannot track; the eye detects context and mechanism that stats cannot capture.
  • The best analysis uses stats to identify what to investigate and film to explain why it's happening.
  • Small efficiency differences (e.g., 3% TS = ~90 extra points over a season) are invisible to the eye but compound into meaningful impact.
  • Neither "pure eye test" nor "pure stats" is sufficient -- the two approaches cover each other's blind spots.
Tier 7 — Meta: How to Think

Context Collapse

Why reducing a player to a single number loses critical information -- and what to do instead.

Reducing a player to a single number is convenient but lossy. No all-in-one metric captures everything about a player's value, and the information that gets compressed away is often exactly what matters for real basketball decisions: lineup construction, matchup strategy, and role assignment.

Consider a player who is elite offensively (+4.0 offensive RAPM) and negative defensively (-2.0 defensive RAPM). Their overall RAPM of +2.0 looks "good but not great" -- roughly 30th-50th in the league. But that single number hides the extreme offensive creation and the defensive liability. This player's actual value depends entirely on context: paired with elite defenders who cover his weaknesses, he might be a top-15 player. Paired with other defensive liabilities, he might make the team worse. The single number tells you none of this.

This is why DataBallr offers multiple decomposed views rather than a single ranking. Six-Factor RAPM breaks impact into offensive and defensive components across scoring, playmaking, and efficiency. ShotQuality separates creation value from shot-making value from free throw generation from contest impact. WOWY lineup data shows how a player's impact changes depending on who else is on the court. Head-to-head matchup data reveals specific strengths and weaknesses against particular opponents.

When someone says "Player X is ranked 15th" by some all-in-one metric, the relevant follow-up question is "15th at what?" Are they 15th because they're solidly good at everything? Or because they're elite at one thing and weak at another, and those happen to average out to 15th? The answer matters enormously for how you build a team around them. A player who is average at everything fits anywhere. A player who is extreme in both directions requires specific teammates.

The instinct to reduce complexity to a single number is understandable -- it makes comparison easy and conversation simple. But basketball value is genuinely multidimensional. A player's worth depends on what they do well, what they do poorly, who they play with, and who they play against. Collapsing all of that into one number isn't simplification; it's information destruction.

Myth

Advanced stats can't capture intangibles.

Reality

Many so-called "intangibles" -- gravity, screen quality, defensive positioning, team chemistry effects -- actually show up in properly constructed metrics like RAPM. If a player makes teammates better through "intangible" means, that impact appears in the team's on-court performance and RAPM captures it. What RAPM can't identify is the mechanism, but the impact itself is measured.

Key Takeaways
  • All-in-one metrics compress away the dimensional information that matters most for real basketball decisions.
  • A player with +4.0 offense and -2.0 defense is fundamentally different from a player who is +2.0 on both ends, even though they have the same overall number.
  • Always ask "15th at what?" -- the components matter more than the aggregate for team-building.
  • Use decomposed views (RAPM splits, ShotQuality breakdown, WOWY lineups) to understand the full picture.