Why Shannon Entropy Is the Unique Cost of Information

From Principal-Agent Problems and Experimentation (Harvard, 2017), Chapter 1.

Setup

A principal hires an agent to acquire information about an unknown state $\theta \in \Theta$ and take a decision $d \in D$. The agent chooses an experiment—a joint distribution $p(d, \theta)$ over decisions and states—subject to a cost function $c(p)$ and a capacity constraint.

The contract $b(d, \theta)$ specifies payments from principal to agent. Both parties are risk-neutral. The question is: what form does the optimal contract take, and how does it depend on the cost function?

The general result

For a general cost function, every Pareto optimal contract can be written as:

$$b(d, \theta) = \alpha^* y(d, \theta) - \beta(\theta) - \gamma(d, \theta)$$

where:

$\alpha^* \in [0,1]$ is a piece rate (fraction of output)
$\beta(\theta)$ is a state-dependent transfer (indexing)
$\gamma(d, \theta)$ is a distortion term that depends on the cost function

The distortion $\gamma$ captures how complementarities in the cost of acquiring different signals in different states shape incentives. For a general cost function, $\gamma$ depends on both the decision $d$ and the state $\theta$.

The Shannon entropy result

Suppose the agent's cost function is expected reduction in Shannon entropy:

$$c(p) = H_S(\pi) - \sum_{d \in D} p(d)\, H_S(p(\cdot | d))$$

where $H_S(q) = -\sum_{\theta} q(\theta) \log q(\theta)$ is Shannon entropy and $\pi$ is the prior.

Then the distortion term simplifies dramatically. It no longer depends on the state:

$$\gamma(d, \theta) \;\longrightarrow\; \hat{\gamma}(d) = \sum_{\bar\theta} \frac{\lambda[d, \bar\theta]}{p(d)}$$

The optimal contract becomes:

Optimal contract under Shannon entropy costs:

Decision penalty/reward: $\hat{\gamma}(d) = \sum_{\bar\theta} \frac{\lambda[d, \bar\theta]}{p(d)}$
Indexed output: $y^I = y - \beta / \alpha^*$ where $\beta(\theta) = \min\{\alpha^* y(d, \theta) - \hat{\gamma}(d) : d \in D\}$
Piece rate: $\alpha^* \in [0,1]$

The decision-dependent transfer $\hat{\gamma}(d)$ punishes decisions likely to make the agent's liability limits bind and rewards decisions likely to make the principal's liability limits bind.

The uniqueness theorem

This is the key result. Start from the general distortion term $\gamma(d, \theta)$. For it to collapse to a decision-dependent transfer $\hat{\gamma}(d)$ (independent of $\theta$), the information cost matrix must take the form:

$$k(\theta, \theta', p(\cdot|d)) = p(\theta|d)\, g(\theta', p(\cdot|d)) + \mathbf{1}_{\{\theta' = \theta\}}\, h(\theta, p(\cdot|d))$$

Uniqueness: Using the symmetry of the information cost matrix and the constraint that its rows sum to zero, one can show that the cost function is necessarily proportional to expected reduction in Shannon entropy.

In other words: Shannon entropy is not just a convenient cost function for information acquisition. It is the only cost function that produces the clean contract form above. Any other cost function introduces state-dependent distortions that complicate the optimal contract.

Why this matters

Shannon entropy is ubiquitous in information theory, statistical mechanics, and models of rational inattention (Sims, 2003). But its use is often justified by tractability or tradition. This result provides a structural justification: Shannon entropy is the unique cost function under which optimal incentive contracts take an intuitive, implementable form.

The result connects information theory to mechanism design: the mathematical structure that makes Shannon entropy special in coding theory (it is the unique measure satisfying certain axioms) is the same structure that makes it special in contract design.