Content extract
Source: http://www.doksinet Introduction to Statistical Decision Theory Homework due: M in class. Grading policy: 70% hw + 30% final Class Website: http://math.binghamtonedu/qyu Chapter 1. Decision Theory §1.1 Introduction Definition: A decision problem consists of data X from the sample space X (X = x ∈ X ), with density function fX (x; θ), the parameter θ from the parameter space Θ, an action a from the action space A, 2 a loss function L(θ, a) on Θ × A, e.g |a − θ|, (a − θ)2 , (a−θ) θ(1−θ) , a (nonrandomized) decision rule d: X 7 A. E(L(θ, d(X)) is called the risk function of d, denoted by R(θ, d). Example 1 Suppose that X1 , ., Xn are iid from N (µ, σ 2 ) Consider estimation of σ 2 This can be viewed as a decision problem. X =(X1 , ., Xn ), from X = Rn , θ = (µ, σ 2 ), in Θ = (−∞, ∞) × (0, ∞), A = (0, ∞), an action is an estimate of σ 2 , say a ∈ A, a decision rule is an estimator of σ 2 , say d: Rn 7 (0, ∞), a loss function L is
the squared error L(θ, a) = (a − σ 2 )2 . Recall that under this set-up: 1. R(θ, d) = MSE(d) = E(L(θ, d(X)) = E((d(X) − σ 2 )2 ) P 1 2 2 2 2. There are two common decision rules (estimators) σ̂ 2 and S 2 , where σ̂ 2 = n−1 i (Xi − X) . n S and S = n−1 3. S 2 is the UMVUE of σ 2 4. σ̂ 2 is biased Questions: R(θ, σ̂ 2 ) = ? R(θ, S 2 ) =? Sol. σ4 σ4 R(θ, S 2 ) = V ar(S 2 ) = 2(n − 1) = 2 , (n − 1)2 (n − 1) as (n − 1)S 2 /σ 2 ∼ χ2 (n − 1). R(θ, σ̂ 2 ) = V ar(σ̂ 2 ) + (bias(σ̂ 2 ))2 = ( =( n−1 2 ) V ar(S 2 ) + (−σ 2 /n)2 n n − 1 2 σ4 ) 2(n − 1) + (−σ 2 /n)2 = (σ 2 /n)2 [2n − 2 + 1] n (n − 1)2 = σ4 σ4 [2 − 1/n] < 2. n n−1 Thus R(θ, σ̂ 2 ) < R(θ, S 2 ). Hence the MLE σ̂ 2 is preferable over S 2 in terms of the MSE. Question: Can we find an estimator that is the best w.rt the MSE ? Answer: No ! In Example 1, for each estimator d, ∃ a θo such that R(θo , d) > 0. Set σ̃ 2 (x) = σo2 ∀ x, then 2
R(θo , σ̃ ) = 0 < R(θo , d). Remark. Decision theory can be applied to estimation, as well as to hypothesis testing Example 2. Suppose that X ∼ N (µ, 1) Consider testing problem H0 : µ = 1 vs H1 : µ > 1 This can be viewed as a decision problem. X from X = R. θ = µ in Θ = [1, ∞). A = {0, 1} from {H0 , H1 } An action is an element of A. A decision rule is a function d : X 7 A. A loss function is the 0-1 loss L(θ, a) = 1((a,θ)=(1,1) or a=0 but θ>1) . In particular, a decision rule is d(x) = 1(x−1>.645) 1 Source: http://www.doksinet R(θ, d) = E(L(θ, d(X))) = = ½ ½ P (X − 1 > 1.645) P (X ≤ 1 + 1.645) if θ = 1 if θ > 1 0.05 if θ = 1 Φ(2.645 − θ) if θ > 1 = probability of making errors In fact, the 0-1 loss function = 1(making a mistake) . Another loss function is ( 0 if (θ, a) = (1, 0) or (1+, 1) L(θ, a) = cI a = 1 and θ = 1 cII a = 0 and θ > 1 cI and cII are the cost of types I and II errors. R(θ, d(X)) = ½ cI 0.05
if θ = 1 cII Φ(2.645 − θ) if θ > 1 Remark. The decision problem can also be applied to the confidence interval (CI) Example 3. Suppose that X1 , , Xn are iid from N (µ, σ 2 ) Consider interval estimation of µ This can be viewed as a decision problem. X = (X1 , ., , Xn ) from X = Rn θ = (µ, σ 2 ) in Θ = R × (0, ∞). A is a the collection of all intervals (l, u). A decision rule is a function d: X 7 A. A loss function is L(θ, I) = b · length(I) − 1(µ∈I) , where b > 0. 1(θ∈I) measures the correctness of the action I, length(I) is the length of the interval √ I. √ A decision rule is a CI: d(X) = (X − 1.96/ n, X + 196/ n) √ Its risk R(θ, d) = 2×1.96 × b − (2Φ(1.96/σ) − 1) n √ Another decision rule is a CI: δ(X) = X ± 1.96S/ n Then its risk function is √ R(θ, δ) ≈ bE(2 × 1.96 × S/ n) − 095 p √ σ = bE( χ2 (n − 1)) √ (2 × 1.96/ n) − 095 n−1 √ n−1 1 √ Γ( 2 + 2 ) 2 σ √ =b (2 × 1.96/ n) − 095 n−1 Γ( 2 )
n−1 §2. Bayes approach In this section, we consider Bayesian approach: X1 , ., Xn are iid from f (x|θ), θ is a random variable with df π(θ), f (x|θ) is a conditional df of X|θ. A Bayesian decision rule of θ is the one that minimizes the Bayes risk r(π, d) = Eπ (R(θ, d)) = Eπ (E(L(θ, d(X))|θ)) where Eπ denotes the expectation with respect to θ under π. Remark. Under certain regularity conditions (in the Fubini Theorem), (1) If E(L(θ, δ)|X) is finite, then the Bayes rule δB (x) = arg min E(L(θ, a)|X = x) a 2 (2) If L = (a − θ) , then δB = E(θ|X). Proof: Note that both X and θ are random. r(π, δ) =E(E(L(θ, δ(X))|θ)) =E(E(L(θ, δ(X))|X)) 2 (by Fubini Theorem) Source: http://www.doksinet is minimized by minimizing E(L(θ, δ(X))|X = x) for each xor minimizing E(L(θ, a)|X = x) over all a ∈ A for each x. (2) If L = (a − θ)2 , then E(L(θ, a)|X = x) = E((a − θ)2 |X = x) ∂ E(L(θ, a)|X = x) = 2E((a − θ)|X = x) = 2a − 2E(θ|X = x) ∂a
∂2 E(L(θ, a)|X = x) = 2 > 0 ∂a2 Thus a = E(θ|X = x) is the minimum point. That is, δB = E(θ|X) is the Bayes estimator wrt L and π §2.2 In particular, if L(θ, a) = (a − θ)2 then the Bayes estimator is θ̂ = E(θ|X) Recall the formula f (x, y) fX|Y (x|y) = fY (y) Now f (x, θ) is the joint df of (X, θ), fX (x) is the marginal df of X, π(θ) is the marginal df of θ, called prior df now, f (x|θ) is the conditional df of X|θ, π(θ|x) is the ½ R conditional df of θ|X, called the posterior df now, f (x, θ)dθ if θ is continuous fX (x) = P ½ R θ f (x, θ) if θ is discrete. f (x, θ)dx if X is continuous π(θ) = P x f (x, θ) if X is discrete. f (x,θ) f (x|θ) = π(θ) , π(θ|x) = ff (x(,θ) , X x)½ R θπ(θ|x)dθ if θ is continuous E(θ|X = x) = P θπ(θ|x) if θ is discrete. Remark. Two ways to compute the Bayes estimator: 1. E(θ|X), 2. E(θ|T (X)) where T is a MSS The second method is often simpler in derivation, but they lead to the same estimator.
α−1 (1−t)β−1 Example 1. Let X1 , , Xn be iid from bin(k, θ), θ ∼ beta(α, β) with π(t) = t B(α,β) where B(α, β) = α, β > 0 and (k, α, β) is known. P Qn ¡ ¢ Qn ¡ ¢ P Sol. Method 1 f (x|θ) = i=1 xki θxi (1 − θ)k−xi = ( i=1 xki )θ i xi (1 − θ)nk− i xi f (x, θ) fX (x) f (x|θ)π(θ) = f (x) PX P ∝ θ i xi (1 − θ)nk− i xi θα−1 (1 − θ)β−1 (main trick!!) P P = θ i xi +α−1 (1 − θ)kn− i xi +β−1 π(θ|x) = P P Thus θ|(X = x) ∼ beta( i xi + α, nk − i xi + β) (= beta(a, b)), The Bayes estimator is P a i Xi + α θ̂ = E(θ|X) = = . a+b nk + α + β P Method 2. MSS T = i Xi , T |θ ∼ bin(nk, θ) ¡ ¢ t nk−t fT |θ (t|θ) = nk , t θ (1 − θ) π(θ|t) = ¡nk¢ t θt (1 − θ)nk−t θα−1 (1 − θ)β−1 /B(α, β) fT (t) ∝ θt+α−1 (1 − θ)kn−t+β−1 3 Γ(α)Γ(β) Γ(α+β) , Source: http://www.doksinet . Example 2. Suppose that X1 , , Xn are iid from N (θ, σ 2 ), θ ∼ N (µ, τ 2 ),
where (σ, µ, τ ) is known Bayes estimator of θ ? Sol. We only use MSS method rather than XMSS of θ is T = X and T |θ ∼ N (θ, σ 2 /n) ¡ 1 (t − θ)2 1 (θ − µ)2 ¢ f (t|θ)π(θ) ∝ exp − − 2 2 σ /n 2 τ2 ¡ 1 −2tθ + θ2 1 θ2 − 2θµ ¢ − ∝ exp − 2 2 σ /n 2 τ2 ¡ 1 θ2 1 θ2 1 −2tθ 1 −2θµ ¢ = exp − − − − 2 2 2 2 σ /n 2 τ 2 σ /n 2 τ 2 ¡ 1 2 1 1 t µ ª¢ = exp − θ [ 2 + 2 ] + (−2θ)[ 2 + 2 ] 2 σ /n τ σ /n τ ¡ 1 2 1 1 t µ ª¢ f (t|θ)π(θ) ∝ exp − θ [ 2 + 2 ] + (−2θ)[ 2 + 2 ] fT (t) 2 σ /n τ σ /n τ 1 (x − µ∗ )2 1 2 1 µ∗ µ2∗ exp(− ) = exp(− [x − 2x + ]) 2 σ∗2 2 σ∗2 σ∗2 σ∗2 π(θ|t) = 1 1 1 µ∗ t µ = [ 2 + 2 ] and 2 = [ 2 + 2 ] σ∗2 σ /n τ σ∗ σ /n τ µ∗ = [ σ2t/n + [ σ21/n + µ τ2 ] 1 τ2 ] and σ∗2 = [ σ21/n 1 + 1 τ2 ] Thus θ|(T = t) ∼ N (µ∗ , σ∗2 ) and the Bayes estimator θ̂ = µ∗ = t σ 2 /n 1 σ 2 /n + + µ τ2 1 τ2 . Remark. It
is interesting to notice the following facts In Example 1, the Bayes estimator is P i Xi + α θ̂ = nk + α + β P nk α+β α i Xi = + nk + α + β nk nk + α + β α + β X α = r + (1 − r) k α+β a weighted average of the MLE X k and the prior mean In Example 2, the Bayes estimator is θ̂ = = X σ 2 /n 1 σ 2 /n + + α α+β . µ τ2 1 τ2 1 σ 2 /n 1 σ 2 /n + 1 τ2 X+ 1 τ2 1 1 σ 2 /n + τ 2 µ = rX + (1 − r)µ a weighted average of the MLE X and the prior mean µ. Homework 2.2 1. Consider testing H0 : p ≤ 1/3 verse H1 : p > 1/3, where X ∼ bin(5, p) with the 0-1 loss function Graph and compare the risk functions of the two tests: φ1 = 1(X≤1) and φ2 = 1(X≥4) . 4 Source: http://www.doksinet 2. Consider the estimation of p with L(p, a) = |a − p|, where X ∼ bin(10, p) Graph and compare the risk functions of the two estimators: p̃ = 1/3 and p̂ = X/10. 3. Let X ∼ N (µ, 1) Let δ π be the Bayes estimator of µ for squared error loss Compute
and graph the risk functions R(µ, δ π ) for π(µ) ∼ N (0, 1) and π(µ) ∼ N (0, 10). Comment on how the prior affects the risk functions of the Bayes estimator. 4. Let X1 , , Xn be a random sample from N (θ, σ 2 ), where σ is known Consider estimation of θ using squared error loss. Let π be a N (µ, τ 2 ) prior on θ and δ π the Bayes estimator of θ Compute the risk functions of δ = aX + b and δ π , as well as their Bayes risks. Remark. It is worth mentioning that if L 6= (θ − a)2 , then the Bayes estimator is no longer E(θ|X) Some examples of other loss functions: L(θ, a) = |a − θ|, 2 L(θ, a) = (a−θ) θ(1−θ) , where Θ = (0, 1), 2 √ X+α Example 3. If X ∼ bin(n, p), L(p, a) = (a−p) n/2, π ∼ beta(α, β). Then p(1−p) , p̂1 = X/n, p̂2 = n+α+β , α = β = R(p, p̂i ) = ? r(π, p̂i ) = ? Sol. (X/n − p)2 p(1 − p)/n R(p, p̂1 ) = E( )= = 1/n p(1 − p) p(1 − p) r(π, p̂1 ) = E(1/n) = 1/n. R(p, p̂2 ) = E( r(π, p̂2 ) = X+α (
n+α+β − p)2 p(1 − p) X+α E( n+α+β − p)2 )= p(1 − p) = n 1 √ 2· 4(n + n) p(1 − p) n 1 1 B(α − 1, β − 1) √ 2 · E( ) = cE( )=c p(1 − p) p(1 − p) B(α, β) 4(n + n) making use of M SE(d) = V ar(d) + (Bias(d))2 . √ ( n − 1)2 n (α − 1)(β − 1) √ 2· √ 2 √ = =c (α + β − 1)(α + β − 2) 4(n + n) ( n − 1)( n − 2) making use of B(a, b) = Γ(a)Γ(b) Γ(a+b) and Γ(a + 1) = aΓ(a). Example 4. If X ∼ bin(n, p), L = Sol. (a−p)2 p(1−p) , π ∼ U (0, 1), then Bayes estimator p̂= ? p̂ = δB (x) = arg min E(L(p, a)|X = x) a π(p|x) =?E(L(p, a)|X = x)= ? The joint distribution of (X, θ) is f (x, p) = f (x|p)π(p) = µ ¶ n x p (1 − p)n−x 1(p∈(0,1)) ∝ px (1 − p)n−x 1(p∈(0,1)) . x R1 Thus π(p|x) ∼ beta(x + 1, n − x + 1). Recall that 0 xα−1 (1 − x)β−1 dx = B(α, β) = g(a) = E(L(p, a)|X = x) = Z 1 0 g(a) = c Z Γ(α)Γ(β) Γ(α+β) (a − p)2 x cp (1 − p)n−x dp p(1 − p) 1 0 (a
− p)2 px−1 (1 − p)n−x−1 dp g(a) = E(L(p, a)|X = x) = c Z 1 0 (a2 − 2ap + p2 )px−1 (1 − p)n−x−1 dp 1. Notice that if x 6= 0 or n, g(a) is finite for all a ∈ [0, 1] g ′ (a) = 2c Z 1 0 (a − p)px−1 (1 − p)n−x−1 dp 5 and Γ(α + 1) = αΓ(α). Source: http://www.doksinet g ′′ (a) = 2c Z 1 0 px−1 (1 − p)n−x−1 dp > 0 R1 x p (1−p)n−x−1 dp = Thus g(a) is minimized by a = δB (x) = R 10 x−1 n−x−1 0 p dp (1−p) B(x+1,n−x) B(x,n−x) = Γ(x+1)Γ(n−x)Γ(n) Γ(x)Γ(n−x)Γ(n+1) = x/n 2. Notice that if x = 0, g(a) is finite only when a = 0, as g(0) = c Z 0 1 px+1 (1 − p)n−x−1 dp = cB(2, n) R1 R 1/2 Otherwise, g(a) ≈ c 0 a2 p−1 (1 − p)n−1 dp≥ c 0 a2 p−1 (0.5)n−1 dp = limp↓0 ca2 (05)n−1 (ln(1/2) − lnp) = ∞Thus g(a) is minimized by a = δB (0) = 0 = 0/n. 3. Notice that if x = n, g(a) is finite only when a = 1, as g(1) = c Z 0 1 pn−1 (1 − p)n−n−1+2 dp = cB(n, 2) R1 R1
Otherwise, g(a) ≈ c 0 a2 pn−1 (1 − p)−1 dp≥ c 1/2 a2 (0.5)n−1 (1 − p)−1 dp = ∞Thus g(a) is minimized by a = δB (n) = 1 = n/n. Answer The Bayes estimator w.rt π and L is δB (X) = X/n §3. Admissibility Definition. We say a decision rule δ is as good as another decision rule d if R(θ, d) ≥ R(θ, δ) for all θ We say that δ is better than d if ½ R(θ, d) ≥ R(θ, δ) ∀ θ (1) R(θ, d) > R(θ, δ) for at least one θ, If there exists another rule δ such that Inequality (1) holds, then the decision rule d is said to be inadmissible. If an decision rule is not inadmissible, we say that it is admissible. We have studied several optimal properties for an estimator: 1. unbiasedness, 2 UMVUE, 3 consistency, 4 efficiency, 5. admissibility Example 1 Suppose that X1 , ., Xn are iid from N (µ, σ 2 ) Consider estimation of σ 2 This can be viewed as a decision problem. θ = (µ, σ 2 ), in Θ = (−∞, ∞) × (0, ∞), a loss function L is the squared error
L(θ, a) = (a − σ 2 )2 . P 1 2 2 2 2. There are two common decision rules (estimators) σ̂ 2 and S 2 , where σ̂ 2 = n−1 i (Xi − X) . n S and S = n−1 2 2 3. S is the UMVUE of σ 4. σ̂ 2 is biased R(θ, σ̂ 2 ) = V ar(σ̂ 2 ) + (bias(σ̂ 2 ))2 = ( =( n−1 2 ) V ar(S 2 ) + (−σ 2 /n)2 n n − 1 2 σ4 ) 2(n − 1) + (−σ 2 /n)2 = (σ 2 /n)2 [2n − 2 + 1] n (n − 1)2 = σ4 σ4 [2 − 1/n] < 2. n n−1 Thus R(θ, σ̂ 2 ) < R(θ, S 2 ). Hence the MLE σ̂ 2 is preferable over S 2 in terms of the MSE. In Example 1, S 2 is inadmissible, even though it is UMVUE. σ̂ 2 is biased, but it is better than S 2 in terms of the MSE. In general, we do not expect global optimality When we discuss optimality, it is under certain restriction Example 2. Suppose that X ∼ bin(n, p), L(p, a) = |a − p| δ(x) = 1/3 Show that δ is admissible Proof. Notice that n µ ¶ X n x R(p, d) = E|d(X) − p| = p (1 − p)n−x |d(x) − p|. x x=0 Thus R(p, δ) = 0 if p = 1/3. If d is
as good as δ, then R(p, d) ≤ R(p, δ) for all p ∈ [0, 1] In particular, R(1/3, d) = 0 It follows that fX|p (x|1/3)|d(x) − 1/3| ≤ R(1/3, d) ≤ 0 = R(1/3, δ) for x ∈ {0, 1, ., n} Thus d = δ. It implies that there is no estimator that is better than δ Thus δ is admissible 6 Source: http://www.doksinet The example suggests that admissibility may not be an appealing property, but it is clear that inadmissible estimators are definitely not desirable. Question: How to determine that an estimator is admissible ? Answer: (1) By definition as in Example 2, (2) by the following theorem. Theorem 1. Suppose that the following conditions hold: 1. Θ is a subset of the real line; 2. R(θ, d) is continuous in θ for each decision rule d; R θ +ǫ 3. π is a prior density on Θ such that θoo−ǫ π(θ)dθ > 0 ∀ ǫ > 0, ∀ θo ∈ Θ; 4. δ π is the Bayes estimator wrt π and has a finite Bayes risk r(π, δ π ) Then δ π is admissible. R Remark. If π(θ) is a
non-negative function of θ and π(θ)dθ = ∞, it is called an improper prior density of θ. Theorem 1 is still applicable if π is an improper prior density. x+α is admissible for all α, β > 0. Example 3. Suppose that X ∼ bin(n, p) and L(p, a) = (a − p)2 Show that d(x) = n+α+β Sol. Let p ∼ beta(α, β), then π(p|x) ∼ beta(x + α, n − x + β), as fX|p (x|p)π(p) ∝ px+α−1 (1 − p)n−x+β−1 . X+α n+α+β . Since the conditions X+α n+α+β is admissible for α, β > 0. The Bayes estimator is δ = E(θ|X) = in Theorem 1 hold, δ is admissible. In Example 3, we show that Notice that one is more interested in whether X/n X+α is admissible. In fact, n+α+β is admissible for α, β ≥ 0. By one cannot use the approach E(p|X) with α or β = 0, as π(p|X = 0) is not a proper prior and B(0, n) is not finite if α = 0. Example 4. Suppose that X ∼ bin(n, p) and L(p, a) = (a − p)2 Show that p̂ = X n is admissible. 1 Sol. Let p̂ = X/n, then R(p, p̂)
= p(1 − p)/n Notice that if π(p) = p(1−p) , p ∈ (0, 1), π is not a proper prior. However, the “Bayes estimator ” w.rt π exists fX|p (x|p)π(p) ∝ px−1 (1 − p)n−x−1 . π(p|x) ∝ px−1 (1 − p)n−x−1 . If x ∈ {1, ., n − 1}, π(p|x) can be viewed as a beta(x, n − x) density To obtaine the Bayes estimator, it suffice to minimize E(L(p, a)|X = x). R1 E(L(p, a)|X = x) = 0 (a − p)2 cpx−1 (1 − p)n−x−1 dpis minimized by a = E(p|X = x) = x/(x + n − x) = x/n. If x = 0 then E(L(p, a)|X = x) is finite iff a = 0 = x/n. If x = n then E(L(p, a)|X = x) is finite iff a = 1 = x/n. Thus p̂ = x/n is the Bayes estimator w.rt to the improper prior π Consequently, it is admissible. Remark. Thus p̂ is admissible wrt the squared error loss, UMVUE, consistent, efficient Notice that if X1 , ., Xn are iid from N (θ, σ 2 ), then X ∼ N (θ, σ 2 /n) X is UMVUE, consistent and efficient Is it admissible ? It suffices to ask whether X is admissible if X ∼ N (θ,
σ 2 ), by setting n = 1. Example 5. Suppose that X ∼ N (θ, σ 2 ) and L = (a − θ)2 Show that 1. If σ is known, θ̂ = X is admissible 2. If σ is unknown, θ̂ = X is admissible Proof. Case 1 Suppose that σ 2 is known and thus the parameter is θ We shall assume that θ̂ is inadmissible and show that it leads to a contradiction. If θ̂ is inadmissible, then there is a d such that R(θ, θ̂) − R(θ, d) ½ ≥0 = 2c > 0 ∀θ for θ = θo (1) (x−θ)2 R − 1 2σ 2 e Notice that R(θ, d) = (θ − d(x))2 √2πσ dx is continuous in θ ∀ estimator d, thus R(θ, d) − R(θ, θ̂) is continuous 2 in θ too. Then by Eq. (1), there is a b > 0 such that 7 Source: http://www.doksinet R(θ, θ̂) − R(θ, d) > c if |θ − θo | < b. Suppose that π(θ) ∼ N (µ, τ 2 ), Z r(π, θ̂) − r(π, d) = (R(θ, θ̂) − R(θ, d))π(θ)dθ Z θo −b Z θo +b Z ∞ =( + + )(R(θ, θ̂) − R(θ, d))π(θ)dθ ≥ = Z −∞ θo +b θo −b Z θo +b θo
−b θo −b θo +b cπ(θ)dθ √ c θ2 2πτ 2 e− 2τ 2 dθ with µ = 0. τ (r(π, θ̂) − r(π, d)) ≥ Let π(θ|x) be N (µ∗ , σ∗2 ), where µ∗ = Z θo +b θo −b θ2 c √ e− 2τ 2 dθ. 2π (2) (3) σ2 τ2 def x + µ = (1 − η)x + ηµ τ 2 + σ2 τ 2 + σ2 and σ∗2 = τ 2 η. The Bayes estimator of θ is δ π = E(θ|X), that is, δ π (x) = µ∗ = (1 − η)x + ηµIf we let µ = 0 then δ π (X) = (1 − η)X X if η 0, that is, τ ∞. Moreover, r(π, δ π ) = E(R(θ, δ π )) = E(E(((1 − η)X − θ)2 |θ)) = E(E(((1 − η)X − θ)2 |X)) = E(E((θ − µ∗ )2 |X)) = E(V ar(θ|X)) Since R(θ, θ̂) = E(θ − X)2 = σ 2 , = E(τ 2 η) = τ 2 η. (4) r(π, θ̂) = σ 2 . (5) (4) and (5) yield r(π, δ π ) − r(π, θ̂) = τ 2 η − σ 2 = τ 2 σ2 σ2 τ 2 − σ2 − τ 2 −σ 4 − σ2 = σ2 = 2 2 2 2 +τ σ +τ σ + τ2 τ (r(π, δ π ) − r(π, θ̂)) = τ τ σ2 −σ 4 σ2 + τ 2 σ4 =τ (r(π, θ̂) −
r(π, δ π )) + τ2 =τ (r(π, θ̂) − r(π, d) + r(π, d) − r(π, δ π ))) Z θo +b θ2 c √ e− 2τ 2 dθ + 0 ≥ 2π θo −b (6) by τ (r(π, θ̂) − r(π, d)) ≥ Z θo +b θo −b θ2 c √ e− 2τ 2 dθ. 2π (3) √ Letting τ ∞ in inequality (6) yields 0 ≥ 2bc/ 2π > 0. The contradiction implies that θ̂ is not inadmissible. Case 2. Now assume σ is unknown, then the parameters are γ = (θ, σ) Again we shall suppose that θ̂ is inadmissible and show that it leads to a contradiction. If θ̂ is inadmissible in such case, there exists an estimator d such that ½ ≤0 ∀γ R(γ, d) − R(γ, θ̂) < 0 for γ = (θo , σ o ) 8 Source: http://www.doksinet The risk becomes R(γ, d) = Eγ ((d(X) − θ)2 ). It implies that Eθ,σ ((d(X) − θ)2 ) − Eθ,σ ((X − θ)2 ) ½ ≤0 <0 ∀ (θ, σ) = (θ, σo ) ∀ (θ, σ) = (θo , σo ) That is, θ̂ is inadmissible when σ = σo is fixed. It corresponds to the assumption in part one It
contradicts to the result in part one. Thus θ̂ is admissible when σ is unknown 3.2 Hypothesis testing problem Example 1. Suppose that X ∼ N (µ, 1) H0 : µ = 0 vs H1 : µ > 0 A test is φo = 1(X>1645) Is the test admissible under the 0-1 loss function. Sol. Notice that R(µ, φo ) = 0051(µ=0) + Φ(1645 − µ)1(µ>0) If φ is another test that is as good as φo , then R(µ, φ) ≤ R(µ, φo ) ∀ µ ≥ 0. Thus R(µ, φ) ≤ R(µ, φo ) = 0.05 if µ = 0 That is φ is a level α = 0.05 test Since φo is the UMP level 005 test R(µ, φ) ≥ R(µ, φo ) ∀ µ > 0. It follows that R(µ, φ) = R(µ, φo ) ∀ µ > 0. Thus φ is also a level α UMP test. Since φ is the UMP level 0.05 test for testing H0 : µ = 0 vs H2 : µ = 1, by Neymann-Pearson Lemma, φ = 1( f (x;1) >k) f (x;0) a.e for some k ≥ 0 But φo = 1( f (x;1) >k) for some k ≥ 0. f (x;0) It follows that φ = φo a.e Consequently, R(µ, φ) = R(µ, φo ) ∀ µ ≥ 0. Thus φo is admissible.
4. Minimaxity Definition. A decision rule δ is called a minimax decision rule if sup R(θ, δ) = inf sup R(θ, d), θ∈Θ d∈D θ∈Θ where D is the collection of all nonrandomized decision rule. A typical method for determining a minimax decision rule is as follows. Theorem 2. If δ is a Bayes rule and is an equalizer rule (that is, R(θ, δ) is constant in θ), then it is minimax Theorem 3. If δ is admissible and is an equalizer rule, then it is minimax Example 1. Suppose that X1 , , Xn are iid from N (θ, σ 2 ), show that the MLE of θ is minimax under the loss L = (a − θ)2 . Proof. The MLE is θ̂ = X It is admissible and with constant risk σ 2 /n Thus it is minimax Remark. Under the set up in Example 1, the MLE θ̂ = X is UMVUE, consistent, efficient, admissible and minimax (w.rt squared error loss) Proof of Theorem 2. We shall suppose that the equalizer rule δ is Bayes wrt the prior π and is not minimax and show that it leads to a contradiction. Let d be a
decision rule such that supθ R(θ, d) < supθ R(θ, δ). Then R(θ, d) ≤ sup R(θ, d) < sup R(θ, δ) = R(θ, δ) ∀ θ. θ θ Then r(π, d) < r(π, δ), contradicting the assumption that δ is the Bayes estimator w.rt π The contradiction implies that δ is minimax. Proof of Theorem 3. We shall suppose that the equalizer rule δ is admissible and is not minimax and show that it leads to a contradiction. 9 Source: http://www.doksinet Then there is another rule d such that R(θ, d) ≤ sup R(θ, d) < sup R(θ, δ) = R(θ, δ) ∀ θ. θ θ That is, R(θ, d) < R(θ, δ) ∀ θ. Then δ is inadmissible, contradicts the assumption that it is admissible. The contradiction implies that δ is minimax Example 2. Suppose that X ∼ bin(n, p) and the loss function is L = (a − p)2 , find an equalizer rule Sol. R(θ, d) = E((aX + b − p)2 ) = E(a(X − p) + b − p + ap)2 ) = V ar(a(X − p)) + (b − p(1 − a))2 = (a2 /n)p(1 − p) + b2 − 2pb(1 − a) + p2 (1 −
a)2 = p2 (−(a2 /n) + (1 − a)2 ) + p((a2 /n) − 2b(1 − a)) + b2 = b2 (equalizer rule)if −(a2 /n) + (1 − a)2 = 0 and (a2 /n) − 2b(1 − a) = 0. √ √ 2 2± 4−4(1−1/n) 1± 1/n 1 a /n 1−a √1 √1 a2 (1 − 1/n) − 2a + 1 = 0 yields a = = 2(1−1/n) 1−1/n = 1± 1/n b = 2 1−a = 2 and a = 1± 1/n . √ ± 1/n √ Thus b = . 2(1± 1/n) √ 1/n √ By choosing a = √1 and b = , R(p, d) = b2 < 1/(4n) = supp∈[0,1] R(p, p̂) = supp∈[0,1] p(1 − p)/n.Thus 1+ 1/n 2(1+ 1/n) p̂ is not minimax. Example 3. Suppose that X ∼ bin(n, p) and the loss function is L = (a−p)2 p(−p) . Show that the MLE is minimax. p(1−p) np(1−p) Proof. p̂ is Bayes wrt the loss and the uniform prior Moreover, R(p, p̂) = and thus it is an equalizer rule. Consequently, it is minimax. 4.2 Homework 1. Suppose that the parameter space in a decision problem is finite, say Θ = {θ1 , , θm } Suppose that δ is a Bayes rule with respect to a prior π that π(θi ) > 0 ∀ i.
Show that δ is admissible 2. Let X ∼ N (µ, 1) Consider estimation of µ with squared error loss Let δ(x) = 2 for all x Show δ is admissible 3. Consider the estimation p with X ∼ bin(n, p) under the squared error loss Show that δ = 1/3 is a Bayes rule with respect to some prior π. 4. Let X1 , , Xn be iid from N (θ, σ 2 ) Consider estimation of θ using squared error loss 4.a Show that X + b is inadmissible if b 6= 0 4.b Show that aX + b is admissible if b 6= 0 and a < 1 5. Let X ∼ N (µ, 1) Consider testing H0 µ ≥ 0 vs H1 : µ < 0 with 0-1 loss Let φ be the UMP test with the size α = E(φ). Find the size α such that φ is minimax 6. Under the set-up in Example 2 of §4, show that the obtained equalizer rule is minimax §5. Invariant Decision Problems There are two ways of data reduction: 1. Sufficiency 2. Invariancy Definition. Let G be a group of transformations of the sample space X The model F = {f (x; θ) : θ ∈ Θ} is invariant under the group G if ∀
(θ, g) ∈ Θ × G, ∃! θ′ ∈ Θ (say θ′ = g(θ)) such that Y = g(X) ∼ f (x; g(θ)) if X ∼ f (x; θ). The decision problem is invariant under the group G if the model F is invariant under G and if ∀ (g, a) ∈ G × A, ∃! a′ ∈ A (say a′ = g̃(a)) such that L(θ, a) = L(g(θ), g̃(a)) ∀ θ ∈ Θ. In an invariant decision problem, a decision rule d is invariant if d(g(x)) = g̃(d(x)) ∀ (g, x) ∈ G × X . Denote G = {g : g ∈ G} and G̃ = {g̃ : g ∈ G}. Recall that G is a group of transformations of X is 1. if g, h ∈ G then g ◦ h ∈ G; 2. if g, h, f ∈ G then (g ◦ h) ◦ f = g ◦ (h ◦ f ); 3. ∃ e ∈ G such that e ◦ g = g ◦ e = g ∀ g ∈ G; 4. if g ∈ G then ∃ h ∈ G such that h ◦ g = g ◦ h = e Theorem 1. Let d be an invariant decision rule in an invariant decision problem If ∀ θ and θ′ ∈ Θ, ∃ g ∈ G such that θ′ = g(θ), then R(θ, d) is constant in θ. 10 Source: http://www.doksinet Proof. R(θ′ , d) =Eθ′
(L(θ′ , d(X)) =Eg(θ) (L(g(θ), d(X))) Z = L(g(θ), d(y))dFX (y; g(θ)) = Z (definition) (θ′ = g(θ)) g(X) ∼ FX (; g(θ)) <= X ∼ FX (; θ) L(g(θ), d(g(x)))dFX (x; θ) =Eθ (L(g(θ), d(g(X)))) =Eθ (L(g(θ), g̃(d(X)))) (as d is invariante) =Eθ (L(θ, d(X)) =R(θ, d). Since θ′ is arbitrary, R(θ, d) is constant. Example 1. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider estimating µ using squared error loss Let G = {gc : c ∈ R}, where gc (x) = x + c ∀ x ∈ Rn . (1) Show that the model {N (µ, σ 2 ) : µ ∈ R, σ > 0} is invariant under G. (2) g c =? (3) Show that an invariant estimator d satisfies d(gc (x)) = d(x) + c. Sol. (1) In fact, gc (X) = (X1 + c, , Xn + c) is again independent normal, E(Xi + c) = µ + c and V ar(Xi + c) = V ar(Xi ) = σ 2 . Thus gc (X) = (X1 + c, ., Xn + c) is again independent N (µ + c, σ 2 ) distributed That is, ∀ c ∈ R, if X ∼ fX (x; θ) where θ = (µ, σ 2 ), then gc (X) ∼ fX (x; g c (θ)), where g
c (θ) = g c (µ, σ 2 ) = (µ + c, σ 2 ). (2) The decision problem is invariant if L(θ, a) = L(g c (θ), g̃c (a)) That is (a − µ)2 = (g̃c (a) − g c (θ))2 = (g̃c (a) − (µ + c))2 ⇒ g̃c (a) = a + c (3) An invariant estimator satisfies d(gc (x)) = g̃c (d(x)) = d(x) + c. Example 2. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider estimating σ 2 using squared error loss (a − σ 2 )2 . Let G = {gc : c > 0}, where gc (x) = cx ∀ x ∈ Rn (1) Show that the model {N (µ, σ 2 ) : µ ∈ R, σ > 0} is invariant under G. (2) g c =? (3) Is the decision problem invariant ? Sol. (1) and (2) If X ∼ N (µ, σ 2 ) then Xc ∼ N (µc, σ 2 c2 ) That is, cX1 , , cXn are iid from N (cµ, c2 σ 2 ) and g c (θ) = (µc, σ 2 c2 ). The model {fX (x; θ) : θ ∈ R × (0, ∞)} is invariant under the group G However, the decision problem is not invariant under G. We shall prove by contradiction If the decision problem is invariant under G then ∀ a > 0 ∃!
a′ > 0 such that L(θ, a) = (a − σ 2 )2 = L(g c (θ), a′ ) = (a′ − c2 σ 2 )2 ∀ c > 0 and ∀ θ. Setting a = σ = µ = 1 yields a′ = c2 . Setting σ 2 = 2 = µ yields (1 − 2)2 = (c2 − c2 × 2)2 ∀ c > 0, 11 Source: http://www.doksinet a contradiction.Thus the decision problem is not invariant Example 3. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider estimating σ 2 using the weighted squared 2 2 ) error loss (a−σ . Let G = {gc : c > 0}, where gc (x) = cx ∀ x ∈ Rn Show that an invariant estimator d satisfies σ4 2 d(gc (x)) = d(x)c . Sol. The model is invariant under G with g c (θ) = (cµ, c2 σ 2 ) by discussion in Example 2 Let g̃c (a) = c2 a L(g c (θ), g̃c (a)) = ( a − σ2 2 c2 a − c2 σ 2 2 ) =( ) = L(θ, a). 2 2 c σ σ2 Thus the decision problem is invariant under G. An invariant rule d(x) satisfies d(cx) = d(gc (x)) = g̃c (d(x)) = c2 d(x) Invariance in hypothesis testing. Consider the problem of testing Ho : θ
∈ Θ0 v.s H1 : θ ∈ Θ1 Denote A = {0, 1} corresponding to {rejecting H1 , rejecting H0 }. Let the loss function be L(θ, a) = 1(making mistake) Example 4. Let X1 , , Xn be iid from a cdf G H0 : G(x) = G0 (x − θ), v.s H1 : G(x) = G1 (x − θ), where G0 and G1 are given cdf and θ ∈ (−∞, ∞) Θ0 = {(θ, G0 ) : θ ∈ (−∞, ∞)} and Θ1 = {(θ, G1 ) : θ ∈ (−∞, ∞)}. The parameter can be written as γ = (θ, Gi ) Let G = {gc : gc (x) = x + c}, with g c (θ) = θ + c. Abusing notation, denote gc (x) = x + c) If X ∼ Gi (x − θ), then P (gc (X) ≤ t) = P (X + c ≤ t) = P (X ≤ t − c) = G(t − c) = Gi (t − c − θ) = Gi (t − (θ + c)) The model is invariant under G. Let g̃c (a) = a, then L(γ, a) = L(g c (γ), g̃c (a)) ∀ γ ∈ Θ0 ∪ Θ1 . That is, the decision problem is invariant. The invariant decision rule satisfies that d(gc (X)) = g̃c (d(X)) = d(X). Under the invariance principle, it reduce to testing H0∗ : G = G0 , v.s H1∗ : G = G1
Thus, there exists the UMP invariant test. Homework solution. 2.2 4. Let X1 , , Xn be a random sample from N (θ, σ 2 ), where σ is known Consider estimation of θ using squared error loss. Let π be a N (µ, τ 2 ) prior on θ and δ π the Bayes estimator of θ Compute the risk function of δ = aX + b and δ π , as well as their Bayes risks. Solution Let T = X. θ|(T = t) ∼ N (µ∗ , σ∗2 ) where µ∗ = t σ 2 /n 1 σ 2 /n and σ∗2 = [ σ21/n + + 1 + µ τ2 1 τ2 1 . τ2 ] The Bayes estimator is derived in §2 and is δ π (X) = µ∗ = 12 X σ 2 /n 1 σ 2 /n + + µ τ2 1 τ2 . Source: http://www.doksinet R(θ, δ π ) = E((E(θ|X) − θ)2 )? R(θ, δ π ) = E((E(θ|X) − θ)2 |θ)? R(θ, δ π ) = E(( π R(θ, δ ) = E(( X σ 2 /n 1 σ 2 /n + + = µ τ2 1 τ2 X σ 2 /n 1 σ 2 /n + + µ τ2 1 τ2 − θ)2 )? − θ)2 ) = M SE(δ π ) = V ar(δ π ) + (Bias(δ π ))2 σ 2 /n (σ 2 /n)2 ( σ21/n + τ12 )2 +( θ−µ τ2 1 σ 2 /n + 1 τ2 )2
r(π, δ π ) = E(E((E(θ|X) − θ)2 |θ)) = E(E((E(θ|X) − θ)2 |X)) = E(V ar(θ|X)) = σ∗2 R(θ, δ) = E((aX + b − θ)2 ) = a2 V ar(X) + (aθ + b − θ)2 = a2 σ 2 /n + ((a − 1)θ + b)2 r(π, δ)= a2 σ 2 /n + (a − 1)2 E((θ − c)2 ) (c = b/(1 − a) ) = a σ /n + (a − 1)2 [V ar(θ) + (E(θ) − c)2 ]= a2 σ 2 /n + (a − 1)2 [τ 2 + (µ − c)2 ] 5.2 Homework 1. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider estimation of µ using squared error loss and consider the translation group G = {gc : gc (x) = x + c}. For what values of (a, b) is Ta,b (x) = ax + b an invariant estimator ? 2. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider estimation of σ 2 using the loss 2 2 L((µ, σ 2 ), a) = (1 − a 2 ) . σ2 Show that the estimation problem is invaraint under the group G = {gc : gc (x) = cx, c > 0}. Show that the sample variance S 2 is invariant. 3. Let X1 , , Xn be a random sample from N (µ, σ 2 ) Consider testing H0 : µ ≤ 0 vs
H1 : µ > 0 using the loss function if µ ≤ 0 and a = 0, or µ > 0 and a = 1, 0 if µ > 0 L(θ, a) = µ/σ |µ|/σ if µ ≤ 0. Consider the group G = {g pc : gc (x) = cx, c > 0}. Show that the testing problem is invariant under G and any test based on the statistic X/ S 2 /n is an invariant test. §6. Nonparametric estimation A common interests of estimation are 1. Mean µ, 2. SD σ, 3. cdf F (t) In the elementary statistics, we make the assumption that X1 , ., Xn are iid with the cdf Fo (t; θ), where θ ∈ Θ Then µ = µ(θ) and σ = σ(θ). We either derive the MLEs or derive other types of estimators: θ̂, µ̂ = µ(θ̂), σ̂ = σ(θ̂) and F̂ (t) = Fo (t; θ̂). This is called the parametric analysis. Question: How do we know that the assumption F (t) = Fo (t; θ) is correct ? Answer: We can use the empirical distribution function (edf) to compare to the parametric one. n F̂ (t) = 1X 1(Xi ≤t) . n i=1 13 Source: http://www.doksinet Example
1. Here is a simulation study for checking normality assumption Given a data set, assume that it is normal, then compute the MLE and estimate the cdf by Fo (t; µ̂, σ̂). par(mfrow=c(2,2)) x=rexp(100) u=mean(x) s=sd(x) x=sort(x) plot(x,ppoints(x),type=”S”) lines(x,pnorm(x,u,s),type=”l”) y=rnorm(100,u,s) qqplot(y,x) qqnorm(x) Check for linearity. y=rexp(100,1/u) qqplot(x,y) 14 Source: http://www.doksinet 8 6 x 0 0.0 2 4 0.4 ppoints(y) 0.8 • 0 1 2 3 4 5 • ••• •• •• ••••••••• • • • ••••• • • • • • • • • • •••••••• −2 0 2 sort(y) 4 y • 8 6 −2 0 1 2 4 x •• • • ••••• • •••• • • • • • • ••••• • • • • • • • • •• • •••••••••••••••••• 0 0 2 4 x 6 8 • 2 •• • • •• • • •••• • • • • • ••••• •
• • • • • ••••••• 0 Quantiles of Standard Normal 1 2 3 y Properties of the edf: 1. F̂ is a cdf and for each t, F̂ (t) can be viewed as Y /n, where Y is a binomial random variable bin(n, p) and p = F (t). 15 4 Source: http://www.doksinet Pn 2. E(F̂ (t)) = E( n1 i=1 1(Xi ≤t) ) = E(1(Xi ≤t) ) = P (Xi ≤ t) = F (t) 3. V ar(F̂ (t)) = npq/n2 = F (t)(1 − F (t))/n 4. cov(F̂ (t), F̂ (s)) = n1 cov(1(X1 ≤t) , 1(X1 ≤s) ) (t)F (s) = n1 (E(1X1 ≤s) 1(X1 ≤t) ) − E(1X1 ≤s) )E(1(X1 ≤t) )) = F (t∧s)−F . n a.s 5. F̂ (t)−F (t) by the SLLN √ D 6. n(F̂ (t) − F (t))−N (0, F (t)(1 − F (t))) by the CLT 7. F̂ (t) is admissible wrt the squared error loss and the weighted squared error loss L(p, a) = (a − p)2 . p(1 − p) 8. F̂ (t) is minimax wrt the weighted squared error loss Remark. Since F̂ is a functional, the functional properties are as follows a.s 9. supt |F̂ (t) − F (t)|−0 (uniform strong consistency)
(Glivenko-Cantelli Theorem) √ D 10. n(F̂ − F )−W where W is a Gaussian process with the covariance specified in Part 4 The decision problem of nonparametric estimation of F Suppose that X1 , ., Xn are iid with a continuous cdf F The estimation of F is a decision problem The data is X = (X1 , ., Xn ) from the sample space X = Rn The cdf F is treated as a “parameter” and the “parameter space is Θ, the collection of all continuous cdf. The action space A is the A = F, the collection of all right continuous non-decreasing function bounded by 0 and 1. A decision rule is a map from X to A, say d(x, t). The loss function is Z L(F, a) = (a(t) − F (t))2 h(F (t))dF (t). Let Y be the order statistic of X, which is a sufficient statistic of X. Y0 = −∞ and Yn+1 = ∞Let G be the collection of all continuous and strictly increasing transformations from R onto R. g(Y) = (g(Y1 ), , g(Yn )) If Yj is strictly increasing in j so is g(Yj ). Claim: The model {FX : F ∈ Θ} is
invariant under G. Notice that FX (x) = (F (x1 ), ., F (xn )), F ∈ Θ Fg(X) (y) = (P (g(X1 ) ≤ y1 ), ., P (g(Xn ) ≤ yn )) = (P (X1 ≤ g −1 (y1 )), , P (Xn ≤ g −1 (yn ))) = F ◦ g −1 (y1 , , yn ) g(F ) = F ◦ g −1 . Thus the model is invariant under G. Claim: The decision problem is invariant under G. L(F, a) = = Z Z (F (t) − a(t))2 h(F (t))dF (t) (F ◦ g −1 (t) − a ◦ g −1 (t))2 h(F ◦ g −1 (t))dF ◦ g −1 (t) = L(F ◦ g −1 , a ◦ g −1 ) = L((g(F ), g̃(a)), where g̃(a) = a ◦ g −1 . Question: The form of invariant rules ? Notice that d(g(Y), t) = g̃(d(Y, t)) (or d(g(Y)) = g̃(d(Y)) where d(Y, t) = d(Y)(t)) ∀ g ∈ G and ∀ t. ⇒ d(g(Y)) = d(Y) ◦ g −1 or d(g(Y), t) = d(Y, g −1 (t)) ∀ g ∈ G and ∀ t. For each j, let g ∈ G be such that (a) g(Y) = Y; (b) Yj < x1 < x2 < x3 < Yj+1 ; (c) g(x1 ) = x2 and g(x2 ) = x3 . d(g(Y), x3 ) − d(g(Y), x2 ) = d(Y, x2 ) − d(Y, x1 ) It follows that (1) d(Y, x3 ) − d(Y, x2 )
= d(Y, x2 ) − d(Y, x1 ) by (a) and EQ. (1) 16 (1) Source: http://www.doksinet (2) d(Y, t) is constant for t ∈ (x2 , Yj+1 ), as x3 is arbitrary between x2 and Yj+1 ; (3) Moreover, d(Y, t) = d(Y, Yj ) if t ∈ (Yj , Yj+1 ), in view of (1), as limx2 Yj g(x2 ) = g(Yj ). Answer: Since j is also arbitrary, an invariant rule d satisfies d(Y, t) = n X aj 1(t∈[Yj ,Yj+1 )) j=0 If d is an invariant rule, then R(F, d) = E(L(F, d(Y, ·)) Z = E( (F (t) − d(Y, t))2 h(F (t))dF (t)) Z n X = E( (F (t) − aj )2 1[t∈(Yj ,Yj+1 )) h(F (t))dF (t)) j=0 = n X j=0 Yj+1 Z E( Yj (F (t) − aj )2 h(F (t))dF (t)) By setting U = F (X), WLOG, we can assume that X ∼ U (0, 1). n X R(F, d) = j=0 n X = j=0 n X = j=0 Z E( Yj+1 Z E( Uj+1 Z E( 1 Yj Uj 0 (F (t) − aj )2 h(F (t))dF (t)) (t − aj )2 h(t)dt) 1(Uj ≤t<Uj+1 ) (t − aj )2 h(t)dt) n Z X = µ ¶ n j t (1 − t)n−j (t − aj )2 h(t)dt j 0≤t≤1 j=0 That is, the invariant rule has constant risk. Thus,
there is a best invariant rule, that minimizes the constant risk R(F, d). In fact, if R(F, d) is finite, taking derivative with respect to aj yield −2 Z 1 0 The solution is µ ¶ n j t (1 − t)n−j (t − aj )h(t)dt = 0 j R1 h(t)tj+1 (1 − t)n−j dt . aj = 0R 1 j (1 − t)n−j h(t)dt t 0 Taking the second derivative of R(F, d) with respect to aj yield 2 Z 1 0 µ ¶ n j t (1 − t)n−j (h(t)dt > 0. j If h(t) = 1, then the risk is always finite and aj = = B(j + 2, n − j + 1) B(j + 1, n − j + 1) Γ(j + 2)Γ(n − j + 1) Γ(n + 2) j+1 = Γ(n + 3) Γ(j + 1)Γ(n − j + 1) n+2 17 Source: http://www.doksinet and n X 1+ j+1 1(Yj ≤t<Yj+1 ) = d(t) = n+2 j=0 If h(t) = 1 t(1−t) , Pn j=1 1(Xj ≤t) n+2 then R1 tj (1 − t)n−j−1 dt B(j + 1, n − j) aj = R 10 = = j/n. j−1 n−j−1 B(j, n − j) t (1 − t) dt 0 Thus the best invariance estimator is d(t) = n n X j 1X 1(Yj ≤t<Yj+1 ) = 1(Xi ≤t) = F̂ (t). n n i=1 j=0 11. F̂ is
inadmissible wrt the loss function L(F, a) = Z (F (t) − a(t))2 dF (t) and the parameter space being the collection of all continuous cdfs (Aggarwal (1955)). Proof. Since F̂ has a constant risk and is not the best invariant rule with h(t) = 1, F̂ is inadmissible 12. F̂ is admissible wrt the loss function Z L(F, a) = (F (t) − a(t))2 dF (t) and the parameter space being the collection of all cdfs (Brown (1985)). Remark. It is difficult to compute R(F, d) for an arbitrary estimation d with an arbitrary n In order to get an idear on how to proceed, consider n = 1. An estimator of F is of the form d(X, t) such that d(x, t) ↑ in t for each x F̂ (t) = 1(t≥X) . Let d be an estimator that is as good as F̂ . That is R(F, d) ≤ R(F, F̂ ) ∀ F. (1) We shall show that R(F, d) = R(F, F̂ ) ∀ F. Claim: d(x, t) = 1 if t = x. Otherwise, say d(x, t) < 1 for some t = x = θ. Since F is arbitrary, it can be discrete, say F (t) = 1(t≥θ) R L(F, F̂ ) = (F̂ (t) − F (t))2 dF
(t) = (1(t≥X) − 1(t≥θ) )2 2 R(F, F̂ ) = E((1(θ≥X) − 1(θ≥θ) ) ) = Z Z (1(t≥x) − 1(t≥θ) )2 dF (t)dF (x) = (1(θ≥θ) − 1(θ≥θ) )2 = 0 ∀ θ. L(F, d(x, ·)) = (d(x, t) − F (t))2 dF (t) = (d(x, θ) − 1)2 , RR R(F, d) = (d(x, t) − F (t))2 dF (t)dF (x) = (d(θ, θ) − 1)2 > 0 . (2) R Thus it contradicts (1) and (2). It implies that d(t, t) = 1 for all t and thus d(x, t) = 1 for t ≥ x, as d(x, t) ↑ in t and d(x, t) ∈ [0, 1]. Claim: d(x, t) = 0 if t < x. Otherwise, say d(y, x) = a > 0 for some x < y. Again, (1) holds for F (t) = p1(t≥x) + q1(t≥y) , where x < y, p + q = 1 and p, q ≥ 0. Z R(F, F̂ ) = E( (1(t≥X) − F (t))2 dF (t)) = p(p(1 − p)2 + q(1 − 1)2 ) + q(p(0 − p)2 + q(1 − 1)2 ) 18 Source: http://www.doksinet Z R(F, d) = E( (d(X, t) − F (t))2 dF (t)) = p(p(1 − p)2 + q(1 − 1)2 ) + q(p(a − p)2 + q(1 − 1)2 ) If p < a/2, then R(F, F̂ ) < R(F, d), a contradiction to Eq. (1) Thus d(x, t) = 0
for t < x The proof under n = 1 indicates that it suffices to show that F̂ is admissible within the collection of all discrete cdf. 14. Under the loss function Z (F (t) − a(t))2 dF (t) L(F, a) = F (t)(1 − F (t)) and the parameter space being the collection of all continuous cdfs, the edf F̂ is the best invariant estimator. Whether F̂ is admissible w.rt the loss function and the parameter space being the collection of all continuous cdfs was an open question between 1950’s and 1980’s. Yu (1989) shows that it is admissible if n = 1 or 2, and is inadmissible if n ≥ 3. Proof of admissibility when n = 1. By making a transformation y = (arctan(x) + π/2)/π, WLOG, we can assume that Θ is the collection of all continuous cdf with support in (0, 1). We shall assume that F̂ is inadmissible and show that it leads to contradiction. Then there exists a d such that ½ ≤ R(F, F̂ ) ∀ F ∈ Θ R(F, d) < R(F, F̂ ) for some F = Fo ∈ Θ. Since d(x, t) ↑ in t for each x,
d(x, x−) and d(x, x+) exist at each x. Of course, d(x, t) 6= F̂ (x, t) and m(Qd ) < 1, where Qd = {x : lim d(x, t) = 0 and lim d(x, t) = 1}. t↑x t↓x It follows that ∃ a subset A of (0, 1) and a number c > 0 such that m(A) > 0 and d(x, x−) > 2c or d(x, x+) < 1 − 2c ∀ x ∈ A. WLOG, assume d(x, x−) > 2c ∀ x ∈ A. Since d is bounded and A is a bounded set, by taking a subset of A, WLOG, we can assume that d(x, t) converges to d(x, x−) uniformly for x ∈ A and d(x, t) converges to d(x, x+) uniformly for x ∈ A. Then there exists a sequence of subsets Aj of A and a constant a0 > c and a1 such that (1) m(Aj ) > 0 and (2) |d(x, x−) − a0 | < 1/j and |d(x, x+) − a1 | < 1/j if x ∈ Aj . Moreover due to the uniform convergence assumption on A, Aj satisfies that |d(x, t) − a0 | < uj if t < x and t, x ∈ Aj ; |d(x, t) − a1 | < uj if t > x and t, x ∈ Aj , where uj 0. Let da = a0 1(t<x) + a1 1(t≥x) . Let Fj be a
cdf that gives mass 1 uniformly to Aj Then |R(Fj , d) − R(Fj , da )| = ǫj 0 as j ∞. Since R(F, d1 ) is constant in F ∈ Θ if d1 is invariant, R(F, F̂ ) < R(F, d1 ) for each invariant estimator d1 6= F̂ . Since da is also an invariant estimator that is different from F̂ , R(F, da ) − R(F, F̂ ) = 2c0 > 0 ∀ F ∈ Θ. If j is large enough, then ǫj < c0 and R(Fj , d) ≤ R(Fj , F̂ ) (due to inadmissibility assumption) = R(Fj , da ) − 2c0 ≤ R(Fj , d) − 2c0 + ǫj ≤ R(Fj , d) − c0 19 (due to best invariancy) Source: http://www.doksinet which leads to a contradiction. The contradiction implies that F̂ is admissible Solution to # 3 of Homework 5.2 1. If Xi ’s are iid ∼ N (µ, σ 2 ), then gc (X1 ), ., gc (Xn ) are iid ∼ N (cµ, c2 σ 2 ) The model is invariant under G. 2. Denote g c (µ, σ) = c(µ, σ), g̃c (a) = a and θ = (µ, σ), then if µ ≤ 0 and a = 0, or µ > 0 and a = 1, 0 if µ > 0 L(θ, a) = µ/σ |µ|/σ if µ ≤
0. if cµ ≤ 0 and a = 0, or cµ > 0 and a = 1, 0 if cµ > 0 = cµ/(cσ) |cµ|/(cσ) if cµ ≤ 0. =L(g c (θ), g̃c (a)) Thus the decison problem is invariant under G. 3. Let d(X) = φ(h( √ X2 )) be a test function based on √ X2 S /n S /n . Since cX X d(gc (X)) = φ(h( p )) = φ(h( p )) = d(X) = g̃c (d(X)) 2 (cS) /n S 2 /n d is an invarinate rule. 6.2 Homework 1. Mimic the proof in §12 and give a proof of the following statement for n = 2: F̂ is admissible w.rt the loss function Z L(F, a) = (F (t) − a(t))2 dF (t) and the parameter space being the collection of all cdfs. Proof of inadmissibility when n = 3. Recall that Y1 < Y2 < Y3 are order statistics of X1 , X2 and X3 Consider estimator of the form n+1 X I ) d2 = aIj 1(YjI ≤t<Yj+1 j=1 where I = max{j ≥ 0 : Yj ≤ 0}, Y0 = −∞ and Yn+1 = ∞, if 0 ≤ j ≤ I, Yj I 0 if j = I + 1, Yj = Yj−1 if I + 1 < j ≤ n + 1 Abusing notation, write d2 = (aij )4×5 . Then 0 0 0
1/3 F̂ = 0 1/3 0 1/3 1/3 2/3 1 1/3 2/3 1 2/3 2/3 1 2/3 3/3 1 Consider a simpler form: 0 0 dQ = 0 0 In particular, let 0 0 dQ = 0 0 0 − c1 − c2 1/3 1 3 1 3 2 3 2 3 0 1 1 3 − 6 1 1 3 − 12 1 3 1 3 1 3 20 2/3 + c3 + c4 3/3 1 1 1 1 1/3 2/3 1 3 2 3 2 3 2 1 3 + 12 2 1 3 + 6 1 1 1 1 2 3 2 3 3/3 Source: http://www.doksinet −0.002 −0.004 −0.006 − (p − p^2)/12 + ( − p^2 . Let p = F (0) and D(p) = R(F, dQ ) − R(F, F̂ ). Then 0.0 0.2 0.4 0.6 0.8 1.0 p D(p) = − (1) (2) (3) (4) (5) (6) (7) (8) (9) 1 1 (p − p2 ) + [−p2 log p − (1 − p)2 log(1 − p)] 12 24 (8.4) Note that D(p) = D(1 − p); D(p) < 0 near 0 and 1; D(1/2) = (−1 + log 2)/48 < 0. d3 dp3 D(p) = (2p − 1)/[12p(1 − p)] = 0 only at p = 1/2; D′′ (p) has at most two zero points due to (4). D′ (p) has at most three zero points due to (5). D(p) has at most four zero points due to (6). D(p) has
at most two zero points in (0,1/2) due to (1), (2), (3) and (7). Since D′ (0+) < 0 and D′ (0.5−) < 0, D(p) cannot has only one zero point in (0,1/2) If D(p) has two zero points, then D′ (p) has at least two zero points in (0,1/2), and has at least 4 zero points in (0,1) due to (1), contradicting (1) and (6). Thus ½ < 0 if F (0) = 1/2, R(F, dQ ) − R(F, F̂ ) ≤ 0 otherwise. Proof of (8.4) Z R(F, d2 ) = E( (d2 (t) − F (t))2 h(F (t))dF (t)) Z 1 = E( (d2 (t) − t)2 h(t)dt) 0 Z 1 n+1 X 2 I ) − t) h(t)dt) = E( ( aIj 1(YjI ≤t<Yj+1 0 Z = E( 0 Z = E( 0 = n X i=0 (Xj ’s are now from U(0,1)) j=0 1 n+1 X j=0 n 1X i=0 Z E( 0 1 2 I ) (aIj − t) h(t)dt) 1(YjI ≤t<Yj+1 1(I=i) n+1 X j=0 2 I ) (aIj − t) h(t)dt) 1(YjI ≤t<Yj+1 i−1 X 1(Yi ≤p<Yi+1 ) ( 1(Yj ≤t<Yj+1 ) (aij − t)2 j=0 21 (Y0 = 0 and Yn+1 = 1) Source: http://www.doksinet + 1(Yi ≤t<p) (aii − t)2 + 1(p≤t<Yi+1 ) (ai,i+1 − t)2 + n+1 X j>i+1 = +
1(Yj−1 ≤t<Yj ) )(aij − t)2 )h(t)dt) µ ¶µ ¶ n X i Z p X n n−j j { (t − aij )2 h(t) t (p − t)i−j (1 − p)n−i dt j i − j i=0 j=0 0 n+1 XZ 1 j>i p µ ¶µ ¶ n j−1 (t − aij )2 h(t) (1 − t)n−j+1 (t − p)j−1−i pi dt} j−1 i 15. F̂ is minimax wrt the loss function L(F, a) = Z (F (t) − a(t))2 dF (t) F (t)(1 − F (t)) and the parameter space being the collection of all continuous cdfs was an longstanding conjecture between 1950’s and 1980’s. Yu and Chow (1991) shows that it is indeed minimax Consider n = 1. Let d be an arbitrary estimator Since d is bounded, by taking a subset A of the real line, WLOG, we can assume that (1) d(x, t) converges to d(x, x−) uniformly for x ∈ A and (2) d(x, t) converges to d(x, x+) uniformly for x ∈ A. In fact, letting gn (x) = d(x, x − 1/n), then limn∞ gn (x) = d(x, x−). Let Bk,n = {x ∈ (0, 1) : |gn (x) − d(x, x−)| > 1/k}. Then m(Bk,n ) 0 as n ∞. Thus ∃ nk such that m(Bk,nk )
< 4−k . ⇒ m(∪k≥1 Bk,nk ) ≤ 1/2 and m(B) > 0, where B = ∩k≥1 ((0, 1) Bk,nk ). That is, gn converges uniformly on B or (1) holds. Likewise, we can show that (2) holds. Then ∃ a sequence of subsets Aj of A, a constant a0 and a1 such that (a) m(Aj ) > 0 and (b) |d(x, x−) − a0 | < 1/j and |d(x, x+) − a1 | < 1/j if x ∈ Aj . Moreover due to the uniform convergence assumption on A, Aj satisfies ½ |d(x, t) − a0 | < uj if t < x and t, x ∈ Aj ; (15.1) |d(x, t) − a1 | < uj if t > x and t, x ∈ Aj , where uj 0. Let da = a0 1(t<x) + a1 1(t≥x) .Let Fj be a cdf that gives mass 1 uniformly to Aj Z 1 1 X R(F, da ) = E( (t − aj )2 1(t∈[Yj ,Yj+1 )) h(t)dt) 0 j=0 R(F, d) = 1 X j=0 Z 1 E( (t − d(Y1 , t))2 1(t∈[Yj ,Yj+1 )) h(t)dt) 0 If R(F, d) is finite for all F , then |R(Fj , d) − R(Fj , da )| = ǫj 0 as j ∞. (15.2) Note that supF ∈Θ R(F, F̂ ) = R(F, F̂ ) for each F ∈ Θ. sup R(F, d) ≥ R(Fj , d) ≥ R(Fj , da )
− ǫj ≥ R(Fj , F̂ ) − ǫj = sup R(F, F̂ ) − ǫj F ∈Θ F ∈Θ Taking limits yields sup R(F, d) ≥ sup R(F, F̂ ). F ∈Θ If R(F, d) = ∞ for some F ∈ Θ, then F ∈Θ sup R(F, d) = ∞ ≥ sup R(F, F̂ ). F ∈Θ F ∈Θ Since d is arbitrary, it follows that F̂ is minimax. 22 Source: http://www.doksinet Remark 1. In fact, for n = 1, if a0 > 0, then F (F, da ) = ∞ as R(F, da ) = Z 1 0 (t − a0 )2 t−1 dt + Z 0 1 (t − a1 )2 (1 − t)−1 dt. The way stated in the proof of n=1 is for the generaliztion to n ≥ 2. Yu and Chow (1991) show that (15.1) can be generalized to the following theorem Theorem 1. (Yu and Chow (1991)) Suppose that the sample size n ≥ 1 and d = d(Y, t) is a nonrandomized estimator with finite risk and is a (measurable) function of the order statistic Y . For any ǫ, δ > 0, there exist a uniform distribution function F on a positive-Lebesgue-measure subset J, and an invariant estimator d1 such that (dF )n+1 ({(Y1 ,
., Yn , t) : |d(Y, t) − d1 (Y, t)| ≥ ǫ}) ≤ δ The proof for n ≥ 2 then is similar to those after (15.2) In particular, if {(Y, t) : d > 0 for t < Y1 , or d < 1 for t > Yn } has positive measure w.rt some (dF )n+1 , then R(F, d) = ∞ 16. F̂ is admissible wrt the loss function Z L(F, a) = (F (t) − a(t))2 dW (t) where W is a finite measure, and the parameter space being the collection of all cdfs (Cohen and Kuo (1985)). 17. F̂ is not minimax wrt the loss function Z L(F, a) = (F (t) − a(t))2 dW (t) but is minimax w.rt the loss function L(F, a) = Z (F (t) − a(t))2 dW (t) F (t)(1 − F (t)) where W is a finite measure, and the parameter space being the collection of all cdfs (Phadia (1973)). §7. Invariance in Hypothesis testing §7. Invariance in Hypothesis testing 1. Introduction One of the basic problems of statistics is the two-sample problem of testing the equality of two distributions. A typical example is the comparison of a treatment with a control,
where the hypothesis of no treatment effect is tested against the alternative of a beneficial effect. When normality of samples distributions is in doubt, people may make no specific assumption on distribution functions other than continuity in the two-sample testing problem. Let X1 , ., Xm be a random sample from a continuous cdf F and Y1 , , Yn be another random sample from a continuous cdf G. Assume Xi s and Yj s are independent One wishes to test H0 : F = G (F (x) = G(x) for all x) against the alternative H1 : F ≤ G (F (x) ≤ G(x) for all x) yet F (x) < G(x) for some x. Another alternative is H2 : F ≥ G (F (x) ≥ G(x) for all x) yet F (x) > G(x) for some x. The testing problem is invariant under the group of all continuous and strictly monotone transformations. For this problem, the terms “invariant test” and “rank test” are synonymous It is known that there is no uniformly most powerful rank test of H0 against H1 . The Wilcoxon rank sum test and the
Fisher-Yates test are two most commonly applied procedures for this two-sample problem (see Ferguson (1967)). The admissibility of these two tests in the class of all tests has been a difficult and unsolved problem (Lehmann (1959, p. 240, and 1986, p. 322), and Ferguson (1967)) One two-sided alternative hypothesis is H3 : either H1 or H2 is true. The testing problem is still invariant under the group of continuous and strictly monotone transformations. However, “The theory here is still less satisfactory than in the one-sided case” (Lehmann (1959, p. 240)) Another two-sided alternative is H4 : F (x) 6= G(x) for some x. Note that the only invariant test in this testing problem is a constant, i.e, ψα (·) ≡ α However, ψα is inadmissible at every significant level α (∈ (0, 1)) (Lehmann (1986)). In this note, we establish a sufficient condition that a member from a class of linear rank tests, including the Wilcoxon test, the Fisher-Yates test, the Savage test and the median
test, is admissible for testing H0 against Hi under the continuous set-up within the class of all tests or within the class of all rank tests, where i = 1, 2, 3, 4. In particular, we apply the result to show that for some special cases, the Wilcoxon test and the Fisher-Yates test are admissible. The results partially answer the two longstanding open questions in Lehmann (1959) for the first time. 2. Notations Let Θ be a class of distribution functions under consideration Denote by F the class of all continuous distribution functions. Let X = (X1 , , Xm )′ and Y = (Y1 , , Yn )′ be two independent random samples from two 23 Source: http://www.doksinet populations with distribution functions F and G ∈ Θ, respectively, where X′ is the transpose of the vector X. Let X(1) < · · · < X(m) and Y(1) < · · · < Y(n) be order statistics of X and Y, respectively, and let N = m + n.Denote the pooled sample Z′ = (Z1 , ., ZN ) = (X′ , Y′ ), denote R(Zj ) the rank of
Zj in the pooled sample, and denote Ri = R(X(i) ), Rm+j = R(Y(j) ) and R = (R1 , ., RN ) It is obvious that the joint distribution function of Z is Qm Qn+m FZ (z) = i=1 F (zi ) j=m+1 G(zj ). Recall that a test ψ satisfies ψ(·) ∈ [0, 1]. Let EF,G (ψ(Z)) (or simply EF,G ψ) denote the expectation of ψ(Z) with respect to F and G. For testing against Hi , where i = 1, , 4, a test ψ is said to be admissible if there is no test φ0 such that EF,F φ0 ≤ EF,F ψ and EF,G φ0 ≥ EF,G ψ for each pair of F and G where F and G satisfy either H0 or Hi and F, G ∈ Θ, and at least one strict inequality holds. A test φ0 is said to be as good as φ, if EF,F φ0 ≤ EF,F φ and EF,G φ0 ≥ EF,G φ for each pair of F and G where F and G satisfy either H0 or Hi and F, G ∈ Θ. Given a subset A of the real line, denote AN the product set A × · · · × A of N factors. The Wilcoxon test is of the following form: φ(Z) = 1[L(Z)∈[l,r]] + γ(Z)1[L(Z)=l / or r] , where (2.1) l < r,
1[·] is an indicator function, γ(·) ∈ [0, 1) on {z : L(z) = l or r}, L(Z) = m X i=1 c1 S(Ri ) + n X j=1 c2 S(Rm+j ), c1 and c2 are distinct constants, c1 < c2 , and S(·) is a real-valued strictly increasing function. A common treatment is to set γ(·) ≡ c, a constant. By properly defining γ(·), we can obtain a test with a desirable size α When S(r) = r for all r and (c1 , c2 ) = (0, 1), φ is the Wilcoxon test. That is L(Z) = n X Rm+j j=1 When S(r) is the expected value of the rth order statistic of a sample of size N from a standard normal distribution and (c1 , c2 ) = (0, 1), φ is the Fisher-Yates test. The test given in (21) actually includes both one-sided and two-sided tests by allowing l = −∞ or r = ∞. For example, if l = −∞ and γ = 0, (21) becomes φ(Z) = 1[L(Z)>r] 3. A Continuous Parametric Two-sample Problem In order to attack the more difficult classical problem, we first formulate an appropriate problem for testing two continuous
distribution functions from the regular exponential family. Let ξ be a N × 1 vector with coordinates ξ1 < · · · < ξN . Let Fq and Gp be two continuous distribution functions with density functions fq = fq,ξ = gp = gp,ξ N kX qi 1[x∈(ξi −1/k,ξi +1/k)] , 2 i=1 N kX = pi 1[x∈(ξi −1/k,ξi +1/k)] , 2 i=1 where k > 2/ min{|ξi − ξj |, i 6= j} (Why k > ?) (disjoint uniform on (ξi − 1/k, ξi + 1/k)). and q = (q1 , , qN ) and p = (p1 , ., pN ) are probability vectors Let Fk (ξ) be the collection of distribution functions with the above forms and with fixed k and ξ.Note that Fk (ξ) belongs to the regular exponential family.Let t, s and u be (N − 1) × 1 vectors (multinomial random vector), with coordinates m n X X ti = ti (Z) = 1[Yj ∈(ξi −1/k,ξi +1/k)] , si = si (Z) = 1[Xj ∈(ξi −1/k,ξi +1/k)] j=1 j=1 and ui = ti + si , respectively. The ti ’s and sj ’s are jointly complete and sufficient for Fq and Gp (or (q, p)) and the joint
probability density function f of Z is N N k Y ti Y sj pi qj . f = 1[z∈Ω] ( )N 2 i=1 j=1 24 Source: http://www.doksinet n−t1 −···−tN −1 m−s1 −···−sN −1 qN f ∝ pN N −1 Y ptii i=1 N −1 Y s qj j j=1 N where Ω = (∪N i=1 (ξi − 1/k, ξi + 1/k)) . ie, f (z) ∝ 1[z∈Ω] et(z)·w+u(z)·θ µ ¶µ ¶ N N fu,t ∝ 1[z∈Ω] et(z)·w+u(z)·θ = h(u, t)et(z)·w+u(z)·θ , t u−t ¡ ¢ ± QN PN −1 PN −1 where t · s is the inner product of the vectors t and s, Nt = N ! i=1 ti !, tN = n − i=1 ti , sN = m − i=1 si , ¡N ¢¡ N ¢ h(u, t) = t u−t 1[z∈Ω] , θ and w are all N − 1 dimensional vectors with coordinates θj = log(qj /qN ) and wj = log((pj /qj )/(pN /qN )), respectively. It is ready to see that H0 is the same as w = 0 (ie, wi = 0 ∀ i), H4 is corresponding to w 6= 0, and w ≥ 0 (i.e wj ≥ 0 ∀ j) implies (but is not equivalent to) H1 H1 <=> w1 ≥ 0 w1 + w2 ≥ 0 PN −1· · · i=1
wi ≥ 0 Given a test ψ which is a function of (u, t), let Cu,ψ = {t : ψ(u, t) < 1}. Let µFq ,Gp (u, ·) denote the measure induced by the conditional distribution function of t given u. We say Cu,φ1 is convex asµFp ,Gq (u, ·), in the sense that if ti ∈ Cu,φ1 , where µFp ,Gq (u, ti ) > 0, i = 1, 2, and if t3 = at1 + (1 − a)t2 , where a ∈ (0, 1) and µFp ,Gq (u, t3 ) > 0, then t3 ∈ Cu,φ1 . It can be shown that φ given in (21) is not a function of the complete and sufficient statistic (u, t) Thus we define a new test φ1 = EFq ,Gp (φ(Z)|u, t). (3.1) Verify that φ1 is a function of (u, t) and φ1 6= φ, but EFq ,Gp φ1 = EFq ,Gp φ ∀ Fq , Gp ∈ Fk (ξ). (3.2) By Fk (ξ), we convert the continuous nonparametric two-sample problems to a continuous parametric two-sample problem. By φ1 , we convert the continuous parametric two-sample problems to a discrete parametric two-sample problem, as u and t are discrete random variables. In fact, while the density
function of Z of Fk (ξ) is N N k Y ti Y sj pi qj , fc = 1[z∈Ω] ( )N 2 i=1 j=1 N where Ω = (∪N i=1 (ξi − 1/k, ξi + 1/k)) . The fc corresponds to the density of a discrete random variable, taking values N on Ω1 = {ξ1 , ., ξN } and N N Y Y s ti fd = 1[z∈Ω1 ] pi qj j , i=1 j=1 where u and t are defined as the same as in fc . Moreover, (u, t) are sufficient and complete statistics in both fc and fd , with density function fu,t = h(u, t)et(z)·w+u(z)·θ , Let Fck and Fd be the cdf’s of the two random variables, then it can be shown that Fck (t) Fd (t) pointwisely. One may have problem to show that EFq ,Gp (φ) EFd (φ) However, (3.1) is a good procedure to convert the continuous set-up to the discreet set-up. Xi m Fq si soi θi Notations: Yi n Gp ti toi wi + Zh N f ui 1 25 Source: http://www.doksinet Lemma 3.1 When F is replaced by Fk (ξ), for i = 1, 2, 3, 4, the test φ1 = φ1 (u, t) for testing w = 0 against Hi is admissible, if for all u, (1)
Cu,φ1 is convex a.sµFq ,Gp (u, ·), (2) φ1 (u, t0 ) > 0 implies that ∃ a vector b such that b · (t − t0 ) < 0 ∀ t ∈Cu,φ1 and t 6= t0 , where b ≥ 0 for i = 1, b ≤ 0 for i = 2, b ≥ 0 or b ≤ 0 for i = 3, and b is arbitrary for i = 4. The proof of the lemma is a minor modification of Theorem 4.1 in Yu (2000) For the convenience of readers, we include the proof in the appendix. The following lemma points out that one only needs to verify condition (2) of Lemma 3.1 for φ1 Lemma 3.2 Consider the problem of testing H0 against Hi with F is replaced by Fk (ξ), i = 1 2, 3, 4 Let φ and φ1 be given by (2.1) and (31), respectively Then condition (1) of Lemma 31 holds for φ1 Proof. We only give the proof for the test against H1 as the proofs for the rest of the cases are similar We shall give the proof in 4 steps. Step 1 (Notations). The coordinates of the random vector Z, Zi ’s, are distinct as Let Z(1) ≤ · · · ≤ Z(N ) be order statistics of Z1 , ., ZN ,
and let Pn Pm toi = j=1 1[Yj =Z(i) ] and soi = j=1 1[Xj =Z(i) ] , i = 1, ., N (3.3) Then toi + soi = 1 for all i a.s Let to = (to1 , , toN )′ Verify that t and to satisfy tj = σj X i>σj−1 toi , j = 1, ., N − 1, where σj = P k≤j uk , u0 = 0 and σ0 = 0. (3.4) Step 2 (Linearity of t as a function in to ). Given u, let T u be the set of all the possible values of t, and for each t ∈ T u , let Tt = {to : to satisfies (3.4)} Eq (34) can be viewed as a linear map from ∪t∈T u Tt to T u , say t = Au to for each to ∈ Tt and for each t ∈ T u . (3.5) The entries of the (N − 1) × N matrix Au can easily be identified by (3.4) Verify that Au does not depend on t for each fixed u. PN PN Step 3 (Linearity of L in to ). Verify that i=1 toi = n, j=1 soj = m and for each Z whose coordinates are all distinct, L(Z) = m X i=1 = N X h=1 c1 S(Ri ) N X 1[X(i) =Z(h) ] + j=1 h=1 S(h)(c2 − c1 )toh + c1 =a · to + c, n X N X h=1 c2 S(Rm+j ) N X 1[Y(j) =Z(h)
] h=1 S(h) (3.6) where a = (a1 , ., aN ) and c are defined in an obvious way Verify that a and c are not functions of u and (36) holds w.p1 as Z has distinct coordinates wp1 Furthermore, a1 < · · · < aN as S is strictly increasing and c1 < c2 Note that φ is a function of to a.s, thus abusing notations, we write φ = φ(Z) = φ(u, t) = φ(to ). Step 4 (Conclusion). Fix a Z with distinct coordinates and fix (u, t) = (u(Z), t(Z)) Let |Tt | be the cardinality of the set Tt . Then (31) yields X 1 φ(to ). (3.7) φ1 (u, t) = |Tt | o t ∈Tt In view of (3.6), we can write φ = 1[a·to +c∈[l,r]] + γ(to )1[a·to +c=l or / r] . (3.8) We say that {to : φ(to ) < 1} is convex in to , if t3 = at1 + (1 − a)t2 , where a ∈ (0, 1), ti ∈ {to : φ(to ) < 1}, i = 1 and 2, and t3 ∈ {to : φ(to ) ≤ 1}, then t3 ∈ {to : φ(to ) < 1}. It is trivially true that {to : φ(to ) < 1} is convex in to , as 26 Source: http://www.doksinet there do not exist t1 6= t2
such that t3 = at1 + (1 − a)t2 with a ∈ (0, 1) and ti ∈ {to : φ(to ) ≤ 1}. The last statements is due to the fact that the coordinates of ti are either 0 or 1. In view of (37) Cu,φ1 = {t ∈ T u : t = Au to0 , to0 ∈ {to : φ(to ) < 1}}, and thus Cu,φ1 is also convex Xi m Notations: Yi n + Zh N in t. Fq si Gp t i f ui soi toi 1 θi wi 4. The main result We present a sufficient condition for determining whether a linear rank test is admissible in the classical two-sample problem. Theorem 4.1 Consider the problem of testing H0 against Hi (i =1, 2, 3 or 4) Let φ be a test of form (21) Then φ is admissible within the class of all tests when F, G ∈ F, if for every k and ξ condition (2) of Lemma 3.1 holds for φ1 defined in (3.1) when F, G ∈ Fk (ξ) Proof. Let i be an arbitrary integer among 1, 2, 3, and 4 Suppose φ1 satisfies condition (2) of Lemma 31 Then by Lemmas 3.1 and 32, φ1 is admissible for testing H0 against Hi when F, G ∈ Fk (ξ) Given
a measure ν, by ν N , we mean a product measure. We say that a measurable function d(z) (z = (z1 , , zN )) is approximately continuous at a point z0 with respect to a measure ν, if ∀ ǫ, δ > 0, ∃ a neighborhood O(z, r) of z0 with radius r such that ν N ({(z) ∈ O(z, r) : |d(z) − d(z0 )| > ǫ}) ≤ δ. ν N ({(z) ∈ O(z, r)}) Denote µF the measure induced by a distribution function F . To prove the theorem, it suffices to show the following statement: Given a measure ν induced by a continuous distribution function, if EF,F φ0 ≤ EF,F φ and EF,G φ0 ≥ EF,G φ for each pair of F , G ∈ F where µF and µG are absolutely continuous with respect to ν and F 6= G, then φ0 = φ a.s ν N Without loss of generality, we only need to consider the case that ν N is the same as the Lebesgue measure µN on the N -dimensional Euclidean space. We shall show that if φ = φ0 as is not true, then it leads to a contradiction By (3.8), φ is constant in a neighborhood of every
point z whose coordinates are all distinct Since φ0 is measurable, it is approximately continuous almost everywhere (Munroe (1953)). If φ0 = φ ae is not true, then there is a point η = (η1 , ., ηN ) such that its coordinates are all distinct, φ0 is approximately continuous at η and φ0 (η) 6= φ(η). (4.1) Let ξ1 < · · · < ξN be the order statistics of η1 , ., ηN Since (u, t) is a sufficient and complete statistic for (Fp , Gq ) or (p, q), we can define a test φ2 (u, t) = EFp ,Gq (φ0 (Z)|u, t). (4.2) By definition and (3.1), φ1 and φ2 are both constant in the neighborhood of η, O(η, 1/k) Verify that φ(η) = φ1 (η), as φ is constant in a neighborhood of every point z whose coordinates are all distinct and η is such a point. Then by (41) and by taking k large enough φ2 6= φ1 for each z ∈ O(η, 1/k). (4.3) Furthermore, EFq ,Gp (φ1 − φ2 ) = EFq ,Gp (φ − φ0 ). Since φ0 is as good as φ and φ is admissible when F, G ∈ Fk (ξ), EFq ,Gp (φ1
− φ2 ) = EFq ,Gp (φ − φ0 ) = 0 for all Fq , Gp ∈ Fk (ξ). It follows that φ1 = φ2 a.s on Ω wrt the measure induced by (Fq , Gp ), as φ1 and φ2 are both functions of (u, t), which is complete and sufficient for (Fq , Gp ). Thus φ1 = φ2 as on O(η, 1/k) (⊂ Ω), which contradicts (43) Verify that φ0 is arbitrary, thus the contradiction shows that φ is admissible within the class of all tests. We can also assume φ0 is a rank test, thus the above contradiction shows that φ is admissible within the class of all rank tests. This completes the proof of the theorem. 5. Applications In this section, we apply the theorem to show that the tests of form (21) are admissible in some special cases. In particular, in Theorem 51, we assume max{m, n} ≤ 2 but the size of the test is arbitrary, and in Theorem 5.2, we assume that the size of the test is ≤ N4 but m and n are arbitrary (n) 27 Source: http://www.doksinet Theorem 5.1 Consider the problem of testing H0 against Hi
with F, G ∈ F, where i =1, 2, 3, 4 Suppose either (1) min{n, m} = 1 or (2) max{n, m} = 2 but γ(·) = 0. Let φ be a test of form (21) Then φ is admissible within the class of all tests Proof. We only give the proof for testing against H1 , as the others are very similar Replacing c + l by l in (36), (21) becomes φ =1[a·to <l] + γ(to )1[a·to =l] , where γ(·) ∈ [0, 1). In view of Theorem 4.1, in order to prove the admissibility of φ, it suffices to verify condition (2) of Lemma 31 Hereafter, we fix u and assume that t0 satisfies φ1 (u, t0 ) > 0. Case n = 1. If all the coordinates of t0 are zero, then ∃ to0 ∈ Tt0 such that a · to0 ≤ l as φ1 (u, t0 ) > 0 For each t, every to ∈ Tt satisfies a · to < a · to0 ≤ l. Consequently φ(to ) = 1 and thus φ(u, t) = 1 in view of (36) Thus, t ∈ / Cu,φ1 . Without loss of generality, we can assume that there is only one coordinate of t0 , say, the i0 th coordinate, that is not zero, and it must be 1. Let b be a
N − 1 dimensional vector whose i0 th coordinate is 2 and the rest are 0 Note that b ≥ 0. Then b · (t − t0 ) = 2 · (0 − 1) < 0 for all t 6= t0 , thus condition (2) holds Case m = n = 2. There are at most two coordinates of t0 that are not zero: either (2a) there are exactly two coordinates of t0 being 1 in t0 , say, the indexes of the two coordinates are k0 and j0 , or (2.b) the k0 th coordinate of t0 is 2, or (2.c) the k0 th coordinate of t0 is 1 and the rest coordinates are all zero, or (2d) all the coordinates of t0 are 0 PN −1 Consider case (2.a) Since i=1 ui ≤ N = 4, either (2a1) at least one of the k0 th and j0 th coordinates of u (say the k0 th) is 1, or (2.a2) both the k0 th and j0 th coordinates of u are two In case (2.a1), let b = (b1 , b2 , b3 ) be such that bk0 = 2, bj0 = 1, and the other bh = 0 Verify that b satisfies condition (2). In fact in case (2a1), every point t satisfies b · (t − t0 ) < 0 if t 6= t0 , as the k0 th coordinate of t is at most 1
and the j0 th coordinate of t is at most 2. In case (2.a2), let k0 < j0 and define b as above, then the only point t 6= t0 satisfying b · (t − t0 ) ≥ 0 has the k0 th coordinate being 2. We shall show next that the latter point t ∈ / Cu,φ1 . Since φ1 (u, t0 ) > 0, by (37) there is a to0 ∈ Tt0 such that φ(to0 ) > 0. Verify that a · to < a · to0 ∀ to ∈ Tt , as k0 < j0 and a1 < · · · < aN , where a = (a1 , , aN ) As consequences, φ(to ) = 1 by (3.8) and φ1 (u, t) = 1 by (37) It follows that t ∈ / Cu,φ1 . Thus condition (2) holds Consider case (2.b) or (2c) Let b be such that its k0 th coordinate is 1 and the rest are zero Then condition (2) holds. Consider case (2.d) By assumption in the theorem, φ is a non-randomized test Since φ(u, t0 ) > 0, there is to0 ∈ Tt0 such that φ(to0 ) = 1. Verify that for each t ∈ T u , if to ∈ Tt then a · to ≤ a · to0 Consequently, φ1 (u, t) = 1 Thus condition (2) holds. The proof for the case m
= 1 is similar. This concludes the proof of the theorem Theorem 5.2 Consider the problem of testing H0 against H3 (or H4 ) Let φ = 1[a·to 6∈(l,r)] + γ1[a·to =l or r] , where + 1, r = n(m+1+m+n) − 1 and a = (1, 2, ., N ) Then φ is admissible within the class of all tests l = n(n+1) 2 2 Proof. It suffices to show that for each u, (B) if φ(u, t0 ) > 0, then there is a vector b such that (1) b ≥ 0 or b ≤ 0 and (2) for each t, b · (t − t0 ) ≥ 0 and t 6= t0 imply that φ(to ) = 1 for all to ∈ Tt . By (3.7), statement (B) implies that φ1 (u, t) = 1 and thus t ∈ / Cu,φ1 . It follows that condition (2) of Lemma 31 holds. As a consequence, φ is admissible by Theorem 41 If φ(u, t0 ) > 0, then by (3.7) there exists a to0 ∈ Tt0 such that φ(to0 ) > 0 By the assumption on φ in the theorem, one of the following must be true: 1. The first n coordinates of to0 are 1 and the rest are zero; 2. The first n − 1 coordinates and the (n + 1)st coordinate of to0 are 1
and the rest are zero; 3. The last n coordinates of to0 are 1 and the rest are zero; 4. The last n − 1 coordinates and the mth coordinate of to0 are 1 and the rest are zero Let t0,i1 and t0,i1 +j be first and the last nonzero coordinates of t0 , respectively. In the first two cases, let b be the vector such that its (i1 , ., i1 + j)th coordinates are (j + 1, j, , 1) and the rest are zero. It is obvious that b ≥ 0 Verify that if case 1 is true, then there is no t 6= t0 such that b · (t − t0 ) ≥ 0. The reason is as follows By (33), (3.4) and (35), the (1, , i1 + j − 1)th coordinates of u and t0 are the same and ui1 +j ≥ the (i1 + j)th coordinate of t0 , where ui is the ith coordinate of u. Then b · (t − t0 ) ≥ 0 implies t = t0 Now if case 2 is true and if b · (t − t0 ) ≥ 0 and t 6= t0 , then (3.3), (34) and (35) and the structure of to0 in case 2 imply that Tt consists of only one element to and the first n coordinates of to are 1 and the rest are zero. Verify
that the point t satisfies a · to < a · to0 < l, thus φ(to ) = 1 and consequently, φ1 (u, t) = 1. So statement (B) holds On the other hand, if either case 3 or case 4 is true, let b = −(N − 1, ., 2, 1) Thus b ≤ 0 28 Source: http://www.doksinet Verify that if case 3 is true, then there is no t 6= t0 such that b · (t − t0 ) ≥ 0. The argument is somewhat similar to that for case 1. So statement (B) holds Now if case 4 is true and if b · (t − t0 ) ≥ 0 and t 6= t0 , then (3.3), (34) and (35) imply that Tt consists of only one element to and the last n coordinates of to are 1 and the rest are zero. Verify that the point to satisfies a · to > a · to0 > r, thus φ(to ) = 1 and consequently, φ1 (u, t) = 1. So statement (B) holds Remark 1. Unlike Theorem 51, the test φ stated in Theorem 52 is the Wilcoxon test It is worth mentioning that for the cases considered in Theorem 5.2, the Wilcoxon test is a representative of other tests of form (21) In fact, as
pointed out in the proof, there are only four cases that φ > 0, and the test φ = 1 if case 1 or 3 is true and = γ if case 2 or 4 is true. In the form of (21), given c1 < c2 and b 6= (1, , N ) but b1 < b2 < · · · < bN , it is easy to find lb and rb such that the test φ in Theorem 5.2 is the same as 1[b·to 6∈(lb ,rb )] + γ1[b·to =lb or rb ] Consequently, the Wilcoxon test, the Fisher-Yates test and the median test are admissible in the cases mentioned in Theorems 5.1 and 52 Remark 2. It is easy to modify the proof of Theorem 52 to show the following results: 1. For testing against H1 , φ = 1[a·to <l] + γ1[a·to =l] is admissible within the class of all test when F, G ∈ Fk if the size of φ is ≤ N2 . (n) 2. For testing against H2 , φ = 1[a·to >r] + γ1[a·to =r] is admissible within the class of all test when F, G ∈ Fk if the size of φ is ≤ N2 . (n) Remark 3. Admissibility within the class of all rank tests is different from admissibility
within the class of all tests A standard t-test is not a rank test. If a test is admissible within the class of all tests, it is admissible within the class of all rank tests, but not vice-verse. 6. Comments The paper makes progress in attacking two well-known open questions in Lehmann’s famous textbook “Testing statistical hypotheses” (1959). To the best of the author’s knowledge, the open questions were not settled in any special case before. The main difficulty is that the problems were not tractable The significance of the current paper is Theorem 4.1, not Theorems 51 and 52 Theorem 41 provides a sufficient condition for admissibility of the Wilcoxon test and makes the open problems solvable. Theorems 51 and 52 demonstrate that Theorem 4.1 can indeed be used to solve the open questions for some special sample sizes, or for some special significance levels α. At this moment, the author believes that Theorem 41 can be used to show that the non-randomized Wilcoxon test is
admissible for each sample size n and each attainable significance level α on a case-by-case basis, but is unable to produce a unified proof. Note that the sufficient condition in Theorem 4.1 is applicable for a wide class of linear rank tests including the Fisher-Yates test and the median test. In Section 5, we apply the theorem and show that the Wilcoxon test is admissible in some special cases. The cases considered in Section 5 are some special cases in in practice They were chosen because the proofs are relatively easy to follow and thus it makes the paper more readable. It is possible to establish admissibility results for the Wilcoxon test in additional cases other than those listed in Theorems 5.1 and 52 and Remark 2 Thus the results in Section 5 are not the only cases that Theorem 4.1 is applicable We further point out that Theorem 4.1 is a sufficient condition and may not be a necessary condition In view of the inadmissibility results on the randomized Wilcoxon tests, it is
possible that the tests of form (2.1) maybe inadmissible within the class of all tests if F, G ∈ Fk (ξ) for some special cases. However, it is still not clear whether the tests of form (2.1) is admissible within the class of all tests when F, G ∈ F Indeed, Theorem 52 already presents a different result on the continuous two-sample problem from the one on the discrete set-up. If n, m ≥ 4 and the assumption of Theorem 5.2 holds, each non-randomized Wilcoxon test with γ = c ∈ (0, 1) is inadmissible in the discrete set-up (see Yu (2000)) but is admissible in the continuous set-up. The test of form (2.1) is a rank test if γ is a constant However, if γ is not a constant, it may not be a rank test The test of form (2.1) may not be of practical importance, as probably nobody would use a randomized test in the application. However, it is important in the statistics theory, as is well known that without the concept of randomized tests, the Lehmann-Pearson Lemma would not exist. The
latter lemma is a foundation of the theory on testing statistical hypotheses. Acknowledgement. Helpful discussion on the classical nonparametric two-sample problem with Professor Larry D Brown is acknowledged. He kindly pointed out an error in an earlier version of the paper The author also thanks two referees and the editor for invaluable comments. References. * Ferguson, T. S (1967) Mathematical statistics, a decision theoretic approach Academic, New York, 250-255 * Lehmann, E. L (1959) Testing statistical hypotheses 1st ed Wiley, New York, * Lehmann, E. L (1986) Testing statistical hypotheses 2nd ed Wiley, New York * Munroe, M. E (1953) Introduction to measure and integration Addison -Wesley, Massachusetts, 291-292 29 Source: http://www.doksinet * Sugiura, N. (1965) An example of the two-sided Wilcoxon test which is not unbiased Ann Inst Statist Math 17, 261-263. * Yu, Q. Q (2000) Inadmissibility and admissibility of randomized Wilcoxon tests in discrete two-sample problems
Statistics & Decisions, 18, 35-48. Appendix To the Paper A SUFFICIENT CONDITION FOR ADMISSIBILITY OF THE WILCOXON TEST IN THE CLASSICAL TWO-SAMPLE PROBLEM For the convenience of readers, in this appendix, we give the proof of Lemma 3.1, which is a minor modification of Theorem 4.1 in Yu (2000) Given an arbitrary test φ, we say a u-section of φ is admissible for testing w = 0 against H1 if there is no test ψ such that EFp ,Gq ([φ − ψ]|u) ≤ 0 and EFp ,Fp ([φ − ψ]|u) ≥ 0 for all Fp , Gq ∈ Fk (ξ) with Fp ≤ Gq , and with at least one strict inequality. For testing against H1 , Lemma 3.1 follows from Lemmas A1 and A2 below For testing against H2 or H3 or H4 , the proof is almost identical to the proof given below. Lemma A.1 For testing H0 against H1 when F, G ∈ Fk (ξ), a test φ as (21) is admissible if every u-section of φ is admissible for testing w = 0 against H1 . Lemma A.2 Consider the problem of testing H0 against H1 when F, G ∈ Fk (ξ) Let φ be a
one-sided test satisfying conditions (1) and (2) of Lemma 3.1, then every u-section of φ is admissible for testing w = 0 against H1 Proof of Lemma A.1 Assume every u-section of φ is admissible for testing w = 0 Since qi ≤ pi , i < N , imply Fq ≤ Gp , it suffices to consider testing against H1∗ : qi ≤ pi for i = 1, ., N − 1, but p 6= q, or equivalently, w ≥ 0 but w 6= 0. Let us denote the set of all possible values of u by U, and let u0 be an extreme point of the convex hull of U (the smallest convex set that contains U). Note that U consists of finitely many elements If φ2 is as good as φ, then X u∈U eθ·u X t X h(u, t)ew·t [φ2 (u, t) − φ(u, t)] ≥ 0 ∀ θ and ∀ w ≥ 0 but w 6= 0, eθ·u X h(u, t)ew·t [φ2 (u, t) − φ(u, t)] ≤ 0 ∀ θ and w = 0. X h(u, t)ew·t [φ2 (u, t) − φ(u, t)] = 0 ∀ θ and w = 0 t u∈U (7.1) Due to continuity in w, (7.1) yields X eθ·u t u∈U Consider a hyper-plane b · (u − u0 ) = 0 which
supports the convex hull of U at u and such that b · (u − u0 ) < 0 for all u ∈ U and u 6= u0 . It is important to notice that θ in (71) is arbitrary, though w is restricted In (71) let θ = vb, and multiply (7.1) by e−vb·u0 Letting v ∞ yields X t h(u0 , t)ew·t [φ2 (u0 , t) − φ(u0 , t)] ≥ 0 for all w ≥ 0 (7.2) with equality when w = 0. Since each u-section of φ is admissible, (72) implies X t h(u0 , t)ew·t [φ2 (u0 , t) − φ(u0 , t)] = 0 for all w ≥ 0. Since {w : w ≥ 0} contains a nonempty N − 1 dimensional open set, t is complete and sufficient for the conditional distribution µFq ,Gp (u0 , ·). Thus φ(u0 , t) = φ2 (u0 , t) for all possible t In the latter case, we can replace U by U {u0 } in (7.1) and repeat the argument for an extreme point of U {u0 } After finitely many steps we must either arrive at a contradiction or conclude φ2 = φ for all possible (u, t). Proof of Lemma A.2 We shall show that if the u-section of φ is not
admissible, then we can reach a contradiction Suppose a u-section of φ is dominated by a different test ψ. Then there must exist some point (u, t0 ) such that φ(u, t0 ) > ψ(u, t0 ). Otherwise, it could not be true that EFp ,Fp ([φ − ψ]|u) ≥ 0 (and EFp ,Gq ([φ − ψ]|u) ≤ 0) Because φ = 0 for all non-extreme points t of Cu,φ by condition (2), we see that t0 must belong to the complement of Cu,φ or 30 Source: http://www.doksinet be an extreme point of Cu,φ . Consequently, w = vb ≥ 0 ∀ v > 0, where b is the vector in condition (2) Thus w = vb is a proper parameter under H1 . Letting w = vb, the fact that u-section of ψ dominates the one of φ gives 0 ≤e−w·t0 ≤ X c t∈Cu,φ + X t [ψ(u, t) − φ(u, t)]h(u, t)ew·t (7.3) [ψ(u, t) − φ(u, t)]h(u, t)evb·(t−t0 ) X t∈Cu,φ {t0 } [ψ(u, t) − φ(u, t)]h(u, t)evb·(t−t0 ) + [ψ(u, t0 ) − φ(u, t0 )]h(u, t0 ) X ≤ [ψ(u, t) − φ(u, t)]h(u, t)evb·(t−t0 ) t∈Cu,φ {t0 } + [ψ(u,
t0 ) − φ(u, t0 )]h(u, t0 ) c c (as φ = 1 on Cu,φ ), where Cu,φ is the complement of Cu,φ . Taking limits on both sides of (73) as v ∞ yields 0 ≤ ψ(u, t0 ) − φ(u, t0 ) by condition (2), which contradicts the assumption that ψ < φ at (u, t0 ). This completes the proof of the lemma. 31