Sampling with Removal in LP-type Problems

Random sampling is an important tool in optimization subject to finitely or infinitely many constraints. Here we are interested in obtaining solutions of low cost that violate only few constraints. Under convexity or similar favorable conditions, and assuming fixed dimension, one can indeed derive combinatorial bounds on the expected number (or probability mass) of constraints violated by the optimal solution subject to a (small) random sample of constraints. The cost of the sample solution, however, cannot be bounded combinatorially. Suppose that for a fixed integer k, we locally minimize the cost by removing k constraints from the sample. Can we still guarantee (good) bounds on the number or mass of violated constraints? This question was asked and partially answered in the framework of chance-constrained optimization [1]. Here we study the question in the completely combinatorial setting of LP-type problems that allows a unified treatment of many different optimization problems. Given a nondegenerate LP-type problem of combinatorial dimension δ, we prove that sampling with removal of the best k constraints induces an LP-type problem of combinatorial dimension O(δk+1), and this bound is tight in the worst case. It follows that through this kind of removal, the expected number of violated constraints blows up by a factor of at most O(δk); for relevant sample sizes, however, a matching lower bound for the blowup factor is not implied. At the same time, our results show that further improvements require new tools that go beyond combinatorial dimension.


INTRODUCTION
The combinatorial framework of LP-type problems is a well-studied abstraction of convex programming and encompasses many concrete (geometric) optimization problems. The framework has been introduced in a seminal paper by Matoušek, Sharir and Welzl in 1992 that also lists several concrete LP-type problems [11,9]. Their randomized algorithm for solving LP-type problems yields-together with algorithms by Clarkson [4]-the currently best algorithm for linear programming in the real RAM model of computation [6].
Random sampling, the main tool in Clarkson's algorithms, is well understood in this combinatorial framework: if we have an LP-type problem of (combinatorial) dimension δ over n constraints, and we sample r of them at random, then the expected number of constraints still violated by the sample solution is at most δ(n − r)/(r + 1) [7].
In this paper, we obtain the first results for random sampling with removal in LP-type problems: if we sample r constraints at random and then remove from the sample the k ones that lead to the highest improvement in objective function value, the expected number of constraints violated by this "k-optimized" sample solution is O(δ k+1 (n − r)/(r + 1)) (assuming that k is fixed). We also exhibit a concrete LPtype problem for which this bound is tight if r = n − 1. In relevant cases, however (most notably in the aforementioned algorithms), we have r n, and here we do not get a matching lower bound. We will discuss this issue (and the open problem arising from it) in detail in the last section.
The main technical step in the upper bound proof is the construction of a k-optimized LP-type problem of combinatorial dimension O(δ k+1 ), with the property that random sampling with removal in the original problem corresponds to "normal" random sampling in the k-optimized problem.
In practice, removing the best k constraints is feasible only for very small k, and one would rather remove constraints according to some cheaper rule. It would be interesting to understand whether our combinatorial results extend to removal rules other than "the best k". Currently, we do not know the answer.
Still, our tight bound for "best-k-removal" has a clear theoretical value. On the one hand, for fixed δ and k, the expected number of violated constrains increases only by a constant factor, even under the best possible local improvement of the objective value. On the other hand, our lower bound shows that there is a price to pay that may be exponential in the number of removed constraints, unless new problem parameters that are less coarse than the combinatorial dimension can be found and employed.

Chance-constrained optimization
The motivation for this work comes from chance-constrained optimization, where we have a probability distribution over a set of constraints. For a given ε, the goal is to find the best solution with violation probability at most ε, meaning that it violates a randomly chosen constraint with at most this probability. Let us call such a solution sufficiently feasible.
A natural approach here is to sample a finite number of constraints at random from the distribution, and to solve the problem optimally subject to the sampled constraints. Assuming a linear objective function and convex constraints, Campi and Garatti give tight bounds for the probability that the sample solution is sufficiently feasible, depending on the sample size [1]. In particular (and this was known before), the probability of not being sufficiently feasible tends to zero as the sample size goes to infinity.
In a later paper, Campi and Garatti study a variant of this approach where the sample solution is improved through removal of some of the sampled constraints, followed by reoptimization under the remaining ones [2]. In particular, if the quality of the sample solution is suffering from outliers, this is desirable. The main question is how such a removal affects sufficient feasibility.
The main result of Campi and Garatti is that the probability of not being sufficiently feasible goes up, but only by a factor depending on the problem dimension d and the number k of removed constraints [2, Theorem 2.1]. As a consequence, the solution after removal is still sufficiently feasible with probability tending to one as the sample size goes to infinity.
It is not clear whether the bound of [2, Theorem 2.1] is best possible, and what happens if we are not interested in asymptotic results but results for a fixed budget of samples. Fagliano and Schildknecht derive a bound for the expected violation probability of the sample solution but believe that this bound is poor [5].
Although the aforementioned work [1, 2, 5] deals with convex programming, many of the arguments are (implicitly) of a purely combinatorial nature and use little problem structure beyond what we also have in LP-type problems. This paper is an attempt to make the combinatorial properties of random sampling with removal explicit by working on the abstract level only. The author believes that the results obtained in this way will also contribute to the understanding of sampling with removal in chance-constrained optimization.

Paper outline
In Section 2, we review the definition of LP-type problems and the known bounds for sampling without removal. Section 3 derives the k-optimized LP-type problem. In Section 4, we show how the known concept of k-levels in LP-type problems serves as a tool for the analysis of sampling with removal. Section 5 employs this tool to bound the combinatorial dimension of the k-optimized LP-type problem, from which we derive our main upper bound result in Section 6. Section 7 provides a matching lower bound on the combinatorial dimension. In the final Section 8, we discuss open questions, and in particular the limitations of our results.

SAMPLING IN LP-TYPE PROBLEMS
On an abstract level, many problems of minimizing an objective function subject to finitely many constraints have the following structure.
Definition 2.1. Let H be a finite set (the constraints), Ω a totally ordered set (the values) with a smallest element −∞, and w : 2 H → Ω a function that assigns an objective function value to each subset of constraints, such that w(∅) = −∞. Suppose that for all F ⊆ G ⊆ H, the following two axioms hold.
Then the pair P = (H, Ω, w) is called an LP-type problem [11,9]. We refer to axiom (i) as monotonicity and axiom (ii) as locality.
In an LP-type problem, w(G) represents the minimum objective function value subject to only the constraints in G.
The original minimization problem asks for the value w(H), the minimum value subject to all constraints. The locality axiom can be seen as an abstract analog of convexity of constraints, since it allows for a local test of optimality: a simple induction shows that if w(

Smallest enclosing balls
As an example, we use the classic problem of computing the smallest enclosing ball of a finite set of points P in Euclidean space R d . To write this as an LP-type problem, we let H = P and Ω = R ∪ {−∞}. For G ⊆ H, we define w(G) to be the radius of the smallest enclosing ball of G, with the convention that the empty set yields radius −∞. The smallest enclosing ball of a nonempty set of points exists and is unique [12], hence w is well-defined. Monotonicity is clear, and to see locality, we observe that F ⊆ G, w(F ) = w(G) means that both F and G have the same smallest enclosing ball. If w(G ∪ {h}) > w(G), then h is outside of that ball, so we also have w(F ∪ {h}) > w(F ). Let us refer to this LP-type problem as SEB.

The sampling lemma
Given a minimization problem subject to a finite (but possibly very large) number of constraints, one approach for obtaining an approximate solution is the following: sample a (much smaller) subset of constraints at random, and optimally solve the problem subject to only the sampled constraints. In the framework of LP-type problems, we can define a "combinatorial quality" of the sample solution, by counting the number of constraints still violated by it.
. We use VP (G) to denote the set of constraints violated by G.
. We use XP (G) to denote the set of extreme constraints in G.
Let us illustrate these notions for SEB. As already seen, h violates G if h is outside of the smallest enclosing ball of G; h is extreme in G if its removal allows for a smaller enclosing ball. Necessarily, such an h is on the boundary of the smallest enclosing ball, but this is not sufficient: if G consists of the four corners of a square, then G has no extreme points. Lemma 2.3 (Sampling Lemma [7]). Let P = (H, Ω, w) be an LP-type problem, and r ∈ N, 0 ≤ r < n := |H|. Let R be a random sample of H of size r, obtained by choosing uniformly at random from all r-element subsets of H. The Sampling lemma can be used to argue that vr is small if the expected number xr+1 of extreme constraints in a random sample of size r + 1 is small. For smallest enclosing balls in R d , this is the case: using Helly's Theorem, one can show that every set has at most d + 1 extreme points, and this yields vr ≤ (d + 1)(n − r)/(r + 1). For example, if d = 2, the smallest enclosing ball of a random sample of size √ n has in expectation less than 3 √ n points outside.

THE K-OPTIMIZED PROBLEM
Suppose we are given an LP-type problem P = (H, Ω, w) and a random sample of constraints R ⊆ H. We have seen that under favorable conditions, Lemma 2.3 provides a good bound on the expected number of constraints still violated by R. We now want to understand how this number changes when we remove some constraints from the sample in order to locally minimize the objective function value.
Definition 3.1. Let P = (H, Ω, w) be an LP-type problem, and let k ∈ N. We define the function w k : 2 H → Ω by This means, we are interested in the smallest value that can be obtained by removing at most k constraints from G. Note that w0 = w. It turns out that w k inherits monotonicity from w.
where we have used monotonicity of w in the second inequality.
Definition 3.3. If P = (H, Ω, w) is an LP-type problem and k ∈ N, then P k = (H, Ω, w k ) is the k-optimized version of P.
We remark that we have not yet proved that P k is an LPtype problem, and in general, this is false.

Weak nondegeneracy
We next want to argue that every k-witness C of G violates all removed constraints h ∈ G \ C. But this is not true in general (consider an LP-type problem that assigns the same value to every nonempty subsets of constraints). We therefore need to make an additional assumption.
For example, in SEB, the set of 4 corners of a square would be a set without any extreme points. If no four points are cocircular, however, we always obtain a weakly nondegenerate instance.
Lemma 3.5. Let P = (H, Ω, w) be a weakly nondegenerate LP-type problem, k ∈ N. Let G ⊆ H such that w k (G) > −∞, and let C be a k-witness of G. Then the following two statements hold.
The following is the main technical lemma of this section. Recall that we want to count the constraints violated by a random sample R in P, after removal of the best k constraints from R. Assume for a moment that k-witnesses in P are unique (we will take care of this afterwards). Then the lemma says that on top of the k removed ones (previous lemma), the constraints in question are precisely the ones violated by R in the k-optimized version P k . Thus, we can reduce random sampling with removal in P to the well-understood normal random sampling in P k .
Lemma 3.6. Let P = (H, Ω, w) be a weakly nondegenerate LP-type problem, k ∈ N, and G ⊆ H with w k (G) > −∞. Then the following two statements are equivalent for every h ∈ H \ G.
For the direction (i)⇒(ii), we have w k (G ∪ {h}) = w k (G), and we let D be a k-witness of G ∪ {h}. We will show that (a) h ∈ D, so we have equality throughout, and D is a set of value w k (G) > −∞ without extreme constraints. This is a contradiction to weak nondegeneracy of P.
Then we argue as before that so we have equality throughout and C is a k-witness of G as required.

Strong nondegeneracy
Under weak nondegeneracy, a set may still have several kwitnesses, but we must avoid this in order to complete the argument outlined before Lemma 3.6. We therefore stipulate a stronger notion of nondegeneracy.
We remark that any inclusion-minimal subset of G with value w(G) is a basis of G, and vice versa. Under strong nondegeneracy, every set G has exactly one basis.
Lemma 3.8. A strongly nondegenerate LP-type problem is weakly nondegenerate.
Proof. Suppose P = (H, Ω, w) is strongly nondegenerate and let G ⊆ H be a set such that w(G) > −∞. We need to show that G has an extreme constraint. Suppose not and let B ⊆ G be a basis of G. Because of w(G) > −∞ = w(∅), B contains at least one constraint h. Since h is not extreme in G, we have w(G) = w(G \ {h}), and every basis B of G \ {h} is also a basis of G. But since h / ∈ B , we have two distinct bases B, B with the same value w(G), a contradiction to strong nondegeneracy.
As an illustration, consider SEB again. A basis of G is a minimal subset of points with the same smallest enclosing ball of G. In particular, all points of the basis are on the ball's boundary. The set of four corners of a square has two bases: the two pairs of diagonally opposite points.
Maybe surprisingly, SEB is not strongly nondegenerate, even if we assume sufficiently general position of the points. Indeed, every basis of size 1 has the same value 0 (the radius of the smallest enclosing ball of a single point). However, we can define new values w (G) = (w(G), VSEB(G)). We compare these pairs lexicographically, where we use a fixed but arbitrary order among sets in the second component. With respect to w , we have strong nondegeneracy if no d + 2 points are cospherical. We later come back to this refinement [8] in a more general setting.
The following corollary of the above definition shows that under strong nondegeneracy, sets with the same value violate the same constraints, and this will be the key to proving uniqueness of k-witnesses below. Under strong nondegeneracy, we can now strengthen our previous "workhorse" Lemma 3.6.
Lemma 3.10. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem, k ∈ N, and G ⊆ H with w k (G) > −∞. Then the following two statements hold.
(i) G has a unique k-witness that we denote by C k (G).
Proof. For (i), suppose that there are two k-witnesses C, C . By Corollary 3.9, VP (C) = VP (C ), and in particular VP (C) ∩ G = VP (C ) ∩ G. On the other hand, Lemma 3.5 yields VP (C) ∩ G = G \ C and VP (C ) ∩ G = G \ C , so C = C follows. Given that G has a unique k-witness, part (ii) is an easy consequence of Lemma 3.6.
We conclude this section by showing that the k-optimized problem is in fact an LP-type problem if we assume strong nondegeneracy.
Theorem 3.11. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem, k ∈ N. Then the k-optimized version P k = (H, Ω, w k ) (Definition 3.3) is an LP-type problem.

Proof. Monotonicity is Lemma 3.2. For locality, fix
, and from Corollary 3.9, we get VP (C k (F )) = VP (C k (G)). So we also have w(C k (F )∪{h}) > w(C k (F )). Applying Lemma 3.10(ii) again, this time to F , we get w k (F ∪{h}) > w k (F ) and hence locality.

THE K-LEVEL
We have now shown that the k-optimized version P k of a strongly nondegenerate LP-type problem P is an LP-type problem itself, and the next goal is to apply the Sampling Lemma 2.3 to get bounds on the expected number of constraints violated by the solution of a random sample in P k . But for this, we need to bound the (expected) number of extreme constraints in a random sample of size r + 1. This is done using k-levels.
Definition 4.1. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem, G ⊆ H, k ∈ N such that w k (G) > −∞. The k-level of G is the set According to Lemma 3.5, the k-level of G contains the kwitness C k (G) of G and possibly many more sets. The crucial fact we prove next is that the size of the (k + 1)-level of G provides a bound for the number of extreme constraints of G w.r.t. the k-optimized problem P k .
Theorem 4.2. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem, G ⊆ H, k ∈ N such that w k+1 (G) > −∞. Then we have Proof. Let x ∈ XP k (G) (meaning that w k (G \ {x}) < w k (G)), and let Cx be the unique k-witness of G\{x}. Since But we also have x ∈ VP (Cx) as a consequence of Lemma 3.10 (ii). This implies that Cx ∈ L P k+1 (G), and we will charge x to Cx. Taken over all x ∈ XP k (G), every set C in the (k+1)level is charged at most k + 1 times (namely at most once for every x ∈ G \ C). The claimed bound follows.

COMBINATORIAL DIMENSION
In a general LP-type problem, we cannot make any nontrivial statements about the size of a level according to Definition 4.1. But in many applications, the LP-type problem has fixed dimension Definition 5.1. Let P = (H, Ω, w) be an LP-type problem. The combinatorial dimension δ(P) of P is the size of a largest basis.
There is a strong relation between combinatorial dimension and extreme constraints. Proof. The latter statement immediately follows from Definition 3.7 of a (largest) basis. For the former, we use the simple observation that an extreme constraint of G is contained in every basis of G.
For the smallest enclosing ball problem SEB in R d , the combinatorial dimension is thus at most d+1, a consequence of the same previous bound for the maximum number of extreme points.
For strongly nondegenerate LP-type problems with fixed combinatorial dimension, we have complete control over the level sizes, and this is a classic result first proved by Clarkson in the context of linear programming [3], and later by Matoušek for LP-type problems [8]; below we apply yet another version [7,Theorem 4.1]. In fact, these results only require "normal" nondegeneracy. Definition 5.3. Let P = (H, Ω, w) be an LP-type problem. P is called nondegenerate if every set G ⊆ H has a unique basis.
Lemma 5.4. Every strongly nondegenerate LP-type problem is nondegenerate, and every nondegenerate LP-type problem is weakly nondegenerate.
Proof. Nondegeneracy from strong nondegeneracy: if a set has two bases B, B , these have the same value, hence B = B by strong nondegeneracy. Weak nondegeneracy from nondegeneracy: Let G be a set with w(G) > −∞. Let B ⊆ G be the the unique basis of G. Because of w(G) = w(B) > −∞, we have B = ∅, and every constraint x ∈ B is extreme in G, since otherwise, we would find another basis in G \ {x}.
It is important to mention that every nondegenerate LPtype problem can be refined to a strongly nondegenerate one of the same combinatorial dimension. We have explained how this works for SEB after Definition 3.7, and the general construction is due to Matoušek [8]. A refinement P of P has the property that for all G ⊆ H, so the refinement can only add violated constraints. As a consequence of this discussion, a nondegenerate problem can be assumed to be strongly nondegenerate without loss of generality. Here is the bound on the k-level.
Theorem 5.5. Let P = (H, Ω, w) be a nondegenerate LPtype problem of combinatorial dimension δ, and let G ⊆ H, k ∈ N such that w k (G) > −∞ and |G| ≥ k + δ. Then Proof. We apply a further refinement [8, Lemma 2.4] of P to a regular nondegenerate LP-type problem P , meaning that every set G of size at least δ has a unique basis of size exactly δ. Then we have by the aforementioned classic results [7,Theorem 4.1]. Property (2) yields L P k (G) ⊆ L P k (G), since there can never be more than k violated constraints. The statement of the theorem follows.

SAMPLING WITH REMOVAL
We are approaching the main result of this paper that allows us to bound the expected number of constraints violated by a random sample R in an LP-type problem, after removal of the best k constraints from R. We start with the combinatorial dimension of the k-optimized version.
Lemma 6.1. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem of combinatorial dimension δ, let k ∈ N, and let P k = (H, Ω, w k ) be its k-optimized version (which is an LP-type problem by Theorem 3.11). Then P k has combinatorial dimension at most for fixed k.
Proof. We already know that every set G has at most extreme constraints, by the bounds established in Theorems 4.2 and 5.5. According to Lemma 5.2, this also bounds the combinatorial dimension.
Here is our main result.
Theorem 6.2. Let P = (H, Ω, w) be a strongly nondegenerate LP-type problem of combinatorial dimension δ, and let r, k ∈ N, 0 ≤ k + δ ≤ r < n. Let R ⊆ H be a set of r constraints chosen uniformly at random, and let C k (R) be the unique k-witness of R, i.e. an (r − k)-subset of R with minimum possible w-value.
Let v k,r = E[|VP (C k (R))|] be the expected number of constraints violated by C k (R). Then for fixed k.
Proof. Lemmas 3.5 and 3.10 together show that v k,r = k + E [|VP k (R)|], and the Sampling Lemma 2.3 (applied to the k-optimized version P k ) yields since δ(P k ) is by Lemma 5.2 an upper bound for the (expected) number of extreme constraints of any set. Plugging in the bound for δ(P k ) from Lemma 6.1, the result follows.

LOWER BOUND
The quality of the bound in Theorem 6.2 is determined by the combinatorial dimension of the k-optimized LP-type problem, and it is natural to ask whether the bound that we have given in Lemma 6.1 is asymptotically best possible for the relevant case of fixed k. We currently have no "geometric" LP-type problem (such as SEB) for which we can prove tightness of the bound, but there is an abstract LP-type problem that does it.

The top nodes LP-type problem
Let H be the set of nodes of a rooted tree, and let C(u) be the set of children of node u. We assign positive weights to the nodes such that for every node u. For G ⊆ H, we let w(G) = u∈G w(u) be the total weight of G. We choose w generic, meaning that different sets have different total weights. For a set of nodes G, its top nodes are the nodes that have no ancestor in G, and we use T (G) ⊆ G to denote the set of top nodes in G. For example, any set G containing the root node ρ satisfies T (G) = {ρ}.
Let B(G) be the unique set of at most δ top nodes that achieves the maximum in (4); uniqueness follows from w being generic.
We have B(∅) = ∅ and hence w (δ) (∅) = 0. 1 All other sets have at least one top node and therefore positive value.
Theorem 7.2. Problem T = (H, R, w (δ) ) is a strongly nondegenerate LP-type problem of combinatorial dimension at most δ.
To prove this, we separately show monotonicity and locality (Lemmas 7.4 and 7.6 below). The bound on the combinatorial dimension (Definition 5.1)) is a consequence of (4), in form of the simple observation that w (δ) (G) = w (δ) (B(G)). Strong nondegeneracy (Definition 3.7) immediately follows from w being generic. The reader mainly interested in the lower bound construction can skip ahead to the next paragraph.
Here is a preparatory lemma. Proof. We can obtain S from the set {h} by repeatedly replacing a node with some of its children. In each such replacement, the weight of the current set decreases according to (3). Now we can show monotonicity of w (d) .
Proof We get |B | ≤ |B(G)| ≤ δ and where the latter inequality uses Lemma 7.3 applied with the set S = B(G) ∩ Q of top nodes. Hence, and monotonicity is established.
Here is the major step towards locality: Whether a constraint h is violated by G only depends on B(G).
. So we can assume that h ∈ B and argue as follows: Since |B(G)| = δ, we have a nodeũ ∈ B(G) \ B , and since Locality is now an easy consequence.

Lower bound construction
We now show that for a suitable tree and suitable node weights, the k-optimized LP-type problem T k = (H, R, w (δ) k ) derived from T has combinatorial dimension Ω(δ k+1 ).
Let us assume that k ≥ 1 is a constant, and that k + 2 divides δ ≥ 3. Then our tree will be the complete δ/(k + 2)ary tree with k + 3 levels. The total number of nodes is therefore Here is the idea of the lower bound construction: To show that T k has combinatorial dimension Ω(δ k+1 ), we exhibit a set G ⊆ H with that many extreme constraints. We choose G = H \ {ρ}, i.e. the set of all nodes except the root, and we show that all interior nodes in G are extreme. Since the number of such interior nodes is this yields the desired bound. To make all the interior nodes extreme, we choose the weight function w in such a way that w k (G \ {h}) for every interior node h. What makes this possible is the height of the tree: starting from G, the removal of k nodes cannot turn any leaf into a top node, but starting from G \ {h}, this is always possible. To exploit this, we need a weight function that assigns exceptionally small values to the leaves; then the most profitable way to remove k constraints from G \ {h} is to expose some leaf as a top node.
We choose w(u) ∈ (0, 1) if u is a leaf, and w(u) > δ if u is a parent of a leaf. For all other nodes u, we inductively select in a bottom-up fashion. It is clear that this can be done in such a way that the resulting w is generic; by construction, w also satisfies the condition (3).
Lemma 7.7. With w as previously defined, we have for every interior node h ∈ G; here T (G) is the set of top nodes in G, the δ/(k + 2) children of the root.
In particular, w (δ) (G \ {h}) < w (δ) (G) for all interior nodes h ∈ G, so h is an extreme constraint in G w.r.t. the koptimized LP-type problem T k . With Lemma 5.2 we obtain the following Corollary 7.8. With the tree structure and weights as above, the combinatorial dimension of the k-optimized LPtype problem T k is Θ δ k+1 .
Proof. (Lemma 7.7.) For the lower bound on w (δ) k (G), we successively remove the best k constraints from G (in increasing order of distance to the root) and look at the val- where Gi is the set after removing i constraints. The value w (δ) (G k ) determines w (δ) k (G). We first observe that B(Gi) = T (Gi) for all i, because the arity of the tree allows us to increase the number of top nodes by at most δ/(k + 2) − 1 per removal step, so we will never have more than δ top nodes in the process. Now, if the i-th removed constraint is a top node in Gi−1, none of its children have been removed yet, so all of them become top nodes in Gi, and by the previous observation along with (5) we obtain If the i-th removed constraint is not a top node in Gi−1, we get T (Gi) = T (Gi−1) and hence After k removal steps, we therefore have For the upper bound, we start from G0 = G \ {h} for some interior node h and choose any path from a leaf to the root that contains node h. This path has exactly k interior nodes in G \ {h}. Removing all of them in turn, and in increasing order of distance to the root, monotonicity of w (δ) yields where G k−1 is the set obtained before the last removal step, the one that removes a parent p of a leaf and yields G k . In that step, all children of p become top nodes. By our choice of w, and again observing that throughout the process, B(Gi) = T (Gi), we have Together with the previous inequality, this yields w (δ) k (G \ {h}) ≤ w (δ) (G k ) < w(T (G)) − 1, since G k is a k-candidate of G \ {h}.

DISCUSSION AND OPEN PROBLEMS
If an LP-type problem (H, Ω, w) with |H| = n constraints has combinatorial dimension δ, the Sampling Lemma 2.3 together with Lemma 5.2 (every set has at most δ extreme constraints) implies that vr ≤ δ n − r r + 1 .
Here, vr is the expected number of constraints violated by a random sample R ⊆ H of size |R| = r. This, however, is only a worst case bound. The true value of vr as given by the Sampling Lemma is vr = xr+1 n − r r + 1 , where xr+1 is the average number of extreme constraints in a random sample of size r + 1. We know that xr+1 = δ if r + 1 = n and the basis of H happens to be among the ones of maximal size δ. In other cases, xr+1 might be much smaller than the combinatorial dimension δ. In particular, in Clarkson's algorithms [4,6] and also in chance-constrained optimization [1,2], the Sampling Lemma is used with sublinear values of r, and this is the whole point of the sampling approach.
Our lower bound result from the previous section has therefore no practical value in the applications: We cannot use it to construct an LP-type problem where sampling r constraints at random and then removing the best k yields vr = Ω δ k+1 n − r r + 1 for relevant values of r as a function of n.
On the positive side, we show that if better bounds on vr exist for relevant cases, they cannot be obtained with the help of just the combinatorial dimension. This is an important insight, since in all theoretical work that we are aware of, it is precisely the inequality (6) that is used to bound the expected number of violated constraints.
Our work therefore leads to the obvious open problem of finding bounds on xr+1 that improve over δ in relevant cases.
In practice, it might be prohibitively expensive to remove the k best constraints, so it is natural to ask whether other removal strategies lead to similar theoretical results. In the abstract setting of LP-type problems, our techniques do not seem to leave much room for this; after all, we crucially need the property that all removed constraints are violated by the k-optimized solution, and to prove this in Lemma 3.5, we needed a k-witness to have the smallest possible w-value among all k-candidates.
Another question is whether the requirement of strong nondegeneracy underlying our results can be relaxed. As pointed out in Section 5, it suffices to assume "normal" nondegeneracy, and in many geometric situations, this can be achieved through a (symbolic) perturbation of the input. In the abstract setting of LP-type problems, however, nondegeneracy is a more restrictive assumption: As shown by Matoušek andŠkovroň [10], degeneracies may be removable only at the price of a substantial increase in the problem dimension.