maketitle begin abstract Given an point metric and an integer we consider the problem of covering by balls so as to minimize the sum of the radii of the We present randomized algorithm that runs in log cdot log Delta time and returns with high probability the optimal Here Delta is the ratio between the maximum and minimum interpoint distances in the metric We also show that the problem is even in metrics induced by weighted planar graphs and in metrics of constant doubling end abstract begin keywordname clustering cover clustering metric clustering planar metric doubling end keywordname section Introduction Given metric defined on set of points we define the ball ball centered at in and having radius geq to be the set in leq In this work we consider the problem of computing minimum cost cover for the given point set where is some given integer which is also part of the For kappa kappa cover for subset subseteq is set of at most kappa balls each centered at point in whose union covers contains The cost of set of balls denoted cost is the sum of the radii of those This problem and its variants have been well examined motivated by applications in clustering and coverage cite Doddi et cite consider the metric cover problem and the closely related problem of partitioning into set of clusters so as to minimize the sum of the cluster Following their terminology we will call the latter problem em clustering to minimize the sum of diameters They present bicriteria algorithm that returns clusters whose cost is within multiplicative factor log For clustering to minimize the sum of diameters they also show that the existence of polynomial time algorithm that returns clusters whose cost is strictly within of the optimal would imply that NP Notice that this hardness result does not imply the of the cover Charikar and Panigrahy cite give algorithm based on the method that gives constant factor approximation around for the cover problem and thus also constant factor approximation for clustering to minimize the sum of The well known center problem is variant of the cover problem where the cost of set of balls is defined to be the maximum radius of any ball in the The problem is and admits polynomial time algorithm that yields approximation cite Several other formulations of clustering such as median and clustering are as well cite Recently Gibson et cite gkk consider the geometric version of the cover problem where subset Re for some constant When the or infty norm is used to define the metric they obtain polynomial time algorithm for the cover With the norm they give an algorithm that runs in time polynomial in the number of points and in log returns cover whose cost is within eps of the optimal for any eps subsection Our Results Our first result generalizes the algorithmic approach of Gibson et cite gkk to the metric For the cover problem in the general metric setting we obtain an exact algorithm whose running time is log cdot log Delta where Delta is the em aspect ratio of the metric space the ratio between the maximum interpoint distance and the minimum interpoint The algorithm is randomized and succeeds with high Thus when Delta is bounded by polynomial in the running time of the algorithm is This result for the cover problem should be contrasted with the results for problems such as center median and clustering which hold when the aspect ratio is bounded by polynomial in The main idea that underlies this result is that if we probabilistically partition the metric into sets with at most half the original diameter cite then with high probability only log balls in the optimal cover of are cut by the recursive approach is then used to compute the optimal This algorithmic result raises the question of whether an algorithm whose running time is in is possible even when the aspect ratio is not polynomially Our second result shows that this is unlikely by establishing the of the cover The aspect ratio in the construction is about The metrics obtained are induced by weighted planar graphs thus establishing the of the cover problem for this special Our final result is that the cover problem is in metrics of constant doubling dimension for large enough This result is somewhat surprising given the positive results of cite gkk for fixed dimensional geometric The rest of this article is organized as In Section ref sec metric we present our algorithm for the cover In Section ref sec qptas we point out that our algorithmic result for the metric cover problem readily yields randomized approximation algorithm that runs in time log log frac eps and returns with high probability cover whose cost is at most eps times the cost of the optimal Notice that the running time does not depend on the aspect ratio of the input metric This approximation algorithm is obtained by applying simple transformation involving discretization that reduces the approximate problem to several instances of the exact metric kappa cover problem with aspect ratio bounded by mbox poly In Section ref sec hardness we establish the of the cover problem for metrics induced by weighted planar In Section ref sec ddhardness we establish for metrics of constant doubling section Algorithm for General Metrics label sec metric We consider the cover problem whose input is metric where is set of points and is function giving the interpoint distances and an integer We assume without loss of generality that the minimum interpoint distance is Let Delta denote diam the maximum interpoint We present randomized algorithm that runs in log log Delta time and with high probability returns the best cover for We will assume below that leq The main idea for handling the metric case is that probabilistic partitions cite can play role analogous to the line separators were used in the geometric case cite gkk To formalize this let denote some subset of such that diam geq and consider the following randomized algorithm taken from cite that partitions into sets of diameter at most diam noindent partition begin enumerate item Let pi denote random permutation of the points in item Let beta denote random number in the range diam item Let leftarrow item For leftarrow mbox to do begin enumerate item Let leftarrow in pi leq beta item Let leftarrow setminus end enumerate end enumerate begin algorithm ht caption partition begin algorithmic STATE Let pi denote random permutation of the points in STATE Let beta denote random number in the range diam STATE Let leftarrow FORALL leftarrow mbox to STATE Let leftarrow in pi leq beta STATE Let leftarrow setminus ENDFOR end algorithmic end algorithm Since each is contained in ball of radius at most diam we have that diam leq diam also partition Let us say that ball subseteq is em cut by this partition of if there are two distinct indices and such that cap cap neq emptyset and cap cap neq emptyset The main property that the probabilistic partition enjoys is encapsulated by the following lemma whose proof follows via the methods of Fakcharoenphol et cite begin lemma Let subseteq be some ball of radius The probability that is cut by the partition of output by partition is at most frac diam log label lem metric partition end lemma begin proof Let ldots denote the ordering of the points in according to increasing order of distance from broken We may assume that the lemma trivially For each let denote the distance to the closest furthest point in the triangle inequality it follows that leq We say that pi em settles if is the first index for which some point in Note that exactly one point in settles We say that pi em cuts if pi settles and at least one point in The probability that is cut by the partition equals sum Pr pi mbox cuts sum Pr mbox cuts The event that cuts requires the occurrence of two events the event that beta lands in the interval and the event that appears before ldots in the ordering pi Using independence begin eqnarray Pr mbox cuts leq Pr Pr Pr Pr leq frac diam cdot frac end eqnarray So the probability that is cut by the partition is bounded above by frac diam sum frac frac diam log end proof Let denote the optimal kappa cover for some kappa The following states the main structural property that begin lemma The expected number of balls in that are cut by partition is log Consequently the probability is at least the number of balls in that are cut by partition is at most log where is some label lem metric structure end lemma begin proof The expected number of balls in cut is equal to sum in Pr mbox is cut log sum in frac radius diam log frac cost diam The Lemma follows by observing that cost leq diam since can be covered by single ball of radius diam end proof subsection The Randomized Algorithm We describe recursive algorithm bccompute that takes as arguments set subseteq and an integer leq kappa leq and returns with high probability an optimal kappa cover for We begin by noting that we may restrict our attention to balls whose radius equals for some in Henceforth in this section we only refer to this set of For easing the description of the algorithm it is convenient to add to this set of balls an element cal whose cost is infty Any subset of this enlarged set of balls that includes cal will also have cost of infty noindent bccompute kappa begin description item If return the empty item Otherwise if kappa return cal not possible to cover item Otherwise if diam leq directly compute the optimal solution in polynomial In this case the optimal solution has cost at most so it consists of set of at most balls of radius plus zero or more singleton The number of such solutions is polynomial and our algorithm checks them item If diam then the algorithm repeats the following steps log times item hspace in Call tt Partition to obtain partition of into two or more Let ldots tau denote the nonempty sets in this item hspace in For each set of at most log balls where is the constant in Lemma ref lem metric structure item hspace in Let each leq leq tau and leq kappa leq kappa recursively call bccompute set returned in the local variable best item hspace in For leq leq tau let cup tau tau and Set local variables mybest tau kappa to equal mybest item hspace in For leftarrow tau mbox down to and leq kappa leq kappa set local variable mybest kappa to be the lowest cost solution among best cup best kappa kappa leq kappa item hspace in Let denote the lowest cost solution mybest kappa cup over all choices of tried above with leq kappa item Return the lowest cost solution obtained over the Theta log end description begin algorithm ht caption bccompute kappa begin algorithmic STATE If return the empty STATE Otherwise if kappa return cal not possible to cover STATE Otherwise if diam leq directly compute the optimal solution in polynomial In this case the optimal solution has cost at most so it consists of set of at most balls of radius plus zero or more singleton The number of such solutions is polynomial and our algorithm checks them FORALL log iterations STATE Call tt Partition to obtain partition of into two or more Let ldots tau denote the nonempty sets in this FORALL sets of at most log balls where is the constant in Lemma ref lem metric structure STATE Let each leq leq tau and leq kappa leq kappa recursively call bccompute set returned in the local variable best STATE For leq leq tau let cup tau tau and Set local variables mybest tau kappa to equal mybest FORALL leftarrow tau mbox down to and leq kappa leq kappa STATE set local variable mybest kappa to be the lowest cost solution among best cup best kappa kappa leq kappa STATE Let denote the lowest cost solution mybest kappa cup over all choices of tried above with leq kappa ENDFOR ENDFOR ENDFOR STATE Return the lowest cost solution obtained over the Theta log end algorithmic end algorithm paragraph Running To solve an instance kappa of the problem with diam geq the algorithm makes log recursive calls to instances with diameter at most diam log It follows that the running time of the algorithm invoked on the original instance is log cdot log Delta paragraph We will show that bccompute computes an optimal cover for with high We begin by noting that the base case instances kappa are solved correctly with probability of We will show by induction on that any instance kappa with geq is optimally solved with probability of at least frac If the kappa instance happens to fit in one of the base cases we are Otherwise consider an optimal kappa cover opt for It is enough to show that bccompute kappa returns kappa cover of cost at most mycost opt with probability of at least frac By Lemma ref lem metric structure the probability is at least frac that one of the log calls to tt Partition returns partition ldots tau of into tau geq sets such that no more than log balls in opt are cut by the Assuming this good event happens fix such partition ldots tau of and consider the choice of that exactly equals the balls in opt that are cut by the The balls in opt setminus are not cut by the partition and can be partitioned into subsets opt ldots opt tau some of these can be empty such that for any ball in opt we have cap subseteq It is easy to see that opt must be an optimal opt cover for bccompute if The probability that bccompute opt cover for prod geq prod frac geq frac Assuming this second good event also happens it follows from an easy backwards induction on that mybest sum opt is sum opt cover for with cost at most sum mycost opt Thus mybest kappa is an kappa cover for sum tau with cost at most sum tau mycost opt Thus mybest kappa cup is kappa cover of with cost at most mycost opt The probability of this happening is at least the product of the probabilities of the two good events we assumed which is at least frac This completes the inductive step because bccompute kappa returns the lowest cost kappa cover among the log kappa covers that it begin theorem label thm metric There is randomized algorithm that given set of points in metric space and an integer runs in log cdot log Delta time and returns with probability at least cover for Here Delta is an upper bound on the ratio between the maximum and minimum interpoint distances within end theorem section Approximation in Time label sec qptas In this section we describe how the result of the previous section can be used to obtain time approximation scheme for the cover That is we describe randomized algorithm that takes as input set of points in metric space an integer and parameter eps and returns cover whose cost is with probability at least factor of eps of the optimal cover the running time of the algorithm is log log frac eps Since our algorithm is rather standard way of reducing the problem to instances whose aspect ratio is bounded by polynomial in frac eps we describe it here only for completeness and only sketch the proof of We assume without loss of generality that leq leq Let lambda denote the cost of the optimal cover of We first obtain crude approximation to lambda by computing in polynomial time approximation lambda to the optimal center cost for cite Observe that frac lambda leq lambda leq lambda We then compute minimum spanning tree of under the input metric Let ldots tau denote the connected components obtained by removing from the minimum spanning tree all edges of length strictly greater than lambda Notice that the distance between any two points that are in different components is strictly larger than lambda This implies that any ball in an optimal cover of is contained within one of the The maximum distance within each is at most lambda For each and for each leq leq we compute an approximation best to the optimal cover of as described Compute minimal subset subseteq with the property that for each in there is in such that frac eps lambda The aspect ratio of is bounded by polynomial in frac eps Run the algorithm of the previous section log times with and and take the minimum cost This is with probability at least frac the optimal cover of Expand the radii of each of these balls by frac eps lambda to obtain cover best of It is not hard to see that with probability at least frac the cost of best exceeds the cost of the optimal cover of by at most frac eps lambda Let and let cup for leq leq tau note that tau Assign best to best for each leq leq For each cdots tau and for each leq leq set best to be the solution among best cup best leq leq We return best tau as our cover of It can be verified that with probability at least frac eps lambda leq eps lambda of lambda the cost of an optimal begin theorem label thm qptas There is randomized algorithm that takes as input set of points in metric space an integer and parameter eps and returns in log log frac eps time cover for whose cost is with probability at least multiplicative factor of eps of the optimal end theorem section of Cover label sec hardness natural question is whether there is quasipolynomial time algorithm in that returns the exact optimum for the case where the input metric has unbounded aspect This is unlikely to be the case because as we show in this section the general problem is even in case of planar We give reduction from version of the planar SAT problem the textit SAT This problem was shown to be in cite Planar SAT is defined as follows Let Phi be an instance of SAT with variable set ldots and clauses ldots such that each clause consists of exactly Define textit formula graph Phi with vertex set bigcup and edges bigcup where le le footnote Here we assume that the arithmetic wraps around and contains or overline SAT formula Phi is called textit planar if the corresponding formula graph Phi is The edge set defines cycle on the vertices and thus divides the plane into exactly Each node in lies in exactly one of those two In the textit SAT problem we have the additional restriction that there exists planar drawing of Phi such that if and We describe simple transformation easily seen to be effected by polynomial time algorithm from such textit SAT instance to an instance of finding an optimal cover in metric induced by weighted planar graph The transformation has the property that there is cover in the metric of cost at most if and only if the original textit SAT instance is We set The vertex set of the graph is union of sets set overline ldots overline that can be identified with the set of variables of the textit SAT instance with each variable occurring twice once as positive literal and once as negative literal set ldots that can be identified with the set of clauses of the textit SAT instance and sets ldots where each consists of To obtain the edge set we add an edge between each vertex and overline in with weight for leq leq For each vertex in we add an edge between and every vertex in of weight for le le Analogously we add an edge between each vertex overline and every vertex in again of weight In addition we add edges between every vertex in and every variable vertex or its negation overline whichever appears in it of weight Note that this graph is planar this follows from the of the SAT and it maintains the property of Phi by which all clauses in which the variable appears lie in one face of the plane and those in which overline appears lie in the other This is because the only change we make in the planar Phi is that we delete the edges between vertices and In addition we expand every vertex into gadget that consists of and overline with an edge between them and other vertices of the set with edges incident on both and It is easy to see the resulting graph is planar and satisfies the above property of clause separation according to the sign of variable that graph representing textit SAT instance would See Figure ref figure NPC for an begin figure htpb centering begin tabular hspace linewidth includegraphics height linewidth includegraphics width linewidth end tabular caption The gadget for variable in Phi planar embedding for Phi and construction of the corresponding instance of clustering All for the variable The optimal cover is highlighted with grey blobs Phi neg vee vee wedge vee neg vee neg wedge vee neg vee neg wedge vee neg vee Satisfying assignment Weight of the covering is exactly label figure NPC end figure begin claim Any cover of whose cost is at most includes for each leq leq ball centered at either or overline with radius at least end claim begin proof Consider any cover of and let be the largest index such that there is no ball in the cover centered at either or overline and having radius at least So for each leq leq there is ball in the cover centered at either or overline and having radius at least Since has points in it there is point in that is not the center of any ball in the Let be some ball in the cover that covers If for some leq leq then has radius at least cdot In this case the cover has cost at least cdots cdot If neq for any leq leq then the radius of is at least since the distance of from any point other than and overline is at least cdot Thus in this case too the cover has cost at least cdots cdot end proof Now suppose the original textit SAT instance is yes So there is an assignment of truth values to cdots such that all clauses in are Consider the set of balls ldots where is centered at or overline whichever is satisfied by the assignment and has radius It is easily checked that these balls form cover of of cost cdots Now suppose the original textit SAT instance is no We claim that any cover of has cost strictly greater than in this Suppose this is not the case and consider cover of cost at most As consequence of the claim such cover must consist of balls ldots where is centered at either or overline and has radius precisely Since these balls must cover each vertex in it follows that the assignment of truth values to variables in which comprises of being true if the ball is centered at and false if it is centered at overline satisfies all clauses in This contradicts the supposition that the original textit SAT instance is no begin theorem label thm hardness The decision version of the problem of computing an optimal cover for an point planar metric is end theorem section The Doubling Metric Case label sec ddhardness We now consider the cover problem when the input metric has doubling dimension bounded by some constant rho geq The doubling dimension of the metric is said to be bounded by rho if any ball in can be covered by rho balls of radius enough constant rho the cover problem for metrics of doubling dimension at most rho is The proof is by reduction from restricted version of SAT where each variable appears in at most clauses cite Let Phi be such CNF formula with variables ldots and clauses ldots We describe simple transformation easily seen to be effected by polynomial time algorithm from such SAT instance Phi to an instance of finding an optimal cover in metric induced by weighted graph The metric will have doubling dimension bounded by some The transformation has the property that there is cover in the metric of cost at most if and only if the original SAT instance is The transformation is similar to the one in the previous section with some modifications to ensure the doubling dimension We set The vertex set of the graph is union of sets set overline ldots overline that can be identified with the set of literals in Phi set ldots that can be identified with the set of clauses of Phi and sets ldots where each consists of vertices ldots To obtain the edge set we add an edge between and overline with weight for leq leq We add an edge between and every vertex in of weight for le le Analogously we add an edge between overline and every vertex in again of weight In addition we add edges between every vertex in and every literal that appears in the clause If the literal is either or overline the weight of the corresponding edge is Finally for each leq leq and each leq leq we add an edge of weight Figure ref figure doubling for an illustration of the begin figure htpb centering begin tabular hspace linewidth Configuration includegraphics height linewidth includegraphics width linewidth Configuration includegraphics height linewidth includegraphics width linewidth Configuration includegraphics height linewidth includegraphics width linewidth Configuration includegraphics height linewidth includegraphics width linewidth end tabular caption The gadget for the variable in Phi Each edge between and has weight exactly the number of representation of an instance of clustering on doubling metric constructed from an instance of Phi All for variable The optimal cover is highlighted with grey blobs Phi neg vee vee wedge vee neg vee neg wedge vee neg vee neg wedge vee neg vee Satisfying assignment Weight of the covering is exactly label figure doubling end figure begin lemma There is constant rho geq so that the doubling dimension of the metric induced by the graph is bounded by rho end lemma begin proof Let be some ball in the If then either the ball consists of singleton vertex or subseteq for some and the subgraph of induced by is In either case it is easily verified that balls centered within and having radius We therefore consider the case geq Let be the largest integer that is at most such that leq For each in we place balls of radius at overline cap ii clause vertices incident to or overline that are in and iii points of cap so that these balls cover cap this is possible because cap induces path of length at most In addition if in for some we place balls of radius these balls cover cap Finally we place ball of radius that these cover Let denote the set of centers at which we have placed Let in be point that is not in or in for in or in if in Fix shortest path from to and let on this path that is in We first observe that none of the internal vertices on the path from to is in for any Furthermore if in for some then by assumption not in Thus all edges of the subpath from edge can have weight or greater because if leq No such edge can have weight for in because otherwise the endpoint of the edge closer to would be in Thus every edge on the subpath from It is easy to see that the subpath contains at most edges of weight for any leq Thus the weight of the subpath from cdots cdot So is in the ball of radius end proof begin claim Any cover of whose cost is at most includes for each leq leq ball centered at either or overline with radius at least label ddhardnessclaim end claim begin proof Consider any cover of and let be the largest index such that there is no ball in the cover centered at either or overline and having radius at least So for each leq leq there is ball in the cover centered at either or overline and having radius at least If some point in is covered by some for leq leq then has radius at least cdot In this case the cover has cost at least cdots cdot If some point in is covered by ball different from the not centered at any of the points in then the radius of is at least Note that by assumpion can or overline Thus in this case too the cover has cost at least cdots cdot The only remaining case is when each point in is covered by some ball centered at point in Since there can be at most balls in the cover centered within the sum of the radii of these balls is at least frac left frac frac right cdot The cover has cost at least cdots cdot end proof We now argue that the transformation has the property that there is cover in the metric of cost at most if and only if the original SAT instance Phi is Suppose that Phi is Then we can choose for each leq leq exactly one of or overline such that within each clause of Phi there is chosen Consider the set of balls ldots where has radius and is centered at or overline whichever was These balls form cover of with cost For the reverse direction consider cover of the target metric space of cost at most It follows from Claim ref ddhardnessclaim that the cover must consist of balls ldots where is centered at either or overline and has radius precisely Let us choose the literals corresponding to the centers of these For each we clearly choose exactly one of of overline Consider any clause vertex It must be covered by at least one of the balls Given the radii of the balls the only balls that can cover are the ones centered at literals contained in the It follows that our set of chosen literals contains for each clause in Phi at least one of the literals contained in the Thus Phi is begin theorem For large enough constant rho geq the decision version of the cover problem for metrics of doubling dimension at most rho is label thm ddhardness end theorem section Acknowledgements We thank Chandra Chekuri for his suggestion to study the bibliographystyle plain bibliography clusterbib end document