Discovery potential of top-partners in a realistic composite Higgs model with early LHC data

Composite Higgs models provide a natural, non-supersymmetric solution to the hierarchy problem. In these models, one or more sets of heavy top-partners are typically introduced. Some of these new quarks can be relatively light, with a mass of a few hundred GeV, and could be observed with the early LHC collision data expected to be collected during 2010. We analyse in detail the collider signatures that these new quarks can produce. We show that final states with two (same-sign) or three leptons are the most promising discovery channels. They can yield a 5 sigma excess over the Standard Model expectation already with the 2010 LHC collision data. Exotic quarks of charge 5/3 are a distinctive feature of this model. We present a new method to reconstruct their masses from their leptonic decay without relying on jets in the final state.


Introduction
The Standard Model (SM) of particle physics has so far been experimentally confirmed in many of its aspects. Yet, a fundamental piece is still missing; namely, the understanding of the mechanism responsible for the breaking of the SU (2) L × U (1) Y electroweak symmetry. The 'minimal' description provided in the SM consists in the introduction of a complex scalar doublet ϕ. Electroweak symmetry breaking (EWSB) is achieved assuming that this field acquires a non-zero vacuum expectation value. After EWSB, only one physical degree of freedom survives: the Higgs boson. Experimental results point towards a relatively light particle. If the Higgs boson exists, it should be within reach of the LHC.
A light fundamental scalar is not natural, though. Radiative corrections are expected to drive its mass close to the Planck scale (or to the scale of onset of some new physics). An elegant way to prevent this is through symmetries. The most famous example is supersymmetry, that exploits the cancellation between the contributions given by fermions and by bosons to the Higgs self-energy. This is not the only solution. Composite Higgs models [1,2] provide an alternative mechanism to explain the lightness of the Higgs boson. In these models the Higgs boson arises as a composite state of some new, strongly interacting sector. The new sector possesses a global symmetry that is spontaneously broken at some scale f . This symmetry breaking gives (at least) four Goldstone bosons that can be arranged into a complex SU (2) L doublet, which we identify with the Higgs doublet. Upon gauging the electroweak symmetry group, the Higgs boson acquires a potential, and hence a mass. Since we are interested in the low-energy regime of this strongly coupled theory, we can adopt an effective Lagrangian approach [3]. We will consider the minimal symmetry breaking pattern SO(5)/SO(4) [4], that also preserves custodial symmetry.
The Higgs boson is not necessarily the only composite state of the new sector to be relatively light. In particular, the mixing of the top with composite quarks can explain the large top mass. These composite quarks can give significant contributions to the electroweak precision observables, thus modifying the region of parameter space that is allowed for these models [5,6,7,8,9,10]. For this study, we focus on a non-minimal realization of this model, where two multiplets of top-partners in the fundamental representation of SO(5) are introduced [10].
The LHC is expected to run throughout this year at a center of mass energy √ s = 7 TeV, opening an unprecedented window for searches of new phenomena in particle physics. The first glimpse of new physics could well be due to new heavy quarks, which are a rather common feature of Beyond the Standard Model (BSM) scenarios. The discovery potential of such heavy quarks has been studied in the context of little and littlest Higgs models [11,12,13], warped extra dimensions [14,15], fourth generation quarks [16,17,18,19] and generic vector-like quarks in isospin singlets or doublets and with different hypercharge [20]. If new quarks are observed, we will need a way to understand which model they point at. For this reason we focus on collider signatures that can be considered distinctive of the composite Higgs model under study. In particular, we look for configurations in which either two charge 5/3 quarks or a full, almost degenerate 4 of SO(4) lie within the reach of the 2010 runs at the LHC. For these distinctive signatures, we discuss the phenomenology and study the discovery potential on the basis of 200 pb −1 of collision data at √ s = 7 TeV. We study the event yield with respect to the SM expectation in various multi-lepton channels for different points in the parameter space. We outline a new method to reconstruct the mass of a charge 5/3 top-partner exploiting its leptonic decay channel. We show that with only about 50 signal events in the same-sign di-lepton final state, this method can be used to judge if the signal is mainly due to one charge 5/3 quark or rather produced by the contributions from multiple top-partners.
The paper is organized as follows. In section 2 we review the composite Higgs model of ref. [10]. In section 3 we discuss the general features of the phenomenology of the two distinctive signatures of the model. In section 4 we describe how the model was implemented in an event-generator to allow for consistent event generation within a specific point in parameter space. The generation of signal and background samples and the fast detector simulation are discussed in section 5. Section 6 is dedicated to the description of the discovery potential in multi-lepton final states. We focus on two particularly interesting points and discuss their phenomenology and the discovery potential by means of a robust cut-based analysis. In section 7, we present a new method to reconstruct the mass of a charge 5/3 top-partner via its leptonic decay.

The Higgs sector
We consider a strongly interacting sector that can be described at low energy by a nonlinear sigma model. The cutoff of this model is Λ U V = 4πf / √ N G , where N G is the number of Goldstone bosons and f is the scale at which the SO(5) → SO(4) breaking occurs. This scale is assumed to be larger than the EWSB scale v = 174 GeV. Too large values of f would introduce a substantial fine-tuning of the model [6]; on the other hand, if the scale of new physics is too low, large contributions to electroweak parameters and flavour physics are introduced. For these reasons we set f = 500 GeV, which corresponds to a ∼ 10% fine-tuning [6].
The SO(5) → SO(4) breaking is realized through a scalar φ subject to the constraint In the non-linear representation where φ 0 = (0, 0, 0, 0, f ) is the vacuum state that preserves SO(4), Tâ are the four broken generators and hâ the corresponding Goldstone bosons. Expanding the exponential, we get We denote by ϕ andφ the SM Higgs doublets with hypercharge +1/2 and -1/2. Finally, we gauge SU (2) L and the T 3 R generator of SU (2) R . This explicitly breaks the SO(5) symmetry and induces a potential for the Higgs boson, that becomes a pseudo-Goldstone boson. Since the potential is generated at loop level, the mass of the Higgs boson is expected to be light. Throughout this study we set m h = 120 GeV.
The usual relation for the mass of the W boson, holds provided that we set For s α = 0, φ = φ 0 , electroweak symmetry remains unbroken and the gauge bosons are massless, while s α = 1 corresponds to maximal EWSB. Higgs compositeness, together with the requirement for canonical normalization of the kinetic term, leads to a rescaling of the physical Higgs field by a factor c α = 1 − 2v 2 /f 2 . This implies an analogous reduction of the couplings between the Higgs and the gauge bosons and gives in turn some dependence of the electroweak precision test (EWPT) observables on the UV cutoff of the model. In fact, in the SM the Higgs boson regulates the logarithmic divergencies of the gauge bosons self-energies. In the heavy Higgs approximation, the Peskin-Takeuchi parameters S and T [21] read where m h is the mass of the Higgs boson and a S,T , b S,T are constants. The reduction of the Higgs boson couplings to the gauge boson spoils the cancellation of the logarithmic dependence on the UV cutoff, so that now This can be taken into account when one computes EWPT observables by replacing the Higgs mass with an effective mass [6] As a consequence, we obtain an extra positive contribution to S and a negative contribution to T , where c W is the cosine of the Weinberg angle and m h,ref is the Higgs mass used in the electroweak fit. On top of this, one can expect the strongly coupled dynamics itself to affect EWPT observables through some higher-dimensional operator. This model includes custodial symmetry to protect the T parameter. A reasonable estimate of the contribution to S is [6] Combining the effect from Higgs compositeness and higher-order operators, one typically obtains too large contributions to the S and T parameters and the model is not compatible with current EWPT constraints [6,9,10]. Yet, one can expect other composite states to be as well below the cutoff of the effective theory. Here we will consider the case of fermionic resonances and analyze how they can improve the agreement of the model with observations.

The fermionic sector
We consider vector-like resonances of composite fermions transforming in the fundamental representation of SO (5). We denote them by Ψ i , with the index i running over the multiplets included below the cutoff. The corresponding mass Lagrangian is [9,10] where y ij is a Hermitian matrix. Under the electroweak gauge group, Ψ decomposes as Ψ = (Q, X, T ), where Q and X are SU (2) L doublets with hypercharge +1/6 and +7/6 respectively, and T is a SU (2) L singlet with hypercharge 2/3. The X doublet introduces another quark of electromagnetic charge 2/3, which can mix with the top, and a quark with charge 5/3. Such quarks are one of the distinguishing features of the model. The SM quarks q L and t R have the same quantum numbers as Q and T , respectively. The most generic interaction between the top sector and the new quarks is therefore of the form Combining eqs. (2.10) and (2.11) we obtain the mass matrices for the quarks of charge 2/3 and for the quarks of charge -1/3. The indices u and d denote respectively the charge 2/3 and -1/3 components of the doublet indicated. In the case of more fermionic resonances, the mass matrices are to be understood as in block form. Note that in eq. (2.13) we introduced an explicit SO(5) breaking term to give a mass to the bottom quark. We could also generate a mass for the bottom quark in an SO(5) preserving fashion. For example, we could couple the bottom quark to some new multiplets of SO(5), as we did for the top quark. This would come at the expense of introducing extra particles. Since the mass of the bottom quark is small, we do not expect large effects from bottom compositeness. We opt therefore for a minimal description, in which the bottom mass is generated with the current particle content of the model. The couplings of the fermions to the Higgs boson are obtained expanding the second term in eq. (2.10) around the vev of φ. For example, the couplings of the charge 2/3 quarks to the Higgs boson are given by Here we already included the suppression factor c α in the Higgs couplings.
As it was emphasized in refs. [6,8,9,10,22], composite Higgs models with only one set of fermionic resonances below the cutoff of the effective theory are very constrained from EWPT. The charge 5/3 quark is the lightest new particle predicted, with a mass m 5/3 500 GeV. Above it and rather close in mass (∆m 100 GeV) is a charge 2/3 quark. The other quarks are typically much heavier. The most relevant collider signatures therefore come from the production and decay of the charge 5/3 quark. These signatures have been studied in detail in [23,24].
The scenario dramatically changes if we include a second set of composite fermions below the cutoff [10,25]. Constraints from EWPT become less stringent, and many different mass patterns are allowed in the region accessible with first LHC data. In the next sections we discuss the collider signatures and discovery potential of this model.

Parameter-space scan
We scan over the parameter space of the model in order to find regions compatible with EWPT. From eqs. (2.10)-(2.14) we see that there are 6 variables parametrizing the fermionic sector with one multiplet below the cutoff, and 11 for the case of two multiplets. In both cases, s α is fixed through eq. (2.4), as we have v = 174 GeV and f = 500 GeV. We fix other two parameters in such a way to obtain the measured top and bottom masses [26,27]  This is more easily done if we factor out of the mass Lagrangian (2.12) (or eq. (2.13) for the bottom quark) one of the parameters, say M 1 (λ b ). Then we diagonalize the remaining part of the mass matrix and fix M 1 (λ b ) so that the mass of the lightest quark is m t (m b ). We are left with eight free parameters in the case of the two-multiplet model. We require that the resulting quarks contribute to EWPT observables in such a way to make the model compatible with observations. We use the same fit as in [10] to assess the agreement between a point in parameter space and experimental constraints. Furthermore, we exploit the value of χ 2 that parametrizes this comparison in order to drive our Vegasbased analysis [28]. The procedure is the following. We use Vegas to randomly sample on the eight-dimensional parameter space. For each point sampled, the value returned to Vegas as an 'integrand' is 1/χ 2 . By construction, Vegas will focus its sampling on the points that lead to a higher value of the integrand 1/χ 2 , i.e., to a better agreement with EWPT. We retain points that are compatible with EWPT at 99% C.L. . We further refine our search asking for signals which are characteristic of the twomultiplet model. As we said, in the case of only one multiplet below the cutoff, the mass spectrum of the new resonances is typically rather spread out. The charge 5/3 quark has a mass of some few hundred GeV, while the charge -1/3 quark is very close to the cutoff. A signature of a two-multiplet model would then be a charge 5/3 quark in a 4 of SO(4), i.e. very close in mass to two charge 2/3 and one charge -1/3 quarks. We require the mass difference among these particles to be 60 GeV, so that decays through offshell gauge bosons are strongly suppressed. Another typical signature of the model is the presence of both the charge 5/3 quarks. We take these two signatures as neat indications of this particular model and focus on their discovery potential with early LHC data. With 200 pb −1 of collision data at 7 TeV, a significant number of quarks with masses below ∼ 500 GeV should be produced 1 . We will set this value as an upper bound in our search for the two distinctive patterns that we just discussed. Direct searches have set lower bounds for the mass of new quarks. We use the most recent results from Tevatron on the exclusion of a charge 5/3 top-partner [29]. For this quark, the only decay channel is tW + , as in the reference. We do not use instead the most stringent exclusion bounds on the charge -1/3 and 2/3 quarks, [29,30] and [31], as they assume the new quarks to decay entirely through either W or Z. This is not the case in our model. Therefore, as lower mass bounds we set

Phenomenology of the two-multiplet composite Higgs model
In this section, we outline some of the basic features of the phenomenology that we expect from the two-multiplet model. This phenomenology is largely determined by the mass hierarchy of the 10 new quarks. The mass eigenstates (ordered according to increasing mass) of the new top-like quarks will be named t 1 , t 2 , t 3 , t 4 , t 5 and t 6 , whereas the charge 5/3 quarks and the bottom-like quarks will be denoted as x 1 , x 2 and b 1 , b 2 , respectively.
Dominant decay modes. The process gg → qq plays the dominant role in the production of heavy top partners at the LHC. Therefore, we analyse the decay chains that start from pair produced quarks. Figures 1 and 2 show two Feynman diagrams for a possible decay chain of a t 1t1 and a x 1x1 pair. In all points of the parameter space that satisfy the selection criteria of section 2.3, the two lightest new quarks are x 1 and t 1 . The signatures from this model that could be observed early at the LHC will be therefore dominated by the decay modes of these two quarks. We also find that their mass difference is always too small for the heavier of the two to decay into the lighter. Consequently, only the following channels are accessible for the decay of the two new quarks 1 For reasons that we will explain later, we focus on decay channels which produce at least two charged leptons in the final state. We estimate a leading order cross section of 207.8 ± 0.5 fb for pair production of a quark with a mass of 500 GeV at √ s = 7 TeV (the stated uncertainty is due to statistics only). Taking into account the branching ratio for the W and Z bosons to decay leptonically, we cannot expect to observe more than a handful of events in the considered channels for an integrated luminosity of 200 pb −1 . Figure 1: Example Feynman diagram for t 1 pair production with a possible decay chain. The lightest bottom-like quark b 1 , which we always find to be heavier than t 1 and x 1 , decays predominantly via The other possible decays, b 1 → bZ and b 1 → bh, are strongly suppressed because of the small off-diagonal couplings. Such small mixing is a consequence of the fact that the bottom quark is mainly fundamental. We find that the decay b 1 → t 1 W − is not kinematically accessible.
Phenomenology of the 4 of SO(4). We consider x 1 , t 1 , t 2 and b 1 to form a 4 of SO(4) when the maximal mass difference among the quarks is 60 GeV. In this way none of the new quarks can decay into another one, since decays through the W , Z and h bosons are not kinematically allowed. Consequently, all these four new quarks can only decay to the SM top and bottom quarks.
Phenomenology of the XX. The phenomenology can be much richer if both charge 5/3 quarks are below 500 GeV and no restriction on the maximal mass difference among the new quarks is imposed. However, the exclusion limits from the CDF experiment in combination with the upper bound of 500 GeV for early detection imposes strong restrictions on the cascade decays that are kinematically allowed. Often, the mass differences of these quarks are such that they only decay via the channels given in (3.1) and (3.2). The two lightest quarks are x 1 and t 1 , where either of the two can be the lighter one. Going up in mass, we find b 1 and t 2 , or t 2 and b 1 . The next heavier quark is either x 2 followed by t 3 , or vice versa. The most common hierarchy is A rarer mass pattern is The quarks that do not appear in these relations have masses above 500 GeV. An example for a cascade decay accessible for various points is Both this cascade decay and the dominant decay modes from eqs. (3.1) and (3.2) suggest that the model can easily yield multi-lepton final states plus many jets. The SM is expected to produce only few events with such a signature. For this reason, we focus our study on final states with at least two leptons and multiple jets.

Implementation
We implement the model in MadGraph/MadEvent 4 (MG/ME) [32]. MG/ME is a matrixelement based tree-level event generator that is capable of generating amplitudes and events for any given model describing high energy physics interactions. For such an event generator to be able to cope with a new physics model, the couplings and interactions of the new particles as defined in section 2.2, in addition to the (modified) Standard Model interactions, have to be translated into a specific form. In MG/ME these couplings are defined according to the convention from HELAS [33]. The implementation of these couplings is done by means of the usermod v1 framework. The decay widths and branching ratios of all unstable particles are calculated with BRIDGE [34]. We implement the model taking into account not only the couplings of the newly introduced particles, but also the changes in the Standard Model couplings arising from Higgs compositeness and from the mixing of the SM quarks with the new states.

Benchmark points in the composite Higgs model parameter space
Since the scan over the parameter space was optimized to search for points that satisfy the selection criteria of section 2.3, the points returned are not necessarily very different from each other. For this reason, we arrange the points in groups that are expected to exhibit a similar phenomenology and focus on the representatives of these groups for a detailed study. We assign two points to the same group if all branching ratios of the new quarks with a mass below 500 GeV are of similar magnitude. When a group contains more than one point, we use the mass of the lightest new quark m q,low to select two representatives: the point with the lowest value of m q,low and the one with largest value of m q,low . In the following, these two representatives will be referred to as low benchmark point (lBP) and high benchmark point (hBP) of a group. For the discussion of the discovery potential, we will restrict ourselves to the 30 benchmark points obtained in this way.

Event generation
For each benchmark point, we produce 10 5 signal events with MG/ME. In particular, we generate events for pair production of all new quarks that have a mass below 500 GeV. The outcome of the MG/ME event generation is a Les Houches event file [35], which we process with Pythia 6 [36] for the showering and hadronization of the partonic events and for the simulation of the underlying event. Table 1 lists the mass of the lightest particle m q,low and the total leading order cross section for pair production of all considered quarks for each benchmark point.
As already mentioned in section 3, we focus on final state signatures with at least two charged leptons and multiple jets. Consequently, every SM process that can lead to such final states represents a possible background. Table 2 lists the leading order cross section and the number of generated events for all relevant background processes. Note that single top production was neglected for this study. Its contribution is expected to be within the uncertainty of the pair production cross section. In order to estimate correctly the momentum spectrum of the jets in the transverse plane of the detector, we generate all partonic multiplicities needed for the SM backgrounds in MG/ME and use Pythia for the parton shower. The overlap between the phase-space description of the matrixelement calculator and the parton shower is removed using the MLM parton-jet matching prescription [37]. For the signal samples, the jets produced by the parton shower in the decay of very heavy particles are known to be satisfactory [38]. The underlying event is simulated with Pythia. For all samples, we use the parton distribution function set CTEQ6L1.
We would like to point out that the samples for the background processes were generated within the SM. We did not take into account the changes of the SM couplings process cross section (pb) # of generated events introduced in the composite Higgs model. These modifications differ for each point in the parameter space of the model, but we expect the resulting effects on the SM backgrounds to be small. Also note that we only consider pair production of the new quarks for the signal samples. We neglect the contributions of other processes (such as single quark production) to the signal yield in multi-lepton final states. These additional contributions to the signal would enhance the excess over the SM expectation.

Detector simulation
We use DELPHES [40] for the simulation of the response of a typical LHC detector. DELPHES is a recently developed simulation framework for a generic collider experiment.
As CMS is one of the two general purpose detectors at the LHC, we use the CMS detector card for the DELPHES detector simulation. We reconstruct the jets with the anti-k t jetclustering algorithm [41] and use a cone radius ∆R = ∆φ 2 + ∆η 2 = 0.5. φ denotes the azimuthal angle and the pseudorapidity η is defined as η = − ln tan θ 2 , where θ is the angle between the beam pipe and the trajectory of the particle. To adapt the performance of DELPHES to our needs, we make the following modifications.
• In DELPHES, the possibility of a jet being reconstructed as an electron is not taken into account. This, however, is expected to be a relevant source of fake electrons. In ref. [42], the probability for a jet to be reconstructed as an isolated, identified electron is estimated to be at a level of 6 · 10 −6 . We use this result and add jets to the isolated electron collection with the stated global probability.
• We set the global tracking efficiency to 100% for tracks with a transverse momentum of at least 0.9 GeV, but remove electrons from the electron collection with a probability of 10%.

Lepton and jet identification
We outline a robust and simple event selection that is suitable for early data from the LHC.
Charged lepton selection. For the electrons and the muons, we demand a transverse momentum p T > 20 GeV and a pseudorapidity |η| < 2.4. The first cut ensures a robust identification of electrons and muons, both offline and on trigger level, whereas the second cut is made to restrict the leptons to the volume of the tracker. For this study, we are interested in prompt leptons coming from vector boson decays. To discriminate against leptons coming from semileptonic hadron decays, we apply a relative isolation. In particular, we sum the p T of the tracks in a cone with ∆R < 0.3 around the electron or muon under analysis and require this value to be smaller than 5% of the lepton momentum.
Jet selection. To obtain a robust jet selection, we demand the p T of a jet to be larger than 50 GeV and require |η| < 3. The conservative choice of p T > 50 GeV is made to minimize the contribution of fake jets. The second cut marks the end of the electromagnetic and hadronic calorimeters. As electrons may be reconstructed as possible jet candidates, we reject those jets that are matched within ∆R < 0.2 to an isolated electron. The jet collection can be further cleaned from such electrons by requiring that the jets should have an electromagnetic fraction (electromagnetic over hadronic energy deposits) of less than 0.98.
Purity and efficiency of the lepton selection. Imposing a harder cut on the lepton isolation enhances the purity of the selection but causes the efficiency to decrease. The goal is to achieve a pure selection of prompt leptons without losing too much efficiency. By purity we define the number of isolated leptons matched to prompt MC leptons divided by the number of isolated leptons. The efficiency is defined as the number of isolated leptons divided by the number of MC prompt leptons. The number of matched isolated leptons is obtained by counting the ones that satisfy both criteria: • they have a prompt MC lepton within a cone of ∆R < 0.2 • the equation For the tt sample from table 2, we find an efficiency of 83% and a purity of 97% for the electrons. For the muons, we obtain an efficiency of 91% and a purity of 99%.
6. Discovery potential at the LHC

Identification of promising channels
After applying the lepton and jet selection defined in section 5.3, we investigate the number of events for a given jet multiplicity and lepton configuration for each background and signal process. The lepton configurations go from di-lepton events -same-sign (SS) or oppositesign (OS) -to events with up to five charged leptons in the final state. Each configuration, which is characterized by a certain lepton combination and jet multiplicity, is interpreted as a specific signal region with an associated cut efficiency. Since these cut efficiencies are  In figures 3 and 4 we plot the jet multiplicity versus the lepton configuration, respectively for the total SM background and for the signal for BP 10. We denote by SS the configurations in which all the leptons have the same charge. Configurations in which at least one lepton has a different charge are denoted by OS. In the OS di-lepton case, we also distinguish between the opposite-flavor (OF) and sameflavor (SF) configurations. For the bins in figure 3 for which zero MC background events were found, we calculate an upper limit of 2.13 background events with a confidence of 95%. This number is dominated by the contributions from W + jets and Z + jets due to their large cross sections and the limited MC statistics. In figure 5, we plot the number of signal events for BP 10 divided by the total number of background events. In terms of number of expected events over the SM background, we can see that the final states with SS di-leptons and OS tri-leptons are the most promising channels for a possible discovery with 200 pb −1 of collision data. This observation holds for all the 30 benchmark points. The decrease in the plotted S/B ratio for large jet multiplicities is not expected in collision data. This effect is due to the combination of a finite number of MC events with the calculation of an upper limit on the number of background events. In the light of the above discussion, we will focus on the four channels: SS di-lepton with 3 or 4 jets and OS tri-lepton with 2 or 3 jets.

Inclusive discovery potential
In order to quantify the discovery reach in the four channels above, we calculate the probability for the expected signal + background observation to be caused by a fluctuation in the background distribution. We use 2 log X as a test statistic, where X is the ratio of the likelihood function for the signal + background hypothesis H 1 to the likelihood function for the background hypothesis H 0 [43,44].
The likelihood ratio X i for the channel i can be defined as Here, s i and b i denote the number of signal and background events, respectively, and d i is the number of observed candidates. Since the statistic 2 log X for the outcome of multiple channels is the sum of the test statistics of the channels separately, we use 2 4 i=1 log X i for the combined four channels defined in section 6.1. We define the confidence level as where the probability sum assumes the presence of the background only. Note that the background confidence 1 − CL b expresses the compatibility of the observation with the background hypothesis, since CL b is the probability that the background processes would give fewer than or equal to the number of events observed. For this reason, we use CL b to quantify the discovery potential. The background confidence 1−CL b can be compared with the widely used notion of standard deviations (σ) by using the convention from ref. [27] 2 . There, a 3σ and 5σ excess beyond the background expectation corresponds to a one-sided background confidence level of 1 − CL b = 1.35 · 10 −3 and 1 − CL b = 2.87 · 10 −7 respectively. The distribution of the test statistic for H 0 and H 1 , often referred to as the test statistic probability density function (tPDF), are obtained by throwing Poisson numbers around s i + b i and b i as a replacement for d i . The confidence level CL b and its uncertainty process 2l SS + 3j 2l SS + 4j 3l OS + 2j 3l OS + 3j Z + jets  is calculated as follows. In the presence of data, CL b is given by the integral of the tPDF of the background hypothesis from −∞ to the measured value of 2 log X. For this study, we replace this value by the mean of the tPDF for the signal + background hypothesis to substitute collision data. The uncertainty on CL b is then obtained by changing the integration limit to the mean plus/minus one standard deviation of the signal + background tPDF. To claim a 5σ excess over the background expectation, we have to be sensitive to CL b at the order of 10 −7 . For this reason, we generate 10 9 pseudo-experiments for each of the tPDFs for H 0 and H 1 .
In table 3 we list the expected number of events for all background processes for the four channels considered. The corresponding values for the 30 signal benchmark points, including the results for the confidence level CL b , are given in table 7 of appendix A. Even in the worst case scenario, we expect a signal evidence of at least 3σ for 23 benchmark points. For 10 points among these 23, the central CL b value corresponds to an excess over the SM expectation of at least 5σ.

Discovery potential of two benchmark points
We now focus on the discovery potential of two promising benchmark points. One is BP 10, which has both x 1 and x 2 below 500 GeV; the other is lBP 18. Both benchmark points exhibit a 4 of SO(4) and have large cross sections (10.78 pb and 5.52 pb respectively), yielding a relevant excess over the SM background. We use a simple cut-based analysis and outline some features of their specific phenomenology.
Phenomenology of the two benchmark points. As we can see from table 1, the lightest new quark for BP 10 is the top-like t 1 with a mass of 316.6 GeV. The full mass hierarchy for the new quarks with a mass below 500 GeV reads where the masses are given in GeV. For this point, the mass difference between the t 4 quark and the other quarks is such as to allow the t 4 to decay into most of them. The full list of branching ratios for all the above listed quarks can be seen in table 4. For lBP 18, the mass hierarchy is Given that the maximal mass difference among these quarks is about 40 GeV, their decay modes are described by eqs. (3.1) and (3.2). In table 5 we list the branching ratios corresponding to these decay modes.
Cut-based analysis. We outline a simple, cut-based analysis for the two benchmark points to illustrate a complementary way to investigate the discovery potential of the model. For this analysis we use the lepton and jet selections defined in section 5.3. Given the results from tables 3 and 7, we ask as preselection to have at least two same-sign (isolated) leptons (e, µ) with p T > 20 GeV and |η| < 2.4. Figure 6 shows the expected number of jets per event for BP 10, lBP 18 and the SM background after preselection for an integrated luminosity of 200 pb −1 . Based on these distributions we impose a cut on at least 2 jets, where the jets are requested to have p T > 50 GeV. As a next step, we make use of the variable h T , which is defined as the scalar sum of the transverse momentum of the selected jets and leptons per event. In  after imposing the preselection cut. Clearly, this variable can be used as a powerful cut to suppress the background contribution. For this reason, we require an h T > 300 GeV for the events to pass this cut. From figure 8 we see that the signal distributions of the p T of the hardest jet peak at larger values than the corresponding SM background distribution. Consequently, we impose a cut at 90 GeV on this variable. Summarizing, we impose the following cuts: 1. at least 2 jets with p T > 50 GeV, 2. h T > 300 GeV and 3. p T of the leading jet > 90 GeV.
In table 6 we list the efficiencies of the preselection and of the superposition of all cuts for the two signal samples and for each of the background processes. The efficiency of the cuts has been studied individually. Moreover, we list the expected number of events after having superimposed all cuts. We find that we can expect 62 and 41 events for an integrated luminosity of 200 pb −1 for BP 10 and lBP 18, but only a total of 6.7 events arising from the SM backgrounds. To estimate how much integrated luminosity we need to obtain a 5σ excess over the SM expectation, we again make use of the log likelihood ratios. In particular, we calculate the background confidence level 1 − CL b and require it to be smaller than the 5σ probability of 2.87 · 10 −7 . For lBP 18 we find a 5σ significance for an integrated luminosity of 46  Table 6: The efficiency of the preselection cut, the total cut efficiency and number of expected events for each background and the two signal samples for an integrated luminosity of 200 pb −1 .
The stated uncertainty on the number of expected events corresponds to the 68.3% confidence interval of this number. The total background sums up to 6.7 events.
Systematic errors are not taken into account. These results show that a discovery of this model may already be feasible at the LHC with only a few dozen inverse picobarns of understood collision data.

5/3 top-partner
Among the top-partners, the charge 5/3 x 1 gives the largest contribution to the excess over the SM expectation in the SS di-lepton channel. This is due to its low mass and the fact that it always decays to tW + , which leads to in the leptonic decay mode. For t 1 , which is the only new quark that could be lighter than x 1 , only few of its decay modes (eq. (3.1)) produce SS di-leptons in the final state. The accurate mass reconstruction of a charge 5/3 quark would be a big step towards the interpretation of the discovery. In the literature, different methods have been proposed for the reconstruction of its mass. These methods usually focus on pair production, so that they can exploit same-sign di-leptons from the decay of one of the charge 5/3 quarks to select and identify the event. The mass is reconstructed using the fully hadronic decay mode of the two W bosons coming from the other charge 5/3 quark [20,23,24]. In ref. [45] an alternative method is presented. The mass of a charge 5/3 top-partner is reconstructed in SS di-lepton events via its transverse mass. This transverse mass is computed from the momenta of the two SS leptons, the missing transverse energy (from the two neutrinos) and the b jet belonging to the semileptonically (and not to the second, hadronically) decaying top quark.
In the following, we outline a new method to reconstruct the mass of a charge 5/3 quark x 1 . We exploit the same channel as [45], but we only rely on the two SS leptons and use the shape of their invariant mass distribution to reconstruct x 1 . This avoids b tagging inefficiencies and the problem of assigning the correct b jet to the corresponding x 1 decay. We also consider the situation in which an excess of about 50 SS di-lepton events (as expected for 200 pb −1 of collision data) is caused by the presence of multiple top-partners. In this case, we show how the method can be used to discriminate the signal against a hypothesized presence of x 1 only.

Mass determination with 200 pb −1 of collision data
The method. In the decay of a pair-produced x 1x1 , the SS di-leptons come from the same decay leg and the positively (negatively) charged leptons can be assigned to the decay of x 1 (x 1 ). The invariant mass distribution of the SS di-leptons contains information about the x 1 mass. In fact, the endpoint of this invariant mass distribution m max ll is sufficient to determine m x 1 , since m max ll can be expressed in terms of the masses of the particles involved in the decay (7.1). The mass of x 1 is the only unknown parameter in this relation. An accurate measurement of this endpoint, however, is not possible with only 200 pb −1 of collision data. We can use, instead, the shape of the invariant mass distribution to determine m x 1 .
In ref. [46], an analytic expression for the shape of the invariant mass distribution M lc for the supersymmetric decayg →tt 1 ,t 1 → cχ 0 1 is presented 3 . As the kinematic configuration of this decay is identical to eq. (7.1), we can use their results to model the shape of the invariant mass distribution of the SS di-leptons from leptonic x 1 (x 1 ) decays. This shape function, however, does not take into account the possibility of a leptonically decaying tau-lepton originating from a W decay. Also, an inclusive electron and muon spectrum without any selection cuts was assumed. These two assumptions are not satisfied in our realistic analysis. A fit of the full invariant mass distribution does therefore not lead to an accurate estimation of m x 1 . However, we find the shape of the tail of the distribution to be almost invariant under the effect of the selection cuts and the tau contribution 4 . A fit of the tail of the distribution is thus a powerful means to extract the mass of x 1 .
In figure 9 we show the invariant mass distribution of the SS di-leptons from a pairproduced x 1 with a mass of 365 GeV. This is the x 1 mass for BP 10 and lBP 18. We apply the same selection as in section 6.3. Both the signal and the SM background (see table 6) are normalized to an integrated luminosity of 200 pb −1 . With a leading order cross section of 1.64 pb for the signal, we estimate 15.3 ± 0.2 SS di-lepton events due to x 1x1 . When fitting the tail of the total distribution from the signal plus the SM with the shape function (starting from the peak of the distribution), we obtain a fitted mass m fit of 370.0 ± 32.3 GeV. By rescaling the generated distribution with the signal cross section, we underestimate the statistical fluctuations in the number of events per bin. The statistical uncertainty of about 32 GeV on the fitted mass, however, correctly represents the precision expected with about 15 signal events. We conclude that fitting the tail of the invariant mass distribution of the signal plus the SM background leads to a fairly accurate estimate of the x 1 mass 5 .
The above method assumes the total pro- duction cross section of the charge 5/3 toppartners to be dominated by pair-production.
Neglecting the contribution of single quark production allows us to estimate the production cross section as a function of the quark mass. This neglected contribution affects neither the shape nor the endpoint, but changes the absolute normalization of the invariant mass distribution of the SS di-leptons. The cross section for single quark production is typically small for relatively light top-partners, but influenced by model-dependent electroweak couplings. For BP 10 and lBP 18 we find that the ratio of the leading order cross section for single x 1 production over x 1x1 pair production is about 5.8% and 2.3%, respectively. For these points, the errors introduced are smaller than the uncertainty of the next-to-leading order pair production cross section, which is approximately 20% for top-partners with a mass of about 500 GeV [47].
Applying the method to two benchmark points. For the 4 of SO(4) and the XX signatures, the various top-partners in addition to x 1 contribute to the excess of SS dilepton events and alter the invariant mass distribution. In the special case of BP 10 and lBP 18, there is a bottom-like b 1 with a mass of about 10 GeV above the x 1 mass. Since it predominantly decays to W − t, it plays an important role for the additional production of SS di-leptons. In order to obtain SS (rather than OS) di-leptons from b 1b1 decays, one lepton has to come from b 1 and the other fromb 1 . Therefore, the invariant mass distribution of the SS di-leptons from b 1b1 decays does not show an endpoint, but rather a tail that extends far into the high invariant mass region. This is in contrast to the SS di-leptons from x 1x1 decays. In case of BP 10, two charge 5/3 quarks below 500 GeV contribute to the excess of SS di-lepton events. Since x 2 is more massive than x 1 , the invariant mass distribution due to its leptonic decay is broader and has a larger endpoint with respect to the x 1 contribution. The main effects of these additional top-partners (including the charge 2/3 quarks) on the invariant mass distribution of the SS di-leptons are an increased number of signal events and a large tail that hides the endpoint due to the light x 1 . These effects can be used to determine whether or not the expected SS di-lepton invariant mass distributions for BP 10 and lBP 18 can be explained by the hypothesized presence of a charge 5/3 top-partner only.
In figures 10 and 11 we show the invariant mass distribution of the SS di-leptons for BP 10 and lBP 18 respectively, as expected to be observed with 200 pb −1 of collision data. The SM background for the same integrated luminosity is added to the signal distribution. As explained above, a fit of the tail of the distribution leads to a fairly accurate estimate of the x 1 mass, if the observation is caused by only one charge 5/3 quark (plus the SM contribution). For BP 10, we obtain m fit = 395.5 ± 24.6 GeV and for lBP 18, we find m fit = 388.6 ± 29.7 GeV. This shows that a fit of the total distribution, including the contributions from the various top-partners and the SM backgrounds, leads to a systematic overestimate of the mass, which nevertheless remains within about 1σ of the true x 1 mass.  As a next step, we calculate the cross section and simulate the expected signal for a pair-produced x 1 with the fitted masses. This signal plus the SM expectation gives the expected invariant mass distribution of the SS di-leptons for a given mass hypothesis. For BP 10 and lBP 18 (figures 10 and 11), we see that neither of the two signal distributions can be explained assuming the presence of only one charge 5/3 quark. In particular, we expect 62.5 and 6.7 SS di-leptons from BP 10 and the SM background respectively. Fitting the tail of the SS di-lepton invariant mass distribution, however, leads to an estimate of The possibility of the signal to be mainly caused by a very light t 1 can be excluded in the following way. The dominant channel for t 1 to produce SS di-lepton events includes the leptonic decay of a Z, t 1 → tZ. In this case, three leptons are produced and two OS leptons come from the Z. A veto on a mass window around m Z for OS di-leptons thus helps to suppress the t 1 contribution to the signal. For BP 10 and lBP 18, leptonic t 1 decays account for 5% and 19% of the signal. Cutting on a window of m Z ± 10 GeV results in a loss of about 20% of the total signal, but reduces the t 1 contribution by about 70%. Alternatively, one could directly veto tri-lepton final states to curb the contribution from t 1 .
We conclude that for both benchmark points, 200 pb −1 of collision data would be sufficient to obtain an evident discrepancy between the total invariant mass distribution and the distribution based on the hypothesized presence of only one charge 5/3 top-partner. Such an observation could be seen as evidence in favour of a model with multiple toppartners. If instead the signal distribution were consistent with the expected distribution from m fit , much more than 200 pb −1 of collision data would be needed for the distribution to reveal the presence of additional, heavier top-partners.
Beyond the 200 pb −1 scenario. When more integrated luminosity has been collected at the LHC, advanced techniques can be used to resolve more details about the masses of the top-partners. The identification of either a full 4 of SO(4) or two charge 5/3 quarks would be evidence in favour of our model. The signal in the SS di-lepton channel can be produced by various top-partners and it may be difficult to disentangle the different contributions. Discriminating the SS di-lepton events due to x 1 from the contribution due to b 1b1 would be an important step. In the SS di-lepton channel, the two leptons from x 1x1 decays come from the same particle, whereas in b 1b1 decays they come one from a quark each. In the OS di-lepton channel, the roles of x 1 and b 1 are exchanged. The shapes of the SS and OS di-lepton invariant mass distributions may help to gain insight in the underlying physics.

Conclusions
4 of SO(4) or two charge 5/3 top-partners lie within the reach of the LHC. We scanned the parameter space of the model focussing on points that are consistent with EWPT observables and give these signatures. For these signatures we described the possible mass hierarchies and outlined the basic features of their phenomenology. We find that the trilepton and same-sign di-lepton final states are the most promising ones for a discovery of the model.
We studied in detail the phenomenology of two benchmark points with a large production cross section. Both exhibit a 4 signature and one has two charge 5/3 quarks with a mass below 500 GeV. We presented a robust cut-based search strategy for an excess in final states with at least two same-sign leptons. After making a basic kinematic selection, only little background from the SM was found in this channel. We find that for both benchmark points a few tens of inverse picobarns of understood collision data would suffice to observe a 5σ significance.
Since the SM contamination in the same-sign di-lepton final state is small, this channel is not only well suited for observing an excess over the SM expectation, but also for reconstructing the masses of the new particles. Among the top-partners, the light charge 5/3 quark contributes the most to the excess of SS di-lepton events. We described a new method to reconstruct the mass of such a quark via its leptonic decay. This method only relies on the reconstruction of the two same-sign leptons and exploits the shape of their invariant mass distribution. For both distinctive signatures of the model, the light toppartners besides the charge 5/3 quark also contribute to the excess of same-sign di-lepton events. In this case, we showed how the mass reconstruction method could be used to judge if the excess of same-sign di-lepton events is compatible with the presence of a charge 5/3 quark only, or if it hints at the existence of additional top-partners. For this, we used the fact that the cross section for pair production of top-partners can be predicted as a function of mass. Already with an integrated luminosity of 200 pb −1 and a corresponding statistics of about 50 signal events, we found an evident disagreement between the single x 1 hypothesis and the expected observation. Such a disagreement can be seen as an indication for the presence of top-partners in addition to a charge 5/3 quark. ratios (as defined in section 6.2) as a test statistic. We indicate if the central 1 − CL b value corresponds to an excess of at least 3σ or 5σ.  Table 7: Number of expected events in each of the four channels for 200 pb −1 of integrated luminosity. The background confidence level 1−CL b with its uncertainty is also given. The 1−CL b values marked with ( * ) correspond to benchmark points for which more than 10 9 pseudoexperiments would be needed for the tail of the tPDF of the background hypothesis to leak out of the integrated region.