This lesson is still being designed and assembled (Pre-Alpha version)

Developing a pre-selection

Overview

Teaching: 10 min
Exercises: 50 min
Questions
  • What is the purpose of a preselection?

  • What loose cuts could you apply that follow the signal topology?

  • What are background processes that could enter this selection?

  • How do I normalise background processes?

Objectives
  • Define a ‘good’ region of the detector and apply MET filters.

  • Develop a simple signal selection cutflow.

  • Make stack plots of background processes.

Recording files of this session are in cernbox

Introduction

A preselection serves two purposes, first to ensure that passing events only utilize a “good” region of the detector with appropriate noise filters and second to start applying simple selections motivated by the physics of the signal topology.

Defining a “Good” Region of the Detector

A “good” region of the detector depends heavily on the signal topology. The muon system and tracker extend to about |η| = 2.4 while the calorimeter extends to |η| = 3.0 with the forward calorimeter extending further. Thus, if the signal topology relies heavily on tracking or muons, then a useful preselection would limiting the region to |η| < 2.4. Some topologies, like vector boson fusion (commonly called VBF) have two forward (high eta) jets, so placing a preselection that requires two forward jets is a useful preselection.

cmsDetectorCrossSection

Discuss (5 min)

Now, let’s take a more detailed look at our signal topology and see how it fits in with the detector. The b-star is produced from the interaction of a bottom quark and a gluon, will this production mode yield any characteristic forward jets? In this topology, the b-star decays to a jet from a W boson and a jet from a top quark. What is characteristic of a top jet? What about a W jet? How does this impact the region of the detector needed? What |η| and φ in the detector do we need? Think about this while looking at the Feynman diagram and the signal topology.

bstarFeynman bstarTopo

Solution

The production mode does not have any characteristic forward jets, but the final state has two jets. The top quark decays to a b jet and W jet, where the b jet is typically identified by making use of it’s characteristic secondary vertex. This secondary vertex is identified in the tracker. Both the W jet and top jet have unique substructure that can be used to distinguish them from QCD jets. Therefore it is crucial to use a region of the detector with good tracking and granular calorimetery,so we should restrict |η| < 2.4. There are no detector differences in phi that should impact this search, so there should be no restriction in φ.

Finding Appropriate MET Filters

Missing transverse momentum (called MET) is used to identify detector noise and MET filters are used to remove detector noise. The MET group publishes recommendations on the filters that should be used for different eras of data.

Exercise (5 min)

The recommended MET filters for Run II are listed on this twiki. Use this twiki to create a list of MET filters to use in the preselection.

Simple Selections

The preselection should also include a set of simple selections based on our physics knowledge of the signal topology. These “simple” selections typically consist of loose lower bounds only, which help to reduce the number of events which will get passed to the rest of the analysis while still preserving the signal region.

Consider a heavy resonance decaying to two Z bosons that produce jets to create a dijet final state. In this case, the energy of the collision would go into producing a heavy resonance with little longitudinal momentum, so conservation of momentum tells us that the jets should be well separated in φ, ideally they should have a separation of π in φ. Therefore placing a selection of Δφ > π/2 should not cut out signal, but will reduce the number of events passed on to the next stage.

Reflect

Why not use a selection close to Δφ = π?

Solution

The jets can recoil off of other objects creating a dijet pair that is less than Δφ = π

This also a good stage to place a lower limit on the jet pT. In a hadronic analysis, it is common to place a high lower limit on the pT. For this example, a lower limit of pT = 400 GeV should be good.

Reflect

Why place such a high lower limit on jet pT?

Solution

This is a tricky question. It relates to the trigger. Hadronic triggers have a turn on at high HT or high pT, so the lower limit ensures that that the analysis will only investigate the fully efficient region. triggerTurnOn

A jet originating from a Z boson should also have two “prongs” (regions of energy in the calorimeter), these “prongs” are part of the jet substructure discussed in the earlier lessons. For a two pronged jet like a Z jet, it is good to place a lower limit on the τ21 ratio. Another useful substructure variable to use in the preselection is the softdrop mass. The softdrop algorithm will help to reduce the amount of pileup that is used when measuring the jet mass. The preselection is a good place to define a wide softdrop mass region. For this example, a wide region around the Z boson mass would be ideal, such as 65 < mSD < 115 GeV.

It is important to emphasize that the preselections should be relatively light. It is important to check that the preselection is not eliminating large amounts of signal. A good way to monitor this is to utilize stacked histograms. More about these plots is described below.

Discuss (5 min)

Again, use the images above to think about the signal topology. What “simple” selections can be used in the preselection? Any Δφ or pT criteria? What about substructure?

Solution

In this signal topology the t and W should be well separated, so a light Δφ cut should be placed. Think about a reasonable selection and investigate the result in the plotting exercise. Same for the jet pT. Both the top jet and the W jet should have substructure. The top jet should have three prongs and W jet should have two prongs. Think about the softdrop regions and n-subjettiness (τ) ratios that should be used and investigate them in the plotting exercise.

Applying our selection and monitoring the MC response

When applying the preselection, the selections will be placed serially in the code creating a “cutflow”. The filters are applied first to ensure that the data was taken in “good” detector conditions. Then the kinematic/substructure cuts are applied. It is important to monitor the signal and background in between these physics inspired cuts.

Exercise (20 min) Plots to Monitor Signal and Background

Find where the filters are applied in the exercises/ex4.py script, check that all the filters are there, and then create a histogram displaying τ21 τ32 for the leading and subleading jet. This can be done using the exercises/ex4.py script from the BstarToTW_CMSDAS2020 repository.

cd CMSSW_11_0_1/src # where you saved the TIMBER and BstarToTW_CMSDAS2020 repositories during the setup
cmsenv
python -m virtualenv timber-env
source timber-env/bin/activate
python exercises/ex4.py -y 16 --select

What criteria would you use for τ21? What about τ32?

Solution

The filters are listed as flags

flags = ["Flag_goodVertices",
       "Flag_globalTightHalo2016Filter", 
       "Flag_eeBadScFilter", 
       "Flag_HBHENoiseFilter", 
       "Flag_HBHENoiseIsoFilter", 
       "Flag_ecalBadCalibFilter", 
       "Flag_EcalDeadCellTriggerPrimitiveFilter"]

Then they are applied using the Cut function

# Initial cuts
a.Cut('filters',a.GetFlagString(flags))

Now, we want make histograms for mSD, Δφ, and leading/subleading jet pT.

Let’s modify the ex4.py script to include leading jet pT. To do this, first add a new string to varnames. Then, define the new quantity. Finally, add an if statement to adjust the bounds of the histograms.

# To the varnames
'lead_jetPt':'Leading Jet p_{T}',

# To the definition
a.Define('lead_jetPt','FatJet_pt[jetIdx[0]]')

# add an if statement by the hist_tuple
if "tau" in varname :
    hist_tuple = (histname,histname,20,0,1)
if "Pt" in varname :
    hist_tuple = (histname,histname,30,400,1000)

The histograms for mSD, Δφ, and subleading jet pT are left as homework.

Homework

Make histograms for mSD, Δφ, and subleading jet pT. Then, use these plots to agree on a preselection using the discord chat (Or choose your own if this is after DAS). Finally, add your groups’ decided on preselection to the bs_select.py script in BstarToTW_CMSDAS2020

Key Points

  • Preselection reduces data size, but further signal optimization is done later

  • Preselected events should be in good regions of the detector with appropriate filters

  • Stacked histograms are an important tool for creating cuts