NanoAOD analysis: Produce histograms

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • How to produce many histograms efficiently?

  • How does an analysis with RDataFrame look like in Python?

Objectives
  • Produce all histograms required for the final plots

  • Understand why we need so many histograms for a single plot

In the previous section, we produced skimmed datasets from the original files but still preserved information of selected quantities for each event. In this step, we compute histograms of these quantities for all skimmed datasets. Because of the data-driven QCD estimation, similar histograms have to be produced with the selection containing same-charged tau lepton pairs. This sums up to multiple hundreds of histograms which have to be combined into the final plots such as the ones shown in the next section.

For convenience, this step is implemented in Python in the file histograms.py, which you can download here.

Investigate and run the Python script!

Have a look at the code and run it! Note that the program picks up the files from the same directory in which you run it.

Investigate the output!

The script produces the file histograms.root, which contains the histograms. You can have a look at the plain histograms using for example the ROOT browser!

Key Points

  • We produce histograms of all physics processes and all observables.

  • All histograms are produced in a signal region with opposite-signed muon-tau pairs and in a control region with same-signed pairs for the data-driven QCD estimate

  • This step shows the usage of RDataFrame in Python producing a large number of histograms in a single event loop and in parallel!