Recommendations (Dos and Don'ts)¶
While the RECAST framework is designed to be able to capture a wide range of analyses, some ways of setting up your analysis will make your life easier. Here we are collecting a few recommendations that you might want to follow.
Do: Transform Mindset¶
Think of the individual steps of the workflow as transformations between different representations of the data.
E.g. (your analysis may differ)
- EventSelection: Transform from DxAOD to Trees
- Weighting & Mergin: Tree -> Weighted Tree
- Statistical Analysis: Weighted Trees -> Statistical Result
The way you setup the analysis script should reflect that transforming nature
E.g. for the above steps the transforms could look like
./myEventSelectin --in {input file} --out {output file} --something {par}
./merge_and_weight.py --inputs {input files} --lumi {my lumi}
./run_fit.py --signal {signal} --background {background} --resultfile {result}
If you have a clear command line API for your analysis it will help you immensely to implement the individual steps as described heres
Don't: Avoid deeply nested scripts¶
When writing step scripts In order to debug easily, try to not nest your scripts to deeply. E.g.
./script1.sh
\ calls ./script2.sh
\ calls ./script4.sh
\ calls ./script3.sh
This will make it cumbersome to debug the nested scripts (e.g. script4.sh), since they depend on the shell environment of their parents. Wrapping it may make the top-level script `cleaner' but ultimately may make your life harder.
Rather try to have a flat list of actions in a script
./script1.sh
./script2.sh
./script3.sh
Do: Make your scripts testable (ideally with user-definable run-time / test inputs that run fast)¶
In order to have a quick development cycle, you would like to avoid having to run the full workflow over and over again. I.e. if you make a change in the statistical analysis step, ideally you'd like to quickly find out if the change works. For this the recast
command line tool provides utilities to test individual steps. Even then it's useful to be able to limit the runtime of an individual test. There are various ways to achieve this:
- use smaller test files
- have hooks in your scripts to limit e.g. the number of events to run over
- have hooks in your scripts to run a simplified version of the step (e.g. running without or just a limited set of systematics)
This will improve your overall turnaround time when developing the workflow.
Do: Only compute the necessary data for reinterpretation¶
A lot of analysis code is set up to produce many results at the same time. Consider designing your scripts in a way that easily let's you slim down the number of computations. Ideally a RECAST workflow should be simple, e.g.
- process new signal w/ event selection
- scale according to cross-section
- fit new signal
Often only a small subset of required histograms etc is needed compared to the full analysis.
Do: Design your stat. analysis such that swapping out signals is easy.¶
Most statistical analysis code organically grows from a simple set of scripts to something that can generate a wide array of results needed for approval. As you develop this keep inn mind that you might want to swap out the signal later on. A lot of statistical analysis code hardcodes e.g. a specific signal grid or can only process grids of points rather than a single new signal point.
Don't: Avoid hardcoded paths. Use Command line Flags or environment variables.¶
By far the biggest obstacle analyses phase in preserving their analysis is the fact that some things that change in the course of a reinterpretation are hardcoded and need to be made parametrized after the fact, which often results in quite hacky code.
Make sure you do not hard code any paths to /cvmfs
or /afs
in your analysis pointing to input files
or the like. Rather set up your transform scripts (as shown above) up in a way that they can receive
If you are retro-fitting an existing analysis and it's hard to pass down command line parameters effectively (consider from above: when you avoid nested loops, it's easier to pass cmd line flags) a good strategy that usually works is using environmennt variables.
script: |
export MYANALYSIS_INPUT={input_parameter}
In your script you can then use e.g. os.environn['MYANALYSIS_INPUT']
to pick up that value.