Running Workflows¶

With defined workflows and steps, we can now run them!

When setting up the repo according to the Introduction

Defining Example Cases¶

In the catalogue entry (recast.yml) it is posisble to define test parameter sets for the workflow you defined for testing and validation purposes using the example_inputs field:

name: examples/testdemo

metadata:
  author: Lukas Heinrich
  input requirements: ''
  short_description: Example from ATLAS Exotics Rome Workshop 2018

spec:
  workflow: workflow.yml

example_inputs:
  default:
    initdata:
      did: 404958
      dxaod_file: https://recastwww.web.cern.ch/recastwww/data/reana-recast-demo/mc15_13TeV.123456.cap_recast_demo_signal_one.root
      xsec_in_pb: 0.00122

... some additional data ...

When listing the catalogue via recast catalogue ls you can then see the examples

recast catalogue ls
NAME                               DESCRIPTION                                                 EXAMPLES
atlas/atlas-conf-2018-041          ATLAS MBJ                                                   default
examples/checkmate1                CheckMate Tutorial Example (Herwig + CM1)                   default
examples/checkmate2                CheckMate Tutorial Example (Herwig + CM2)                   default
examples/rome                      Example from ATLAS Exotics Rome Workshop 2018               default,newsignal

Using Local Data¶

Sometimes, particularly during development, it's helpful to keep some input files on the local machine for faster turnaround, rather than downloading them every time you want to run the workflow. You provide local input to the workflow using the initdir: field under dataopts:

name: examples/testdemo

metadata:
author: Lukas Heinrich
input requirements: ''
short_description: Example from ATLAS Exotics Rome Workshop 2018

spec:
workflow: workflow.yml

example_inputs:
  default:
    dataopts:
      initdir: inputdata
    initdata:
      did: 404958
      dxaod_file: '{readdir0}/mc15_13TeV.123456.cap_recast_demo_signal_one.root'
      xsec_in_pb: 0.00122
... some additional data ...

Here, {readdir0} is an internal variable which evaluates to inputdata (or whatever the directory is specified in the initdir: field).

Testing Examples¶

Locally¶

$> recast run examples/testdemo --example <example name> --tag <your_tag> --backend docker

For example,

recast run examples/testdemo --example default --tag firsttest --backend docker

This will create an output directory named recast-<your_tag> (eg. recast-firsttest) which contains all the output produced by the run. Each stage produces a sub-directory with the same name as the stage, and helpful debugging info for each stage can be found in the log files located under recast-<your_tag>/<stage_name>/_packtivity.

Within CI¶

The recommended recast/recastatlas image has already all the necessary software installed to work with the local backend. Adding something like this to your .gitlab-ci.yml should run the test.

Note about CI variables: If your workflow specs are stored in the recast-atlas group area on gitlab, the RECAST_USER, RECAST_PASS and RECAST_TOKEN will already be defined as global CI variables, where the RECAST_USER is the common recast-atlas service account (recastat). This account has access to all CERN internal (but not private) gitlab registry images, and to all inputs stored in /eos/project/r/recast/atlas. If your project is not in the recast-atlas group area, or you want to override the global default values, they will need to be defined as CI variables in your gitlab project (see instructions in step 2 of the step-by-step guide).

testing:
  tags:
  - docker-privileged
  services:
  - docker:stable-dind
  stage: build
  image: "recast/recastatlas:v0.3.0"
  script:
  - eval "$(recast auth setup -a $RECAST_USER -a $RECAST_PASS -a $RECAST_TOKEN -a default)"
  - eval "$(recast auth write --basedir authdir)"

  # add my workflow
  - $(recast catalogue add $PWD)
  - recast run examples/testdemo --example default --tag firsttest

Defining RECAST Results¶

A RECAST workflow is a workflow specifically implementing a reinterpretation with respect to a new model. When executing the workflow, typically files are created that hold the result of the statistical analysis.

The catalogue entry file recast.yml allows to specify a set of result files for the workflow together with a short description

Example¶

results:
- name: CLs 95% based upper limit on poi
  relpath: statanalysis/fitresults/limit_data.json
- name: CLs 95% at nominal poi
  relpath: statanalysis/fitresults/limit_data_nomsignal.json

Here, the statistical analysis stage produces output files that hold the CLs information of the fit.

$> cat statanalysis/fitresults/limit_data_nomsignal.json |jq
{
  "exp_p2": 0.8276574946063575,
  "exp_p1": 0.5345040672498215,
  "exp_m1": 0.10547655600578199,
  "exp_m2": 0.03889040527686523,
  "exp": 0.25999040745937085,
  "obs": 0.574709475331039
}
$> cat statanalysis/fitresults/limit_data.json |jq
{
  "exp_p2": 1.7545752712294322,
  "exp_p1": 1.2720762961732819,
  "exp_m1": 0.6377501820447065,
  "exp_m2": 0.4731739008380644,
  "exp": 0.8924846399964371,
  "obs": 1.3352971254860764
}

Now, when running the workflow you can see the results directly

recast  run examples/testdemo --backend docker
... <some more output > ...
2019-06-03 12:03:42,989 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result examples/testdemo recast-c43a2b56:
--------------
- name: CLs 95% based upper limit on poi
  value: '{"exp_p2": 1.7545752712294322, "exp_p1": 1.2720762961732819, "exp_m1": 0.6377501820447065,
    "exp_m2": 0.4731739008380644, "exp": 0.8924846399964371, "obs": 1.3352971254860764}'
- name: CLs 95% at nominal poi
  value: '{"exp_p2": 0.8276574946063575, "exp_p1": 0.5345040672498215, "exp_m1": 0.10547655600578199,
    "exp_m2": 0.03889040527686523, "exp": 0.25999040745937085, "obs": 0.574709475331039}'

For further processing of the data it is useful if the files produced by the workflow has a definite structure. We recommend that the statistical analysis code produce e.g. JSON or YAML files. This way, no cumbersome text-wrangling needs to happen on the downstream analysis side, but the data can be read directly into a comfortable structure.

If you have your statistical analysis code setup in this way, you can add load_yaml: true to the result spec, and the data will be nicely formatted for you.

results:
- name: CLs 95% based upper limit on poi
  relpath: statanalysis/fitresults/limit_data.json
  load_yaml: true
- name: CLs 95% at nominal poi
  relpath: statanalysis/fitresults/limit_data_nomsignal.json
  load_yaml: true

recast  run examples/testdemo --backend docker
... <some more output > ...
2019-06-03 12:09:12,795 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result examples/testdemo recast-8c2188da:
--------------
- name: CLs 95% based upper limit on poi
  value:
    exp: 0.8924846399964371
    exp_m1: 0.6377501820447065
    exp_m2: 0.4731739008380644
    exp_p1: 1.2720762961732819
    exp_p2: 1.7545752712294322
    obs: 1.3352971254860764
- name: CLs 95% at nominal poi
  value:
    exp: 0.25999040745937085
    exp_m1: 0.10547655600578199
    exp_m2: 0.03889040527686523
    exp_p1: 0.5345040672498215
    exp_p2: 0.8276574946063575
    obs: 0.574709475331039

Last update: July 5, 2023