How to select a long-lived stable version of hep-benchmark-suite

rptaylor · May 8, 2024, 7:24pm

Hello,

I was running HS basically like this:

git clone https://gitlab.cern.ch/hep-benchmarks/hep-benchmark-suite.git
./hep-benchmark-suite/examples/hepscore/run_HEPscore.sh -s "Site" -d $WORK_DIR

However instead of using the master branch I need to ensure the execution is as deterministic and reproducible as possible over time, with only minor bug fixes but no other major changes in behaviour.
Which branch or tag should I use? v2.2 ?

Thanks.

rptaylor · May 9, 2024, 2:35am

I tried the v2.2 branch but I noticed this:

./hep-benchmark-suite/examples/hepscore/run_HEPscore.sh -s "bench" -d $WORK_DIR/hepscore
Setting site to bench
Setting the working directory to /mnt/bench/work/hepscore
Running script: ./hep-benchmark-suite/examples/hepscore/run_HEPscore.sh - version: 1.2.1
Creating the WORKDIR /mnt/bench/work/hepscore
Latest suite release selected: v2.2.
Requirement already satisfied: pip in ./env_bmk/lib/python3.9/site-packages (21.2.3)
Collecting pip
  Using cached pip-24.0-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.3
    Uninstalling pip-21.2.3:
      Successfully uninstalled pip-21.2.3
Successfully installed pip-24.0
Collecting git+https://gitlab.cern.ch/hep-benchmarks/hep-score.git@v1.5
  Cloning https://gitlab.cern.ch/hep-benchmarks/hep-score.git (to revision v1.5) to /mnt/bench/work/tmpdir/pip-req-build-rdokkvu1
  Running command git clone --filter=blob:none --quiet https://gitlab.cern.ch/hep-benchmarks/hep-score.git /mnt/bench/work/tmpdir/pip-req-build-rdokkvu1
  Running command git checkout -q f6169255d1ee03a8637366b6a7f1a680205c5da2
  Resolved https://gitlab.cern.ch/hep-benchmarks/hep-score.git to commit f6169255d1ee03a8637366b6a7f1a680205c5da2

Seems to work fine and eventually complete, but then the process seems to hang indefinitely at the end, until I Ctrl-C. This would significantly interfere with our automated benchmark system.


2024-05-08 23:31:47, hepscore.hepscore:_run_benchmark [INFO] Executing 3 runs of alice-digi-reco-core-run3-ma-bmk [v2.1_x86_64]
2024-05-08 23:31:47, hepscore.hepscore:_run_benchmark [INFO] Starting run0
2024-05-08 23:53:29, hepscore.hepscore:_run_benchmark [INFO] Starting run1
2024-05-09 00:11:21, hepscore.hepscore:_run_benchmark [INFO] Starting run2
2024-05-09 00:28:17, hepscore.hepscore:_run_benchmark [INFO] 
2024-05-09 00:28:17, hepscore.hepscore:gen_score [INFO] Final result: 239.0693
2024-05-09 00:28:17, hepbenchmarksuite.hepbenchmarksuite:run [INFO] Completed hepscore with return code 0
2024-05-09 00:28:17, hepbenchmarksuite.plugins.extractor:__init__ [INFO] you should run this program as super-user for a complete output.
2024-05-09 00:28:17, hepbenchmarksuite.plugins.extractor:collect_sw [INFO] Collecting SW information.
2024-05-09 00:28:17, hepbenchmarksuite.plugins.extractor:collect_hw [INFO] Collecting HW information.
2024-05-09 00:28:17, hepbenchmarksuite.plugins.extractor:collect_cpu [INFO] Collecting CPU information.




^CTraceback (most recent call last):
  File "/mnt/bench/work/hepscore/env_bmk/bin/bmkrun", line 285, in <module>
    main()
  File "/mnt/bench/work/hepscore/env_bmk/bin/bmkrun", line 253, in main
    suite.start()
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/hepbenchmarksuite.py", line 57, in start
    self.run()
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/hepbenchmarksuite.py", line 99, in run
    self.check_lock()
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/hepbenchmarksuite.py", line 107, in check_lock
    self.run()
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/hepbenchmarksuite.py", line 68, in run
    self.cleanup()
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/hepbenchmarksuite.py", line 113, in cleanup
    self._result = utils.prepare_metadata(self._config_full, self._extra)
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/utils.py", line 288, in prepare_metadata
    'HW': hw_data.collect_hw(),
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/plugins/extractor.py", line 347, in collect_hw
    "CPU"    : self.collect_cpu(),
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/plugins/extractor.py", line 111, in collect_cpu
    'Power_Policy': self.exec_cmd(f"cat {scaling_governors} | sort | uniq"),
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/plugins/extractor.py", line 70, in exec_cmd
    reply, _ = utils.exec_cmd(cmd_str)
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/utils.py", line 141, in exec_cmd
    return_code, reply, error = run_piped_commands(cmd_str, env)
  File "/mnt/bench/work/hepscore/env_bmk/lib64/python3.9/site-packages/hepbenchmarksuite/utils.py", line 175, in run_piped_commands
    output = subprocess.run(cmd_split, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, env=env)
  File "/usr/lib64/python3.9/subprocess.py", line 507, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1134, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1995, in _communicate
    ready = selector.select(timeout)
  File "/usr/lib64/python3.9/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt

giordano · May 10, 2024, 6:56am

Dear Ryan,

we will fix and release a new tag.
Meanwhile, may I ask you to open a GGUS ticket using the instructions at How to Run HEPScore23 Benchmark
Thanks