Benchmark failures with hep-score

drw · January 17, 2024, 1:40pm

Hi, I’m experimenting with hep-score and while this seemed to work in the past, I’m now seeing a few errors.

Firstly here is the recipe we are using to run the benchmarks:

screen
ulimit -n 1000000
su - job0000
# This must be after su, other ulimit must be before...
ulimit -u 1000000
cd /srv/localstage/scratch/
mkdir -p HEPscore/tmp
cd HEPscore
export TMPDIR=$(pwd)/tmp
export SINGULARITY_CACHEDIR=$TMPDIR
export PATH=$PATH:/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/x86_64/bin
python3 -m venv ./venv
source venv/bin/activate
git clone https://gitlab.cern.ch/hep-benchmarks/hep-score.git
pip install -e hep-score
hep-score -n hepscore23 results

# Don't forget to remove the dir at the end
rm -Rf /srv/localstage/scratch/HEPscore

We have an aarch64 machine so initially I was following the above but using:

export PATH=$PATH:/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/aarch64/bin.

I tried checking out v1.5 and the x86_64 machine gets as far as run1 of cms-reco-run3-ma-bmk.
The aarch64 machine gets as far as run1 of atlas-reco_mt-ma-bmk.

Looking in the results directory for the x86_64 job I see:

ERROR! 2 processes failed (out of 6)

However, unless I’m missing something I can’t really spot any obvious problems.

On the aarch64 machine its a similar story:

ERROR! 32 processes failed (out of 32)

I can’t spot any obvious reason for the failures.

I guess my first question is “Is our approach (i.e. the “recipe”) along the right lines, or am I doing something completely wrong” ?

Thanks,

Dan

giordano · January 17, 2024, 3:23pm

Dear Dan,

thank you for reporting this.
May I ask you, please, to open a GGUS ticket to get support?
Details in Benchmarking Working Group

Best regards
Domenico

drw · January 17, 2024, 5:19pm

Thanks, have done so.