What I Read: aggregation.
https://mlbenchmarks.org/12-problem-aggregation.html
The problem of aggregation
Moritz Hardt
"Therefore, multi-task benchmarks are analogous to voting systems where tasks are voters and models are candidates."