Těžká ztráta výkonu střídavého počet OpenMP paralelní vlákna

0

Otázka

Následující kód změní počet paralelních podprocesů použitých pro střídavý paralelní fors.

#include <iostream>
#include <chrono>
#include <vector>
#include <omp.h>

std::vector<float> v;

float foo(const int tasks, const int perTaskComputation, int threadsFirst, int threadsSecond)
{
    float total = 0;
    std::vector<int>nthreads{threadsFirst,threadsSecond};
    for (int nthread : nthreads) {
        omp_set_num_threads(nthread);
#pragma omp parallel for
        for (int i = 0; i < tasks; ++i) {
            for (int n = 0; n < perTaskComputation; ++n) {
                if (v[i] > 5) {
                    v[i] * 0.002;
                }
                v[i] *= 1.1F * (i + 1);
            }
        }
        for (auto a : v) {
            total += a;
        }
    }
    return total;
}

int main()
{
    int tasks = 1000;
    int load = 1000;
    v.resize(tasks, 1);
    for (int threadAdd = 0; threadAdd <= 1; ++threadAdd) {
        std::cout << "Run batch\n";
        for (int j = 1; j <= 16; j += 1) {
            float minT = 1e100;
            float maxT = 0;
            float totalT = 0;
            int samples = 0;
            int iters = 100;
            for (float i = 0; i <= iters; ++i) {
                auto start = std::chrono::steady_clock::now();
                foo(tasks, load, j, j + threadAdd);
                auto end = std::chrono::steady_clock::now();
                float ms = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() * 0.001;
                if (i > 20) {
                    minT = std::min(minT, ms);
                    maxT = std::max(maxT, ms);
                    totalT += ms;
                    samples++;
                }
            }
            std::cout << "Run parallel fors with " <<j << " and " << j + threadAdd << " threads -- Min: "
                << minT << "ms   Max: " << maxT << "ms   Avg: " << totalT / samples << "ms" << std::endl;
        }
    }
}

Při kompilaci a spuštění s Visual Studio 2019 v Release režimu je výstup:

Run batch
Run parallel fors with 1 and 1 threads -- Min: 2.065ms   Max: 2.47ms   Avg: 2.11139ms
Run parallel fors with 2 and 2 threads -- Min: 1.033ms   Max: 1.234ms   Avg: 1.04876ms
Run parallel fors with 3 and 3 threads -- Min: 0.689ms   Max: 0.759ms   Avg: 0.69705ms
Run parallel fors with 4 and 4 threads -- Min: 0.516ms   Max: 0.578ms   Avg: 0.52125ms
Run parallel fors with 5 and 5 threads -- Min: 0.413ms   Max: 0.676ms   Avg: 0.4519ms
Run parallel fors with 6 and 6 threads -- Min: 0.347ms   Max: 0.999ms   Avg: 0.404413ms
Run parallel fors with 7 and 7 threads -- Min: 0.299ms   Max: 0.786ms   Avg: 0.346387ms
Run parallel fors with 8 and 8 threads -- Min: 0.263ms   Max: 0.948ms   Avg: 0.334ms
Run parallel fors with 9 and 9 threads -- Min: 0.235ms   Max: 0.504ms   Avg: 0.273937ms
Run parallel fors with 10 and 10 threads -- Min: 0.212ms   Max: 0.702ms   Avg: 0.287325ms
Run parallel fors with 11 and 11 threads -- Min: 0.195ms   Max: 1.104ms   Avg: 0.414437ms
Run parallel fors with 12 and 12 threads -- Min: 0.354ms   Max: 1.01ms   Avg: 0.441238ms
Run parallel fors with 13 and 13 threads -- Min: 0.327ms   Max: 3.577ms   Avg: 0.462125ms
Run parallel fors with 14 and 14 threads -- Min: 0.33ms   Max: 0.792ms   Avg: 0.463063ms
Run parallel fors with 15 and 15 threads -- Min: 0.296ms   Max: 0.723ms   Avg: 0.342562ms
Run parallel fors with 16 and 16 threads -- Min: 0.287ms   Max: 0.858ms   Avg: 0.372075ms
Run batch
Run parallel fors with 1 and 2 threads -- Min: 2.228ms   Max: 3.501ms   Avg: 2.63219ms
Run parallel fors with 2 and 3 threads -- Min: 2.64ms   Max: 4.809ms   Avg: 3.07206ms
Run parallel fors with 3 and 4 threads -- Min: 5.184ms   Max: 14.394ms   Avg: 8.30909ms
Run parallel fors with 4 and 5 threads -- Min: 5.489ms   Max: 8.572ms   Avg: 6.45368ms
Run parallel fors with 5 and 6 threads -- Min: 6.084ms   Max: 15.739ms   Avg: 7.71035ms
Run parallel fors with 6 and 7 threads -- Min: 7.162ms   Max: 16.787ms   Avg: 7.8438ms
Run parallel fors with 7 and 8 threads -- Min: 8.32ms   Max: 39.971ms   Avg: 10.0409ms
Run parallel fors with 8 and 9 threads -- Min: 9.575ms   Max: 45.473ms   Avg: 11.1826ms
Run parallel fors with 9 and 10 threads -- Min: 10.918ms   Max: 31.844ms   Avg: 14.336ms
Run parallel fors with 10 and 11 threads -- Min: 12.134ms   Max: 21.199ms   Avg: 14.3733ms
Run parallel fors with 11 and 12 threads -- Min: 13.972ms   Max: 21.608ms   Avg: 16.3532ms
Run parallel fors with 12 and 13 threads -- Min: 14.605ms   Max: 18.779ms   Avg: 15.9164ms
Run parallel fors with 13 and 14 threads -- Min: 16.199ms   Max: 26.991ms   Avg: 19.3464ms
Run parallel fors with 14 and 15 threads -- Min: 17.432ms   Max: 27.701ms   Avg: 19.4463ms
Run parallel fors with 15 and 16 threads -- Min: 18.142ms   Max: 26.351ms   Avg: 20.6856ms
Run parallel fors with 16 and 17 threads -- Min: 20.179ms   Max: 40.517ms   Avg: 22.0216ms

V první várku, několik jízd s rostoucím počtem závitů se provádí, střídající paralelní fors se stejným počtem závitů. Tento dávkový produkuje očekávané chování, zvýšení preformance jako počet vláken je zvýšit.

Pak druhá várka je hotová, runing stejný kód, ale střídající paralelní fors, kde jeden z nich používá více podprocesů, než ostatní. Tato druhá dávka má vážné ztráty výkonu, zvýšení výpočetního času až na faktor 50~100x.

Sestavování a spouštění s gcc v Ubuntu vede k očekávané chování, s obou šarží vykonává podobně.

Takže otázka je, co je příčinou této obrovské ztráty výkonu při použití Visual Studio?

c++ openmp performance visual-studio
2021-11-23 23:30:00
1

Nejlepší odpověď

0

Tak na experimenty je vysvětleno v komentářích na otázku, a s nedostatkem lepší vysvětlení, zdá se to být chyba v VS runtime.

2021-11-30 22:51:16

V jiných jazycích

Tato stránka je v jiných jazycích

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................
Slovenský
..................................................................................................................