-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coreneuron_modtests::test_pointer_py_cpu incompatible with vectorisation (Intel compiler with -O2) #2191
Comments
They are breaking it. (But the documentation should be made clear about that)
I regret not adding a comment to this effect for those pointer update statements that increment from multiple other value locations. I remember looking for pragma's that would hint that the pointer update needed to be atomic. And also remember thinking that the original pthread strategy of writing
was overkill as, in principle, there should be a separate lock for each distinct pim since the problem was mulitiple ia contributing to a single pim. It would be great if a hint like
would be sufficient to have the translator output what is needed get the compiler to do the right thing. Although, I suppose the hint is already there in the form of the pointer increment statement. |
there are few possible scenarios here:
Until now, Let's say we don't want to introduce new keyword like LOCK
pim = pim - ia : child contributions
UNLOCK In this case:
...
#pragma omp simd
for(....) {
.....
#pragma omp critical
{
pim = pim - ia : child contributions
}
} Essentially, this will avoid the SIMD parallelisation of the loop due to atomic section. This won't be performant but at least will produce correct code. This can be also
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x; So we need to impose restrictions like "there can be only one statement in a Does this look "reasonable" ? |
Bit late but just realised PROTECT keyword: NEURON {
GLOBAL var
}
BREAKPOINT {
PROTECT var = var + 1
} I wonder if
|
I forgot about that. At present it just surrounds the statement using
and has that annoying property of one (pthread mutex?) lock for many noninteracting independent statements |
I am thinking following:
#pragma omp atomic update
statement1 = ...
#pragma omp atomic update
statement2 = ... These atomic statements are compatible for CPU as well as GPU and will be executed "as fast as possible". |
That seems very good to me. I read that "The atomic construct ensures that a specific storage location is accessed atomically, ..." and mentally underline "specific storage location" to mean it's ideal for our typical case of thousands of storage locations each accessed by only a couple of instances of the statement. I.e. for our neuron trees, parents are accessed most often by only 1 or 2 children and extremely rarely by 3-5 children |
* axial.inc used in test_pointer.py is not "really thread-safe" * update it to use PROTECT based on discussion in #2191 * fixes one of the test mentioned in #1792 - [ ] merge nmodl PR BlueBrain/nmodl/pull/994 and then update submodule - [ ] update mod2c support fixes #2191
* Update pointer test with PROTECT for SIMD/SIMT execution - axial.inc used in test_pointer.py is not "really thread-safe" - update it to use PROTECT based on discussion in #2191 - fixes one of the test mentioned in #1792 * Update mod2c and NMODL with fixes for PROTECT and MUTEX constructs * Update docs fixes #2191
Context
In #2186 (in conjunction with BlueBrain/spack#1814 for the definition of
build_type=FastDebug
) I tried to enable some more compiler optimisation flags.This led to test failures (#2186 (comment), specifically in the test:neuron:nmodl:intel:legacy and test:neuron:nmodl:intel:shared tests) in the
coreneuron_modtests::test_pointer_py_cpu
test.Overview of the issue
The error message is:
which comes from the test using
coreneuron
withcell_permute = 0
:nrn/test/coreneuron/test_pointer.py
Lines 210 to 214 in c62dc10
(this is a bit fragile, for example reducing the number of cells from 5 to 1 in the first argument on
nrn/test/coreneuron/test_pointer.py
Line 155 in c62dc10
cell_permute = 0
to pass, butcell_permute = 1
still fails)The issue comes from the
BEFORE STEP
block:nrn/test/coreneuron/mod files/axial.inc
Lines 44 to 48 in c62dc10
which NMODL translates into a loop with
annotations.
If I understand correctly
pim
is aPOINTER
variable that refers to theRANGE
variableim
on a different instance of a mechanism derived fromaxial.inc
, and multiple instances of the mechanism may havepim
values referring to a common, other instance of the mechanism. This means that multiple iterations of the loop are updating the same value, which gives correct results if the loop is executed serially.Removing
#pragma omp simd
from this loop is sufficient to get the correct result.@nrnhines: what is your perspective here? Should this test pass as-is?
Expected result/behavior
Tests should pass when reasonable compiler optimisations are enabled, so this needs to be fixed somehow.
One opinion is that the mechanisms/test are buggy, as they declare that they are thread-safe:
nrn/test/coreneuron/mod files/axial.inc
Line 5 in c62dc10
and then assume non-existent atomic magic when updating
pim
.The documentation (also here), is not super clear on whether these mechanisms are breaking the
THREADSAFE
contract.The targets of POINTER variables are generally only set at runtime, so in the general case we must assume that all instances of a mechanism may have POINTER variables pointing to the same places, which would imply extra care is needed around atomicity etc...
NEURON setup
Minimal working example - MWE
were the options I used, with:
followed by
The text was updated successfully, but these errors were encountered: