The line in Gemma that leads to this problem is (after removing typedefs)
Kokkos::View< Kokkos::complex<double> **, Kokkos::HostSpace > d_systemMatrix ("systemMatrix" , 56925, 56925);
This constructor requests a View with 3240455625 entries. This number fits in int64_t, but not int.
At some point, the view constructor calls get_work_partition, a function in Kokkos_HostThreadTeam.hpp:
std::pair<int64_t,int64_t> get_work_partition() noexcept
{
return std::pair<int64_t,int64_t>
( m_work_range.first * m_work_chunk
, m_work_range.second * m_work_chunk < m_work_end
? m_work_range.second * m_work_chunk : m_work_end );
}
In get_work_partition, m_work_range.second is type int64_t but m_work_chunk is type int. Multiplying them together casts the result to type int. In our specific case, m_work_range.second * m_work_chunk is -1040187392, which is always less than m_work_end. Therefore, the first return part of the ternary expression is always used and get_work_partition returns a pair whose second value is negative.
Later, in Kokkos_OpenMP_Parallel.hpp, the negative value is cast to an unsigned long (a number larger than the number of entries in the View) and the constructor for Kokkos::complex
Changing get_work_partition to cast m_work_range.second and m_work_chunk to type int64_t in the following way and recompiling appears to fix this problem for Gemma.
std::pair<int64_t,int64_t> get_work_partition() noexcept
{
return std::pair<int64_t,int64_t>
( m_work_range.first * m_work_chunk
, ( int64_t ) m_work_range.second * ( int64_t ) m_work_chunk < m_work_end
? m_work_range.second * m_work_chunk : m_work_end );
}
However, there are other lines of code in Kokkos_HostThreadTeam.hpp that have the same ternary if statement. I expect that the similar casting to int64_t should be done to other parts of get_work_partition.
Can you confirm that I understand what is happening in the View construction correctly? If so, for the sake of a short-term solution, it would be nice to know which other statements should be modified in our local copy of Kokkos so that we can run problems of this size.
As always, thank you for your time.
@bfzinser I remember fixing one integer shortening issue in that function, but I guess I didn't fix it fully. We'll get this fixed and also add a unit test to confirm its really fixed.
For what it's worth, it appears that my naive fix works for at least --kokkos-threads=2, but it does not work for --kokkos-threads=16.
Tests are running, gonna have a pull request in a bit.
Should be fixed. Unit Tests was added and runs nightly. The unit test is not on by default unless run on Sandia machines where I know I got the 50GB required to run the test.