I鈥檇 be in favor of building out a generic x86-64 chanel/label which uses none of nocona optimizations.
Issue:
Environment (conda list):
$ conda list
Details about conda and system ( conda info ):
$ conda info
I'm okay with changing -march=nocona to -march=k8, and rebuilding a few packages, but rebuilding all is a big ask.
This seems fine. We can start with the set that were selected for aarch/ppc64le and then add others as people need them.
Do you have statistics and benchmarks you can point to for a costs/benefits analysis here?
I am not sure what exactly you are looking for @mingwandroid, but I don't have anything offhand.
I am an academic and keep running into very old machines that lack more recent instructions like SSE3. It is hard to say how many people have had this problem, but I have seen it twice.
Per the discussion above, the idea would be to have a very small channel with generic x86-64 builds for these cases.
I don't feel very strongly here, so happy to be punt or be convinced otherwise! :)
It might be nice to process our binaries to figure out when they need rebuilding for this channel. Many won't touch SSE3 for example and rebuilding those would be a bit of a waste.
For sure! The other way to do this would be to use virtual packages and have conda build embed the required instructions directly in the package requirements. I am guessing you have thoughts on that.
I wish I had time for thoughts! Sounds nice though. I've not looked at virtual packages yet.
This would be nice to have to swich on as needed
Ok. I鈥檓 game to work on this after the openmp stuff goes through. So the simplest path is a migrator with some custom compilers. I鈥檓 more inclined to go after the virtual package since it solves bigger problems. TBH we probably need both since if some package won鈥檛 install due to having say sse3, we will want to build a version without.
I don't think we need a custom compiler or a custom channel. Packages that really care about sse3 have runtime detection of the cpu like openblas, gmp, etc.
Ok. I鈥檓 not going to push it. I鈥檝e seen illegal instruction errors on two different machines now. It could be people using march=native too.
Closing for now.
What I meant was that I'm fine with changing nocona to k8.
Ahhhh. Yeah we could go that route.
Bumping @jjhelmus @mingwandroid since I think they were involved when defaults set its march flags. What do y'all think if conda-forge goes with k8 instead of nocona?
k8 would be fine. From what I recall nocona was selected as k8 was not documented in the GCC manual at the time. It looks to be documented now.
The only CPU that I am aware of that nocona will causes issue with but k8 should be alright with are the first generation 130 nm SledgeHammer Opteron processors which lack SSE3. I'm surprised at how long these seem to remain in service.
I have a suspicion that some of the illegal instruction errors are a result of virtualization rather than limitation of the actual hardware.
Good call on virtualization!
It appears that the node I had trouble with should have this instruction, but it is not listed in /proc/cpuinfo
processor : 15
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6128
stepping : 1
microcode : 0x10000d9
cpu MHz : 2000.000
cache size : 512 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 8
apicid : 39
initial apicid : 23
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate retpoline_amd ibp_disable vmmcall npt lbrv svm_lock nrip_save pausefilter
bogomips : 4000.04
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
It looks like the kernel is listing this as pni and not sse3 which are the same instruction sets as far as I can tell. I vote to shelve this convo for a while and see if any more issues bubble up.
Nothing bubbled up for over a year here. I am going to close this for now as it seems this is not needed.
Most helpful comment
This would be nice to have to swich on as needed