a7 is completely unreliable and at most likely two parts are involved (CPU causing segfaults and something unknown causing daily crashes, see #160). Our options are:
What are your opinions? (Especially @MasterToney and @ingwinlu)
The problems are 100% load related (as crashes were much less frequent before switching over to the new build system).
Let's exchange the CPU first before making any further steps.
The problems are 100% load related (as crashes were much less frequent before switching over to the new build system).
It is a bit more complicated. Several months ago (when the hardware was bought) intensive load tests were done, without any errors. Also the gcc compilation was done for many hours. It might be that these tests were too synthetic or for some other reason the problems were not triggered.
Let's exchange the CPU first before making any further steps.
Let us see if AMD ships something, I did not hear from them yet.
synthetic tests
Pretty sure the segfault problems are not even occurring if you use clang so that is entirely possible.
However I am surprised that no problems with the PSU got detected.
Has v2 crashed since replacing the PSU + CPU?
v2 has uptime
20:41:59 up 2 days, 6:01, 1 user, load average: 0.09, 0.05, 0.01, so afaik there was no crash after replacing the power supply unit.
A relative minor change would be to swap a7 and v2. Then crashes of a7 would not influence v2. (The assumption is that a7 crashes more often than v2.)
I have a i7 system that we could also use in the same server room.
Any suggestions which SSD we should buy for a7?
@markus2330 regarding the SSDs it depends what we can afford. I have some good experience with Samsung SSDs. Here are some examples of current SSDs. There are older models, but they are not cheaper 🙃.
For SATA:
For M.2:
@mpranj Thank you for the tip! I purchased a Samsung EVO. It should arrive within next week.
Seems like we have to exchange the main boards, too. (v2 had a freeze without any output, so the power supply unit is unlikely to help for a7, too.)
Does anyone has experience with mainboards for Ryzen?
I have no experience with them, but it seems that ASUS boards are quite popular judging from reviews.
One thing that comes to mind talking about this: I've read that Ryzen can be sensitive to RAM choice. They have a list of validated RAM. Sometimes the mainboard vendors also maintain such a list. It is of course possible that other modules are supported.
Do you have a particular ASUS board in mind?
A wrong RAM choice should have been caught by RAM tests, shouldn't it?
Do you have a particular ASUS board in mind?
Unfortunately, I don't really know the exact hardware of a7. Are we talking about these here? There would be newer boards with newer chipsets we could try. There are only older ASUS µ-ATX boards. Maybe Anton should take a look before we buy anything incompatible.
A wrong RAM choice should have been caught by RAM tests, shouldn't it?
Probably. I'm just throwing in suggestions. There are different reports of instability with ryzen. Someone who has a similar board (but different chipset) reports (sorry about the production quality) that he fixed it in the bios by changing some Pstate settings. It might be less work to just pick a new and stable mainboard in the first place, though.
Are we talking about these here?
Yes, I did not know about that page. Good to have documentation :smile:
he fixed it in the bios by changing some Pstate settings.
We disabled the C6 state but in the latest version of the BIOS these settings do not exist anymore.
It might be less work to just pick a new and stable mainboard in the first place, though.
Yes, but which one?
It seems we don't have much choice if we can only fit µ-ATX boards.
All above support overclocking and other features we supposedly don't need. The board which is already in a7 was a good choice judging by tech specs. I can't think of what is wrong there.
As I said, before you order anything, maybe let Anton or someone else take a look at it.
If we can fit larger boards, there might be better options.
Some recent changes:
a7 is now reliable. We could add an SSD if someone has time to set it up. @MasterToney?