I did a quick run with profiling enabled to see where all the CPU cycles go and discovered this interesting bit:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
29.43 3350.44 3350.44 4043913295 0.00 0.00 StructCPID::StructCPID(StructCPID const&)
28.19 6559.47 3209.03 4045384901 0.00 0.00 StructCPID::~StructCPID()
A majority of the time spent goes to constructing and destructing StructCPID objects. I took a peek at how these are used and saw a lot of these usages:
std::string CPIDByAddress(std::string address)
{
//CryptoLottery
for(auto map<string,StructCPID>::iterator ii=mvMagnitudes.begin(); ii!=mvMagnitudes.end(); ++ii)
{
StructCPID structMag = GetStructCPID();
structMag = mvMagnitudes[(*ii).first];
if (structMag.initialized && structMag.cpid.length() > 2 && structMag.cpid != "INVESTOR" && structMag.GRCAddress.length() > 5)
{
if (structMag.GRCAddress==address)
{
return structMag.cpid;
}
}
}
return "";
}
Disregarding the extra copies of std::string objects for now, this will:
I think all this can be replaced with:
const StructCPID& structMag = mvMagnitudes[(*ii).first];
No extra copies, no destructors.
I'll give it a go in my wallet and see what happens.
I think I have tracked it down to this: https://github.com/gridcoin/Gridcoin-Research/blob/master/src/main.cpp#L6304
Each and every time the function is called, the incoming map is copied. That explains the absurd amount of copy constructor calls.
Also, what is that function intended to do with the incoming map? It adds a new object to the map but since it's only a copy that insert isn't propagated through the rest of the program. Is the intention that a new object should be inserted into the caller's map, not the copy?
I will submit a PR for this as a fix for #140, not for this one as the fix that covers this issue is much, much larger in terms of SLOC and files.
Hmmm, this looks pretty major and ingenious, thanks.
So before I try to answer, what kind of performance gain did you experience by changing the struct initializer to directly pull the data from the map without copying it?
Rob
Well, in comparison by copying the entire map it's minor. The big benefit is the change in #158 whereas the rest is just a bonus. By passing all StructCPID objects by reference instead of value you gain a lot of small increases but not really more than "it's common practice, there will be less copying" (at least the way they're called now). By passing the maps by reference instead of value as in #158 you drop the CPU usage by a massive amount.
So the reason I added the function initially was about a year ago, we were experiencing some illegal memory overwrite exceptions inside TallyNetworkAverages, and it was right around the line after we initialize the structcpid, then we start setting values, and someone running valgrind on linux picked up situations where the code wrote into protected memory and core dumped. So I made that function that sets the basic remnant of the object before using it - but to this day never realized it was copying every time it accessed it because it was not being passed by reference, so this is indeed a great find, great work, thanks!
I want to pull 158 in and test it... Im trying to get to it, the issue is I have an issue Im working on right now, but shouldnt be much longer.
I hope this fixes our performance issue people are complaining about. I was always confused by it because on windows, my box runs at 1% utilization, but hey it could be a thread-lock or contention issue on other platforms or something.
Oh btw Den, can you do me a favor? Can you leave your code running for a couple days and verify the wallet does not crash on linux with the new code? Ill test Windows over the next few days so we can have a safe main branch commit.
I'll let it run also (with the new merge) on a linux64, no crash since 1h
(waiting also for a block to be created which would make me calm)
I have it running on two boxes since yesterday.
Which compiler is used on Windows? It could be that MSVC is smarter and can see that the map is a local copy and simply optimize it out.
so, after ca. 7h:
The RAM performance on Windows Server 2008 R2 for your info is steady between 425Mb - 450Mb of RAM.
Perhaps a testnet test would be an idea although my present testnet wallet running on Win10 at the moment is showing difficulties with PoR Difficulty stuck at 0.00 and has been that way for some days.
The rest of the optimisations, including the one in the first post, are negligible so I'm closing this one.
Most helpful comment
I think I have tracked it down to this: https://github.com/gridcoin/Gridcoin-Research/blob/master/src/main.cpp#L6304
Each and every time the function is called, the incoming map is copied. That explains the absurd amount of copy constructor calls.
Also, what is that function intended to do with the incoming map? It adds a new object to the map but since it's only a copy that insert isn't propagated through the rest of the program. Is the intention that a new object should be inserted into the caller's map, not the copy?
I will submit a PR for this as a fix for #140, not for this one as the fix that covers this issue is much, much larger in terms of SLOC and files.