Based on https://github.com/UniversalMediaServer/UniversalMediaServer/issues/1652#issuecomment-439658453
@theafien 240k instances of GlobalIdRepo is indeed too many so I am interested in seeing how that happened. I haven't had much to do with that area of the code before, the global ID feature was implemented by @SharkHunter and was a bugfix for how some renderers expected things to be, I think in particular Xboxes.
I am trying to learn more about that area now and this will issue will help me learn more about it :)
@SharkHunter any idea why 600 videos would result in 240k instances of this class?
@skeptical also did a lot of work on it, would be cool to see him around the place again :)
First up the globalIdRepo is to be able to use the remote control feature of DLNA. SInce each object will have a unique ID one render can tell another render to play video X. Now if you look more closely there is not 240K instances of globalIdRepo (there is always one) there is 240K of globalIdRepo$ID and that is @skepticals fault :). The reason for 240K is obviously that there are 240K DLNAResources created. THe question is why there are 240k DLNAResources created. I would say all other servers give each media a unique id but then agian they don't do discover on the fly.
@SharkHunter this resources include videos, srt, audio? my library have 1000 files including image, video, srt, txt, url shortchuts no more than that
@theafien it includes around 6 entries per video file, which are the virtual files within the #-TRANSCODE-# folders. There is certainly no reason for there to be 240k entries in there.
One of our users has a TV that keeps spamming browse requests when it is sitting there and might be creating a new set of global IDs sometimes - so that can eventually lead to 240k+, and it might be a common thing for devices to do. That is my suspicion about a lot of these memory problems anyway. I am approaching solutions from multiple angles:
1) Reduce the memory used by each resource, mostly by optimizing the compression of thumbnails since they are the thing that takes memory
2) Deduplicate the thumbnails in memory - we already dedupe them in the database but that needs to be abstracted to happen in memory too, and it has already been implemented in a fork so we will grab that soon
3) If the memory does eventually get full we will remove/reset the oldest entries so they can be rediscovered but aren't sitting in memory anymore, I have a PR for this already but it is incomplete
When those 3 things are done I think we will be much closer to stability over time
@SubJunk it can be much more than 6 entries per video when video has multiple audio and multiple subtitles and all transcoding engines are enabled.
@valib you're right, still though I doubt there are 240k valid entries
@SubJunk I made the PR #1683 for testing because there is a bug when all renderers are adding their own GlobalIdResource IDs for the same media.
@theafien what gave you this view? https://user-images.githubusercontent.com/18597242/48667190-1c087b00-eab8-11e8-964d-966cba631d4e.png
@SubJunk I used the YourKit Java Profiler, but others java profilers give the same result
Cool I'll try it, it looks useful
profilers are a hand on the wheel, even more to see unused memory references, leak memory, etc.. I do not understand about the UMS engine, but it looks like HTTP V2 Request Worker creates a lot of junk
Yes we do have a lot of junk being created and previously we got away with a lazy approach to it, but in v5 we added the global ID repository which prevents the GC, so now we need to actually fix the program so that our memory use isn't so crazy.
There is a lot of work to be done and we have known about it for a long time but haven't managed to tackle this yet - there is just so much to do on this program at all times - but it is my top priority at the moment so I hope to solve or at least improve it soon.
On a related topic, have you considered contributing code to UMS? ;)
I'm going to read a little about dlna, so maybe I can help :)
Cool! We could really use some extra Java-specific experience around here. I am trying hard lately to upskill myself in Java and having someone with the experience would be helpful for me and the product.
there is 240K of globalIdRepo$ID and that is @skepticals fault :)
Hello comrade @SharkHunter! whew! Glad you cleared that up, for a second I thought you might have something to do with it too :)!
Hello guys, I'm pretty rusty with the code (and java) at this point, but @SharkHunter's correct that if there are 240k globalIdRepo$ID there must also be just as many DLNAResources. Does your profiler confirm that @theafien? If the problem doesn't occur in ums 6.8 then this current issue is specific to ums7 I suppose.
Of course the underlying problem (regardless of this issue) is that there are many associated objects that hold references to each DLNAResource (including but not limited to GlobalIdRepo) and could be preventing it from going out of scope, so this needs to be audited (preferably by someone who likes headaches :).
@SubJunk I looked into a version of GlobalIdRepo based on weak references back in 2015 (in a long ago deleted branch, but luckily I still happened to have one diff file showing a commit # and lo and behold some git voodoo succeeded in resurrecting the associated commits from my repo tree, which I've updated and condensed in #1695). This may be a better approach than #1486 which tries to sidestep rather than solve anything (imho :). Please note that although I've updated the code in the PR and it compiles ok, I don't really maintain a media collection anymore and can't test it easily myself (I'd have to dig out an old hard drive. Side note: I still use ums 6.2.1, the last jumpy-compatible version, but mostly for web streaming), but it's at least a starting point.
Also note that this at best would just remove GlobalIdRepo from the list of suspects, I doubt that the root cause would be solved. For that I think all other objects need to hold weak references in a similar way to the PR (or just hold the id# and fetch the resource as needed for local operations). HTH :).
@skeptical wow, great to see you back here :) I hope you've been well, what are you working on these days?
A little history about v7 as I recall it (I'm sure others recall it differently as always :)), it was a fairly light release in terms of code changes, so I would be surprised if that caused this particular issue. The main thing that has caused our users lots of problems after v7 was released was just the fact that it runs a media scan on startup, and that revealed a lot of existing problems that happen when a scan is run. Since most users never ran a scan before, they hadn't run into the problems associated with it before v7. There were some other unrelated bugs introduced in v7 but I think we have caught most of them now.
Soon after the 7.0.0 release, in 7.0.1, I made the change you mentioned at https://github.com/UniversalMediaServer/UniversalMediaServer/pull/1486 and I totally agree it sidesteps the problem. It was more of a hotfix to make the program usable for the affected users and wasn't meant to be a permanent or proper fix. We had quite a few unhappy users and I didn't have the knowledge about how the global ID repository works in order to do a real fix and I knew it would take time to figure that out, so that made those users happy in the meantime.
Hopefully we are starting to figure out a better approach now and maybe we can even remove my hack after your approach is implemented :) I'm interested in the weak reference approach and I remember talking about it all those years ago, and infrequently along the way since then. I will test it when I can and report back. Thanks!
While weak references would prevent OOM, it wouldn't address the core problem - the fact that (DLNAResource) instances aren't properly managed and is cloned and whatnot. The problem that I see using weak references is that it is completely random (or, at the discretion of the GC) what instances will be free'd when. It's easy to imagine that it would be some kind of FIFO mechanism so that old resources would be released first, but that's not the case. The GC might just as well choose to release newly created resources that is needed. Some kind of cache solution that keeps track of when instances were last "used"/written/read might be more appropriate for that reason.
I very much doubt there is one bug behind this. I've used to called it a "flawed design" and the reason is that it seems to me that managing instances has never really been much of a consideration when the code has been made. If it was, I believe a lot of things would have been handled differently. The "global IDs" certainly made the problem worse, but as I see it's not the root cause. The root cause, as I see it, is the lack of a strategy/rules about how long and when instances should live and how this should be managed. Mindless cloning certainly doesn't make the problem any less.
The way the root folder works per renderer creating a "tree" of DLNAResources for each renderer combined with "global IDs" that make sure nothing is ever released isn't enough to explain the insane number of instances that is reported. But, if you factor in those things, cloning, no "accounting" of instances, careless instantiation whenever that's considered the laziest way to implement something and then combine this with other bugs that make sure the already out-of-control instance use is multiplied, you might actually get there.
I don't think there is a quick way to solve this, I think many things need to be cleaned up before things are under control. In the meantime, it's all about "damage control". As long as no effort is made to address the underlying lack of structure/control, I think all the workarounds in the world will always fall short.
@skeptical - I'm completly innocent :).
The discussion about "flawed design" is interesting. The design is not flawed. It might be buggy but it for sure aint flawed. It is designed to behave just as it does. PMS/UMS uses a dynamic tree building approach. This is the killer USP (for me). And it is harder to do. One could do like other DLNA servers do and that is simply to force database usage, In this easy case you simply just scan the files and add info to a sql db. Then a bunch of virtual folders are added which uses the db and it's indexing to access the file. This is good for file based media. It is worthless for dynamic (or internet) content.
Trees per render is not a bug it's a feature which actually gives cool possibilities to allow for "access" control etc. Now I personnaly use 6.2 as well and I have no issues.
The globalID are only useful for DLNA remote controls and can most likely be easily skipped by simply not storing the data.
@SharkHunter You sound like a broken record, even when these problems have become so obviously problematic. It is not "hard" to do the "dynamic tree building". On the contrary it's very easy and slow, you simply act as if every renderer exists in a vacuum and keep reading the same information over and over again. I just can't comprehend what you mean is "difficult" or complex about this approach. It also breaks DLNA by the way, because DLNA dictates that renderers can expect to get the same information when using the same requests. Renderers are thus allowed to cache information (and many do, leading errors and other "strange behavior"). The ID's are required to be persistent.
There's nothing "killer" about this except for "killing" memory and performance. As I've said many times before, there's not problem to use a database or other persistent storage and also allow "dynamic updates". It's so obvious it shouldn't be necessary to say, you can scan/resolve/parse things as often as you wish even if the data isn't just kept in memory. You simply needs to put a little bit more effort into updating your data than to delete everything and build them from scratch.
I've never said that a per tree "resource tree" in itself is a design flaw, but I certainly thing the implementation used here is. To me it's obvious that a such design should keep the media data (which does NOT need to exist on a per renderer basis) separate from the renderer presentation data. Resources should also be free'd when no longer needed. This is the flawed design.
I know you will probably never understand what I'm trying to say, so I'm not trying to convince you. As always I think it's important to correct erroneous information so that others, that don't have the knowledge to see the logical flaws, don't confuse it for facts.
I must say that I don't understand why you're also so defensive about this. Judging by the way you defend the design flaws that have been made, one would think that you're responsible for it all, and that you think that the needs of online content is much more important than the needs of local media. As far as I know, you're not responsible for most of this flawed design, even though you have contributed to make it worse. I find this puzzling. The idea that one kind of use is "more important" than another I reject by principle. Everybody should respect that people, and thus users, are different and have different needs.
Just a warning that I am monitoring the tone of this conversation closely and will take action if ad hominem points get out of hand. I will not tolerate a repeat of what happened years ago where we couldn't even have a discussion without it becoming upsetting, leading to people being driven away from the project.
We are all capable of communicating and collaborating nicely so let's focus on that and improve the program together :)
Another related thing that might help with the problem of too much memory use is mentioned in a todo at https://github.com/UniversalMediaServer/UniversalMediaServer/blob/master/src/main/java/net/pms/configuration/RendererConfiguration.java#L1611
I made a new issue for it https://github.com/UniversalMediaServer/UniversalMediaServer/issues/1696
Most helpful comment
Hello comrade @SharkHunter! whew! Glad you cleared that up, for a second I thought you might have something to do with it too :)!
Hello guys, I'm pretty rusty with the code (and java) at this point, but @SharkHunter's correct that if there are 240k globalIdRepo$ID there must also be just as many DLNAResources. Does your profiler confirm that @theafien? If the problem doesn't occur in ums 6.8 then this current issue is specific to ums7 I suppose.
Of course the underlying problem (regardless of this issue) is that there are many associated objects that hold references to each DLNAResource (including but not limited to GlobalIdRepo) and could be preventing it from going out of scope, so this needs to be audited (preferably by someone who likes headaches :).
@SubJunk I looked into a version of GlobalIdRepo based on weak references back in 2015 (in a long ago deleted branch, but luckily I still happened to have one diff file showing a commit # and lo and behold some git voodoo succeeded in resurrecting the associated commits from my repo tree, which I've updated and condensed in #1695). This may be a better approach than #1486 which tries to sidestep rather than solve anything (imho :). Please note that although I've updated the code in the PR and it compiles ok, I don't really maintain a media collection anymore and can't test it easily myself (I'd have to dig out an old hard drive. Side note: I still use ums 6.2.1, the last jumpy-compatible version, but mostly for web streaming), but it's at least a starting point.
Also note that this at best would just remove GlobalIdRepo from the list of suspects, I doubt that the root cause would be solved. For that I think all other objects need to hold weak references in a similar way to the PR (or just hold the id# and fetch the resource as needed for local operations). HTH :).