Hi,
I am working on Nvidia Shield, with and external USB 3 storage, the storage is set as "This device's storage", the apps data is moved to the storage.
I am downloading DASH streams, using DownloadService and DownloadManager.
There is performance issue because of the quantity of files generated by exoplayer Downloaders.
After having downloaded a few GB of data, I now have thousands of .exo files in my download folder. Since then, everytime I boot my device when the external device is mounted, I can see the process /system/bin/sdcard is taking up to 40% of cpu for a while. It's probably indexing all these files.
Same thing when I start my app, the first time I will play a media: the same android indexation process will start and affect my device perfomance, my media will only play after a few seconds.
Once it is indexed every other media will play instantly.
Moreover, exoplayer's actionFile already has its own indexation. So this heavy system indexation process does not really help exoplayer.
Handling thousands of files on the storage is really tough for the system.
Is there any way to optimize that?
Once a media is downloaded, do you have a way to merge or archive all it's .exo files in just 1 file?
Or maybe there is another solution, please advise.
Thanks
Which directory do you use for downloading? If you use the dir returned by Context.getExternalFilesDir(), I think it shouldn't be indexed by the system.
Yes, I am using Context.getExternalFilesDir(null), just like in the demo.
Now I am thinking about creating a new folder for each media, this should help the file system. But I will lose the ability to make multiple downloads, because it forces me to instantiate a new download manager everytime.
It seems the download manager is made to work in only one folder for all simultaneous downloads.
But do you think there is a way to make the manager work with simultaneous downloads in separated folders?
If the issue is the system is scanning too many files, even if you put each download to a different folder, the total number of files, won't be changed. If the issue is a single folder has to many files, than it might help.
You're right DownloadManager/Cache works with a single folder. If you want to use multiple folders then you need to create multiple DownloadManagers and Caches. Also for playback you need to use the right Cache instance.
Could you try putting an empty file which is named ".nomedia" in to the parent folder of the cache folder. Not in to the cache folder as SimpleCache would delete it when the app starts.
One more thing, please provide the information requested in the issue template as much as possible.
@erdemguven ,
If we cannot reduce the number of files and if creating multiple folders won't help the system, I believe it could at least help the SimpleCache and the player launch faster.
Yesterday I did try the .nomedia file, but in the cache folder and in order to avoid to have it deleted before android indexation, I created it each time I start y service and each time I initiate my cache, but I was still not sure the file would be deleted before android indexation process. I did it that way because I was not sure creating it in the parent folder would affect the subfolder.
Anyway, It did not help the sdcard process, maybe it only affects the android.media process.
I'm going to try it in the parent folder, to make sure it is never deleted, and I will let you know what happens.
Otherwise:
About using multiple folders, instead of creating multiple downloadManagers, I was wonderding if I edit DownloadService to launch each DownloadAction with a new DownloadConstructorHelper (new cache instance and folder), Do you think it would work this way or something else will break?
It means I would still have only one DownloadManager with one action file and subfolders in my cache folder.
Thanks for your help.
Issue Template:
Android file system performance issue because of too much .exo file in the DownloadManager's cache folder. the process "/system/bin/sdcard" consumes a lot of ressources at each android boot and each cache initialization (after force killing and restarting the app for example)
Describe how the issue can be reproduced, ideally using the ExoPlayer demo app.
The content is downloaded on the local storage. it's Dash Streams with 4 second segments and Drm encryption.
2.8.0
Nvidia Shield, Android 7.0
Perhaps you can easy try if multi folder improves sdcard issue by manually moving files to different folders.
And about SimpleCache, is there a way to generate an index for each media? so that the SimpleCache instance won't have to seek in all the cache
Are the methods getKeys() and getCachedSpans(key) what I need?
For my use case, is that SimpleCache instance appropriated?
SimpleCache(downloadDirectory, NoOpCacheEvictor())
about the .nomedia file: I retested in the parent folder, it does not help the sdcard process
Multiple folders is the solution to this bug.
The android system/bin/sdcard process stopped taking all my device's resources, and the performance has improved when having a lot of data.
Now that I store one media per folder, when I want to play a video from one of my folders, the Cache instance and the player initialization are instantaneous, even just after boot and force kill/relaunch.
@erdemguven I didn't need to create multiple DownloadManagers, I wanted to keep the DownloadService to stay as functional as before.
The best solution I found is to generate a new DownloadConstructorHelper for each download, it does not change a lot the initial code, it would probably be great to have this in the next versions.
Having multiple folders implied to also have multiple cache instances.
Details:
I have modified the DownloadManager and DownloadManager.Task : Now instead of passing a DownloadConstructorHelper into the DownloadManager, I pass my Parent Folder as a File object, and in the Task I generate a new child folder and generate a cache instance by calling a static synchronized getCache() method and then I build a specific DownloadConstructorHelper.
My getCache() method is in My Service companion object, I also use it for playback: it checks and stores cache instances in a HashMap. This way, from the id of a media I can check if a cache is already initialized in its folder (for example if I download and watch at the same time).
The DownloadService still works the same way, I still have one actionFile.
Thanks for investigating this. I'll look into this and decide to use either a solution similar to what you have done or modifying SimpleCache to place each media under a separate folder.
Sorry, I just noticed, I forgot to recommend another way to reduce the number of exo files. You can increase maxCacheFileSize when you create CacheDataSource. By default it's 2MB so downloaded media is divided to exo files max 2MB size. Passing a bigger number will make it create fever exo files. Problems with a bigger exo file size are, increased risk of losing more data in case the app crashes and you won't be able to read the currently downloaded exo file until it reaches to the max size or the end of stream.
@erdemguven I followed your recommendation and changed the maxCacheFileSize to 20MB for testing, but it seems it cannot help because of my source's segment size:
-My source has short segments of 2.5MB. When I set the maxCacheFileSize to 20MB the downloader does not build files bigger than 2.5MB.
-It only works in the other way: If I set the SIZE to 1MB, all my downloaded files will be 1MB instead of 2.5
-And I also tested with lower bitrate media, all my downloaded files are never bigger than 500KB.
So the current code can divide chunks bigger than maxCacheFileSize, but it cannot combine chunks smaller than it.
Is there anything else I can do to force building bigger files?
I also noticed something else: in my download folder I always have all my chunks of 2MB or so and I also have as much files of 100KB~200KB what are they? is it audio? (they also are .exo)
Unfortunately, there is no way to combine segments in to a single file.
Small exo files are probably the audio track. Also if your video segments are 2.5MB, there must be two exo file per segment when maxCacheFileSize is 2MB: 1 2MB file and 1 0.5MB file. So increasing the maxCacheFileSize to > 3MB still helps slightly.
Alright, thanks for your help @erdemguven.
In a next release, it would be great to make editable the maxCacheSize of the buildCacheDataSource of the DownloaderConstructorHelper.
And also use one folder per media.
@kvillnv Thank you for your tips. By the way, it is able to apply maxCacheSize to DownloaderConstructorHelper by inject custom CacheDataSinkFactory.
@erdemguven I found critical bug related with this issue. If user download more than 65535 files in external memory which is the maximum number of FAT32 format, new downloaded chunks overwrite existed chunks. In my case, the average size of chunk is 55KB, and the size of cache folder is around 3.74GB, 65535 * 55KB.
@KiminRyu, thanks for letting us know. It looks like we should definitely support multiple sub cache folders.
@erdemguven I confirm multiple folders really helps a lot, so far performance and management is a lot better.
@KiminRyu maybe you can look into EXT4 format which supports 4 billion files.
@kvillnv Do you have any plan to send PR about sub cache folders? or could you leave some details about your changes in solution? This bug causes a lot of problem for my customers. It will be really helpful. Thank you!
@erdemguven Any update for sub cache folders? PR?
My understanding from this thread is that splitting the cached files into multiple folders improves performance, even though the total number of files present is the same (actually slightly greater, if you count the additional directories as files).
If that's the case then we can simply shard the cached files between a number of directories, or in an approximately balanced tree of directories. There's no particular need for each directory to represent anything logical, like a piece of content, and it's much simpler to implement without trying to do this.
To make the change, I think we need some idea of how many files you can put in a directory before performance starts to degrade. Does anyone have good data for this? Obviously we need to stay under 65535 for the FAT32 limit, but it sounds like performance starts dropping off well before that. We could pick something arbitrary like 1000, but it would be preferable to make the decision based on actual data.
@ojw28 I had up to 250 folders for a total of 500GB. Largest folders were containing up to 5GB splitted in 2000 files. Performance is optimum.
Using a usb HDD@5400rpm on an average performance Android TV device.
Also, having one piece of content per folder has another advantage: the folder can be deleted before launching a removeAction in the DownloadService so that the deletion is done instantly.
Thanks for the information!
Also, having one piece of content per folder has another advantage: the folder can be deleted before launching a removeAction in the DownloadService so that the deletion is done instantly.
We're aware of the benefit, but the approach doesn't fit nicely with (non-download) caching use cases. In particular:
CacheDataSource) to be able to map each request to a unique content identifier. Plumbing for that doesn't exist currently, and it's unclear whether we really want to be adding it for this.So we're trying to avoid having to go down that route if possible. We believe most of the latency associated with content deletion is actually due to repeatedly re-writing the cache index every time a segment is removed. We've already addressed this in the dev-v2 branch, so deletion should be much faster there.
As an aside: One content per folder probably works really nicely for apps that download HLS streams, but it's not going to work nicely for apps that download 10,000 small MP3 files :). In that case you'd end up with 10,000 directories containing one file in each, which I suspect probably suffers from the same performance issues described in this thread. Approximately balanced (but otherwise arbitrary) sharding probably helps in both use cases.
It's also likely we can just make fewer cache files in the first place for some use cases.
We have a pretty good understanding of the problem now. We think the issue is caused by an O(N) cost of querying file metadata (in particular the length of the file) on certain file systems (FAT32), where N is the total number of files in the parent directory.
Cache initialization requires querying file metadata for every file, which results in a complexity of:
= (cost-of-listing-N-files) + N*O(N)
= O(N + N^2) = O(N^2)
When the N files are instead split equally across M sub-directories, the cost becomes:
= (cost-of-listing-M-directories) + M * ((cost-of-listing-(N/M)-files) + (N/M)*O(N/M))
= O(M + M*((N/M) + (N/M)^2))
If you take slices through (M,N) space where N or M are fixed, this is:
= O(M + 1/M) for a slice of (M,N) space where N is fixed
= O(N^2) for a slice of (M,N) space where M is fixed
The cost still grows quadratically with N, but increasing M (i.e. splitting between multiple directories) can significantly decrease the constant factor. There's a sweet spot of M after which further increasing it will increase the cost. The sweet spot is different for different values of N, and I think the best that can be done to find it is by measuring on some real world devices.
Our plan to fix this is:
N):M sub-directories. We need to do this anyway to avoid the possibility of hitting the 65535 file limit. The value of M and exactly how this works is to be determined, although the value may become relatively unimportant if we also do the point belowN and make the whole thing linear. If this can be done then the value of M probably becomes relatively unimportant (picking something like 100 would probably be fine)This should be much improved now. Please give the dev-v2 or dev-v2-r2.10.0 branches a try. The 2.10.0 release will be available in the next week or so.