Mtasa-blue: Show initial download size in server browser

Created on 26 Apr 2019 · 16Comments · Source: multitheftauto/mtasa-blue

Is your feature request related to a problem? Please describe.
Some servers use custom models excessively and not uncommonly end up with several gigabytes of initial download. For people with slower internet connections or just people who don't want to download that much, this is a problem.

Describe the solution you'd like
I suggest adding an accurate total download size field to the server browser (as already suggested in PR #154).
On the one hand, this leads to more transparency and on the other hand, it might put some pressure on server owners not to exaggerate using custom models too much if it's not necessary.

Additional context

PR #154 was closed due to not being accurate enough, but could be used as a base.

enhancement

Source

Jusonex

👍6

Most helpful comment

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

Deihim007 on 26 Apr 2019

👍3

All 16 comments

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

Dezash on 26 Apr 2019

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

Deihim007 on 26 Apr 2019

👍3

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

You can't account for cached files by just having the download size.

Dezash on 26 Apr 2019

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

You can't account for cached files by just having the download size.

Then what about this, the method you described, but not like doing it for every server, when you click on a server name a panel be opened, with the info 🤔

Deihim007 on 26 Apr 2019

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

You can't account for cached files by just having the download size.

Then what about this, the method you described, but not like doing it for every server, when you click on a server name a panel be opened, with the info 🤔

Nobody uses the info panel and I did not suggest doing it for every server. I suggested doing it for the servers visible on the screen.

Dezash on 26 Apr 2019

Wouldn't the server browser slow down if MTA would be downloading the file list and checking what is cached for each server in the browser? Could it be possible to perform an asynchronous check only for the servers that are visible on the screen?

The way you described yeah it would be slowed down,
But if the initial-size would be sent to serverlist server with the query request, then there won't be any performance issue.

You can't account for cached files by just having the download size.

Then what about this, the method you described, but not like doing it for every server, when you click on a server name a panel be opened, with the info 🤔

Nobody uses the info panel and I did not suggest doing it for every server. I suggested doing it for the servers visible on the screen.

How about sending the download file list to the server list?

Deihim007 on 26 Apr 2019

How about sending the download file list to the server list?

That's basically what I said in my first post.

Dezash on 26 Apr 2019

Yeah but what i said is different, it doesn't need asynchronous check.

Deihim007 on 26 Apr 2019

Yeah but what i said is different, it doesn't need asynchronous check.

By asynchronous, I meant checking the download size while scrolling down the list instead of checking all the server at once.

Dezash on 26 Apr 2019

It's not like a server file-size gonna change every second, so a first size report on server start with query request should be sufficient. An static value like the server name.

Deihim007 on 26 Apr 2019

It's not like a server file-size gonna change every second, so a first size report on server start with query request should be sufficient. An static value like the server name. ~ @Deihim007

I believe @Dezash is talking about client-side latency in looking up + hashing all the client resources. As long as we store the size of each resource, there is no problem in updating the file size when a resource is updated.

A potential solution to the client slowdown:

when resources are downloaded:
- filenames+hashes are stored in a cache somewhere (probably an sqlite database?):
- each resource has their download time recorded as well
when browser loads / MTA loads, we check that the cache is still valid
- if client resource folder was updated _after_ the resource download time (stored in cache), we invalidate that resource and re-hash each file (and store it in the cache)
when we query for a server's size:
- the filenames+hashes+filesizes of all resources are received
- if server filename/hash DOES NOT MATCH the cache filename/hash, we know that the file will be redownloaded when you join the server. increase the filesize by the server entry filesize
- if the server filename/hash DOES MATCH the same details in the cache, the resource will not download

Problems with the above solution:

i do not think that the OS updates "last update" if subfolders are updated. this is a problem for resources with subfolders. potential solution is to store the last update of all subfolders too.
folder size updates if a resource uses file functions.
- we could instead use the last_update = max(map(get_last_update, filepaths)), but then we still need to do disk (metadata) reads for every single file in every single resource.
- However, since this is done only once (on MTA load), it will probably not be a bottleneck.

Notes:

cache validation only needs to be performed on MTA load because MTA will _not contain outdated_ resource information after joining the server. this is because the server resource download procedure will include updating the cache
this cache must only be used for server browser size calculation, not for use elsewhere (i.e. not for use in determining which files need to be redownloaded during the resource download stage).
- this is because the cache can be manipulated (and folder size obviously cannot be trusted)
Calculation of cache size.
- Size of each resource entry in the cache: 280 bytes
  - 16 bytes (128 bits) per MD5 hash (may be CRC instead, which is 32 or 64 bits)
  - 32,767 is the real max filepath length, but it's most often 260 characters - so 260 bytes per filepath.
  - 4 bytes for a filesize (unsigned int)
- My resources folder has 3,910 files. So 3910 files * 280 bytes = 1094800 bytes. Which is ~1mb in my case.
- @lpsd has 47,893 files in their resources folder. That makes the cache ~12.8 MiB.

qaisjp on 26 Apr 2019

if client resource folder was updated _after_ the resource download time (stored in cache), we invalidate that resource and re-hash each file (and store it in the cache)

What if the server updated the resource? I think there should be an expiration time of those hashes.

when we query for a server's size:

the filenames+hashes+filesizes of all resources are received

if server filename/hash DOES NOT MATCH the cache filename/hash, we know that the file will be redownloaded when you join the server. increase the filesize by the server entry filesize

if the server filename/hash DOES MATCH the same details in the cache, the resource will not download

When/how do you suggest querying for server size though? If the client were to query around 5000 servers on every startup, downloading its filenames, hashes, filesizes and then checking each hash, it could take up quite a bit of time.

Dezash on 26 Apr 2019

What if the server updated the resource? I think there should be an expiration time of those hashes.

The server always sends over the hashes for updated resources. If the user's client resources are out of date, their hashes will not match the server's hashes. This means that the mismatched file needs to be redownloaded and therefore the size added to the client's perceived server download size.

An expiration time for hashes would not be necessary as they would always be up to date (because of the download time heuristic).

Generally I feel that expiring resources might be a good idea, though. i.e. resources that servers have not used in a long time should be deleted. This specifically can be discussed in a separate issue.

Also, servers sending the resource structure (filename) across make it possible for us to provide a "delete resources" button after disconnecting from a server. Again to be discussed in a separate issue (there are a few usability issues I can think of).

When/how do you suggest querying for server size though?
If the client were to query around 5000 servers on every startup, downloading its filenames, hashes, filesizes and then checking each hash, it could take up quite a bit of time.

The "filename data" (filenames, hashes, sizes) is sent alongside with the other metadata. Indeed this might end up being a lot of information - we'd have to calculate the average number of files in started resources per server and determine whether or not it would be a great increase.

In the (very likely) case that it is too expensive to send this information over at once, we can still:

calculate and display it all asynchronously, with information loaded separately to the main data
only do it for ~100 or so on-screen resources
we can have two columns: estimated download size, total server size. the sorting feature would be disabled for the first column as it would require us to calculate the estimated size for all 5000 servers

I'm not sure if hash comparison will take a while, but I may be underestimating it. We also don't have to be perfectly accurate, so we can apply some other heuristics or cheat a little.

One way of cheating at hash comparison would be to ignore filenames and just use Bloom filters (on the resource level). See "yourbasic.org Bloom filters explained" (or this other unverified non-Go related tutorial).

Actually, for bloom filters, we don't even need to ignore filenames, we can test for membership of concatenated strings like so: hash + "-" + filename.

Bloom filters on Wikipedia:

The Google Chrome web browser used to use a Bloom filter to identify malicious URLs. Any URL was first checked against a local Bloom filter, and only if the Bloom filter returned a positive result was a full check of the URL performed (and the user warned, if that too returned a positive result).

We only want an estimate, so we would not need to perform a full check. We would have a bloom filter for each resource the client has already downloaded, and (I assume) files are usually spread thin across many resources, so there's probably a low probability of failure as well (someone should verify this).

Also, further reading about why Chrome no longer uses Bloom filters:

PrefixSet as an alternate to BloomFilter for safe-browsing.

The safe-browsing prefix data is uniformly distributed across the 32-bit integer space. When sorted, the average delta between items is about 8,000, which can be encoded in a 16-bit integer. PrefixSet takes advantage of this to compress the prefixes into a structure which is relatively efficient to query.

qaisjp on 26 Apr 2019

The sorting issue could probably be solved without the need of two columns. The values could be initialized using the estimated download size and asynchronously updated with the accurate download size (prioritizing the servers that are visible on the screen).

Dezash on 26 Apr 2019

🚀1

The sorting issue could probably be solved without the need of two columns. The values could be initialized using the estimated download size and asynchronously updated with the accurate download size (prioritizing the servers that are visible on the screen).

That's an even better idea 💖

qaisjp on 26 Apr 2019

We might go through more than two prototypes (the first one being #154) which we don't have enough room for on this issue, so I haven't marked this as "Likely Accept", and have marked it as "Accepted" instead.

qaisjp on 15 Mar 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings