As I looked at addressing #3252, I noticed that the User-Agent string currently used in requests does not follow the order recommended by RFC 7231:
By convention, the product identifiers are listed in decreasing order of their significance for identifying the user agent software.
As described by #800 and shown in tests, Collect's User-Agent string looks like:
Dalvik/2.1.0 (Linux; U; Android 5.1.1; hi6210sft Build/LMY47X) org.odk.collect.android/v1.5.1-10-ge20fa334-dirty
The RFC only makes a recommendation so this is not a big deal but I wanted to at least make the decision explicit.
I'm guessing that it was done this way because the application ID looks odd at the head.
~Using the application ID is also unorthodox, I think. It has the nice property of ensuring that forks by default provide an identifying User-Agent string. But we could also easily put a string in a config file that forks don't get and have forks be identified as something like "UnidentifiedODKCollectFork".~ @zestyping discovered in https://github.com/opendatakit/collect/issues/3253#issuecomment-515638028 that Mapbox uses the application ID as well. Thinking about it more, I think it's the right way to go.
Is it worth changing either the order ~or the product name or both~? I think this is unlikely to be disruptive so I would tend to change it but I may not be thinking of ways this could break downstream tools. Naturally, it would reset quotas based on User-Agent strings.
CC @zestyping @yanokwa
Putting it first (before Dalvik) seems reasonable. I don't think we need to replace the app ID; I agree it's unusual but it also works well. Just my two cents. :smile:
Side note: I confirmed that the tile requests from OSMDroid do not use the same HTTP infrastructure. Those requests look like this, and would need to have their headers set separately:
GET /2/3/2.png HTTP/1.1
User-Agent: osmdroid
Host: 192.168.1.67:9999
Connection: Keep-Alive
Accept-Encoding: gzip
@zestyping I'm hoping you'll enjoy https://github.com/opendatakit/collect/pull/3254 (writing the description up right now).
I just tried the same experiment with Mapbox, and discovered that it makes its own User-Agent string as well:
GET /17/21017/50624.png HTTP/1.1
User-Agent: org.odk.collect.android/v1.23.0-beta.4-15-g4a4742c-dirty (3397) Mapbox/7.3.0 (4260644) Android/28 (armeabi-v7a)
Host: 192.168.1.67:9898
Connection: Keep-Alive
Accept-Encoding: gzip
We can't change the Mapbox one, though. It's statically defined:
Great to see that Mapbox uses the application ID as well. Upon further reflection, I think we should keep that part.
Seems totally fine for Mapbox to set its own User-Agent string since it's just communicating with its own servers, anyway.
If/when we give it tile URL templates (e.g. if we use it to replace OSMDroid), it'll fetch those tiles from whatever other servers using that User-Agent. I like their User-Agent string though; it looks totally reasonable to me.
Oh, right, I hadn't thought about that! But yes, I agree that it's reasonable even in that context. If servers are doing throttling based on User-Agent string, they probably have some heuristics for identifying the application part which hopefully wouldn't get fooled by that switch. Though I think we're fine on terms of service other than openstreetmap.org
The goal is to get to something that looks like Dalvik/2.1.0 org.odk.collect.android/v1.5.1-10-ge20fa334-dirty (Linux; U; Android 5.1.1; hi6210sft Build/LMY47X)
@opendatakit-bot claim
I wrote above that "The goal is to get to something that looks like Dalvik/2.1.0 org.odk.collect.android/v1.5.1-10-ge20fa334-dirty (Linux; U; Android 5.1.1; hi6210sft Build/LMY47X)" but I now don't understand why. It seems that if we're going by decreasing order of usefulness for identifying the agent, it should be org.odk.collect.android/v1.5.1-10-ge20fa334-dirty Dalvik/2.1.0 (Linux; U; Android 5.1.1; hi6210sft Build/LMY47X).
@SaumiaSinghal, @seadowg, can either of you think of a reason of why I may have split the host identifier? I wonder whether I was just going too quickly and made a mistake.
No actually! Your new example makes more sense to me.
Yes, also the tests don't fail like that. :)