Api: Increase rate limit?

Created on 29 Jul 2018  Â·  36Comments  Â·  Source: Bungie-net/api

I've been working on clan automation. I've got a database populated with all sorts of activity data and stat data for my clan's bungie activity history and discord activity history. The bungie loader takes about 30 minutes to update 450 people's data in the database. I've identified a few things I can do to speed things up on my end, but the majority of the time is spent waiting for the API to get back to me.

I'd like to split out my character loader, the pvp stats loader, the raw activity loader, and the pve stats loader into separate tasks to speed things up. I have a strong feeling that if I have these 4 jobs and the discord role assigning code (currently only updates non-clan members twice a day because it has to hit the API a lot) running at the same time, I'm going to get throttled. There is also the website running off the same node (barely used atm).

So for those doing a lot of backend work instead of in a client side browser, is there a way to submit a request to allow more API hits?

Most helpful comment

Oh, I just opened the excel spreadsheet. You're not calling GetCharacter, that's good.

Hmm, that is a crapload of calls. I think a better solution for us isn't going to be increasing the throttle - this is actually one of those situations where it's probably pretty prudent to throttle you, that's a lot of requests - but to try and find ways that we can return you more of the data you need in fewer calls.

Something that I do see in there off the bat that may help is that you can go ahead and call the /Profile call with both components 100 and 200. Looks like you make two separate calls there for any given membership ID, and you can ask for them both at once and save yourself a round trip there. Not a huge savings, but there may be more like that in there.

All 36 comments

You're seeing it take 30 minutes to update the data for 450 people? What calls are you making per person, just so I can get an idea of what you're hitting? Which requests are taking longest to get back to you, and what kind of time per request are you seeing? That'll give me a better idea of the kind of situations you're hitting here.

I'll have to get back to you here. I have a metrics tracker built into all the calls, but it only puts it into the logs. I'll dump it to a table in the db and generate some metrics.

I recorded all the calls and dumped it into an excel file. It was ~4800 API calls to do everything.

Everything =

  1. Update the clan's roster in the db (254 account level, platform level, and character level),
  2. Get new pvp stats (kills.deaths/kd/etc)
  3. Get number of raid completions for each raid
  4. Update the raw activity table with the activities that took place after the last run. Currently not getting PGCRs, but I kinda want to do this too.

DatasForTheBungies.xlsx

One thing you'll probably ask about is why i'm using Stats?groups=85 for one of the calls. I found it returns the characters (including deleted characters) with an invalid group code and it doesnt return all the data I don't care about. I'm pretty sure I have it structured so if it detects a character was deleted since the last run, it grabs there raw activity data and then stops checking in the future. It might still be hitting the aggregate stats for deleted characters though.

Edit: If I could thread this, it'd go a lot faster, but I'd be throttled. My main concern though is when I have the website done. It will let my leadership accept people into the clan regardless of if they are in their division or someone elses, and also let them kick people from their clan. It'd mark them as being kicked and send them a message over discord with their stats so they know why. It will have a warn option too so they have a chance to improve before getting removed from the clan. It wont have a ton of traffic, but I want this to be stable without having to get another node. I dont think I'll be able to fit another node into my azure budget.

(I’m not with Bungie.)

Re 1), if you could make a single GetClanProfile call instead of hundreds of GetProfile calls, which properties would you need to be included in that response?

On Jul 29, 2018, at 13:49, migit128 notifications@github.com wrote:

I recorded all the calls and dumped it into an excel file. It was ~4800 API calls to do everything.

Everything =

Update the clan's roster in the db (254 account level, platform level, and character level),
Get new pvp stats (kills.deaths/kd/etc)
Get number of raid completions for each raid
Update the raw activity table with the activities that took place after the last run. Currently not getting PGCRs, but I kinda want to do this too.
DatasForTheBungies.xlsx

One thing you'll probably ask about is why i'm using Stats?groups=85 for one of the calls. I found it returns the characters (including deleted characters) with an invalid group code and it doesnt return all the data I don't care about. I'm pretty sure I have it structured so if it detects a character was deleted since the last run, it grabs there raw activity data and then stops checking in the future. It might still be hitting the aggregate stats for deleted characters though.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

The clan roster update is updating their 254 account info, their platform level info (game version, clan rank, etc), character level data (character ids, if they are deleted or not, last played times), and also grabbing their battletags.

I'm asking for the rate limit to be increased if it's possible because I'd like to have the PVP data update, the raid data update, and the raw activity update running at the same time. I'd probably finish it all in 10 minutes that way and then I can get updates every 15 minutes to 30 minutes instead of once an hour.

Ah, so I can help you with some of that - unfortunately I don't think we're going to provide any rate limit increases. But fortunately you can remove a lot of redundancy in your results if you skip the GetCharacter calls. GetProfile when asking for character-level components will get you everything you would have gotten in a GetCharacter call. Since that's at the "leaf" of your call tree, that should dramatically decrease your call count!

Oh, I just opened the excel spreadsheet. You're not calling GetCharacter, that's good.

Hmm, that is a crapload of calls. I think a better solution for us isn't going to be increasing the throttle - this is actually one of those situations where it's probably pretty prudent to throttle you, that's a lot of requests - but to try and find ways that we can return you more of the data you need in fewer calls.

Something that I do see in there off the bat that may help is that you can go ahead and call the /Profile call with both components 100 and 200. Looks like you make two separate calls there for any given membership ID, and you can ask for them both at once and save yourself a round trip there. Not a huge savings, but there may be more like that in there.

I can add this to my to do list. It'd knock 3 minutes off the run time.

To be clear though, If I was to get another node with another IP, I'd be able to double the number of calls I can make per second?

I'm planning on expanding my clan at some point and the number of calls will likely double in the future.

As long as you're adhering to our overall guideline of no more than 25 requests per second across whatever nodes you're using (https://github.com/Bungie-net/api/issues/244) feel free to go ahead and do that. That would bypass another layer of automated throttling that we do - where we're throttling for requests from a specific IP, and that's okay as long as in general you're being a good citizen of the service by staying under that global limit.

Just note that, if your service begins to make calls - even if it bypasses the throttle - where it begins to impede our service, we'll try to contact you to get it resolved. If we can't get a hold of you or we can't resolve the issue quickly, we may temporarily block your app's access: so try to keep that 25 requests/second limit sacrosanct.

A pro tip which I actually just realized and shared with someone else who was having request throttling issues earlier is that in cases like yours where server affinity doesn't really matter, you can save both you and us a lot of headaches by not passing the affinitization cookie. We usually recommend that people do so that their server state is consistent for a given character, but when you're looking at things like stats that matters a lot less than, say, what items you have equipped. If you don't pass the affinitization cookie that we set (sto-id-sg_live.bungie.net), then your requests will be spread over whatever servers have load to bear it, which will be great if you're planning on doing a large number of requests in bulk as you are planning.

Give the combination of those a try!

This is good that we've had this conversation, it's finally causing me to write an article on the subject. Head here for more info, anyone who's reading this and is curious about whether they should/shouldn't be affinitizing and how to do/not to so:

https://github.com/Bungie-net/api/wiki/When-should-I-affinitize-to-a-BNet-server,-and-how-do-I-affinitize-not-affinitize%3F

Would I be able to do those 25 calls per second from a single node?

The reason I'm making so many requests is so I can keep my database as up to date as possible. This allows me to hit your API less and get my data a LOT faster when I need it. A cached copy of the data is enough for me to do what I need to do in most cases. If I ran all 4 tasks in parallel I think I'd be under the 25 calls per second, but I'd need to check to make sure... If I go over 25 per second, would I be temporally throttled?

I havent dealt with affinitizating because I my website is super simple right now. It's all c# programs grabbing data and storing it..

Aye, the throttle ought to kick in if you go over 25 requests per second, and you could be temporarily blocked depending on how much server resources you end up using up. There are other throttles depending on the endpoints you are calling, and those are per IP - though I don't see them being applied to the endpoints you're calling at the moment. (however, note that this doesn't guarantee that we won't in the future... but if we do, I'll try to keep people posted here. If we do, it would be only out of necessity to keep the service running)

What's your target goal for data freshness? Theoretically your ideal goal might not even be possible with our API.

For instance, on our side our GetProfile/GetCharacter data is cached for up to a minute, and our stats data can potentially take several minutes after a game is complete before it's recorded and able to be accessed externally. It could be, with those kind of lag times, that you won't see the type of immediate feedback you're hoping for even if you increase your requests per second.

The roles in my discord are based on things like, what division you're in, how many raid completions you have, what your overall KD is... things like that. Right now it can take over an hour to get new people their clan roles because my cache is out of date. Clan roles give them more access. Then the skill based roles give them access to other areas too. It'd be nice for things to only be 15 minutes behind.

There are 60+ role formulas in my database that manage everything.

Yeah, 15 minutes should be reasonable - in that period of time, 4800 requests should be reasonable and won't get you blocked - just make sure you're not affinitized to a single server. Spreading those out over all the servers should make this work just fine for your needs!

The way I understand it, I have to do something to be affinitized to a server. Since I'm not setting any cookies, I dont need to worry about that right?

Honestly I dont even know how to do that. lol. I'll have to read your new doc.

Aye - it might depend on what library/framework you're using for making your HTTP connections. I'm not certain if some frameworks/libraries would automatically honor set-cookie headers for subsequent requests. What are you using?

c#'s HttpClient. I'm making a new HttpClient for every call so I dont see how it'd know about previous requests

Sweet - yeah, in that case you're set!

Ok. I'll report back if I end up getting throttled with the jobs all running concurrently. I have code to deal with the throttlingseconds variable, but i dont think i've actually tested it... :P

Sweet! Aye, I wish you luck! And thanks for bearing with us on all of this - we try to keep the throttles tight in the good/lower traffic times so things don't explode in the hard/higher traffic times, which sometimes means that we have to ask people to jump through hoops that I otherwise wouldn't have wanted them to have to jump through. :(

I've been getting error code 51 since I enabled threading last night. I actually got rid of all the pvp stat calls and raid stat calls and used the raw activity data I'm storing in my db to figure it out on my own.

The error code is
---> (Inner Exception #0) BungieAPI threw an error.
{"ErrorCode":51,"ThrottleSeconds":0,"ErrorStatus":"PerEndpointRequestThrottleExceeded","Message":"Too many platform requests per second.","MessageData":{}}<---

The character loader only has 5 threads going (lowered to 3 now) and it got error code 51.
throttled.txt

The activity loader had 7 worker threads and it's having problems too.
activitylogs.txt.txt

How many calls per second can I make to a specific endpoint? Is this a rate limit that can be changed?

It's 25 requests per second across the board, having more devices making requests doesn't change this unfortunately. Its hard coded to prevent bungie.net from suffering a performance hit from a rogue application or misconfigured one

Note that even if you adhere closely to the rate limit, you’ll still need to ensure you handle error 51 appropriately — which you probably are — a single Internet hiccup could bunch up your requests for a moment and briefly exceed the limit, even if you time it precisely on your side!

On Aug 11, 2018, at 12:27, Christopher Hall notifications@github.com wrote:

It's 25 requests per second across the board, having more devices making requests doesn't change this unfortunately. Its hard coded to prevent bungie.net from suffering a performance hit from a rogue application or misconfigured one

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

I dont believe I'm hitting the 25 requests per second the limit.

My thought is that since I'm hosting in Azure, others with the same external IP may be hitting the API as well. When I run my code from my house, I dont get the errors and the thread count is higher here.

I'm also getting a stupid amount of cloudflare errors lately too. I've had to add randomized cooldowns between calls when I get a cloudfare error and increase the number of retries before bombing out my loader.

I added error code 51 to the "retry if you see this" list, so it may handle it better now. I just dont think I should be getting these errors based on the number of calls I'm making to the API.

Also to ArkahnX - Everything for me is server side (on 1 node) so I shouldnt have any problems with other people using my api key to make requests.

If that shared IP concern ends up being troublesome (for you or any future people reading this issue for guidance), apparently Azure lets you get a dedicated outbound IP by using “IP-based SSL”, as once you get a dedicated IP for inbound traffic, you use it for outbound too.

https://docs.microsoft.com/en-us/azure/app-service/app-service-web-tutorial-custom-ssl

On Aug 11, 2018, at 13:23, migit128 notifications@github.com wrote:

I dont believe I'm hitting the 25 requests per second the limit.

My thought is that since I'm hosting in Azure, others with the same external IP may be hitting the API as well.

I'm also getting a stupid amount of cloudfare errors lately too. I've had to add randomized cooldowns between calls when I get a cloudfare error and increase the number of retries before bombing out my loader.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

That article seems to be for web apps. I've got a VM running my scripts. I'll check if I can get my own IP for the VM

The way that sort of thing usually works is that you bill through the inbound ip allocation and it _coincidentally_ also provides what you need for outbound ip uniqueness. Rather a hack IMO but if it helps you, cheers!

On Aug 11, 2018, at 13:39, migit128 notifications@github.com wrote:

That article seems to be for web apps. I've got a VM running my scripts. I'll check if I can get my own IP for the VM

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

I changed my public IP so we'll see if I get less errors now. If it's a basic azure VM you can do it in the "IP addresses" section on the VM object. (then you reboot)

If it's using the resource manager it's a bit more complex. https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-public-ip-address

Sweet! Thanks for sharing how you did it.

On Aug 11, 2018, at 15:09, migit128 notifications@github.com wrote:

I changed my public IP so we'll see if I get less errors now. If it's a basic azure VM you can do it in the "IP addresses" section on the VM object. (then you reboot)

If it's using the resource manager it's a bit more complex. https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-public-ip-address

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

With a private public IP and some tweaks to the retry logic it's a lot more stable now. I'm able to have 5 threads for the character loader and 10 with the activity loader without having much of a problem. It still bombs out every once in a while but not anywhere near as much.

I just dont understand why I have so many "504: Gateway time-out" errors. I bumped up the retry limit to 10 and it still happened so I had to add a random 5 to 30 second cooldown between the retries to overcome the majority of these errors.

To put this into perspective, it will be fine for a day, then the 504 errors will cause the loader to retry 10 times with 5 to 30 second cds between the calls and run out of retries... it just doesnt seem like this is how it's supposed to be.

The 504 errors indicate that Bungie’s servers are temporarily at capacity, if I remember correctly from other threads, so it’s probably more that it’s just randomly peak load due to coincidence meets standard normal business capacity planning.

(I’m not Bungie.)

On Aug 14, 2018, at 16:09, migit128 notifications@github.com wrote:

With a private public IP and some tweaks to the retry logic it's a lot more stable now. I'm able to have 5 threads for the character loader and 10 with the activity loader without having much of a problem. It still bombs out every once in a while but not anywhere near as much.

I just dont understand why I have so many "504: Gateway time-out" errors. I bumped up the retry limit to 10 and it still happened so I had to add a random 5 to 30 second cooldown between the retries to overcome the majority of these errors.

To put this into perspective, it will be fine for a day, then the 504 errors will cause the loader to retry 10 times with 5 to 30 second cds between the calls and run out of retries... it just doesnt seem like this is how it's supposed to be.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

Unfortunately, we experience intermittent connectivity errors for a variety of reasons, both upstream and downstream, particularly at peak hours. Our ability to control this is limited with the server resources we've been allocated, our dependencies on external systems, and most importantly our mandate to prevent harm/excessive load on game servers - even if it leads to the detriment or even temporary/long term outages in the API in order to do so.

Basically, we can't make any sort of uptime guarantee: and as a client you'll unfortunately have to build your application to work around that fact. It sucks, but it's the unfortunate conclusion of the aforementioned limitations.

We'd ask that particularly bulk-calling clients, when they encounter this, perform an "exponential backoff" (https://en.wikipedia.org/wiki/Exponential_backoff) of requests when this happens, which will help reduce the overall workload during times of high stress and ideally improve not only your connectivity but others as well!

Note that there will definitely be multi-hour, and very occasionally
multi-day, outages so it is essential that you not assume that X retries
will be sufficient for any value of X.

On Tue, Aug 14, 2018 at 5:21 PM, Vendal Thornheart <[email protected]

wrote:

Unfortunately, we experience intermittent connectivity errors for a
variety of reasons, both upstream and downstream, particularly at peak
hours. Our ability to control this is limited with the server resources
we've been allocated, our dependencies on external systems, and most
importantly our mandate to prevent harm/excessive load on game servers -
even if it leads to the detriment or even temporary/long term outages in
the API in order to do so.

Basically, we can't make any sort of uptime guarantee: and as a client
you'll unfortunately have to build your application to work around that
fact. It sucks, but it's the unfortunate conclusion of the aforementioned
limitations.

We'd ask that particularly bulk-calling clients, when they encounter this,
perform an "exponential backoff" (https://en.wikipedia.org/
wiki/Exponential_backoff) of requests when this happens, which will help
reduce the overall workload during times of high stress and ideally improve
not only your connectivity but others as well!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Bungie-net/api/issues/583#issuecomment-413056629, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAFqDPQcQBMERsjV0MhKbBbDD7qy1nzjks5uQ2mPgaJpZM4VleaK
.

Aye - multi-hour particularly on release days when we frequently have to disable services due to our game dependencies, and multi-day has happened in times of great server stress (for instance, the great Vendor disaster of September 2016).

We try to disable the specific services that are experiencing issues when possible (or limit the amount of time we're down when situations like incompatible game deployments force us to stay offline until they are done migrating... which hopefully should be a thing of the past as of Forsaken from what I've been told!), but there have definitely been some extreme times in the past where we needed to turn off the services entirely to make sure that the game itself continues to operate properly.

In pseudocode, here's a simple backoff algorithm that simply reacts to any
error by backing off, and to the absence of errors by ramping up. It's
critical to vary the percentages randomly, and it's critical that it ramps
up more slowly than it backs off. If you're running X threads for your API
key instead of just one thread, using (min_s, max_s = 10X, 60X) in each
thread ensures that your threads never exceed the rate limit without
requiring you to synchronize between them at all.

(min_r, max_r = 1, 25)
(min_s, max_s = 10, 60)
(cur_r, cur_s) = (max_r, min_s)

try:
api_call()
cur_r = rand(120%-140%) * cur_r
cur_s = rand(60%-80%) * cur_s
if cur_r > max_r:
cur_r = max_r
if cur_s < min_s:
cur_s = min_s
catch:
cur_r = rand(40%-60%) * cur_r
cur_s = rand(140%-160%) * cur_s
if cur_r < min_r:
cur_r = min_r
if cur_s > max_s:
cur_s = max_s

On Tue, Aug 14, 2018 at 5:34 PM, Vendal Thornheart <[email protected]

wrote:

Aye - multi-hour particularly on release days when we frequently have to
disable services due to our game dependencies, and multi-day has happened
in times of great server stress (for instance, the great Vendor disaster of
September 2016).

We try to disable the specific services that are experiencing issues when
possible (or limit the amount of time we're down when situations like
incompatible game deployments force us to stay offline until they are done
migrating... which hopefully should be a thing of the past as of Forsaken
from what I've been told!), but there have definitely been some extreme
times in the past where we needed to turn off the services entirely to make
sure that the game itself continues to operate properly.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Bungie-net/api/issues/583#issuecomment-413058606, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAFqDOx93VjTYOhgydn0y7wjLfBJ2uUeks5uQ2x_gaJpZM4VleaK
.

In response to “what could we add to a specific endpoint to make it better” I would like to suggest you add an option to the GetActivityHistory to include the pgcr as an object in the activity entry of the DestinyHistoricalStatsPeriodGroup.
This would cut down the number of calls immensely.
I too am aggregating the activities of clan members, and the pgcr for each activity, at an interval of 10 minutes. I grab the last 10 activities per character, and filter out duplicates and activities that occurred before the previous batch. Nevertheless, I then have to call the pgcr for each unique activity, and after forsaken, that’s become a lot!!
Any chance you could add a query string parameter like “details=yes” to the GetActivityHistory that adds the pgcr?

Was this page helpful?
0 / 5 - 0 ratings