Plots2: Stats downloading returns "Page does not exist" for dates prior to early 2013

Created on 16 Apr 2019  ยท  56Comments  ยท  Source: publiclab/plots2

I was trying to download stats from the beginning of the site (which was sometime in 2010) until July of 2016 and I received a "Page does not exist" error. I did some experimentation and while I'm not sure of the exact date before which this occurs, I can say it is sometime between 01-01-2013 and 01-07-2013 (in stats page date format DD-MM-YYYY). For what it's worth to the diagnosis of the problem, I was "only" downloading up until July of 2016.

Let me know if there's anything else I can tell you (FF 66.0.2, macOS 10.12.6).

Oh, a possible reference where this could be merged: Raw data from stats page, #4654

NOTE for high impact URLs which could slow the main website down, if you want to test them out, please use stable.publiclab.org instead of publiclab.org and you'll only slow or break the stable test server, which should have very similar code and data. (note by @jywarren)

bug help wanted

Most helpful comment

Oh, I forgot about that date. @cesswairimu, we're never supposed mention that date or even speak about what happened on it. It's sort of the Voldemort of dates.

But seriously folks.

Very interesting.

I'll download around it.

All 56 comments

:thinking: Will take a look at why this is happening thanks @skilfullycurled

@skilfullycurled when did you receive the error when submitting the date ranges or the data was returned fine and this occurred when you were trying to "download as"?

No, @cesswairimu, it was the search itself that returned the error.

Oh, I forgot about that date. @cesswairimu, we're never supposed mention that date or even speak about what happened on it. It's sort of the Voldemort of dates.

But seriously folks.

Very interesting.

I'll download around it.

Aha gotcha sorry will delete the comment

@skilfullycurled maybe we can close this now since its not a code issue?

lol are you folks having too much fun in here? Voldemort dates? ๐Ÿ”ฎ ๐Ÿค ๐Ÿ™Š I hope Cess has seen Harry Potter and knows you're joking? Cess, @skilfullycurled has a strange sense of humor please don't hold it against him.

No but for real, thanks for looking into this. I'm sure it was a tough one to track down esp. given the mysterious date.

:laughing: :laughing:

Oh my gosh. @cesswairimu, I'm so sorry. So sorry. I feel terrible. I was hoping to convey I was kidding when I said "but seriously folks". PS: @jywarren, I'll have you know that all the people who consistently laugh at my jokes think I have a great sense of humor.

Actually, on a legitimately serious note, I'll try to be a little more clear humor. I've seen enough furrowed brows to readily admit that my sense of humor can be confusing even in person because I would have said that with a completely straight face.

@skilfullycurled no its fine, blame it on having english as my second language :laughing:

I appreciate that. In the meantime, I should probably learn to chill out just a bit when I first meet people.

It looks like you actually deleted the comment with the date, so for the record and people seeing this trying to track down a bug, it was April 25, 2013 I think...? Also, I'd still like to know if there's a reason! What I really thought was funny was that there would be one specific day prior to, and after, everything would work fine, but just not that day. So please if anyone knows, do end the mystery!

OK, we are synced up now I feel. Thank you Cess for being understanding and
thank you Benjamin for the same!

I do feel that date must be haunted! And, no worries Benjamin, i love your
sense of humor!

On Wed, Apr 17, 2019 at 7:39 PM skilfullycurled notifications@github.com
wrote:

I appreciate that. In the meantime, I should probably learn to chill out
just a bit when I first meet people.

It looks like you actually deleted the comment with the date, so for the
record and people seeing this trying to track down a bug, it was April 25,
2013 I think...? Also, I'd still like to know if there's a reason! What I
really thought was funny was that there would be one specific day prior to,
and after, everything would work fine, but just not that day. So please if
anyone knows, do end the mystery!

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/publiclab/plots2/issues/5490#issuecomment-484300734,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAF6J3YZ4KFPI5NMKHK7DLPQ6YIDANCNFSM4HGEHCFA
.

And I am a great admirer of yours, too!

Well, not great news.

I got the "That page does not exist" for a new date set: 01-01-2012 - 01-01-2013.

Oh, also, the users.csv download for 01-01-2013 until 25-04-2013 is returning both incorrect data and incomplete data. It returns 154 users and those users are random from UID 1 to 58354.

Hey all, let me know if there is anything that I can do to try and diagnose the situation beyond simply telling you what dates aren't work. I just figured since I'm using the interface, I might as well be of some use!

Hi, returning to this conversation since I'm trying to plan a bit for the summer. Can I be of help on this? And if so, what would be the most systematic way to figure out which dates are causing the problem. And also, how can I avoid taking down the whole site? Do people use unstable?

No, unstable and stable are both fine to hammer on as much as you'd like!

As to how to debug, i'm not sure... we could try to pull logs for when it happens. Wait - let me link some Sentry issues and see if they shine any light?

Sentry issue: PLOTS2-6H

@jywarren, not sure if I did this right, but I just signed up for Sentry and requested access. I wasn't sure what permissions to request so I just requested the default ones that sentry provided for me. Let me know if I need to change anything.

(also, thanks!)

Hmm, weird that you can't see the error log, it's supposed to be public! But, ok here it is:

ActionController::UnknownFormat: ActionController::UnknownFormat
  from action_controller/metal/mime_responds.rb:205:in `respond_to'
  from app/controllers/stats_controller.rb:141:in `format'
  from app/controllers/stats_controller.rb:110:in `tags'
  from action_controller/metal/basic_implicit_render.rb:6:in `send_action'

But i'm not convinced this is for the same page: It says this is for:

https://publiclab.org/stats/tags

Just accepted the invite, this is awesome, thank you!

Well, I've spent a fair amount of time with Sentry and and Skylight and I can't seem to find an error that is registered when I recreate the error on the site with the dates in question. I even turned on a VPN so that I could be confident that I would see the correct IP address but nothing appears in the events log in Sentry. Something might be happening in Skylight but I can't figure out how to see the timestamp of the errors. Thoughts?

I have also encountered this error when attempting to guestimate the start date for our website so that i could view data "For All Time".

I'm wondering if it's really a bug or perhaps maybe it could be worked around by people not having to guess when we started logging data for our website?

If it's the latter, here's an idea:
On the "Choose a Start Date" interface, could we add the exact start date of our website as a "quick pick" OR perhaps make an option to choose "for all time" as the period of inquiry?

Got some more info from @ebarry, but @ebarry, it would be great to have the dates that you tried. My interpretation from our conversation is that you want to see the stats (at least in the two sentences) for all time. There are two known issues (this one, and #5524), which might be the problem but depending on the specifics of your issue, it could be a new third one.

I've requested dates outside Public Lab's existence and I don't think the issue. I believe that if you choose a date for which there is data, it should simply return a set of data that won't include those dates because there's no data to find. It shouldn't be any different that asking for a specific week in 2017 during which there happens to be no data.

Two things which might be causing it:

1) This issue. There are a few dates the inclusion of which will return a 404 (the issue on this page). One we're aware of is 4/25/13.

2) #5524: There is a period of time which is so large, that requesting it will literally break the site. So, you're actually lucky that you're receiving a "page does not exist".

 The [working theory](https://github.com/publiclab/plots2/issues/5524#issuecomment-484611290) is that this period contains an inordinate number of spam users.  There's an idea for a temporary fix in #5450 in which we email users that have a high probability of being spam and ask that they confirm they are a human (click a link or something).

 Actually, it would be great if you (or anyone) could chime in on that conversation and in particular, the risk of negatively impacting an actual user receiving such an email (loosely, would they be offended such that they might stop contributing).

Oh, I should add, even if it did work, the user spam issue would make any user figures wildly incorrect by at least ~350,000.

Sentry issue: PLOTS2-B2

Adding data here; i opened (which did show a 404 error): https://publiclab.org/stats?start=01-01-2012&end=25-04-2013

I pulled the logs:

[6abc8157-f63e-41a9-a791-b6ec924ab0eb] Completed 404 Not Found in 4579ms (ActiveRecord: 2178.1ms)
[6abc8157-f63e-41a9-a791-b6ec924ab0eb]   
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=11220):
[6abc8157-f63e-41a9-a791-b6ec924ab0eb]   
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `block in nodes_frequency'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `map'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `nodes_frequency'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:36:in `block in range'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:19:in `range'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:41:in `index'

For https://publiclab.org/stats?start=01-01-2012&end=01-01-2013&commit=Go, which also showed a 404, I got:

[6abc8157-f63e-41a9-a791-b6ec924ab0eb] Completed 404 Not Found in 4579ms (ActiveRecord: 2178.1ms)
[6abc8157-f63e-41a9-a791-b6ec924ab0eb]   
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=11220):
[6abc8157-f63e-41a9-a791-b6ec924ab0eb]   
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `block in nodes_frequency'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `map'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/models/tag.rb:65:in `nodes_frequency'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:36:in `block in range'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:19:in `range'
[6abc8157-f63e-41a9-a791-b6ec924ab0eb] app/controllers/stats_controller.rb:41:in `index'

I also managed to get the Sentry issue above, for https://publiclab.org/stats?start=01-01-2012&end=04-25-2013

It looks like we have a bad tag record; Couldn't find Tag with 'tid'=11220

irb(main):003:0> Tag.find 11219
=> #<Tag tid: 11219, vid: 3, name: "wc", description: "", weight: 0, count: 0, parent: nil>
irb(main):004:0> Tag.find 11221
=> #<Tag tid: 11221, vid: 3, name: "lon:false", description: "", weight: 0, count: 2, parent: nil>

And yet we have a NodeTag pointing at it here:

irb(main):005:0> NodeTag.find_by(tid: 11220)
=> #<NodeTag tid: 11220, nid: 2120, uid: 12, date: 1402381879, created_at: "2014-06-10 06:31:19", updated_at: "2019-06-15 18:00:53">

From this node: http://publiclab.org/n/2120

I'm going to try deleting the NodeTag.

Now, we are DD-MM-YYYY, so this should work: https://publiclab.org/stats?start=01-01-2012&end=25-04-2013

But it doesn't. Something similar perhaps:

[0b0b480c-084e-45a5-9805-ef990116fd68] Completed 404 Not Found in 4838ms (ActiveRecord: 1788.7ms)
[0b0b480c-084e-45a5-9805-ef990116fd68]   
[0b0b480c-084e-45a5-9805-ef990116fd68] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3015):
[0b0b480c-084e-45a5-9805-ef990116fd68]   
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `block in nodes_frequency'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `map'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/models/tag.rb:65:in `nodes_frequency'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:36:in `block in range'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:19:in `range'
[0b0b480c-084e-45a5-9805-ef990116fd68] app/controllers/stats_controller.rb:41:in `index'

I think these range stats queries are just good at finding lonesome db records, because they try to gather ALL such records across huge swaths.

I'll find and kill that NodeTag record too...

OK, deleted lonesome NodeTag records pointing at non-existent tids of 3015 and 3016, and https://publiclab.org/stats?start=01-01-2012&end=25-04-2013 now works. Is that it, then? ๐Ÿ•ต๏ธโ€โ™€๏ธ ๐Ÿ•ต

Should be for this issue...PS: where in Sentry can you see the 404's? I thought they did not show them?

oh, and thanks @jywarren! This is been like a splinter in my data finger. Thanks for tweez-ing it out!

Oh, i think the Sentry was for a different error - i guess for the "date
out of range" error on Date.parse given that i'd reversed 25-04-2013 to
04-25-2013 -- so trying for the 25th month.

On Mon, Jun 17, 2019 at 11:35 PM Benjamin Sugar notifications@github.com
wrote:

Should be for this issue...PS: where in Sentry can you see the 404's? I
thought they did not show them?

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/publiclab/plots2/issues/5490?email_source=notifications&email_token=AAAF6J7MTTKKI56EZAM2JE3P3BJVBA5CNFSM4HGEHCFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5CGRI#issuecomment-502932293,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAF6J7JRUXODUDXCO4KL3TP3BJVBANCNFSM4HGEHCFA
.

Ah, I see, the 404's are still logged somewhere though. Good to know!

Thanks to you and Cess for chipping away at this so assiduously! It
wouldn't have been nearly as solvable without the thorough documentation
and investigation.

On Tue, Jun 18, 2019, 12:01 AM Benjamin Sugar notifications@github.com
wrote:

Ah, I see, the 404's are still logged somewhere though. Good to know!

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/publiclab/plots2/issues/5490?email_source=notifications&email_token=AAAF6J4XWGTGQ6HYPGEQT23P3BMX3A5CNFSM4HGEHCFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5DI3I#issuecomment-502936685,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAF6J6RRNRLDJCI3TJZU63P3BMX3ANCNFSM4HGEHCFA
.

https://stable.publiclab.org/stats?start=20-07-2010&end=20-10-2020 is still returning a 404 Page does not exist error, so i will leave this open for reference! Thanks!

And noting @skilfullycurled's note on a "workaround date range" in https://github.com/publiclab/plots2/issues/5904#issuecomment-502881778 -

this in #6050 means that the "all time" option will be only going back to Jan 1, 2014.

Next steps summarized by @skilfullycurled here:

https://github.com/publiclab/plots2/pull/6050#issuecomment-519218641

OK, just an update in looking for NodeTag records with no associated Tag record as @skilfullycurled mentioned in #6050, I found these tids to look at:

irb(main):010:0> NodeTag.where('date > 1366851600 AND date < 1366990345').size
=> 70
irb(main):011:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid)
=> [1, 14, 125, 446, 578, 578, 579, 579, 1316, 2421, 3049, 3049, 3049, 3049, 3049, 3049, 3050, 3050, 3050, 3050, 3050, 3050, 3051, 3051, 3051, 3051, 3051, 3051, 3051, 3051, 3052, 3053, 3054, 3055, 3057, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3097, 3098, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114]
irb(main):012:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).size
=> 70
irb(main):013:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).uniq.size
=> 48
irb(main):014:0> NodeTag.where('date > 1366851600 AND date < 1366990345').collect(&:tid).uniq
=> [1, 14, 125, 446, 578, 579, 1316, 2421, 3049, 3050, 3051, 3052, 3053, 3054, 3055, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114]

indeed:

ActiveRecord::RecordNotFound (Couldn't find all Tags with 'tid': (1, 14, 125, 446, 578, 579, 1316, 2421, 3049, 3050, 3051, 3052, 3053, 3054, 3055, 3057, 3058, 3059, 3060, 3061, 3062, 3063, 3064, 3070, 3071, 3072, 3082, 3088, 3089, 3091, 3092, 3093, 3094, 3095, 3097, 3098, 3099, 3100, 3101, 3102, 3103, 3104, 3105, 3106, 3109, 3111, 3112, 3114) (found 46 results, but was looking for 48).)

I'll try figuring out which is missing. OK - only these two:

ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3088)
ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3089)

Deleting them.

https://stable.publiclab.org/stats?start=20-07-2010&end=20-10-2020 still shows a 404, so i'll keep looking for extra NodeTag records with no associated Tag record.

Hmm, i tried a bunch of ways like https://stackoverflow.com/questions/5319400/want-to-find-records-with-no-associated-records-in-rails, but didn't have much success --

NodeTag.includes(:tag).where(term_data: {name: nil})
=> #<ActiveRecord::Relation []>

I also just tried collecting ALL NodeTag tids and subtracting all valid Tag tids, which took a while:

nodetag_tids = NodeTag.select(&:tid).collect(&:tid).uniq
tag_tids = Tag.select(&:tid).collect(&:tid).uniq
tids_missing = nodetag_tids - tag_tids
=> []

So, doesn't that mean all NodeTags have valid tids?

Maybe we should look in Sentry for another kind of error now?

https://stable.publiclab.org/stats?start=20-01-2013&end=20-01-2014 shows 404
https://stable.publiclab.org/stats?start=20-01-2014&end=20-01-2015 loads fine

I searched the logs... didn't find much. Maybe this?

[ba658853-5c5b-4514-b69b-bbde1cf920c5] Processing by StatsController#index as */*
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   Parameters: {"utf8"=>"โœ“", "options"=>"Week"}
[ba658853-5c5b-4514-b69b-bbde1cf920c5] Completed 500 Internal Server Error in 5ms (ActiveRecord: 0.0ms)
[ba658853-5c5b-4514-b69b-bbde1cf920c5] Sending event 79acc5f697a249f7aaddacfa69966bcf to Sentry
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   
[ba658853-5c5b-4514-b69b-bbde1cf920c5] NoMethodError (undefined method `downcase' for nil:NilClass):
[ba658853-5c5b-4514-b69b-bbde1cf920c5]   
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:150:in `to_keyword'
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:20:in `range'
[ba658853-5c5b-4514-b69b-bbde1cf920c5] app/controllers/stats_controller.rb:48:in `index'

But that was on Oct 14th, six days ago. I can't seem to find the errors for today... strange.

Ah! was looking in the wrong directory! Got it!

[2c6a71d3-e104-44b7-98ab-44263e879b22] Processing by StatsController#index as HTML
[2c6a71d3-e104-44b7-98ab-44263e879b22]   Parameters: {"start"=>"20-01-2013", "end"=>"20-01-2014"}
[2c6a71d3-e104-44b7-98ab-44263e879b22] Completed 404 Not Found in 11475ms (ActiveRecord: 9149.4ms)
[2c6a71d3-e104-44b7-98ab-44263e879b22]   
[2c6a71d3-e104-44b7-98ab-44263e879b22] ActiveRecord::RecordNotFound (Couldn't find Tag with 'tid'=3088):
[2c6a71d3-e104-44b7-98ab-44263e879b22]   
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `block in nodes_frequency'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `map'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/models/tag.rb:55:in `nodes_frequency'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:40:in `block in range'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:27:in `range'
[2c6a71d3-e104-44b7-98ab-44263e879b22] app/controllers/stats_controller.rb:55:in `index'

Yes, https://stable.publiclab.org/stats?start=20-01-2010&end=20-01-2020 now works too. Thanks, all - and should we make "all time" now really go back to 2010, @cesswairimu ?

https://github.com/publiclab/plots2/pull/6050 - maybe we could modify this as an FTO and close this issue now?

This is great :tada: yeah that will be awesome. Creating an fto. Thanks Jeff.

I've created an fto https://github.com/publiclab/plots2/issues/8652.

I also think it would be a much faster to do .size on all data when getting 'all time' stats instead of using the where range clause _(where timestamp btn 2010..NOW()_. I will do a follow-up for that after the fto is done. Thanks everyone, closing this

Awesome, thanks Cess!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cesswairimu picture cesswairimu  ยท  79Comments

ebarry picture ebarry  ยท  73Comments

jywarren picture jywarren  ยท  98Comments

cesswairimu picture cesswairimu  ยท  115Comments

gautamig54 picture gautamig54  ยท  84Comments