Mastodon: Backfill statuses from remote accounts when first subscribed

Created on 10 Sep 2016  Â·  37Comments  Â·  Source: tootsuite/mastodon

suggestion

Most helpful comment

What about just fetching blocks of 20 toots, triggered by scrollbar position?

All 37 comments

I'd like to take a crack at this, but am still trying to wrap my head around the world of services/workers and which things are triggered when.

My guess here is that somewhere in the collection of remote follow services and workers we want to add a background worker task which pulls the feed of the remote account which was just followed, and import the last 5 or 10 or so items.

The other thing I'm struggling to locate is any code which is effectively "go grab this account's atom feed and import some entries" ... it strikes me as possible that this doesn't exist, because all statuses are pushed onto the server and never (yet) pulled?

Any pointers here are appreciated!

@mjankowski The main roadblock to this is actually #1059, because if you went back and imported 5 statuses from an account that last posted 6 months ago, you would get 5 6mo-old posts on the top of the public timeline (and potentially home timelines as well)

So this should probably stay a wontfix until something is done about IDs and their sorting.

Thanks, I read that thread and looked at more code, and I agree that backfilling is blocked on improving the ordering approach to either get away from ID-sorting, or make ID-sorting reliable for backfilled records. I'll pause on this, and leave a comment over there.

3307 is about accurate follower counts, but in the comments I discuss some implementation ideas which could theoretically encompass this.

Any updates on this ?

It's now possible to implement this, but it's not clear how many items to fetch (and if you say "all", think about accounts with 100,000 statuses...)

Mmh maybe it can be done by displaying a button where the missing toots are, to fetch the 5 or 10 next toots ? So it would be done manually by users little by little.

What about just fetching blocks of 20 toots, triggered by scrollbar position?

@deutrino Generally this seems good, but some toots might be already fetched manually by users and so there would be holes to fill here and there.
For example if there is a profile with no visible toots from one instance and you have access to the urls of some toots and you insert them into the search bar they are going to be fetched and added to the profile of the user in question (from your instance point of view).

@deutrino ok, I was thinking back at what you were suggesting and it seems to be the good thing to do considering that mastodon already serve 20 toots at a time when looking at a profile. So everything can probably be done smoothly on the server side with field saying "I know I'm up-to-date up to this toot/date" and contacting the other instance if we are trying to get toots that are in the "probably not up-to-date" range.

Maybe fixing this would also partly fix https://github.com/tootsuite/mastodon/issues/6137, as looking for a deleted account toots would give hints to the server to remove the account from its local database ?

Now that #7459 has been merged, I guess it seems simpler to finally implement this. @ThibG ?

Mmh, thinking again, maybe not. Sorry. They seemed closely related in my head but seems like I'm too tired today :/

any chance for this issue to get fixed or partially fixed by the next release (2.5) ?

@kit-ty-kate with some logic similar to #7459 we could probably fetch private toots from a remote user, yeah.
Fetching “up to the N last toots” on first follow is easy enough, but:

  • it would not be applied to users already followed before the update
  • it would not be applied to users not followed
  • “up to the last N toots” is pretty arbitrary

The suggestion of having a “gap” that users could click sounds very nice, but it is a lot harder to implement. Indeed, the protocol mandates toots to be strictly ordered, but I don't think there is a mechanism to make sure you're not missing toots, nor to request a certain range of items, so efficiently filling a gap seems pretty hard. Add to that that some items may only be displayed to some authenticated users, and things can get quite complicated…

I think a good partial solution to the problem of empty or stale profiles of remote users would be to prefetch n last toots whenever there is _any_ interaction between a local user and a toot from remote user, and when particular criteria are met.

For example: if our local user favs, boosts, or even _reads_ a toot from a remote user, we check if there are any toots from that remote user in the db which are less than (for example) 1 week old; if not, we dispatch a job to fetch the last n (and maybe touch a timestamp somewhere that then prevents this type of fetch for a few hours, to prevent thundering herd problems if a toot from an inactive account suddenly becomes popular).

This is a very simple "heat" heuristic. If a local user interacts with a remote user in any way - faving, reading a boost, etc - they are more likely to open that user's profile than that of a random user. With good criteria for when to prefetch, and with the implementation of database compaction in #1554, I think this would be a considerable usability improvement, particularly for small instances / those not using relays.

I'm not sure if this deserves a separate issue, but if you follow a locked account and you're the first person to do so on your server, then you can't fetch old followers-only posts. I followed a locked account that had a pinned public post with followers-only replies, where the public post says only "please read:" and the followers-only replies... were not loaded, and thus unable to be read.

I assume this is because the older statuses weren't backfilled, but I'm not sure if there's a different or better way for your server to discover followers-only replies besides backfilling.

ftr, i think you can fetch followers only posts using the search bar now

On Mon, Aug 6, 2018, 3:17 AM trwnh notifications@github.com wrote:

I'm not sure if this deserves a separate issue, but if you follow a locked
account and you're the first person to do so on your server, then you can't
fetch old followers-only posts. I followed a locked account that had a
pinned public post with followers-only replies, where the public post says
only "please read:" and the followers-only replies... were not loaded, and
thus unable to be read.

I assume this is because the older statuses weren't backfilled, but I'm
not sure if there's a different or better way for your server to discover
followers-only replies besides backfilling.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tootsuite/mastodon/issues/34#issuecomment-410611558,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAORVxRJZlr-7OmbTnRyNUkE5MKNdc_4ks5uN-2DgaJpZM4J5xsH
.

@nightpool you'd have to know the URL of every single post though, wouldn't you?

Yes, you would need to know the URLs, which you typically don't know when those are replies to a toot you can see.
It's a bit of an issue, as those replies are likely to be old posts, and thus not fetched by the “fetch last n toots” proposal.

Ah, my mistake, I thought you meant that the pinned post has a list of
links to private posts, not just replies.

On Tue, Aug 7, 2018 at 11:02 AM ThibG notifications@github.com wrote:

Yes, you would need to know the URLs, which you typically don't know when
those are replies to a toot you can see.
It's a bit of an issue, as those replies are likely to be old posts, and
thus not fetched by the “fetch last n toots” proposal.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tootsuite/mastodon/issues/34#issuecomment-411088514,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAORV796re22Y59NGqlV6bfoS78Q6agXks5uOawPgaJpZM4J5xsH
.

Yeah, I think that's grounds to maybe reconsider a pure numerical approach. This might also overlap with other issues about loading missing toots in a chain due to privacy settings.

Anyone else interested in working on this? Or any advice on where to get started? Would really like to get this feature in (as mentioned in #9525).

If this issue is too complex to solve or we cannot agree on how to make this work, what about a button or text message telling users to look at the full profile?

It's just my opinion, but discovery is always an issue in social networks. Thus, I find this issue paramount to growing the decentralized and federated world. New people I invite to my Mastodon instance have no idea that my server has never connected to another and that's why the "toots" section of the profile is empty. Therefore, a simple message in the toots section of a profile that said, "There are no toots here, trying going to the user's page by clicking on their icon above" might be helpful. Something more elegant, I'm just rattling this idea off before I lose it.

You mean like this? blob:https://imgur.com/0499c120-7418-4ef1-80e8-93fb4d25232e

This is probably my biggest gripe with the platform, and I think it would be a roadblock in trying to get my friends to switch over. It may be hard to implement but it would be VERY useful.

I think a good partial solution to the problem of empty or stale profiles of remote users would be to prefetch n last toots whenever there is _any_ interaction between a local user and a toot from remote user, and when particular criteria are met.

For example: if our local user favs, boosts, or even _reads_ a toot from a remote user, we check if there are any toots from that remote user in the db which are less than (for example) 1 week old; if not, we dispatch a job to fetch the last n (and maybe touch a timestamp somewhere that then prevents this type of fetch for a few hours, to prevent thundering herd problems if a toot from an inactive account suddenly becomes popular).

This is a very simple "heat" heuristic. If a local user interacts with a remote user in any way - faving, reading a boost, etc - they are more likely to open that user's profile than that of a random user. With good criteria for when to prefetch, and with the implementation of database compaction in #1554, I think this would be a considerable usability improvement, particularly for small instances / those not using relays.

I think he's right!
Downloading all toots (like relay) or downloading none toots (without relay) are not very good solution as mention above.
So why not implement a "cache on request, delete after some time " feature. I mean user can specify how much toots or how long they want to cache other users' toots. When the user want to check some history toots, the system needs to download and cache it for the user. After sometime, if those toots are not requested again, the server can delete them using a scheduled task.

A fat +1 for the need of this. Tried to find out what's wrong in my instance just because of this ... although I more and more understand Mastodon technically, normal user's won't ... and they won't understand why they see no info when they search for and look at an other user's profile.

The server should grab the needed info on demand from the remote instance hosting that user.

A tootctl job could delete them if they are not referenced or needed after a certain amount of time.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This is an important issue in my opinion and it shouldn't be closed

(@kit-ty-kate stalebot isn't going to close any issues, the comments / labels were a misconfiguration)

It seems there's a good higher level outline referenced and described by @gammaPi above on what would be wanted for this feature.

One question:
If I am understanding this issue and the last word by @dansup on https://github.com/pixelfed/pixelfed/issues/1745 properly, is backfilling toots from remote accounts still desired?

We do not plan to add backfilling on follow because we are disabling outboxes for privacy reasons.

dansup's statement on outboxes is his own, and doesn't affect Mastodon

As far as I know, we plan on continuing supporting outboxes (even though I made a proposal to never include private toots in them, see #13584, which would prevent backfilling private toots when first following a remote user).

As for backfilling statuses, I do think that would be useful (but maybe surprising or unwanted for private toots? this would be consistent with what we do for already-known toots though), but when to do it isn't obvious, and we will never be able to backfill all toots, so I'm afraid we'd only create more confusion if we had “batches” of fresh toots, with missing ones between them… that being said it would probably not be much worse than the current behavior…

Gotcha. Thanks for the info @ThibG! It appears I was conflating statuses and toots.

Please correct me if I'm mistaken, but based on your reply it isn't clear to me that this is a wanted ticket ready to be worked.

Uh, “statuses” are just what toots are called internally, I used both terms interchangeably, sorry for the confusion.

Gotcha, my mistake.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  Â·  3Comments

sorin-davidoi picture sorin-davidoi  Â·  3Comments

cumbiame picture cumbiame  Â·  3Comments

phryk picture phryk  Â·  3Comments

golbette picture golbette  Â·  3Comments