Joplin: orphaned resources don鈥檛 get deleted

Created on 30 Oct 2018  路  50Comments  路  Source: laurent22/joplin

Operating system

  • macOS
  • Android
  • iOS

Application

  • Desktop
  • Mobile
  • Terminal

For some reason orphaned resources are not deleted. I'm currently using the latest versions on Android and iOS. On macOS I'm using 1.0.111, although I also tried 1.0.114 (until I saw that internal anchors don't work anymore - I'll open anothe issue for that one).

Here's an export of the note_resources table. As you can see quite a few of them don't have an associated page, yet they still exist in my .resources directory.

note_resources.csv.txt

Please let me know, if you need anything else. (There's nothing suspicious in the debug.log)

bug high

Most helpful comment

@littleperks

You can use my ruby script to delete for now:

gem install joplin # may require sudo
joplin clean -n # -n is dry run so remove if you want to actually delete

You need to have the webclipper API turned on and run it on the machine running joplin. And if it can't find your joplin db you'll need to pass the --token option

All 50 comments

I can confirm same behaviour on macOS. The sync report shows me same number of resources stored vs. 2 weeks ago, although related notes are long-time gone. Thanks for having a second look at it.

any update on this?

@laurent22 can you confirm that this is a bug?

Don't known yet, I flagged it so that I remember to check.

The last_seen_time = 0 indicates that they've never been associated with a note (or perhaps were only shortly associated with one). If you can look them up in the resources folder, do you remember if there's something special about these resources?

Nope, nothing special. Some of them were downloaded by the web clipper service, like avatars for comments. But when I deleted the comments, so were the references to those avatar images.
Some others are test files, but were apparently never deleted after 24h.

Wouldn't it be easy to delete all not associated resources on start of the app? Or use this algorithm as an additional step should the cleanup task be run in a timed loop.

@laurent22 I know how to delete the data from the database and also from the local resources directory. I can also remove the orphaned files (metadata + actual resources) on the sync target.
This process is manual and rather tedious. (I can send the SQL as well.)
Is there a way to mark these files to be deleted (in the code), so that they are removed from the sync target and locally (database and folder)?

If you add the ID to deleted_items it should remove them from the sync target too.

Thanks for the info, but what are the correct steps then?

Only copy the ids to the deleted_items table, or do I have to delete them from the tables resources and note_resources manually as well? What about the files in the resources folder? What is the order of the steps?

cd ~/.config/joplin-desktop
for i in `sqlite3 database.sqlite "select resource_id from note_resources where is_associated = 0"`
do 
    rm resources/$i*
done
sqlite3 database.sqlite "delete from resources where id in (select resource_id from note_resources where is_associated = 0)"
sqlite3 database.sqlite "delete from note_resources where is_associated = 0"

I don't really know off the top of my head, and I don't really want to support such method. Editing the db directly could cause all kind of unexpected problems. Maybe for you it's fine, but other users might see this, try to do the same, make a mistake and a week later realise some data is missing, etc. and wonder why. So if it's not supported by the app, the solution is to fix the app rather than editing the db.

No, I think you misunderstood. As soon as I know the steps I add that process to Joplin so that orphaned resources get deleted properly.

Ah I see, but there's already a process to delete orphaned resources so I guess it just need to be fixed. Simply deleting resources that are not associated with any note doesn't work because those resources might have come via sync, and their associated note has not been downloaded yet. There's a mechanism to handle all this, and I guess it needs to be fixed once we understand the cause for your bug.

Unfortunately it does not happen for me alone. I'd be ok with doing it manually, but it has come up several times on gh and the forum. So there are quite a few others who experienced the same problem.

Simply deleting resources that are not associated with any note doesn't work because those resources might have come via sync, and their associated note has not been downloaded yet

Ah, tricky. Ok, now I understand why I can't just delete them. :-)

I wrote a script that retrieves a list of orphaned resources from the db, but uses the API to delete the resources, so the db is not changed directly.

it needs to be fixed once we understand the cause for your bug.

I'm not the only one who's experiencing this bug. (There were a few gh issues and it was mentioned on the forum a few times as well.) It's quite easy to reproduce:

  1. clip a page that includes 1 or 2 images
  2. wait 1-5 minutes
  3. delete the note

After 24 hours you will see that the 1 or 2 images are still in your resource folder and that they have is_associated = 0 in the database.

I've just noticed that the API command to delete resources does not delete the resources from the sync target. Only the meta data files are deleted from the sync target.

e.g.: ecfef68586324b6a88149f651af58068.md is removed from the sync target, but .resources/ecfef68586324b6a88149f651af58068 still remains.

Update: Opened a separate issue for this one. #1694

@laurent22 did you see this comment https://github.com/laurent22/joplin/issues/932#issuecomment-506136563

Yes I did, I'm planning to try to replicate this.

If there's anything I can do to help, please let me know.

I've checked and see why it happens, but there's no easy fix for it as of now. The fact that a resource can be used by more than one note make things complicated, yet it's a feature that's probably almost never used. So I'm thinking eventually to enforce a one-to-one relation between notes and resources, which will make resource management a lot easier. When that's done, this issue will be resolved.

What does this mean? So if I tried to inline an image (local attached resource), I could only do it from one note?
Btw, how's the web clipper handling resources right now. e.g. you clip 2 articles from the same web site. Some resources are the same, e.g. like logo, background image, ... Are these common elements always re-downloaded with a different hash?

The fact that a resource can be used by more than one note make things complicated

I'm not sure why that is, because is_associated is only set to 0, when no notes are using the resource.
I always thought that the problem lies with the cleanup process that actually removes the resources which have is_associated = 0.

When I was changing my script, this is the SQL that makes it aware of the history and thus safe to use - given that no decrypion/encryption/sync process is unfinished:

KEEP=`sqlite3 $DB "select value from settings where key = 'revisionService.ttlDays'"`

select resource_id from note_resources 
where is_associated = 0 and 
  resource_id not in (select resource_id from note_resources where is_associated = 1) 
group by resource_id 
having max(last_seen_time) < strftime('%s','now','-${KEEP} days')*1000

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please feel free to create a new issue with up-to-date information.

I think this is still relevant. Not sure why the essential-reviewed was removed.

Of course this is still relevant. I tried Joplin for the first time yesterday and today I faced this problem.
And also probably you could find out how it works in my previous PIM - MyTerta (GitHub). It is open-source, but written on C++. Sorry, I am only learn how to code and can't help with it yet.

Plus 1 on that and easy very to reproduce.

I started one week ago, added some notes, played with markdown and everything is fine.
Then I deleted the "Welcome"-notebook on windows and the three images remain in the ressouce folder (plus a fourth one which was added by me).

I just had this idea that the client should keep in cache a list of resources it has created and it has synced. Then if a resource is not associated with any note, and it is in this list, and has never been synced, we know it can be deleted. That would take care of this specific edge case, at least for newly created resources.

Edit: doesn't really work because once an orphaned resource has been synced we're back to square one.

I had the same idea/concept in mind: Some kind of "clean-up" menu entry which lists all unused ressources and the user can manually decide wether to delete or to keep the files. The sync would then clean up the directory in the cloud.

What do you have Note History (Preferences -> Note History) set to?

@virtadpt: It is set to 7 days ...

I just imported my Evernote notes into Joplin on Windows. There was a lot of useless notes, so I deleted most of them. I made sure the sync was complete.

Then when I synced from the Joplin client in my ubuntu installation, it was slow. When I checked the resources folder, I saw that all the images from the deleted notes were also getting downloaded.

This was one of the first things I checked after downloading the application. Remove the demo notes. Notes history turned off. Then the resources stay on the sync target.

Disappointed, that after 1.5 year this issue was not solved. This is a showstopper.

This was one of the first things I checked after downloading the application. Remove the demo notes. Notes history turned off. Then the resources stay on the sync target.

Disappointed, that after 1.5 year this issue was not solved. This is a showstopper.

I encounter same issue! But I think the show will resume well again, lol. And looking forward to watching!

Hi @laurent22 & Team,

We appreciate the time and effort you all place into Joplin!

Just touching base regarding this issue - hopefully we see a fix soon!

Can confirm this still is present in Joplin for Desktop v1.0.218

Few recent confirmations via:
https://discourse.joplinapp.org/t/notes-containing-attachments-notes-are-deleted-how-to-delete-all-attachments-cached-images-of-deleted-notes-easily/9380/2
https://discourse.joplinapp.org/t/best-way-to-delete-orphaned-resources/9374/3

Thanks!!

This is also a very big frustration for me as I have lots of resources in my notes. I really like the idea of a manual clean unused resources option.

Can confirm this still is present in Joplin for Desktop v1.0.233

@littleperks

You can use my ruby script to delete for now:

gem install joplin # may require sudo
joplin clean -n # -n is dry run so remove if you want to actually delete

You need to have the webclipper API turned on and run it on the machine running joplin. And if it can't find your joplin db you'll need to pass the --token option

@littleperks

You can use my ruby script to delete for now:

gem install joplin # may require sudo
joplin clean -n # -n is dry run so remove if you want to actually delete

You need to have the webclipper API turned on and run it on the machine running joplin. And if it can't find your joplin db you'll need to pass the --token option

That for your tool for Joplin! Would it update database.sqlite?

That for your tool for Joplin! Would it update database.sqlite?

The API is the external interface so I use that only. Touching the DB directly is fragile and not good practice IMO. The API modifies the db accordingly. The only possible caveat (that tessus pointed out) is that this doesn't handle document versions, but it hasn't worried me.

Simply deleting resources that are not associated with any note doesn't work because those resources might have come via sync, and their associated note has not been downloaded yet.

Is it possible to get the info whether a full sync happened?
If so: wouldn't it be possible to run a cleaning task _after_ it was ensured that a full sync happened?

Is it possible to get the info whether a full sync happened?
If so: wouldn't it be possible to run a cleaning task _after_ it was ensured that a full sync happened?

No, because in a distributed setup you never know if you have the same data as all the other clients. In this particular case, even if your current client has made a full sync, there could still be another client which hasn't synced yet and which is using the resources. People sometimes leave Joplin clients on some forgotten laptop and sync six months later - in that case we shouldn't delete all their data.

Is it possible to get the info whether a full sync happened?
If so: wouldn't it be possible to run a cleaning task _after_ it was ensured that a full sync happened?

No, because in a distributed setup you never know if you have the same data as all the other clients. In this particular case, even if your current client has made a full sync, there could still be another client which hasn't synced yet and which is using the resources. People sometimes leave Joplin clients on some forgotten laptop and sync six months later - in that case we shouldn't delete all their data.

Or add a entry Delete Orphaned Resources under Files menu or other zone? If user press the button means user agree delete those orphaned resources in all devices! So the delete user orphaned resources task never autorun.

So moving to a 1:1 relationship would solve everything, right?

Since having m:n relations between resources and notes is obviously

a feature that's probably almost never used

anyway.

Also it's causing issues like this one. And a feature that nobody uses which is causing bugs, is probably not a very good feature. :wink:

Is it possible to get the info whether a full sync happened?
If so: wouldn't it be possible to run a cleaning task _after_ it was ensured that a full sync happened?

No, because in a distributed setup you never know if you have the same data as all the other clients. In this particular case, even if your current client has made a full sync, there could still be another client which hasn't synced yet and which is using the resources. People sometimes leave Joplin clients on some forgotten laptop and sync six months later - in that case we shouldn't delete all their data.

I don't know if this is mentioned before but my solution:

A new column can be added in the 'Note attachment' tool to show how many notes have that resource, in that device. Let's call the column Occurrence or Used ... Times.
And when the user chooses to order according to that property (like in 'Size'), the result will be the resources with '0' values at the beginning of the list which also means orphaned resources. It's up to the user the view and delete those orphaned resources.

And even one step further, show the list of linked notes when the occurrence number clicked. This would be the most useful Note attachment tool for me.

Since my use case for having resources is usually that I copied some things from a website.
The extension I'm using (Copy Selection as Markdown), just added support for embedding images as base64 at the bottom of the note.

Thus no new resource is added and when I delete the note and the images are deleted automatically as well. :tada:

Maybe this helps someone else as well.

I guess it would also help if Attach file as base64 would be added to the command palette, too.

PS: isn't this a duplicate of issue 154?

What is the best way to deal with this issue? How can the orphaned resources be removed without messing up the sync? Is there a guide or best practice somewhere?

What is the best way to deal with this issue? How can the orphaned resources be removed without messing up the sync? Is there a guide or best practice somewhere?

I find some solution. Orphaned resources will be removed while "Export all" process.
I do so. After that I remove all notes from Joplin and clear "C:\Users\%UserName%\.config\joplin-desktopresources" folder. Also remove all attachments in Tools - Note attachments... Unfortunately there is no button "Delete all attachments" or "Clear DB" (I use LMB+Enter for everyone).
All that's left for me to do is import previously exported notes already without orphaned resources.
It's a long way. But it's working and rarely required. And I hope we will see more convenient way to do this in future. As well as native collapsible text. :)

Yes you are describing a similar process as what myself and others have described. Basically deleting all your data and importing everything again from scratch every time you need to delete a resource. From the test I have done, this is the only way I have verified that works for me as well. I have spent many days exporting all my data, deleting the database, importing it again and uploading all data and syncing all clients again.

Is this a viable solution for a note taking app with synchronizing capabilities? I assume there are complicated corner cases that prevent the proper synchronization of the data and make this a non-trivial task seeing as it is a major core issue with lots of users reporting being affected by it but still open after 2 years.

I am available to test different use cases and scenarios if any developers are interested in working with me in doing a deep dive to finally fix this problem.

@tessus, I'm wondering whether we could add something like a "Vaccum" button in Joplin that would clear out resources according to certain rules, probably rules similar to what you have in your script, but we should document caveats. Do you know what are the current limitation of your scripts? At least I assume it requires that everything has been synced before running it?

Yes, I think that was one of the things you mentioned. That all data has to be synced, but I don't check for that in my script. I'm not even sure, if there's a way to get that info from the db or API.
What about enryption? What happens, if some items are not decrypted yet?

I took care of the note history in my script, so it should be fairly safe. But I also put a big fat warning and told people to make backups.
e.g. I'm not 100% about the Attachment download behavior setting.

Such a vacuum button would have to take care of a few things (and if not possible, people would have to check those manually):

  • sync status
  • encryption status
  • Attachment download behavior
  • note history

Maybe we can use the current Note attachments page as a preview of what would be deleted. One thing I highly miss in that screen though is a search field. I would love to be able to search for attachment name or id.

No, because in a distributed setup you never know if you have the same data as all the other clients. In this particular case, even if your current client has made a full sync, there could still be another client which hasn't synced yet and which is using the resources. People sometimes leave Joplin clients on some forgotten laptop and sync six months later - in that case we shouldn't delete all their data.

Why not store in the resource database, with each attachment, the note ID(s) of each note the resource is used by, _per syncing device_? Then, if a client deletes a note, and finds that is the last note the resource is used by, delete the resource as well.

Of course, we need each client to have a good copy of the resource database. I don't know how sync works, but we need to ensure the resource database is always synced before resources are downloaded.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hakim89 picture hakim89  路  3Comments

kopfuss picture kopfuss  路  3Comments

LifeIsAParadox picture LifeIsAParadox  路  3Comments

GingerPapa picture GingerPapa  路  3Comments

yschutz picture yschutz  路  3Comments