Azure-storage-azcopy: Feature: Support filter on Attribute

Created on 14 Jan 2019  路  21Comments  路  Source: Azure/azure-storage-azcopy

I'd like to request that azcopy v10 gain support for inclusion or exclusion on file Attributes, as the general release of azcopy supports with the /A, /IA, and /XA flags.

feature request

All 21 comments

Hi @jeffwmiles, thanks for reaching out!

I've logged your feature request for consideration.

@jeffwmiles Hi I am interested in implementing this feature if no one has implemented it yet. Could you elaborate a little bit more on how this feature should work? By "inclusion or exclusion on file Attributes", do you mean you want to see certain Blob/File properties from the azcopy list output?

Hi @panfeng0007, FYI the docs for legacy AzCopy has moved to here.

@panfeng0007 Thanks heaps for your offer to implement this feature. I'm not sure whether you're thinking of implementing in the cmd/* files (e.g. copyUploadEnumerator) or in the ste/* files (e.g. xfer-anyToRemote). I think it probably belongs best in the cmd ones.... but, we are about to do some major refactoing of those. Specifically, we are going to refactor the copy* implementations to re-use the types can concepts from the sync* implementations.

So, anything that is written in today's implementation of the copy* types will need to be totally re-done after the refactoring. Maybe, if you don't want to wait for the refactoring, you could implement the feature for sync (since if it works there, it should be easily portable to copy after the refactoring).

Hi @JohnRusk, Thanks for letting me know of the refactoring. Yes my approach would be to implement the feature in the cmd/* files and since the sync command works similarly with the copy command, I could start implementing it in sync first. I鈥檒l also keep an eye on the upstream and keep my branch up to date with the dev or master branch.

Sent with GitHawk

@zezha-msft Thanks for pointing out the documentation. The attributes indicated in the doc are for NTFS files only. So I assume for Linux, I should include the file attributes returned by lsattr and also on macOS by xattr?

Hi @panfeng0007, you are very welcome to contribute a PR for this feature! Please let us know if you have any question about the code base.

I'd suggest to first achieve parity with the AzCopy V8. If there's additional needs, we can add support for the other attributes later.

@zezha-msft Will do!

@jeffwmiles Hi I am interested in implementing this feature if no one has implemented it yet. Could you elaborate a little bit more on how this feature should work? By "inclusion or exclusion on file Attributes", do you mean you want to see certain Blob/File properties from the azcopy list output?

@panfeng0007 My use case might be a little isolated, but it is what instigated this request.
I have a folder containing sets of files from database backups. I'm using azcopy with the new Sync feature to move these to Azure blob storage. Within the parent, there are some sets of files (within folders) that I do not wish to sync.

Right now I accomplish this by placing a specifically named text file in the directory I want to exclude, and then use a PowerShell script to find all child-items of the top-level directory where that text file does NOT exist. Then I run a foreach on each discovered child-item and azcopy them individually.

This is not very efficient but saves multiple TB of transfer. If I could exclude on attribute, I could simply azcopy the top-level parent, and ensure that the files I do not wish to transfer have Temporary or Hidden or some other attribute set, and not worry about additional logic (exclusion).

Or alternatively, only sync/copy the files with the archive attribute set (inclusion).

Thanks for taking a look at this!

@panfeng0007 You're welcome to add a few comments here about your thoughts of where the new code will fit into the existing sync code. Ze can probably reply with a quick note to confirm whether your suggested location is consistent with his design vision for that part of the codebase.

Hi @JohnRusk and @zezha-msft ,

I basically added two new types includeAttrFilter and excludeAttrFilter which both implement the doesPass function in the objectFilter interface. They are similar to the includeFilter and excludeFilter types. To include files with certain file attributes, I OR'd the attribute constants first from the filter then AND'd it with the attributes I got from each file. If the result is non-zero, then the file would be included. The exclusion logic works the same, except when the result is non-zero, the file would be excluded. Then I appended those two filters, in the right order, to the filter list consumed by the comparator, indexer and enumerator in cmd/syncEnumerator.go. If you want, you can check my branch fp_filter_file_attributes from my own forked repo, although I'm still testing it. Please let me know whether I'm on the right track. Thanks!

That sounds perfect to me. It's totally in agreement with my understanding of Ze's intent for those classes. Did you have to make an additional call (e.g. os.Stat) to get the attributes, or did we already have those?

@JohnRusk Yes I had to call syscall.GetFileAttributes to get the attributes on Windows. I haven't implemented this feature on Linux and macOS yet so doesPass just returns true on those platforms.

Sounds fine, because the perf hit of the extra call only happens if the user actually uses the feature (and the perf hit is negligible for all cases expect perhaps for cases with very large numbers of very small files).

BTW, might need some sort of error message, if someone tries to invoke the functionality on an unsupported platform. I.e. rather that just returning true from doesPass, since if that's all the code does, then users on the unsupported platforms may _think_ their filtering is working, when really its not. Better that we tell them its not. (Or implement support for those platforms ;-)

If you wanna tell them that it's not working, I'm not 100% sure of the best place for you to do that. Maybe put a method on your filter type(s) called isSupported, or soemthing, and then call that from the "cook" method in sync.go. THat way the implementation lives with your new code, but the call site for the validation is the same as all the other existing validation. I'm not 100% sure that would be the best approach... maybe there's a better way.

I'll take a look at how difficult it would be to actually implement support for Linux and macOS. It's best to release the feature on all platforms rather than just doing it partially. 馃槃

@panfeng0007 sounds great! We look forward to your PR. 馃槃

@JohnRusk @zezha-msft I just wanted to update my progress here:

  1. I looked around os and syscall packages for Linux and Mac and I didn't find anything that's equivalent to syscall.GetFileAttributes on Windows. It might be tricky to implement this feature on Linux and macOS.
  2. I was testing the code on Windows and I found out that syscall.GetFileAttributes only accepts full file paths but storedObject only contains relativePath. Is it possible to get the full path from relativePath? I am trying not to add new properties in storedObject as it will break current code base at multiple locations.

Any advice? Thanks!

Looks like Linux doesn't have an archive bit. So probably this had to be a Windows-only feature. With informative error message on other platforms.

@zezha-msft, any suggestions re 2? I'm guessing maybe pass the root path into the filter object when it is constructed...?

@panfeng0007 perhaps you could pass in the root path information when constructing the filter like @JohnRusk suggested.

@panfeng0007 's nice implementation of this shipped a few releases ago, but we forgot to close this issue. Am closing it now. Thanks @panfeng0007 !

Was this page helpful?
0 / 5 - 0 ratings