Argo: Allow .tar.gz to be disabled for artifact upload

Created on 8 Mar 2018 · 16Comments · Source: argoproj/argo

From Slack discussion with @jessesuen: Currently the artifact upload feature will .tar.gz any files. It would be good if this was able to be configured, so that we can just upload the raw file. This would be most useful when we want to upload a .txt file, or some json or similar.

enhancement

Source

0xdevalias

👍4

Most helpful comment

While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory).

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

I definitely have the use case to upload N files in a directory as single artifacts. It would be great to support it as long as the backend supports it.
Maybe there could be an differentiation between a directory with N files or an directory with files and subdirectories? Because only subdirectories would cause issues in other backends I think..

DSchmidtDev on 19 Apr 2018

👍2

All 16 comments

Any suggestions on syntax? I've thought about this before and have previously come up with archive: false flag.

jessesuen on 8 Mar 2018

archive:false works. If its set to true, the file will be compressed. If not, it will be saved as whatever extension it was originally?

anshumanbh on 8 Mar 2018

archive: false seems reasonable and straightforward enough to me. Probably have to keep default to true to not break backwards compatibility?

0xdevalias on 8 Mar 2018

Do you foresee supporting different archive formats and/or compression types? If so maybe archivePolicy: TarGz and archivePolicy: None? If not, boolean seems fine.

Does archive: false clearly communicate no archiving AND no compression? Would one ever be performed without the other?

wookasz on 8 Mar 2018

I vote for fine tuning as @wookasz suggested because there are so many compression options
So it is reasonable to make this API extensiable, but for the very beginning {TarGz|None} is enouth for most of users. Later one can add support for 'TarBz2', 'TarZstd',

dmonakhov on 5 Apr 2018

I agree, a simple boolean value is probably short-sighted. How about making the possible values as just the file extension? In other words, the value "tar.gz" seems preferable over "TarGz".

Initially, the behavior would be:

(archive field omitted) - defaults to tar.gz
"tar.gz" | ".tgz" - tar.gz archiving
"none" - no archiving

jessesuen on 12 Apr 2018

@jessesuen Agree, Suffix tag is nice and clean approach.

dmonakhov on 13 Apr 2018

We have an internal need for disabling archiving. @wanghong230 will implement a fix.

jessesuen on 18 Apr 2018

👍1

Some special considerations to make: what if the output artifact path is a directory? I think it should be an runtime error if someone is attempting to save a directory as an artifact, while simultaneously disable archiving. While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory). To keep things simple, we can raise error if we see the combination of a directory + archive: none

jessesuen on 18 Apr 2018

👍1

FEATURE REQUEST:
It would be useful to output a complete directory and it's content as an artifact without tarballing it so the next step(s) could work on only a subset of the files.

Before I saw @DSchmidtDev's comment, I was considered this to be an error scenario for reasons described above: https://github.com/argoproj/argo/issues/784#issuecomment-382539780.

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

jessesuen on 18 Apr 2018

I like the ability to support it if the backend supports it, and fallback to an 'unsupported error' case if not.

0xdevalias on 19 Apr 2018

While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory).

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

DSchmidtDev on 19 Apr 2018

👍2

Working on this now.

One open issue is that S3 doesn't handle empty directories, because s3 has no concept of a directory, just objects with key prefixes to simulate a directory.

It's been hacked around by supporting the notion of a special zero-content key named /, which indicates an empty directory. More discussion here:
https://github.com/ncw/rclone/issues/753

jessesuen on 18 Aug 2018

Turns out a simple string to disable archiving is a bit short-sighted. It'll be better to have something like this instead:

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none: {}

The reason for having a structure is because when disabling archiving, I discovered we may eventually need to give some options of how to handle certain situations. For example, artifact repos like S3 cannot store symlinks, because it has no concept of one. So when archiving directory containing symlinks to S3, we have to either ignore a symlink or follow it (potentially resulting in redundant copies in S3). In the future, this could be controlled like:

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none:
            followSymlinks: true

jessesuen on 20 Aug 2018

The reason for having a structure is because when disabling archiving, I discovered we may eventually need to give some options of how to handle certain situations.

@jessesuen I think using a structure is a good approach and allows to enhance the feature if necessary without breaking changes.
How are symlinks handled in the current approach?

To catch up the question from @wookasz :

Do you foresee supporting different archive formats and/or compression types? If so maybe archivePolicy: TarGz and archivePolicy: None? If not, boolean seems fine.

Would it be an option to structure it like this?

- name: hello-txt
  path: /tmp/hello_world.txt
  archive:
    type: {"TarGz" | "none" | ... }
    options:
      followSymlinks: true

DSchmidtDev on 20 Aug 2018

How are symlinks handled in the current approach?

Currently, symlinks are not followed, since the implementation uses docker cp, which does not follow symlinks. Therefore, the symlinks remain as files with the symlink file mode on, and remain as symlinks when extracted.

In this initial release for disabling archiving, the default behavior will again not follow symlinks. So in the case of S3, symlinks will simply not be uploaded

Note that docker cp does has the option, --follow-link or -L, so in the future, we would use this flag to follow links so that they could get uploaded to S3. But I prefer that the enhancement to follow symlinks, be a separate effort.

Would it be an option to structure it like this?

- name: hello-txt
  path: /tmp/hello_world.txt
  archive:
    type: {"TarGz" | "none" | ... }
    options:
      followSymlinks: true

I think this would not be as flexible since some types will need their own specific options. Also, kubernetes data structures tend to follow the "one of" convention of type definitions, where defining one of the sub-fields, infers the type of the object. My previous example was actually a bad example because followSymlinks really applies to all modes (not part of none). Here are better examples which illustrate my point:

No archiving, follow symlinks

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none: {}
          followSymlinks: true

tar archiving, do not follow symlinks, compress using gz (this is today's behavior).

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          tar:
            compression: gz
          followSymlinks: false

tar archiving, follow symlinks, compress using bz2.

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          tar:
            compression: bz2
          followSymlinks: true

zip archiving, follow symlinks

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          zip: {}
          followSymlinks: true

jessesuen on 20 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Recommended resources for workflow controller

ddseapy · 3Comments

RBAC authorization for non-admins

nelsonfassis · 4Comments

Handle pod deletes outside workflow lifecycle

logicfox · 4Comments

WorkflowTemplate parameters are not substituted when referenced

CermakM · 3Comments

Does Argo work with OpenShift (crc) V4.x?

stevef1uk · 4Comments