Argo: Allow .tar.gz to be disabled for artifact upload

Created on 8 Mar 2018  路  16Comments  路  Source: argoproj/argo

From Slack discussion with @jessesuen: Currently the artifact upload feature will .tar.gz any files. It would be good if this was able to be configured, so that we can just upload the raw file. This would be most useful when we want to upload a .txt file, or some json or similar.

enhancement

Most helpful comment

While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory).

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

I definitely have the use case to upload N files in a directory as single artifacts. It would be great to support it as long as the backend supports it.
Maybe there could be an differentiation between a directory with N files or an directory with files and subdirectories? Because only subdirectories would cause issues in other backends I think..

All 16 comments

Any suggestions on syntax? I've thought about this before and have previously come up with archive: false flag.

archive:false works. If its set to true, the file will be compressed. If not, it will be saved as whatever extension it was originally?

archive: false seems reasonable and straightforward enough to me. Probably have to keep default to true to not break backwards compatibility?

Do you foresee supporting different archive formats and/or compression types? If so maybe archivePolicy: TarGz and archivePolicy: None? If not, boolean seems fine.

Does archive: false clearly communicate no archiving AND no compression? Would one ever be performed without the other?

I vote for fine tuning as @wookasz suggested because there are so many compression options
So it is reasonable to make this API extensiable, but for the very beginning {TarGz|None} is enouth for most of users. Later one can add support for 'TarBz2', 'TarZstd',

I agree, a simple boolean value is probably short-sighted. How about making the possible values as just the file extension? In other words, the value "tar.gz" seems preferable over "TarGz".

Initially, the behavior would be:

  • (archive field omitted) - defaults to tar.gz
  • "tar.gz" | ".tgz" - tar.gz archiving
  • "none" - no archiving

@jessesuen Agree, Suffix tag is nice and clean approach.

We have an internal need for disabling archiving. @wanghong230 will implement a fix.

Some special considerations to make: what if the output artifact path is a directory? I think it should be an runtime error if someone is attempting to save a directory as an artifact, while simultaneously disable archiving. While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory). To keep things simple, we can raise error if we see the combination of a directory + archive: none

FEATURE REQUEST:
It would be useful to output a complete directory and it's content as an artifact without tarballing it so the next step(s) could work on only a subset of the files.

Before I saw @DSchmidtDev's comment, I was considered this to be an error scenario for reasons described above: https://github.com/argoproj/argo/issues/784#issuecomment-382539780.

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

I like the ability to support it if the backend supports it, and fallback to an 'unsupported error' case if not.

While technically S3 supports directory-like semantics, another artifact repository may not (e.g. not sure if this is possible with artifactory).

As I mentioned, it's technically possible in S3 to upload an entire directory as N separate s3 keys (where N is the number of files). We just may not be able to support other repositories easily. It could be that we support unarchived directories in S3 and not Artifactory.

I definitely have the use case to upload N files in a directory as single artifacts. It would be great to support it as long as the backend supports it.
Maybe there could be an differentiation between a directory with N files or an directory with files and subdirectories? Because only subdirectories would cause issues in other backends I think..

Working on this now.

One open issue is that S3 doesn't handle empty directories, because s3 has no concept of a directory, just objects with key prefixes to simulate a directory.

It's been hacked around by supporting the notion of a special zero-content key named /, which indicates an empty directory. More discussion here:
https://github.com/ncw/rclone/issues/753

Turns out a simple string to disable archiving is a bit short-sighted. It'll be better to have something like this instead:

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none: {}

The reason for having a structure is because when disabling archiving, I discovered we may eventually need to give some options of how to handle certain situations. For example, artifact repos like S3 cannot store symlinks, because it has no concept of one. So when archiving directory containing symlinks to S3, we have to either ignore a symlink or follow it (potentially resulting in redundant copies in S3). In the future, this could be controlled like:

      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none:
            followSymlinks: true

The reason for having a structure is because when disabling archiving, I discovered we may eventually need to give some options of how to handle certain situations.

@jessesuen I think using a structure is a good approach and allows to enhance the feature if necessary without breaking changes.
How are symlinks handled in the current approach?

To catch up the question from @wookasz :

Do you foresee supporting different archive formats and/or compression types? If so maybe archivePolicy: TarGz and archivePolicy: None? If not, boolean seems fine.

Would it be an option to structure it like this?

- name: hello-txt
  path: /tmp/hello_world.txt
  archive:
    type: {"TarGz" |聽"none" | ... }
    options:
      followSymlinks: true

How are symlinks handled in the current approach?

Currently, symlinks are not followed, since the implementation uses docker cp, which does not follow symlinks. Therefore, the symlinks remain as files with the symlink file mode on, and remain as symlinks when extracted.

In this initial release for disabling archiving, the default behavior will again not follow symlinks. So in the case of S3, symlinks will simply not be uploaded

Note that docker cp does has the option, --follow-link or -L, so in the future, we would use this flag to follow links so that they could get uploaded to S3. But I prefer that the enhancement to follow symlinks, be a separate effort.

Would it be an option to structure it like this?

- name: hello-txt
  path: /tmp/hello_world.txt
  archive:
    type: {"TarGz" | "none" | ... }
    options:
      followSymlinks: true

I think this would not be as flexible since some types will need their own specific options. Also, kubernetes data structures tend to follow the "one of" convention of type definitions, where defining one of the sub-fields, infers the type of the object. My previous example was actually a bad example because followSymlinks really applies to all modes (not part of none). Here are better examples which illustrate my point:

  1. No archiving, follow symlinks
      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          none: {}
          followSymlinks: true
  1. tar archiving, do not follow symlinks, compress using gz (this is today's behavior).
      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          tar:
            compression: gz
          followSymlinks: false
  1. tar archiving, follow symlinks, compress using bz2.
      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          tar:
            compression: bz2
          followSymlinks: true
  1. zip archiving, follow symlinks
      - name: hello-txt
        path: /tmp/hello_world.txt
        archive:
          zip: {}
          followSymlinks: true
Was this page helpful?
0 / 5 - 0 ratings

Related issues

ddseapy picture ddseapy  路  3Comments

nelsonfassis picture nelsonfassis  路  4Comments

logicfox picture logicfox  路  4Comments

CermakM picture CermakM  路  3Comments

stevef1uk picture stevef1uk  路  4Comments