Having tens of pypi packages that are kind of united but not really makes it very difficult to package it in distros, please fix it (also because due to how python includes works there is no real advantage in doing a huge amount of small packages)
Hi @Fale
Thanks for the feedback, but we do not plan to merge all packages into one.
Each endpoint is managed by different service team that didn't have the same deadlines. For instance, it's not because the Batch team made a breaking change, that it's worth updating a big package including Compute and Resources. Actually, most of our users are using 2 or 3 packages (like Compute+Network) and don't want to install the whole services on their machine (size on disk + time to install).
In addition, this is a consistent experience across language.
We are already working with Debian to provide one "azure" package with a frozen state of several packages and we're happy so far with that solution. We will update in a near future the "azure" meta-package, so you'll be able to install one package with one command if you want, but giving the opportunity to other people to precisely chose what they want.
@lmazuel thanks for the answer. Debian is not the only distro out there. I'm trying to understand if the Debian approach is feasable for us as well or not, but from what I see, they have packaged only a minimal part of the azure python code. Also I noticed that OpenSUSE has the same problem (BUG #525)
Also, the argument that this approach is consistent with the other languages is pointless, since is not consistent with python (and guess what, the users of the python-sdk do not care about the other SDKs, while they care about consistency with the rest of python)
+1 for Fale's issue
The code base is more like an SDK of individual SDK(s) rather than a single SDK with many classes? There is a lot of boilerplate code caused by the mass of pypi modules. For example, each pypi module has a setup.py (similar to project files in Visual Studio).
Thanks @CalvinHartwell for your comment :)
@Fale I agree that the cross language point is not interesting, let's forget I said that :)
I cited Debian as an example because it's recent work for us, but I know there is more distro, RHEL, CentOS, Suse, ArchLinux etc (Mandriva when I was young...). Please be sure I keep an open-mind here and I'm very interested in the conversation (I really do).
What I don't get is why you think a meta-package "azure" that installs some other package is a problem. Meta-package is something common in the Linux world, It's the whole point of the dependency system. In this situation, you have (for instance with random version number) a package "azure" 2.4.2 that will install "azure-mgmt-resource" 0.45.0 and "azure-mgmt-compute" 0.65.0 at the same time. So, you apt-get/yum install azure and got several Python packages at once. What's the issue?
Currently, the "azure" meta-package is not accurate because some core libraries are still in preview. But the final plan is to make sure that "azure" will install every stable packages at the same time, with fixed version. "preview" services will be available for testing as separate package if you want.
Please note that we also have people that are happy to install exactly what they want (only the Azure services they need).
Thoughts?
The current implementation has 3 problems from my point of view:
Also, the point of "many libraries" is a problem during the packaging effort because for many distro (included Fedora, and *EL, for which I'm looking at this SDK) every single pypi package has to be managed in a different (rpm) package, and therefore this means that to maintain the Azure SDK and Azure CLI I will have to maintain ~50 packages. On the other side, I'm the mainter of the AWS python SDK and AWS CLI tool and those are only 3 packages (2 for the SDK, 1 for CLI).
@Fale To discuss, let's assume I merge everything into one package. Botocore is using meta-descriptions of RestAPI "on the fly", we use our meta-descriptions to _generate_ Python code, which takes more place on disk. Our biggest package (Web) is currently doing 1.8Mb. On average packages are 500Kb sized. This means that 40 services of Azure, we can estimate the azure package to reach 20Mb. We plan to support several APIVersion for compatibility in a near future, which can leads a Python package of 200Mb easily. Of course it's an estimate, but it's a likely scenario. What do you think?
Edit: Change numbers to more accurate ones
@Fale Also, why can't you put inside one rpm file several python packages at the same time? Is there some technical limitation somewhere, or is it just non conventional?
Not be be read in a bad way, but your point is that your code is bloated and for this reason is better to split it? This argument is not very strong, I think...
About the fact of putting more python packages in the same rpm is against the policy in many cases, also in the specific case is not possible (technically speaking) due to the fact that an RPM pakage has a single version and all azure packages have different versions and using wrong version is definitely against the policies
If you have a strong equivalence azure 2.1.2 == azure-mgmt-resource 0.40.0 + azure-mgmt-compute 0.35.0 and so on, why don't you create a python3-azure package 2.1.2? There is no ambiguity, no cheating, and you have a single version to use. It's the approach used by Debian, and I don't see where it's against any policy? Really, I'd like to understand your point, but I'm not seeing the technical problem yet :(
@rjschwei @bear454 @schaefi, would you like to share your point of view for Suse?
@irl, would you like to share your point of view for Debian?
Fedora guidelines force us to package things as much as possible "as the upstream" does.
So, currently I should do the following packages:
Can I create the package python?-azure 2.1.2 that ships all files of the other modules? No
Why?
I can do what you describe only if the release cycle and the version number will be the same for all modules.
So originally we also struggled with the issue of many Python packages vs. one rpm package which is why we opened the other issue and it took us a while to wrap our head around how to approach this. We finally decided to basically follow the Python package strategy.
So what we have today are 2 packages https://build.opensuse.org/project/show/Cloud:Tools?search=azure, python-azure-sdk and python-azure-sdk-storage. As the whole thing gets broken into smaller pieces with different upstream teams managing different code streams I see the argument about the coordinated release problem.
Also note that we decided to pull our sources from GitHub rather than pypi. We did struggle with the way things are pushed to pypi and found pulling from GitHub to be a better approach for us for package creation.
To a certain degree we/I followed a similar approach with the ec2utils we provide in our Enceladus project, https://github.com/SUSE/Enceladus/tree/master/ec2utils, meaning different release cycles for each utility.
Having different Python packages, which for us will eventually translate into different rpms implies that client code, for us azurectl, https://github.com/SUSE/azurectl, can be more precise about dependencies, which is an advantage. We have not yet packaged the new az tools, thus I cannot speak to the effect on that packaging effort and dependency management from that point of view, but I am certain we'll solve that in a reasonable way.
While I share the concern of @Fale regarding package proliferation as well as the multi Python package approach not being "Pythonic", there are equally valid arguments on the other side, meaning to have a sdk-storage package, maybe one for networking etc. I think the argument about being "Pythonic" mostly comes into play after install. Meaning as long as I can
from azure.storage import ....
from azure.networkig import ...
as a Python developer I really do not care whether site-package/azure/storage and sit-packages/azure/networking were installed by 2 distro packages or 1. Having multiple packages may be a bit more cumbersome for the developer to set up the system, meaning the developer has to potentially install many packages but that can be easily done with a one liner:
pckgmgr --search python-azure | grep sdk | xargs pckmgr install
or a meta-package, i.e. we can easily create a python-azure-sdk-all package or create a pattern that pulls all the other packages.
I think it is equally valid to look at each service Azure provides as a separate target and have a separate SDK for that target as it is to look at the API as a whole.
That it was decided at the origination of boto, and now carried into botocore, that all of the AWS API should be in one SDK, one Python package, is just as valid a decision as the decision made here that each service should have its own SDK managed by separate teams.
So long story short, from my perspective either way is fine. If from a development perspective things are managed more easily at Msft to have multiple Python packages we are game to follow that route.
Referencing related issue created for Python CLI which also has separate packages - https://github.com/Azure/azure-cli/issues/1055
From the Debian perspective, I am ignoring PyPI and packaging from the Git repo. The idea that in the future there will be changes to the individual packages which will then be released via PyPI and not git tags breaks this. If the releases of all the core modules (i.e. the ones in this repo) are synchronised it makes everything a lot easier for me, and I guess for other distros too.
Thanks @rjschwei and @irl !
So, in summary, what I understand is as long I create some checkpoint as tags in the repo, you're good to package the current Github state as "tag" version number. I can publish on PyPI new packages for a specific service if available, but it will be sync as a Linux package only when I release a new version of the "azure" meta-package (with a new associated tag on Github).
That's seems fair to me. Anyway, I plan to release more often the "azure" meta-package once the core ARM modules (Storage/Compute/Network/Resource) will be officially stable.
@Fale are we fine with that plan?
It will be long and painful, but I can make it work with the policies. Thanks
So, in summary, what I understand is as long I create some checkpoint
as tags in the repo, you're good to package the current Github state as
"tag" version number.
yes this makes packaging work much easier. The same code base referencing
that release tag should also exist on pypi. In my projects that happens
automatically see:
https://docs.travis-ci.com/user/deployment/pypi
I can publish on PyPI new packages for a specific
service if available, but it will be sync as a Linux package only when
I release a new version of the "azure" meta-package (with a new
associated tag on Github).
yes and it should be possible to make this an automatic step
Regards,
Marcus
@Fale fwiw - "do what upstream does" doesn't have to mean PyPI as PyPI is a downstream distribution of the upstream, there's no reason not to package up the sources from Git (possibly my Debian-oriented frame of reference). I have one source package, but plan to build one binary package for each of the logical PyPI packages within that.
@lmazuel That's perfect for me (:
@derekbekoe This would work great for the azure-cli package also.
@irl: I think I'll go the github way too. We do this for many situations :). I'll go for one src.rpm and multiple rpms, but even having a single src.rpm package, it will be a fairly complex spec to be able to generate all the various sub packages properly considering files and versions etc
@Fale I will just be shipping everything with the version number of the metapackage. Subcomponents may have differing versions, but they're all part of the larger "unified" release.
@irl How do you manage packages dependencies that depends on the subcomponents? ie: https://github.com/Azure/azure-cli/blob/master/src/azure-cli-core/setup.py#L49
@Fale If there's a tagged unified release in git and the dependencies don't line up, then Microsoft has done a terrible job at release management. I don't anticipate this happening often.
@irl Microsoft point is that they want to have different version numbers to be able to have different development cycles for the various parts of the codebase, so I anticipate this happening often going forward
@Fale yes, between releases it may be broken and not all lined up, but the metapackage needs to be released with everything lined up otherwise it will never be installable, so the idea would be to package in distributions when, and only when, the metapackage sees a release and the git repo is tagged.
Also:
@Fale not through the package management system they can't, and it's not a bug in your system if they've done something to break it. I fully anticipate other packages depending on the sdk, I have vagrant-azure in Debian depending on the Ruby SDK, and I'm quite happy to continue supporting this. I've had to patch the crap out of it to get it to work with the latest SDK, but as a distribution packager I expect to have to do some work occasionally.
From what I can see, Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities. This situation is no different from any other situation where you have a library and it has dependencies, some external. As a distribution packager, you should be performing QA to catch these problems and working with upstream to find resolutions, or patching locally within your distribution to ensure all your packages line up.
A couple of points and then I'll stop with this since we are going OT:
Microsoft are new to this and I'd rather compromise and accept a little extra work than make demands that they change their entire project management workflow and put them off engaging in open source communities.
- Since Microsoft is new, is even more important to discuss with them "how open source works", so that they can understand it before doing errors
- I can compromise and accept extra work, I will not compromise and accept to break Fedora dependencies
@Fale I think we're aggressively agreeing with each other perhaps. The important thing is that there are releases of the metapackage that have all the dependencies working together nicely.
To summarise my view:
There is a great article on the topic of packages vs. pip here: https://notes.pault.ag/debian-python/
Thank you for your contributions, it really matters :). I agree with latest @irl comment.
In full transparency, the next possible breaking change might be for Azure Stack support in the SDK in a few months. I will bump a major version for every packages at the same time, and keep it in a parallel branch in preview as long I'm not sure than Ansible and others folks I'm in contact with are not ready (FYI I wrote the Ansible plugin with a workmate at MS, I really care to make it work). We don't plan a major change, but the way we have to create the client and authenticate against Azure might change enough to justify a version bump. For now it's subtle enough, that we can provide one line at the start (like Python2/3 compat) to support both versions at the same time. We'll try to keep it that way.
Do not hesitate to contact me by Github or direct email (
I was trying to package this repo as suggested by @irl and @rjschwei and I noticed a problem with a circular dependency with this approach:
azure (aka this repo) depends on azure-storage which depends on azure-common (aka this repo).
How have Debian and Suse manage to make this work?
In Debian, we have an unstable distribution where we can have things that have broken dependencies. In practice, building python-azure (as we call this repo) does not depend on python-azure-storage, only running it. We can upload python-azure and python-azure-storage and our tools will automatically move these to the testing distribution (currently stretch) when all the dependencies line up.
You can see our spec file here:
https://build.opensuse.org/package/view_file/Cloud:Tools/python-azure-sdk/python-azure-sdk.spec?expand=1
and as mentioned we do have a separate package for storage, the specfile is here:
https://build.opensuse.org/package/view_file/Cloud:Tools/python-azure-sdk-storage/python-azure-sdk-storage.spec?expand=1
@irl thanks
@rjschwei do you have some king of "auto-generating" dependencies? Because I don't see any dependency declaration to the other package on those two spec files
[2]@rjschwei do you have some king of "auto-generating" dependencies?
Because I don't see any dependency declaration to the other package on
those two spec files
the azure storage api does not depend on the azure service management api.
Thus python-azure-sdk and python-azure-sdk-storage do not have a dependency
on each other. Our python-azurectl utility however depends on a specific
min. version of the storage and servicemanagement api in order to work
correctly with the sdk features azurectl uses
Back in the day, we (the Python team) made the first azure-storage package, so it was on this repo. When it was obvious that we had some performance issue and that storage drove a lot of specific questions, the storage team took ownership directly to improve it drastically and be more responsive. This implies that the "azure-storage" package is close to our package structure and use our meta-package, but is in another repo (unlike DocDB for instance).
I didn't realize that this could be a problem here :(. It seems to me that azure-storage should be in the Linux package "as if it were on the azure-sdk-for-python" repo. This solve the dependency problem, but I agree it makes the situation a little more complicated :(.
It's the only exception for history reason. We do not plan to include more data stuff not in this repo, like DocDB, in the azure meta-package. We are working on azure-keyvault, but this will be in this repo as well.
@schaefi it seems like azure-sdk-storage depends on azure-common (https://github.com/Azure/azure-storage-python/blob/master/setup.py#L68) which is in this repo.
@lmazuel the majority of distribution do not accept multiple git repos as source of a single source package, so the azure-storage can not be provided by the same source package of the python-sdk. As @irl suggested, it would be possible to build both packages and then push them together to stable, but this would mean that no tests can be performed during the build since at build time you can not install the other package (since it would depend on the package you are building at the moment) and for Fedora tests should be performed in build time (it's not mandatory but it's highly suggested). Also the "push together" approach could have other problems during the life of the package.
Ok, so I guess the best approach is to leave azure-storage in it's own package. azure-storage needs azure, but we can cut the opposite link (azure does not really need azure-storage, no code is using it in any package). This avoid you a circular link.
I think in the long-term the dependency of azure-storage on azure-common will disappear. azure-common is used only for legacy code now in this repository. This will simplify the situation.
On Mon, 2016-10-17 at 09:45 -0700, Laurent Mazuel wrote:
I think in the long-term the dependency of azure-storage on azure-
common will disappear. azure-common is used only for legacy code now
in this repository.
Is "legacy code" here used in the generic way, or are you specifically
referring to ASM ?
@lmazuel the dependency dropping would solve this circularity problem :)
@bear454 I mean "no ARM", so ASM + azure-servicebus.
All common libraries for ARM are in msrest/msrestazure packages.
Is there a chance we could at least see another bundled release? I'm trying to maintain packages for azure-cli on Arch but I've run into a problem where the current 2.0.0rc6 release is too old and results in module errors and the git builds are too new resulting in a different set of errors.
I've tried building each module in this repo separately but there's a ridiculous amount of them and several of them seem to be unable to install independently as needed for Arch packaging.
I can see the argument for this because of independent release cycles but as a user I don't care. I need something that just works. The easiest solution, I think, would be to package major releases every so often that contain stable versions of each module.
Hi @optlink
Yes, the rc7 is planned. The problem is still that for a meta-package to be tagged as "stable", I need all sub-dependencies as stable. And it's not the case currently.
However, if you do packages for the CLI, the CLI uses the same behavior: they are cut into services and sub-packages . For instance, this is the Network implementation of the CLI. And this package follow directly a specific version of Network (this 2.0.1 is linked to 0.30.0). So even if I do a rc7, I can't assure that at all packages will match all the sub-dependencies of the CLI. And even if I assure it today, this can change tomorrow with an update of Network or something else.
I'm not sure I understand clearly your constraints (I'm a Ubuntu user, I just know Arch by name sorry :-( ), but send me an email at MS (\ FYI @derekbekoe @johanste
On 03/30/2017 11:03 AM, Kelsey Maes wrote:
Is there a chance we could at least see another bundled release? I'm
trying to maintain packages for azure-cli on Arch but I've run into a
problem where the current 2.0.0rc6 release is too old and results in
module errors and the git builds are too new resulting in a different
set of errors.I've tried building each module in this repo separately but there's a
ridiculous amount of them and several of them seem to be unable to
install independently as needed for Arch packaging.I can see the argument for this because of independent release cycles
but as a user I don't care. I need something that just works. The
easiest solution, I think, would be to package major releases every so
often that contain stable versions of each module.
For what it's worth. So far we have also created a bundled package for
openSUSE and SUSE Linux Enterprise. However, we are starting with
packaging the az tools (azure-cli) and it is also split into many
pieces. Those in turn depend on the pieces of the SDK rather than the
SDK as a whole. Thus as a package you end up having to either maintain a
large number of packages or a large number of directives for provides in
order to sort out the proper version dependencies. We are going down the
road of many packages.
Thank you @rjschwei for your message. We will still continue to release one package per service, but do you have a suggestion of zip/tar.gz/package/tools or something that we can do to simplify your process?
On 04/03/2017 12:19 PM, Laurent Mazuel wrote:
Thank you @rjschwei https://github.com/rjschwei for your message. We
will still continue to release one package per service, but do you have
a suggestion of zip/tar.gz/package/tools or something that we can do to
simplify your process?
I don't think there is anything else to do. The only simplification
would be to release everything as one, but we've already had that
discussion, and then have the cli package depend on that SDK version.
Anyway, since the cli gets released as Python packages based on service
components that in turn depend on Python packages that are released per
service component that's is really the model we have to follow on the
packaging level.
Hi!
I have recently picked up the task to package the Azure SDK in openSUSE. For openSUSE, the current plan is to use the packages from the PyPi repository. However, while working through the various azure-mgmt-* packages I noticed that many packages are either outdated on PyPi ( those are commerce, compute, network, powerbiembedded, resource, servicebus and storage from the mgmt packages and the meta packages azure-mgmt and azure-nspkg) or are missing the __init__.py files so that setuptools fails to install them properly (those are eventhub, media, network, resource, search, servermanager, servicebus and storage).
On the other hand, I could also use the tarballs generated by the git tags in the github repository. However, it's not clear to me which of these tags should be used when packaging the whole SDK while ensuring all modules are compatible with each other. If I read the discussion correctly, some of the modules can be too new so that they won't work with certain other modules anymore and can only be used individually. And if one wants to deploy the whole SDK, all modules must have the version belonging to a particular version of the whole SDK.
So, my question now is: How do I get release tarballs with the proper versions for each module so that I get a complete and working SDK in the end? PyPi is currently apparently not the best source for the aforementioned reasons and so are the released creates through the git tags on github.
Thanks!
I just figured out that the SDK releases are available as single tarballs generated from the git tags, they follow this pattern:
https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.*)[A-Z,a-z,0-9]*\.zip
e.g.:
https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz
So, I suggest just pulling the tarball from there and using this as a base for the packaging.
Hi,
On 05/10/2017 05:02 AM, John Paul Adrian Glaubitz wrote:
I just figured out that the SDK releases are available as single
tarballs generated from the git tags, they follow this pattern:|https://github.com/Azure/azure-sdk-for-python/archive/v?(\d.)[A-Z,a-z,0-9].zip|
e.g.:
|https://github.com/Azure/azure-sdk-for-python/archive/v2.0.0rc6.tar.gz|
So, I suggest just pulling the tarball from there and using this as a
base for the packaging.
That doesn't work because the azure-cli releases depend on the
individual components of the SDK, not on the SDK as a whole.
So as packager there are two choices:
a.) Create 1 package for SDK, as we pretty much do in openSUSE right now
and then have a very long list of Provides: statements where each
Provides lists a component. This list is going to be a PITA to maintain
and will inevitably be wrong and cause headachs
b.) Package each individual component of the SDK, the approach we are
now taking.
@derekbekoe, do you have any suggestions?
Hi @glaubitz, sorry I didn't answer earlier, it was busy with //build/ this week and PyCon next, and I wanted to take the time to answer you correctly. I just want you to be sure I don't ignore you, I'll be back with my full brain soon :)
That doesn't work because the azure-cli releases depend on the individual components of the SDK, not on the SDK as a whole.
Sorry, I wasn't clear enough then. I was not talking about creating a single RPM package, but to use the github tarball as a single source. Not because I particularly prefer github over PyPi but rather because the packages on PyPi are either outdated or broken.
a.) Create 1 package for SDK, as we pretty much do in openSUSE right now and then have a very long list of Provides: statements where each Provides lists a component. This list is going to be a PITA to maintain and will inevitably be wrong and cause headachs
I agree and that's definitely not what I want. However, having to pull every
b.) Package each individual component of the SDK, the approach we are now taking.
That's definitely what I want to do. However, my problem currently is that I don't know for sure which set of packages I should use.
Should I:
a) Use the v2.0.0rc6.tar.gz as the source for all base packages and create RPMs from that? I have written a small script which creates the individual .zip files for all individual packages. Then complement these RPMs packages with the remaining packages from our list, just using the latest available release version for each package.
or
b) Just use the latest tarball available in the github "Releases" tab, unpack that archive and generate the individual .zip files from there? For example, downloading https://github.com/Azure/azure-sdk-for-python/archive/azure-keyvault_0.3.3.tar.gz, unpacking it and creating the individual .zip files using that archive.
The reason I ask is because each of these tarballs always contain the complete SDK and not just azure-keyvault, for example. Thus, when I download and use the tarball azure-keyvault_0.3.3.tar.gz, can I still assemble a working SDK from that or does that work with the v2.00rc6.tar.gz tarball as it has been tagged as a release of the whole SDK?
It's just confusing that the individual packages and the complete SDK show up on the same "Releases" tab. A releases normally indicates something that is stable - or at least beta - that users can download and use. That's why tagging releases for the individual packages while still containing the complete SDK is confusing as hell.
Adrian
To elaborate a little more: I just ran my small script over the unpacked azure-keyvault_0.3.3.tar.gz and it created azure-2.0.0rc7.zip among others, so the resulting SDK I got is something between rc6 and rc7 (since rc7 has not been officially tagged yet).
@glaubitz I'm not really sure that managing tens of highly coupled packages is anywhere easier.... and that's why I opened this ticket (aka: I think it is not possible to manage this project in a sensible way, and this is why - given the IMHO unsatisfactory answers - I'm not packaging this for Fedora/EL)
@Fale But you can just download the tarballs from the github releases page and you get all modules in a single tarball. In fact, that's what @irl is doing for Debian and since the Debian version is currently at v2.0.0rc6, it has less modules than are currently visible in the github repository.
I generally don't have a problem juggling with a large number of sources - it's just a matter of good packaging tools after all - I'm just confused as to which versions to use for a stable distribution.
Hi @glaubitz
I understand it's complicated, really, :/. Github is not really built to host several packages in one repo. This is some answers:
Let's be pragmatic on what you want (before talking about how to do it): do you want to release one package like Debian like python-azure 2.0.0rc6? Or separate packages for each components? As @rjschwei was saying, the CLI is using each component package independently, so we might have an issue with that. Let's say we can sync azure-cli and azure-sdk bundle package, do you want to:
Once I get what you want the user experience to be, we will figure out the "how".
Hi @lmazuel
I understand it's complicated, really, :/. Github is not really built to host several packages in one repo.
You could put each package into a separate git repository and then use git submodules to references these modules in the git repository for the whole package. Lots of projects actually do this when they use third-party libraries like _ffmpeg_.
Tag are on purpose "
_ " and are made just for the specific package mentioned in the tag. I don't recommend for instance to use tag "azure-keyvault_0.3.3" to install "azure-mgmt-compute"
Ok, so this means that despite azure-keyvault_0.3.3 containing the whole SDK, I should just always assume the remaining packages are effectively git snapshots and should not be used for anything but development. Thus, when I download azure-keyvault_0.3.3, the azure-mgmt-comput package inside this tarball is probably version 1.0.0rc1 plus some extra commits and shouldn't be used for production.
Thus, anyone wanting to use the releases from github really needs to download the tarball separately for each tagged package version.
Tag like "v2.0.0rc6" are also intended to be accurate for "azure 2.0.0rc6" only, even if I'm pretty sure the state of the repo at this state was correct, according to the content of v2.0.0rc6
Isn't azure supposed to be the primary meta package which allows to install the whole SDK in one step? I'm not sure what would be the point of tagging a version of number for the whole SDK if it doesn't mean the generated tarball doesn't create something that works.
Package on PyPI are not outdated, I'm surprised you got issues? About the issues you found, could send me a more detail email at
@microsoft.com?
Sure. Will do that once I have finished writing this message ;).
Let's be pragmatic on what you want (before talking about how to do it): do you want to release one package like Debian like python-azure 2.0.0rc6? Or separate packages for each components?
I want to release separate components. But I also want that these components work with each other, at least that's what users are going to expect. If they use the package manager to install azure, they expect to get the SDK installed ready to be used without having to replace individual components.
For me as the packager, it doesn't really matter whether the whole SDK is released in one tarball or as individual packages. I am writing some simple scripts that will help me deal with the upstream format to generate the RPM packages. What matters is that I know which versions I have to use to be able to assemble something that is going to work in the end on the users side.
For example, if you have released any of the packages in a version which breaks compatibility with most of the other packages, I will naturally not use the latest version of that particular package. I will use the version which is still compatible with the rest and only update once all the other packages have made the transition upstream.
As @rjschwei was saying, the CLI is using each component package independently, so we might have an issue with that. Let's say we can sync azure-cli and azure-sdk bundle package, do you want to:
Release python-azure x.y.x with a lot of packages
Yes, that's what I want. But again, creating a single package out of individual packages or vice versa is not the actual problem. The problem I have is that I don't know which versions are compatible with each other to form a complete, working SDK.
Once I get what you want the user experience to be, we will figure out the "how".
So, here's what I suggest:
If I understand correctly, all the various packages are developed separately. So, these packages should naturally end in separate git repositories. Then use git submodules to link the packages in the main git repository of the Azure SDK. git submodules allows to link specific git commit versions of another repository. Thus, you are able to assemble the SDK from specific versions that are known to work together and you always have something releasable.
If users want to use individual packages, they'll download the tagged tarball from the corresponding package's repository. If they want the whole SDK, they just download the latest tagged version as a tarball.
@lmazuel
Ideally we'd have 1 upstream tarball for each, the SDK and the CLI such that we can create
python-azure-sdk-x.y.z and azure-cli.a.b.c packages with azure-cli.a.b.c depending on python-azure-sdk-x.y.z
That's how the other guys do it ;) aws-cli has only a few dependencies with python-botocore being the equivalent to azure-sdk as the primary dependency.
Anyway, I understand, as does probably everyone else interested in this topic, that there are tradeoffs either way and going with a development model of individual components is just as valid a choice as going with a development model that keeps everything together. However, with the chosen model of many components people down stream (packagers or direct users) still need to have some moment in time every now and then where all the pieces fit together. Based on the finding of @glaubitz this point in time is incredibly difficult to determine.
So somehow a mechanism should exist that allows us to pull what would be considered a consistent SDK. If the answer to that is "whatever is on pypi" then that's OK, and maybe we just have to clean up a few things that @glaubitz ran across on pypi and then we are good to go.
@glaubitz @rjschwei
About SDK consistency:
azure-servicebus, azure-servicemanagement-legacy and azure-storage), they are independant and consistent from version 0.20.0Also, the source code truth is the sdist on PyPI. It's easy to get with XMLRPC, example for azure-keyvault 0.3.3:
import xmlrpc.client
client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi")
[pkg['url'] for pkg in client.release_urls('azure-keyvault', '0.3.3') if pkg['python_version']=='source'][0]
gives
https://pypi.python.org/packages/82/8b/9761cf4a00d9a9bdaf58507f21fce6ea5ea13236165afc0a0c19a74ac497/azure-keyvault-0.3.3.zip
I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0
Thoughts?
FYI @johanste
On Mon, May 15, 2017 at 10:44:03AM -0700, Laurent Mazuel wrote:
About SDK consistency:
- For packages who depends on msrestazure, they must be have ">= 0.4". This is the only condition, meaning you can install azure-mgmt-resource 0.30.0rc6 and azure-mgmt-compute 1.0.0rc2 together with no issue. It's consistent in terms of installation, it's just weird in terms of features.
- For packages that not depends on msrestazure (I think there is three only,
azure-servicebus,azure-servicemanagement-legacyandazure-storage), they are independant and consistent from version 0.20.0
Thanks. This answers my question.
Also, the source code truth is the sdist on PyPI. It's easy to get with XMLRPC, example for azure-keyvault 0.3.3:
import xmlrpc.client client = xmlrpc.client.ServerProxy("https://pypi.python.org/pypi") [pkg['url'] for pkg in client.release_urls('azure-keyvault', '0.3.3') if pkg['python_version']=='source'][0]gives
https://pypi.python.org/packages/82/8b/9761cf4a00d9a9bdaf58507f21fce6ea5ea13236165afc0a0c19a74ac497/azure-keyvault-0.3.3.zip
Aha, I wasn't aware of that. Thanks for the heads-up!
I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0
Thoughts?
We wanted to have separate packages in SUSE anyway, so that isn't
important. I really just wanted to know whether the version
dependencies are critical.
Thanks,
Adrian
On 05/15/2017 01:44 PM, Laurent Mazuel wrote:
@glaubitz @rjschwei
I'll discuss it with the CLI team today, I'll see if we can sync our release (for instance each 6 months). I want to release a 2.0.0, and I will try to use the exact same package than CLI 2.0.6. This way you can package azure-python-sdk 2.0.0 as a whole, and package azur-python-cli 2.0.6 as a whole as well, depending of azure-python-sdk 2.0.0Thoughts?
That would be great but would require significant changes in the setup
of the CLI, i.e. within the components of the CLI the dependencies in
setup.py could no longer refer to the individual components of the SDK.
That's a bunch of work that will probably not fit the development model.
@rjschwei I'm not sure I get your issue? When you install a distrib package like python-azure-sdk, I think you install the necessary "dist-info" folders, so pip is not able to make the difference between a yum installation and an pip installation correct? So here, if your python-azure-cli depends on python-azure-sdk, and I took care to make them in sync, this should on the contrary makes your life easier?
@irl what do you think about that? Because if trying to sync SDK and CLI bundles makes no sense, I have no reason to do it.
@lmazuel , sorry for falling off the face of the planet for a bit and creating a large time gap in the discussion.
You are correct that the installed rpm package will also leave behind the Python information to satisfy installing the CLI bits. Thus if SDK and CLI releases can be synced such the cli-a.b.c depends on sdk-x.y.z then we could go to a one package model and we'd basically have 1 dependency in the CLI package.
However, my concern with this approach would be that, to the best of my knowledge, no tools exist today to ensure this consistency. Of course such tools can be created, but in a sense these tools would counteract the development separation that at this time has been instituted in the SDK and CLI projects.
So if you'd go through the effort to sync everything, which would really be nice for packagers, I think the development model would have to change to a certain degree.
Getting everything in sync would basically mean to collect all the components and verify their dependencies are consistent within each, the SDK and the CLI and are consistent across the boundary. Creating a tool that ensures such consistency should be reasonably straight forward, but it still has to be created and maintained.
However, during the "development phase" this consistency is not necessarily given, meaning CLI component A may depend on version X of SDK component H and CLI component B may depend on version Y of SDK component H. Which is fine as long as at the end of the development cycle both CLI components A and B depend on the same version of SDK component H. This drift makes testing very difficult. Also when there is a security issue because continuous testing is difficult it will not be a good idea to release the security fix from the development branch. The security fix will have to be inserted in two places, the current consistent (synced) code and the development code with a point release off the previous consistent set. This of course can all be managed, but the point is that developers on two teams will have to work more closely together than it appears was intended when the current development model was chosen.
If we look at the same problem using the many-packages approach we can still get into a similar situation, if SDK component H gets a security fix and the version gets advanced. Now the CLI may be broken. However, because the dependencies are not conglomerated we know exactly which CLI packages need potential updates to accommodate the version bump of SDK component H.
To make a long story short, a sync will make initial packaging easier, but individual packages will make dealing with version bumps due to security issues easier.
One thing that would help tremendously would be if you and the CLI team can commit to semantic versioning http://semver.org/ at all levels and change the dependencies in all setup.py files accordingly, i.e.every dependency should be >= one major version and <= the next major version. There should not be any exact version matches enforced. If we can get to that point managing the plethora of packages will be reasonably straight forward.
Closing, in favor of https://github.com/Azure/azure-sdk-for-python/issues/1295