Kubernetes is becoming popular for managing workloads that consume accelerators like Tensorflow for example. The agility that Kubernetes offers makes it easy to consume accelerators across a fleet of machines.
Kubernetes can provide an end to end workflow by separating provisioning and configuration of accelerators from consumption.
@kubernetes/docs
on docs PR@kubernetes/feature-reviewers
on this issue to get approval before checking this off@kubernetes/docs
on docs PR@kubernetes/feature-reviewers
on this issue to get approval before checking this off@kubernetes/api
@kubernetes/feature-reviewers
on this issue to get approval before checking this off@kubernetes/docs
@kubernetes/feature-reviewers
on this issue to get approval before checking this offFEATURE_STATUS is used for feature tracking and to be updated by @kubernetes/feature-reviewers
.
FEATURE_STATUS: IN_DEVELOPMENT
cc @kubernetes/sig-node-feature-requests @kubernetes/sig-scheduling-feature-requests
cc @aronchick for priority
s/accelerators/device assignment please? /cc @derekwaynecarr
regarding accelerators
, does it mean some kind of device, e.g. GPU (but not limit to GPU)?
/subscribe
@k82cn yes. Actually per sig meeting yesterday, any PCI device (most tend to be accelerators but I'd personally prefer more generic wording). Note that Intel has "accelerators" inside their CPUs (called CPU extensions). All of these things should become candidates for scheduler match making.
@jeremyeder
My understanding is that,
1
does not depend on 2
and 2
can be solved independent of 1
. 1
2
if it made available in parallel.Is the scope limited to accelerators or some co-processors like TPM etc?
My understanding is that,
- There needs to be a way to discover, represent and consume Accelerators as a resource in Kubernetes
If the hardware discovery is a functionality that we are targeting, shouldn't scope be broadened to all types of devices(including accelerators)?
This issue is not meant to support arbitrary third party devices which I
believe warrants an issue by itself. Node Feature Discovery attempts to
solve the device discovery problem to an extent.
On Wed, Mar 1, 2017 at 2:26 PM, ravig notifications@github.com wrote:
Is the scope limited to accelerators or some co-processors like TPM etc?
My understanding is that,
- There needs to be a way to discover, represent and consume
Accelerators as a resource in KubernetesIf the hardware discovery is a functionality that we are targeting,
shouldn't scope be broadened to all types of devices(including
accelerators)?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/features/issues/192#issuecomment-283491270,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKI5igGmT1xdSyaC9BAPC3f9y0RZAks5rhfB6gaJpZM4MO8fm
.
Can we use the term "hardware accelerators"? I was really confused by this issue at first.
Good proposal! I think topology support for deivce is a must. For example, nvidia GPUs on different PCI bridge can not talk p2p.
ping @calebamiles to review
One of the critical pieces of this problem is Hardware device plugins landed in v1.8 https://github.com/kubernetes/features/issues/368.
This feature is broad and requires more work around identifying and defining the matrix of devices, device plugins and workload compatibility. This aspect is expected to be handled outside of core kubernetes, but the specifics are not yet defined. For that reason, I'm leaving this issue open, and moving it to v1.9.
@vishh is it still alpha for 1.9?
Also, can you update the feature template to follow the new format? https://github.com/kubernetes/features/blob/master/ISSUE_TEMPLATE.md
It is still alpha for 1.9.
@vishh :wave: Please indicate in the 1.9 feature tracking board
whether this feature needs documentation. If yes, please open a PR and add a link to the tracking spreadsheet. Thanks in advance!
@vishh Bump for docs ☝️
/cc @idvoretskyi
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
@vishh
Any plans for this in 1.11?
If so, can you please ensure the feature is up-to-date with the appropriate:
stage/{alpha,beta,stable}
sig/*
kind/feature
cc @idvoretskyi
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Most helpful comment
Can we use the term "hardware accelerators"? I was really confused by this issue at first.