Envoy: NT support

Created on 7 Oct 2016  路  62Comments  路  Source: envoyproxy/envoy

Get Envoy working on NT. This will require OS shim layer most/all POSIX operations, all Linux specific operations, and especially a new hot restart implementation.

enhancement no stalebot

Most helpful comment

For everyone watching this issue, FYI that Microsoft has committed resources to seeing this port complete. They are going to reach out to the Pivotal folks to help out. cc @mhoran @achasveachas @wrowe. I'm also going to meet with Microsoft in a couple of weeks in-person about this as well.

All 62 comments

Are there any plans for adding Windows support? If yes, any timeline..

@kavyako no one is working on this. If we can find resources would love to do it. Happy to chat offline if MSFT is interested.

@mattklein123 yeah we'd be interested to see what's possible. Send me your contact info and I'll set up some time to chat. My email is [email protected]. Cheers!

@vturecek sweet. Will send email. FYI all Lyft people working on Envoy are based in Seattle so we can meet in person if needed.

Hey @vturecek, @mattklein123. Has anyone started work on Windows support? We're looking to use Envoy in Cloud Foundry to resolve the issue of Route Integrity, and my team in particular is concerned with how we'll do this on Windows. The Linux proposal involves Envoy, and we'd love to use the same solution on Windows, if possible.

We'd be interesting in collaborating in whatever way makes sense. If email communication is preferred, I can be reached at [email protected].

@mhoran I will start a mail thread with MSFT folks to determine status.

Thanks @mattklein123. Just in case there's more interest from others, yes Windows work is underway on our end. @mhoran - I'll follow up with more details over email.

@vturecek Any status from a MS point of view. We are running a mixed cluster (linux pool - linux nodes and windows pool - windows nodes) and would like holistic support for this environment.

@mhoran Just discovered this fork and thinks it might work
https://github.com/microsoft/envoy/tree/sridhar/win32_port

but it appears there is still work necessary to get it working with the required build system...
https://github.com/Microsoft/envoy/tree/sridhar/win32_port/win32_build

@sridmad : for more details about the windows port.

@campbelldgunn sorry for the late response - Windows support work is underway in our fork here: https://github.com/microsoft/envoy/

@sridmad has a branch with what I believe is a functional build at this point but I'll defer to him for the status on that: https://github.com/microsoft/envoy/tree/sridhar/win32_port

Great to see work being done towards adding windows support to envoy. I am curious to understand how traffic redirection is handled on windows platform... basically windows equivalent for linux iptables.

@vinayatgit we've come up with a proof of concept that leverages Windows proxy settings to force application egress through an Envoy proxy. Our investigation was successful, though we encountered issues with the documented APIs for configuring proxy settings in Windows Server 1803. More details here: https://www.pivotaltracker.com/story/show/158328675.

Of course, this only works for HTTP (and potentially HTTPS) traffic. There's still a question around how to handle TCP traffic, but that's something we can certainly add later.

We have been able to successfully compile envoy on Windows using Bazel. A branch with code changes + build system changes can be found here: https://github.com/greenhouse-org/envoy/tree/windows-linux-build

The necessary C++ changes were mostly pulled over from the work @sridmad has done in the Microsoft fork.

We discussed with @mattklein123 last week what the strategy should be to start getting these changes back upstream. The first step we will make is to PR just the build changes (i.e *.bzl, BUILD files etc.) and go from there to get the correct C++ changes in

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@mattklein123 should we keep this issue open to track porting efforts? If so, is there an appropriate tag?

Maybe simply commenting is sufficient...

@mhoran yeah if you could update w/ current status that would be cool.

Preliminary Bazel build support for Windows has been merged into Envoy master. We're currently working to get lyft/protoc-gen-validate working on Windows, which is a blocker to Envoy support for Windows.

We have been maintaining a branch based off the initial work from the team at Microsoft which builds a working executable via Bazel. However, the produced binary isn't too useful because we must disable all Envoy extensions at the moment due to a Bazel bug.

We'll likely need to get this bug resolved before PR-ing our branch. Also, there's likely a good deal of cleanup that will need to be done to the branch before it's ready to be merged.

We are now hitting the command line length limit when compiling even with all extensions turned off. So it's now become a hard blocker for us to keep our fork up to date (let alone actually making the PR)

We've started porting the unit tests to get them running on Windows on our branch. It's set up in our CI, which should be visible here: https://garden-windows.ci.cf-app.com/teams/main/pipelines/envoy/jobs/envoy-windows/builds/45

While porting the tests, we've noticed 2 real issues, both with libevent. First, libevent does not support edge triggering (EV_ET) on Windows -- it just falls back to level triggering. This results in the onWriteReady() callback constantly firing. This is probably not great from a performance standpoint, but does not appear to do anything actively wrong.

The second is that libevent does not support EV_CLOSED on Windows, and so a connection will never get an early close notification.

Our current blockers are: https://github.com/lyft/protoc-gen-validate/issues/89 (blocking us from getting PGV Windows support into Envoy) and https://github.com/bazelbuild/bazel/issues/5163 (blocking us from building Envoy on Windows)

To work around the second issue, we are currently using a fork of Bazel with a hacky change that shortens the include directories passed to cl.exe

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@mattklein123 At this point, it looks like all the PGV related blockers have been resolved. There is one outstanding PR (https://github.com/envoyproxy/envoy/pull/4556) to bump rules_go and google\protobuf to versions that work on Windows. Once that is resolved, the only differences between our fork and the master branch will be source code changes.

Currently, the list of tests that we have passing can be seen here: https://github.com/greenhouse-org/envoy/blob/6680fd6dfb5fa9dbf0fdc6b2bd14f4a9e018de65/test/BUILD#L42-L74 One question we have is: of the tests that we have not yet gotten to pass, which are the most important? This will help us prioritize which test suites to work on, as well as drive out any crucial behavior that may not work on Windows.

Even though the bazel issue with long Windows command lines is still outstanding, we have a self-built bazel.exe that can work around this issue. As such, we would like to begin the process of getting code changes upstream. Since the diff is pretty massive (https://github.com/greenhouse-org/envoy/commit/6680fd6dfb5fa9dbf0fdc6b2bd14f4a9e018de65), we'd probably want to align on how to break this up / the strategy for isolating the differences between platforms. What would be the best way to get this process started?

One question we have is: of the tests that we have not yet gotten to pass, which are the most important?

I would try to get as many of the core integration tests passing as possible. Those are the most important to verify the Windows build likely works.

What would be the best way to get this process started?

Without digging into the diff it's hard to say. The main thing I would like to avoid is a bunch of #ifdef statements around the code. What type of changes are we mainly talking about? Do we need more abstractions in place or can it generally be solved with bazel size conditional compilations? In general if we can break up the changes into PRs of < 500 lines each that focus on different components that would be optimal.

We agree that centralizing / removing #ifdef as much as possible should be a goal. I'll try to group them as much as possible below:

The first and most widespread change is that the data type for a socket on Windows is an unsigned integer as opposed to an integer on Linux. Our solution to this has been to add a header with

#if !defined(WIN32)
typedef int SOCKET_FD
#else
typedef SOCKET SOCKET_FD
#endif

Then, all the code that handles socket file descriptors is updated to use this type. Furthermore, we added a macro to check if the socket is valid, since comparison to -1 isn't correct on Windows.

The second chunk of changes involves routing all syscalls through the os_sys_calls. This lets us handle variations in how they are called and return the correct error value (WSAGetLastError() vs errno). One thing we have not yet figured out is how callers should check this value (e.g. WSAWOULDBLOCK vs. EAGAIN). Right now, it's just handled with #ifdef

The third set involves changing code that does not compile under MSVC:

  1. Removing the ?: operator
  2. Changing stack allocated variable length arrays to use _alloca (done via macro)
  3. Don't initialize structs with designated intializers (e.g. a = {.b =1, .c=2})

Fourth, anything involving unix domain sockets is compiled out using #ifdef, though we recognize this might not be the best way to do this.

There are also a bunch of changes to the test code. The majority are caused by:

  1. bazel not symlinking in runfiles on Windows
  2. differences in helper classes due to how the C stdlib works on the two platforms

re: FD handle types see related work in https://github.com/envoyproxy/envoy/pull/4579 which we will need to reconcile. It might be worth it to wait until that lands and/or please chime in as you see that work evolving so that hopefully we can handle the NT case also. cc @sbelair2 @jmarantz

In general of the above changes seem reasonable to me. Can we start carving them out into small PRs? The MSVC compile errors seem the easiest to fix. Obviously, until we get NT CI setup things will keep creeping in but we might as well get started.

Sure, sounds good to us. We'll definitely wait on #4579 before PR'ing any of the fd / SOCKET changes.

is this effort still kicking? would love to see something available on windows

It is. There's still a good deal of work to be done, but we've been submitting PRs to get Windows support merged upstream. I don't have a timeline for you, and even when everything is merged, it may be some time before we have acceptable performance on Windows. However, rest assured this work is ongoing!

great to know ! definitely a non trivial task

Hi guys, do we have any updates on envoy on Windows? There seems to be a lot of pull requests but no clear status nor any documentation related to Windows.

I am looking on info to run a Kubernetes mixed-os mode with Istio and envoy as service mesh...

thanks in advance for any update!

Hey @pluqueTheLuxe! Progress is moving forward slowly but surely. #6072 is the latest PR.

We do have a fully working Envoy binary for Windows, but it is far from production ready. At a high level, the remaining work is documented here. The related work will show in the rightmost pane.

We don't have a timeline for finishing the work. We submit PRs as the work is ready to be merged upstream, and generally submit PRs on a serial basis to ease in merging.

There is also an external dependency on Bazel that needs to be resolved before the Envoy CI can run the build. See bazelbuild/bazel#5163.

It will likely be some time before Envoy can be used along side Istio on Windows. At this time Istio does not have Windows support either. However, this is also a use case that we are interested in, and will contribute engineering effort as Envoy reaches production readiness.

Status Update: We are still working on this slowly but surely.

Work on our end slowed down for a few weeks while Envoy kept moving ahead, but now we're back at it. Once we get our changes up-to-date with master the PRs should start coming in again.

The next PR will probably be to provide a Windows implementation of the IoHandle class.

We're also looking into testing our work with the latest version of bazel which, according to the thread on the issue should now work on Windows.

Status Update: We are still working on this slowly but surely.

Work on our end slowed down for a few weeks while Envoy kept moving ahead, but now we're back at it. Once we get our changes up-to-date with master the PRs should start coming in again.

The next PR will probably be to provide a Windows implementation of the IoHandle class.

We're also looking into testing our work with the latest version of bazel which, according to the thread on the issue should now work on Windows.

Thanks for the update feedback. It's very appreciated

Very nice. Getting something up in CI would be great, even if it's not ready for production, because I keep reviewing code that I feel needs to at least be tested on Windows, but it's hard to push back on the PR without being able to point to a CI failure.

[...] Getting something up in CI would be great, even if it's not ready for production, because I keep reviewing code that I feel needs to at least be tested on Windows, but it's hard to push back on the PR without being able to point to a CI failure.

Agreed. We do have CI here, however it's currently paused since the branch we were building was so out of date. Once the branch is back up to date we'll re-enable the CI and can watch that for failures.

It should be easier for others to run the build on Windows and integrate into CI if the Bazel issue is indeed fixed. @mattklein123 and I had discussed some time ago about the best course of action for CI visibility. Happy to reopen that discussion once builds are back up and running. We could at least integrate the Windows build status into the PR checks, even if merging is not blocked on failure.

Now that we are using Azure pipelines getting an NT CI job running should be trivial. Please sync up with @lizan

is it still work in progress? Would like to see this working with windows containers. If there is a preview build available please let me know

Hey @sumantfordev, yes, the work is still ongoing. @wrowe and @achasveachas have been working diligently on the port. At this point we don't have a binary that is ready for use. There are some major performance issues that will need to be addressed before it would be ready even for testing -- and much of that work will require help from someone familiar with Windows-specific socket programming. That's something that we're working with Microsoft to get some help with.

I hope that we'll have the next batch of PRs ready soon so that we can continue to move forward with merging Windows support. @mattklein123 is also working to get engagement from Microsoft to help with the performance issues. We'll keep you posted on our progress!

FYI I have reached out to MSFT also to see if we can get some help with this. cc @brendandburns

Thanks @mattklein123. I spoke with the Microsoft team back at MS Build in May and at the time there were no Microsoft developers allocated to the effort. We had discussed next steps but that is stalled until we have someone to work with on that side. Meanwhile, we're still moving forward with the port. The goal was to get Microsoft folks to help with the heavy lifting in the sockets implementation. cc @vturecek @sridmad

Do you have any even draft specs how envoy could be built for windows ?

Trying at the moment with "choco install bazel" +
"C:\tools\msys64\msys2_shell.cmd" > "pacman -S base-devel --needed"

Last time failed without base-devel, but sense there will be more problems after this.

Is it also possible to generate Visual Studio projects / solutions for envoy ?

Would be interested to check if it's possible to strip off HTTP (as a server) to GPRC transcoder code without rest of libraries.

command like this:

bazel build --cxxopt=/wd4819 --cxxopt=/W2 --cxxopt=/wd4624 --cxxopt=/wd4244 --cxxopt=/wd4200 --cxxopt=/wd4267 --cxxopt=/wd4309 --cxxopt=/wd4838 //source/exe:envoy-static

fails with error:

ERROR: Skipping '//source/exe:envoy-static': no such package '@com_google_protobuf//': Traceback (most recent call last): File "C:/users/user/_bazel_user/3jojcmei/external/bazel_tools/tools/build_defs/repo/http.bzl", line 56 patch(ctx) File "C:/users/user/_bazel_user/3jojcmei/external/bazel_tools/tools/build_defs/repo/utils.bzl", line 91, in patch fail(("Error applying patch %s:\n%s%s...))) Error applying patch @envoy//bazel:protobuf.patch: /bin/bash: C:/users/user/_bazel_user/3jojcmei/external/envoy/bazel/protobuf.patch: No such file or directory WARNING: Target pattern parsing failed. ERROR: no such package '@com_google_protobuf//': Traceback (most recent call last): File "C:/users/user/_bazel_user/3jojcmei/external/bazel_tools/tools/build_defs/repo/http.bzl", line 56 patch(ctx) File "C:/users/user/_bazel_user/3jojcmei/external/bazel_tools/tools/build_defs/repo/utils.bzl", line 91, in patch fail(("Error applying patch %s:\n%s%s...))) Error applying patch @envoy//bazel:protobuf.patch: /bin/bash: C:/users/user/_bazel_user/3jojcmei/external/envoy/bazel/protobuf.patch: No such file or directory INFO: Elapsed time: 767.533s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded) currently loading: source/exe Fetching @com_envoyproxy_protoc_gen_validate; fetching 434s

@tapika our latest work is in this branch, but it's pretty hacky and we are nowhere near a working build.

@tapika your error is consistent with using a windows cr/lf checkout, we are checking out envoy
in unix file mode ('\n' line endings) and it doesn't have an issue applying patches. You need to be
consistent about the fixed line ending tags checked out from git vs the flavor of the patch utility in use.

I just wanted to document my recent effort to produce a Windows build unsuccessfully. I tried two approaches:

1) Building with MSVC (compiler) and MSys (autoconf) fails. The code contains references to Unix specific socket headers, which are not available on Windows.
2) Building with Cygwin fails, as Abseil prevents builds on Cygwin deliberately.

During my trial and error phase, I identified two shortcomings within Bazel concerning Cygwin. I addressed both with pull requests.

1) https://github.com/core-process/rules_foreign_cc/commit/1f4fadbd866106a4b6ed9403205f7ea345f813e9 (already merged by maintainer, see https://github.com/bazelbuild/rules_foreign_cc/pull/323)
2) https://github.com/core-process/rules_cc/commit/5a1f5f7d5e051082fa1f5f7d8dbb2af6cff80dd1

The second issue can be addressed via a toolchain configuration alternatively in the envoy project itself. Here is the source code:

package(default_visibility = ['//visibility:public'])

load(":windows_cc_toolchain_config.bzl", "cc_toolchain_config")

filegroup(
    name = "empty",
    srcs = [],
)

filegroup(
    name = "cygwin_compiler_files",
    srcs = [":builtin_include_directory_paths_mingw"]
)

cc_toolchain_suite(
    name = "cygwin",
    toolchains = {
        "x64_windows|cygwin-gcc": ":cc-compiler-x64_windows_cygwin",
        "x64_windows": ":cc-compiler-x64_windows_cygwin",
    },
)

cc_toolchain(
    name = "cc-compiler-x64_windows_cygwin",
    toolchain_identifier = "cygwin_x64",
    toolchain_config = ":cygwin_x64",
    all_files = ":empty",
    ar_files = ":empty",
    as_files = ":cygwin_compiler_files",
    compiler_files = ":cygwin_compiler_files",
    dwp_files = ":empty",
    linker_files = ":empty",
    objcopy_files = ":empty",
    strip_files = ":empty",
    supports_param_files = 1,
)

cc_toolchain_config(
    name = "cygwin_x64",
    cpu = "x64_windows",
    compiler = "cygwin-gcc",
    host_system_name = "local",
    target_system_name = "local",
    target_libc = "cygwin",
    abi_version = "local",
    abi_libc_version = "local",
    cxx_builtin_include_directories = [
        "C:/tools/cygwin/",
    ],
    tool_paths = {
        "ar": "C:/tools/cygwin/bin/ar.exe",
        "compat-ld": "C:/tools/cygwin/bin/compat-ld.exe",
        "cpp": "C:/tools/cygwin/bin/cpp.exe",
        "dwp": "C:/tools/cygwin/bin/dwp.exe",
        "gcc": "C:/tools/cygwin/bin/gcc.exe",
        "gcov": "C:/tools/cygwin/bin/gcov.exe",
        "ld": "C:/tools/cygwin/bin/ld.exe",
        "nm": "C:/tools/cygwin/bin/nm.exe",
        "objcopy": "C:/tools/cygwin/bin/objcopy.exe",
        "objdump": "C:/tools/cygwin/bin/objdump.exe",
        "strip": "C:/tools/cygwin/bin/strip.exe"
    },
    tool_bin_path = "C:/tools/cygwin/bin",
    dbg_mode_debug_flag = "/DEBUG:FULL",
    fastbuild_mode_debug_flag = "/DEBUG:FASTLINK",
)

toolchain(
    name = "cc-toolchain-x64_windows_cygwin",
    exec_compatible_with = [
        "@platforms//cpu:x86_64",
        "@platforms//os:windows",
        "@bazel_tools//tools/cpp:cygwin",
    ],
    target_compatible_with = [
        "@platforms//cpu:x86_64",
        "@platforms//os:windows",
    ],
    toolchain = ":cc-compiler-x64_windows_cygwin",
    toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
)

The referenced file windows_cc_toolchain_config.bzl can be copied from bazelbuild/rules_cc.

Here is my build pipeline regarding Cygwin: https://github.com/core-process/envoy-build/blob/master/.github/workflows/main.yaml#L65-L109

You will find an environment setup for MSVC and MSys commented out in the same file.

For everyone watching this issue, FYI that Microsoft has committed resources to seeing this port complete. They are going to reach out to the Pivotal folks to help out. cc @mhoran @achasveachas @wrowe. I'm also going to meet with Microsoft in a couple of weeks in-person about this as well.

any new info/status from recent microsoft/pivotal meetings?

There will be more news to share in coming weeks and months from different teams of engineers, nothing I can helpfully share. I can update you that my team at Pivotal has resolved PR 8280, proposed PR 8572, and we are preparing the necessary patches in the near-term for filesystem/socket objects on Windows so that the code will compile for anyone interested in participating in development.

@aktxyz We did have two productive meetings with the folks from Microsoft and they committed to helping us push the Windows port through.

We are still hammering out the details, we will probably have more solid information to share at EnvoyCon/KubeCon in 2 weeks.

As part of Windows port are there plans to support the FIPS 140-2 mode? https://www.envoyproxy.io/docs/envoy/v1.10.0/intro/arch_overview/ssl#fips-140-2

Is this still the best place to check for status ?

Yes! There was also an update given at KubeCon. There's also now an event on the CNCF calendar for our bi-weekly sync but it's not showing up publicly. I'll look into what's happening there.

12/20/2019 Update:

Recent PRs merged:

PRs in flight:

We have generally finished up the simple compilation issues in the current codebase. We will still have to fix any new breakages committed to the master branch and are considering additions to CI to prevent this occurrence. Our next steps are around upstreaming our changes to the OS sys calls package, file watcher, and networking package that we have on our compiling fork. We also will continue to attempt to enable more integration tests that rely on extensions.

2/28/2020 Update:

Recent PRs merged:

PRs in flight:

We have upstreamed our cross-platform changes to the OS sys calls package, file watcher, and many changes to the networking package (excluding things that touch Unix sockets/pipes). Most of our remaining changes are Windows specific as well as changes to build scripts/rules to enable Windows to compile. We are planning on submitting a Draft PR with our current work in progress so the community can start to work on top of that to help out with Window support. This Draft PR branch will be periodically revised/rebased on top of master as changes are accepted and revisions are made.

The next tracks of work we will tackle are enabling a static exe to be built in CI (though no tests yet, this involves build script/rule changes and Windows specific code changes) as well as improving error handling in our Windows implementation to better handle the differences and quirks between WSAE* socket errors and Posix socket errors.

If you have forked from github.com/greenhouse-org/envoy as mentioned by Sam above; please note that we have archived the historical efforts to the greenhouse-org/envoy-archive static repo, and started a fresh fork from envoyproxy/envoy at that original repo name, with the vmware-windows-build and vmware-fix-upstream branches in place of pivotal-windows-build and pivotal-fix-upstream. This allows everyone to observe the source deltas to envoyproxy master, rather than the very stale microsoft/envoy fork.

Last week, we also went to the effort to include every extension which compiles on Windows. We next need to modify the test targets to look at the correct extension list, but this resulted in some dozen excluded extensions which do not compile, and some 60 failing tests. A handful of bug fixes are expected to pare that list back of failing tests down to a dozen or so. With much of the code already on master, and several more PR's to follow in the coming weeks, we will be seeking out individuals interested in helping with the heavy lift of hacking and debugging the problematic extensions for this port.

Developers can track activity and raise discussion on the envoyproxy.slack.com #envoy-windows-dev channel. When a generally available preview is ready for non-developer participation, we will then create a corresponding -users channel.

We have an open PR here: https://github.com/envoyproxy/envoy/pull/10293 that once merged will enable CI to build a static version of envoy on each PR

4/24/2020 Update:

Merged PRs:

Open Issues:

Current Work:

  • Fixing failing tests, tagging failing tests with fails_on_windows tag so we can get community help
  • A few parallel ideas/tracks to get Windows tests compiling/running in CI:

    • Enabling remote execution with bazel + Google RBE service

    • Provision AZP CI workers with more resources, the default AZP provided 2 CPU 7 GB RAM is not enough

    • Dockerize our CI scripts to ensure we have a consistent/controlled CI environment to execute in

PR that tags failing tests so we can get more community visibility: https://github.com/envoyproxy/envoy/pull/10940

As @wrowe already mentioned above, I'd like to reiterate that the best way to track the effort of Envoy support on Windows OS and participate is to join the #envoy-windows-dev Slack channel in https://envoyslack.cncf.io/

There are weekly meetings with the working group giving updates on the incremental progress and development activities. Meeting notes + details are captured here.

Also feel free to follow on any specific issues with the area/windows tag and PRs with Windows: in the title

Given that alpha was declared today, I'm going to call this issue closed. Let's track follow ups with area/windows as the team has indicated! Go team!

For those following this activity moving forwards, watch

https://github.com/envoyproxy/envoy/milestone/16 Windows Beta milestone.

Was this page helpful?
0 / 5 - 0 ratings