Vector: Initial AWS ECS support

Created on 30 Aug 2019 · 17Comments · Source: timberio/vector

We'd like to officially support Amazon's ECS platform.

We should start with a specification to determine the best path forward.
Its possible this might just wrap existing sources.
Its possible this might just be an installation page in our documentation.

Prior Art

https://github.com/aws/containers-roadmap/tree/master/preview-programs/firelens#3-decorate-logs-with-ecs-metadata

docs feature

Source

timberio-campfire

🎉1

Most helpful comment

I think it makes sense to build this out specifically for the splunk hec api first. Once we have that working, we can better evaluate if/when to make it a layer on top of a more generic http source.

My feeling is that it won't be that much extra work, but I still think we should do the specific implementation first.

lukesteensen on 30 Oct 2019

👍3

All 17 comments

Link to feature: https://timber.productboard.com/feature-board/planning/features/2784928

timberio-campfire on 30 Aug 2019

Specification v1 (outdated)

AWS ECS support, that we can do on our side, must be split into two use cases:

EC2
Fargate

EC2

EC2 use case can be supported using current docker source, and a Vector running as a daemon service with mounted access to Docker unix socket.

This can be an installation page containing:

basics
a template AWS service definition
an examplevector.toml
suggestions on how to supply vector.toml configuration

This is equivalent to kubernetes source/Vector distributed topology.

As extensions, Centralized topology could be supported by having additional template AWS service definitions, and Vector configurations which will probably use custom DNS resolution feature currently in the work.

Fargate

There is no way to support this use case by ourselves, further than it is.

Generally there are four ways of collecting logs from a container:

sidecar
log files
communication with container runtime
as a part of container runtime

sidecar

There are various ways to do this from diverting stdout and stderr to vector's stdin source, to having application log directly to one of vector's sources. But, all of them require user's intervention.

For users, this the only option if they want to use Vector in Fargate.

The best that we can do is mentioning that this is the way to do it in Fargate, and having an example or two.

log files

Fargate has strict security and isolation policy which is exhibited in unableness to mount host filesystem which is necessary to gain access to log files maintained by Docker. Compounded on that, awslogs, splunk, and awsfirelens, are the only log drivers supported. And they don't create log files.

There is a small chance that Docker feature dual logging is left active for some reason, but even then it is likely using local Docker log driver which has unspecified log format.

communication with container runtime

Fargate has strict security and isolation policy which is exhibited in unableness to mount host filesystem which is necessary to gain access to Docker exposed via unix socket. And even if Docker is available on some port, because of the policy it won't, or at least shouldn't, be accessible.

as a part of container runtime

This is by definition out of our hands, and in the hands of AWS. AWS team would need to support vector in the form of Docker log driver. Firelens from prior art is a good example of that.

In all of this, best case scenario would be AWS ECS team officially supporting vector as a log driver, on which @binarylogic has already started to work.

ktff on 25 Sep 2019

👍1

@ktff for fargate, I believe we have access to the docker logging plugins. Can we either just use the fluentd or syslog plugin to send logs over the network to a central vector instance?

LucioFranco on 30 Sep 2019

👍1

@LucioFranco Fargate only supports awslogs, splunk, and of recently awsfirelens. awsfirelens is backed by fluentd/fluentbit, so ,yes , what you are suggesting is possible.

That would be a centralized topology.

It's doable over fluentbit:tcp -> vector:tcp + json_parser . This would require one ECS task definition and vector.toml hosted in Amazon S3 bucket.

ktff on 1 Oct 2019

@ktff to me this seems like the quickest and easiest way to get ECS support. So I say we go with that method before anything more complicated.

LucioFranco on 1 Oct 2019

@LucioFranco I agree, that is definitely a better approach.

I'll add v2 specification.

ktff on 1 Oct 2019

Specification v2 (outdated)

Thanks to @LucioFranco, a way to have unified approach for whole ECS has been found.

Behavior

Vector will collect logs from ECS containers through fluentbit. Vector as a central service and fluentbit as an agent.

Setup

To setup this we would provide:

template vector.toml, in which users can add what they please.
template ECS task definition for Vector service, which users will have to modify to supply vector.toml. For example, using custom Vector image with added vector.toml seems like the best option.
template ECS task definition for logged tasks, which contains fluentbit configuration. When adding container definitions to it, users should use awsfirelens log driver with supplied options which configure fluentbit to forward logs to Vector.
service discovery configuration, used to register Vector as a service. Also to support service discovery, user will have to additionally create Service Discovery Resources, it they are not already present.

This could all be a documentation page.

Implementation

The whole implementation is in the configurations.

Containers are configured to have the logs picked up by fluentbit running in every task.
Fluent bit is configured to send logs over tcp in json format to Vector.
Networking between Vector and fluentbit instances will be supported by ECS service discovery.
Vector will be configured with tcp source which will be connected to json_parser transform to unpack the logs. Additional transformations may be required.

ktff on 1 Oct 2019

I would propose removing from specification how/where Vector is ran. By doing that, a broader set of use cases can be supported, and a lot of configuration/confusion avoided for most of them.

In the specification, use cases are limited to that of Vector running on ECS. This limitation is not necessary to support collecting logs from ECS. Vector can be ran anywhere so long as fluentbit can get it's IP address and access it.

With that limitation removed, a major amount of configurations and steps needed to be done by the user can be removed. Then, documentation can be later extended for specific locations where Vector is ran. Be it ECS, EKS, private server, etc.

The only real downside of this is in delegating networking configuration to users, for now.

@LucioFranco what do you think?

ktff on 21 Oct 2019

@ktff do you think it would be possible to implement the fluentbit forward api then use this [logging driver] which should be available in fargate? This would avoid the need to even run fluentd or fluentbit but replace it with vector?

LucioFranco on 22 Oct 2019

@LucioFranco I am not sure about implementing forward api, but I do know that fargate doesn't support fluentd logging driver. You can see that in their documentation in section logDriver, but I have also checked this for real at Amazons AWS ECS service.

Either way, I think the current specification, with the latest comment, is

self contained and has clear borders which will enable us to replace it in the future.
extensions builded on top of this will be compatible with replacements, or at least easily modifiable to be.
has no code implementation, just a documentation page with few template configurations.
will enable us to move forward and gain experience.
will do it's job of making users of vector have better experience with ECS.
and maybe, just maybe, if with this and further extensions vector becomes popular enough on ECS to garner attention of it's staff to support or in some other way enable us to better integrate vector with their product, we will be able to remove fluentbit from the pipeline.

ktff on 23 Oct 2019

Third time's a charm.

Specification v3

Whole ECS could be supported with a splunk source, and a documentation page.

Behavior

Vector would collect logs from containers that it shares the task with and that have splunk log driver configured to send logs to it.

Implementation

To facilitate this, a documentation page with two template files should be added:

template ECS task definition
template vector.toml

with addition of mentioning how to supply vector.toml to the program.

Aditionaly, documentation could have description on how to use template ECS task definition in AWS gui environment.

Requirements

To support this, a new splunk source should be added. (#1088 )

ktff on 23 Oct 2019

So a few questions I have, do we want to name this source ECS source but have it use the splunk driver, or do we want to call this a splunk source that can accept any splunk http request?

LucioFranco on 23 Oct 2019

@LucioFranco A splunk source independent of ECS, so yes, any splunk http request.

I will open an issue for this new splunk source, while the implementation of this issue will use splunk source once it's added.

ktff on 23 Oct 2019

@lukesteensen do you mind weighing on the idea of an http source vs extracting that out later? Thanks.

binarylogic on 30 Oct 2019

I think it makes sense to build this out specifically for the splunk hec api first. Once we have that working, we can better evaluate if/when to make it a layer on top of a more generic http source.

My feeling is that it won't be that much extra work, but I still think we should do the specific implementation first.

lukesteensen on 30 Oct 2019

👍3

Now that #1088 is done, a guide would complete this.
@binarylogic what do you think?

Here are important information on how to use splunk_hec source to collect logs from any ECS.

target's task definition
- configured to use "logDriver": "splunk"
- set option "splunk-url". Since splunk_hec source currently doesn't support https, use http.
- set option "splunk-token". Corresponds to token field in splunk_hec source configuration
- In file, it would look like:
  
  "logConfiguration": { "logDriver": "splunk", "options": { "splunk-url": "http://0.0.0.0:8088", "splunk-token": "...", } }
- In sidecar setup it's advised to use 0.0.0.0 and not localhost as the containers won't be able to connect with it.
- Also in sidecar setup, use Container Ordering to start target after vector has started, so as to not miss first logs.
- Most of this can be found in logConfiguration section of ECS documentation

vector's task definition
- token field in vector configuration must correspond to value of "splunk-token".
- ensure it's port is accessible
networking
- vector as sidecar configuration, with awsvpc networking, with above mentioned, it's done.
- in other configuration, it should be ensure that target container can communicate with vector.

Example of skeleton task definition for sidecar configuration containing only above mentioned:

{
  "containerDefinitions": [
    // Vector   
    {     
      "portMappings": [
        {
          "hostPort": 8088,
          "protocol": "tcp",
          "containerPort": 8088
        }
      ],
      "name": "vector"
    },
    // Target
    {
      "logConfiguration": {
        "logDriver": "splunk",
        "options": {
          "splunk-url": "http://0.0.0.0:8088",
          "splunk-token": "test_token"
        }
      },
      "dependsOn": [
        {
          "containerName": "vector",
          "condition": "START"
        }
      ]
    },

  "networkMode": "awsvpc",
}

Additional notes:

ECS used over browser has gui builder for constructing task definitions.
In essence, logConfiguration is the only extra configuration on ECS, else is networking.

ktff on 12 Dec 2019

👍2

Continues in #1449

ktff on 18 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Version mismatch in binary output

jamtur01 · 3Comments

Allow `--config` to be optional and let vector search for the config

LucioFranco · 3Comments

New `gcp_cloud_storage` sink

trK54Ylmz · 3Comments

Re-enable kubernetes tests in CI

LucioFranco · 3Comments

Define how metrics are converted to logs

binarylogic · 3Comments