We'd like to officially support Amazon's ECS platform.
We should start with a specification to determine the best path forward.
Its possible this might just wrap existing sources.
Its possible this might just be an installation page in our documentation.
AWS ECS support, that we can do on our side, must be split into two use cases:
EC2 use case can be supported using current docker source, and a Vector running as a daemon service with mounted access to Docker unix socket.
This can be an installation page containing:
vector.tomlvector.toml configurationThis is equivalent to kubernetes source/Vector distributed topology.
As extensions, Centralized topology could be supported by having additional template AWS service definitions, and Vector configurations which will probably use custom DNS resolution feature currently in the work.
There is no way to support this use case by ourselves, further than it is.
Generally there are four ways of collecting logs from a container:
There are various ways to do this from diverting stdout and stderr to vector's stdin source, to having application log directly to one of vector's sources. But, all of them require user's intervention.
For users, this the only option if they want to use Vector in Fargate.
The best that we can do is mentioning that this is the way to do it in Fargate, and having an example or two.
Fargate has strict security and isolation policy which is exhibited in unableness to mount host filesystem which is necessary to gain access to log files maintained by Docker. Compounded on that, awslogs, splunk, and awsfirelens, are the only log drivers supported. And they don't create log files.
There is a small chance that Docker feature dual logging is left active for some reason, but even then it is likely using local Docker log driver which has unspecified log format.
Fargate has strict security and isolation policy which is exhibited in unableness to mount host filesystem which is necessary to gain access to Docker exposed via unix socket. And even if Docker is available on some port, because of the policy it won't, or at least shouldn't, be accessible.
This is by definition out of our hands, and in the hands of AWS. AWS team would need to support vector in the form of Docker log driver. Firelens from prior art is a good example of that.
In all of this, best case scenario would be AWS ECS team officially supporting vector as a log driver, on which @binarylogic has already started to work.
@ktff for fargate, I believe we have access to the docker logging plugins. Can we either just use the fluentd or syslog plugin to send logs over the network to a central vector instance?
@LucioFranco Fargate only supports awslogs, splunk, and of recently awsfirelens. awsfirelens is backed by fluentd/fluentbit, so ,yes , what you are suggesting is possible.
That would be a centralized topology.
It's doable over fluentbit:tcp -> vector:tcp + json_parser . This would require one ECS task definition and vector.toml hosted in Amazon S3 bucket.
@ktff to me this seems like the quickest and easiest way to get ECS support. So I say we go with that method before anything more complicated.
@LucioFranco I agree, that is definitely a better approach.
I'll add v2 specification.
Thanks to @LucioFranco, a way to have unified approach for whole ECS has been found.
Vector will collect logs from ECS containers through fluentbit. Vector as a central service and fluentbit as an agent.
To setup this we would provide:
vector.toml, in which users can add what they please. vector.toml. For example, using custom Vector image with added vector.toml seems like the best option. awsfirelens log driver with supplied options which configure fluentbit to forward logs to Vector.This could all be a documentation page.
The whole implementation is in the configurations.
Containers are configured to have the logs picked up by fluentbit running in every task.
Fluent bit is configured to send logs over tcp in json format to Vector.
Networking between Vector and fluentbit instances will be supported by ECS service discovery.
Vector will be configured with tcp source which will be connected to json_parser transform to unpack the logs. Additional transformations may be required.
I would propose removing from specification how/where Vector is ran. By doing that, a broader set of use cases can be supported, and a lot of configuration/confusion avoided for most of them.
In the specification, use cases are limited to that of Vector running on ECS. This limitation is not necessary to support collecting logs from ECS. Vector can be ran anywhere so long as fluentbit can get it's IP address and access it.
With that limitation removed, a major amount of configurations and steps needed to be done by the user can be removed. Then, documentation can be later extended for specific locations where Vector is ran. Be it ECS, EKS, private server, etc.
The only real downside of this is in delegating networking configuration to users, for now.
@LucioFranco what do you think?
@ktff do you think it would be possible to implement the fluentbit forward api then use this [logging driver] which should be available in fargate? This would avoid the need to even run fluentd or fluentbit but replace it with vector?
@LucioFranco I am not sure about implementing forward api, but I do know that fargate doesn't support fluentd logging driver. You can see that in their documentation in section logDriver, but I have also checked this for real at Amazons AWS ECS service.
Either way, I think the current specification, with the latest comment, is
Third time's a charm.
Whole ECS could be supported with a splunk source, and a documentation page.
Vector would collect logs from containers that it shares the task with and that have splunk log driver configured to send logs to it.
To facilitate this, a documentation page with two template files should be added:
vector.tomlwith addition of mentioning how to supply vector.toml to the program.
Aditionaly, documentation could have description on how to use template ECS task definition in AWS gui environment.
To support this, a new splunk source should be added. (#1088 )
So a few questions I have, do we want to name this source ECS source but have it use the splunk driver, or do we want to call this a splunk source that can accept any splunk http request?
@LucioFranco A splunk source independent of ECS, so yes, any splunk http request.
I will open an issue for this new splunk source, while the implementation of this issue will use splunk source once it's added.
@lukesteensen do you mind weighing on the idea of an http source vs extracting that out later? Thanks.
I think it makes sense to build this out specifically for the splunk hec api first. Once we have that working, we can better evaluate if/when to make it a layer on top of a more generic http source.
My feeling is that it won't be that much extra work, but I still think we should do the specific implementation first.
Now that #1088 is done, a guide would complete this.
@binarylogic what do you think?
Here are important information on how to use splunk_hec source to collect logs from any ECS.
splunk_hec source currently doesn't support https, use http. token field in splunk_hec source configuration
"logConfiguration": {
"logDriver": "splunk",
"options": {
"splunk-url": "http://0.0.0.0:8088",
"splunk-token": "...",
}
}
0.0.0.0 and not localhost as the containers won't be able to connect with it.token field in vector configuration must correspond to value of "splunk-token". Example of skeleton task definition for sidecar configuration containing only above mentioned:
{
"containerDefinitions": [
// Vector
{
"portMappings": [
{
"hostPort": 8088,
"protocol": "tcp",
"containerPort": 8088
}
],
"name": "vector"
},
// Target
{
"logConfiguration": {
"logDriver": "splunk",
"options": {
"splunk-url": "http://0.0.0.0:8088",
"splunk-token": "test_token"
}
},
"dependsOn": [
{
"containerName": "vector",
"condition": "START"
}
]
},
"networkMode": "awsvpc",
}
Additional notes:
Continues in #1449
Most helpful comment
I think it makes sense to build this out specifically for the splunk hec api first. Once we have that working, we can better evaluate if/when to make it a layer on top of a more generic http source.
My feeling is that it won't be that much extra work, but I still think we should do the specific implementation first.