In this blog, I am going to show you the various ways of getting your logs into AWS manged Elasticsearch from CloudWatch logs. After we get our logs into CloudWatch, AWS allows us to stream the log data to the following three destinations -:

  1. AWS Lambda
  2. AWS Kinesis Streams
  3. AWS Kinesis Firehose

Cloudwatch requires us to apply subscription filter on the logs where we can specify a pattern so that only logs that match a certain pattern are streamed from Cloudwatch to a given destination. This is important since if we do not wish to get certain logs we avoid streaming them which also saves us costs.

AWS Lambda

When streaming logs to AWS Lambda, AWS Lambda receives a cloudwatch record containing a few log events. AWS provides a blueprint for the lambda function that processes these log events and inserts them into Elasticsearch. The second option above that streams to Amazon ElasticSearch Service also does the same thing, however it does a few things automatically such as creating the subscription filter, creating the IAM Role for inserting into ElasticSearch, and creating a lambda function that contains standard code written by AWS, which process the raw CloudWatch object and inserts data into ElasticSearch. The subscription filter allows us to filter the logs that we want to process using a specified pattern. This code is pretty good for almost all use cases. You can customize this function if you want to tweak the data being inserted into ElasticSearch.

The problem faced with using AWS Lambda is that we may face throttling after a certain limit if we have too many logs. Cloudwatch will trigger a Lambda function repeatedly for a small number of logs. AWS provides a max concurrency of 1000 for Lambda functions per region which is a soft limit. If we have too many logs coming into Cloudwatch the number of lambda functions that are triggered may start going up until a point where we reach our concurrency limit. This concurrency limit can be increased upto a max of 2500 after asking AWS for a service limit increase. Hence we may reach point where the logging may start facing scaling issues.

Also one more challenge we face here is if your Elasticsearch is inside a VPC, your Lambda functions have to be inside a VPC subnet where the Lambda functions create network interfaces. These network interfaces are limited to the number of ip addresses available in that subnet. So this can also limit the number of lambda functions that can be created.

Another challenge that we face here is that the Cloudwatch logs and the lambda function have to be in the same region. If the region is different you have to create lambda functions in the other region. Also if the ElasticSearch is inside the VPC, you have to configure VPC peering between the VPCs in the two regions. This adds a lot of overhead for your simple task.

Using AWS Lambda for inserting logs into ElasticSearch can be used if you are sure that your amount of logs will not trigger a lot of Lambdas to cause throttling and your logs are in the same region as the ElasticSearch cluster.

Our other options are Kinesis Firehose and Kinesis Streams.

AWS Kinesis Firehose

Kinesis Firehose is AWS’s fully managed data ingestion service that can push data to S3, Redshift, ElasticSearch service and Splunk. Firehose also provides data transformation using a custom Lambda function. In this Lambda function, we transform the data and insert it back into the Firehose and then Firehose can deliver it to the destination.

Configuring the destination for a subscription filter in CloudWatch requires use of CLI or a CloudFormation template. You need to create a subscription filter that will filter the logs and then send them to Firehose. Also you need to attach a role to CloudWatch Logs that allows it to insert data into Firehose. A complete tutorial can be found at this link

Compared to Kinesis Streams, Kinesis Firehose is a better solution because it handles automatic retries to the transformation function and the destination. In case of failure, Firehose automatically stores the failed records in an S3 bucket. You can then investigate these records later for the reason of this failure and then manually reingest them into the Firehose. Also Firehose scales automatically according to the data that is being ingested. Kinesis streams on the other hand starts throttling after the maximum amount of throughput for a shard is reached. For those who do not know, a shard is a unit of data ingestion in Kinesis streams that can ingest at a maximum throughput of 1 MB/s and can be read at a maximum throughput of 2MB/s. You need to manage the scaling of the Kinesis streams to read data according to the speed at which your data producer produces data. Also with Kinesis streams, there is a chance of data being delivered multiple times.

The advantage in using Firehose for our scenario is that Firehose allows for buffering for data upto a certain size or a certain timelimit. For example, Firehose buffers the data upto 15 minutes or upto 3 MB of data after which it passes the data for transformation to the Lambda function. It then again buffers the transformed data according to a limit specified and then pushes it to the destination. This reduces the number of invocations of the Lambda functions, since the lambda functions will be processing larger amount of data.

However the limitation of Firehose when using it with Elasticsearch is that the transformation function is allowed to return only one record back to Firehose. If we try to return multiple records back to Firehose, it will not be able to insert the data into Elasticsearch giving us a 400 error. When asking the AWS Support about it, they told me that Firehose is not designed to insert data into ElasticSearch (even though they advertise it).

Another disadvantage of Firehose is that it cannot be used to insert data into Elasticsearch inside a VPC. It is necessary for the Elasticsearch cluster to be outside the VPC.

Work Around

So a workaround for using Kinesis Firehose could be inserting the data via the transformation lambda function.

Using the transformation Lambda Function

architecture diagram for centralized logging on AWS

We can configure the Kinesis Firehose stream to insert data into an S3 Bucket. We can have a data transformation Lambda function associated with the Firehose stream. This Lambda function would contain the code to transform the data and then insert it into ElasticSearch using an HTTP Call.

Other than that, we can place this data transformation Lambda inside or outside a VPC allowing us to communicate with an ElasticSearch inside or outside the VPC.

AWS provides a standard blueprint for transforming a CloudWatch record. Along with the lambda blueprint provided by AWS to insert data into ElasticSearch , we can create a function that inserts the Cloudwatch logs into ElasticSearch after transforming the data.

The Firehose transformer provides a function that is left empty by AWS which is where the actual transformation takes place.

function transformLogEvent(logEvent) {
    return Promise.resolve(`${logEvent.message}\n`);
}

Insert an HTTP call there to insert into ElasticSearch. This would insert data into ElasticSearch for every log record. To optimize it further we can perform a bulk request at the end of the Lambda function to insert the data, that would insert all the logs into ElasticSearch.

I hope this blog provides you with clarity on how you can acheive your goals for centralized logging.


Leave a Reply

Your email address will not be published. Required fields are marked *