What Would Cause Kinesis Consumer to Be Retried Again
Well-nigh vi months ago, our squad started the journeying to replicate some of our data stored in on-premise MySQL machines to AWS. This included over a billion records stored in multiple tables. The new organization had to exist responsive enough to transfer any new incoming data from the MySQL database to AWS with minimal latency.
Everything screamed out for a streaming architecture to be put in place. The solution was designed on the backbone of Kinesis streams, Lambda functions and lots of lessons learned. We use Apache Kafka to capture the changelog from MySQL tables and sink these records to AWS Kinesis. The Kinesis streams will so trigger AWS Lambdas which would so transform the data.
These are our learnings from building a fully reactive serverless pipeline on AWS.
Fine-tuning your Lambdas
Nosotros all dear to read the records as soon as they turn up in your streams. For this, y'all accept to be absolutely sure your Lambdas are performing top notch. If y'all are dealing with high volume information, increasing the Lambda parameters can give yous surprising results. The parameters you can finetune are
- Memory
- Batch-size
- Timeout
For united states, increasing the memory for a Lambda from 128 megabytes to ii.5 gigabytes gave united states a huge boost.
The number of Lambda invocations shot upwards nigh 40x. But this besides depends on your data volumes. If you are fine with minimal invocations per second, you tin can e'er stick with the default 128 megabytes.
Y'all tin too increase the Batch-size of each consequence. This means that more than records volition exist processed per invocation. But, brand certain you do not hitting the max timeout of the Lambda function (5 minutes). A proficient number is 500–1000 records per upshot.
Beware of Kinesis read throughput
While reading incoming records from Kinesis, always remember that the Kinesis stream will exist your biggest bottleneck.
Kinesis streams have a read throughput of 2 megabytes per second per shard. This means that the bottleneck really lies in the number of shards you lot have in the Stream.
We started off with unmarried shard streams and realised the Lambdas do non procedure the records fast enough. Increasing the shards from 1 to 8 hands gave us an 8 fold increase in the throughput. At this moment, we almost reached 3k Lambda invocations per second in one of our busier streams. Remember that a single Lambda invocation process records from a single shard.
Having said that, if your stream is not expected to be very decorated, you don't proceeds much by splitting the records into shards.
The bottleneck lies in shards per streams. Create busier streams with more shards so that the records are picked up chop-chop.
See the issue of increasing the shards from i to 8.
The number of invocations went up almost 3x.
Call back the Retentiveness menses
AWS will brand all our lives easier if the records in Kinesis are persisted forever. Merely, Kinesis streams has a maximum retentiveness period of 168 hours or seven days (for now!). This ways that when yous accept a new record in your Kinesis stream, you have 168 hours to process it before you lose it forever.
This besides means that once yous realise your Lambda failed to process a record, you have 168 hours to set up your Lambda or you lose the record.
Things are a bit easier if the incoming records are stateless, i.eastward the lodge of the records does not affair. In this example, you could configure a Dead Alphabetic character Queue or push the record back to Kinesis. But if yous are dealing with database updates or records where the order is important, losing a single record result in losing the consistency of data.
Brand certain you accept enough safe nets and infrastructure to react to failed records.
Monitor your IteratorAge
Make sure you monitor this very important metric, the IteratorAge (GetRecords.IteratorAgeMilliseconds). This metric shows how long the final record processed past your Lambda stayed in the Kinesis stream. The higher the value, the less responsive your system. This metric is important for two reasons:
- To make sure you lot are processing records fast plenty
- To make sure y'all are not losing records (record stays in the stream across your retention period)
A very high responsive system will accept this metric always < 5000 ms, but it's highly linked to the problem you are trying to solve.
It's best to gear up up alarms on the IteratorAge to make sure the records are picked upward on fourth dimension. This can aid you avoid situations where you start losing records.
In the case above, the Lambdas were really slow to selection up records. This means that the records stayed in Kinesis without being candy, and eventually got deleted. At this indicate, all the records fetched by the Lambda has stayed in that location for almost 168 hours (the maximum possible). In this case, there is a very very high hazard we lost records. Exist rubber, set a Cloudwatch alarm.
Lambda helps out in Fault Handling
Lambdas have a special behaviour when it comes to processing Kinesis upshot records. When the Lambda throws an error while processing a batch of records, it automatically retries the aforementioned batch of records. No further records from the specific shard are processed.
This is very helpful to maintain the consistency of data. If the records from the other shards do not throw errors, they will go along equally normal.
Simply consider the example of a specific corrupt record which should be rejected by the Lambda. If this is not handled, the Lambda volition retry forever, and if not monitored, the Lambda will retry until the record is eventually expired past the Kinesis steam. But at the betoken, all the unprocessed records in the shard are too nearly the expiry period and if the Lambda is non fast enough, and then many records volition be lost.
Trust the Lambda to retry errors on valid records, but make certain you handle cases where the record is corrupt or should be skipped.
Conclusion
At the moment, we have almost 20 Kinesis streams and a like number of Lambdas processing them. All of them are fully monitored and have never given united states sleepless nights. We are building and innovating every single day.
Source: https://tech.trivago.com/post/streaming-with-kinesis/
0 Response to "What Would Cause Kinesis Consumer to Be Retried Again"
Post a Comment