NEW: DynamoDB Streams Filtering in Serverless Framework
Posted by Superadmin on June 12 2023 04:02:27
AWS Community Builders  profile imagePawel Zubkiewicz
Pawel Zubkiewicz for AWS Community Builders

Posted on Dec 6, 2021

419

NEW: DynamoDB Streams Filtering in Serverless Framework

#aws#cloudnative#dynamodb#serverless

From this article, you will learn how to utilize recently released functionality of Streams Filtering with DynamoDB and Lambda.

We will move deeper than a basic sample of DynamoDB event action filtering. You will learn how to combine it with your business logic. I will be using DynamoDB single-table design setup for that.

What's new?

If you haven't heard, just before #reInvent2021 AWS dropped this huge update.

What's changed?

Before the update

Every action made in a DynamoDB table (INSERTMODIFYREMOVE) triggered an event that was sent over DynamoDB Streams to a Lambda function. Regardless of the action type, a Lambda function was always invoked. That had two repercussions:

That situation was multiplied in single-table design, where you store multiple types in a single table, so in reality you have many INSERTs with subtypes (ie. new user, new address, new order etc.)

After the update

Now, you can filter out events that are not relevant to your business logic. By defining filter criteria, you control which events can invoke a Lambda function. Filtering evaluates events based on values that are in the message.

This solves above-mentioned problems:

All of that thanks to the small JSON snippet defining filter criteria.

Refactoring to the Streams Filtering

Since you're reading this article, it's safe to assume you're like me, already using DynamoDB Streams to invoke your Lambda functions.

Therefor, let me take you through the refactoring process. It's a simplified version of the code that I use on production.

In my DynamoDB table, I store two types of entities: Order and Invoice. My business logic requires me to do something only when Invoice is modified.
Business logic conditions
As you can see, it's just the single case out of six. Imagine what happens when you have more types in your table, and your business logic requires you to perform other actions as well.

Old event filtering

Let's start from those ugly if statements that I had before the update because I had to manually filter events.

My Lambda's handler started with execution of parseEvent method:

const parseEvent = (event) => {
  const e = event.Records[0] // batch size = 1
  const isInsert = e.eventName === 'INSERT'
  const isModify = e.eventName === 'MODIFY'

  const isOrder = e.dynamodb.NewImage?.Type?.S === 'Order'
  const isInvoice = e.dynamodb.NewImage?.Type?.S === 'Invoice'

  const newItemData = e.dynamodb.NewImage
  const oldItemData = e.dynamodb.OldImage

  return {
    isInsert, isModify, isOrder, isInvoice, newItemData, oldItemData
  }
}

Next step, I had to evaluate the condition in my handler:

const {
    isInsert, isModify, isOrder, isInvoice, newItemData, oldItemData
  } = parseEvent(event)


if (isModify && isInvoice) {
  // perform business logic
  // uses newItemData & oldItemData values
}

New event filtering

New functionality allows us to significantly simplify that code by pushing condition evaluation on AWS.

Just to recap, my business logic requires me to let in only MODIFY events that was performed on Invoice entities. Fortunately, I keep Type value on my entities in DynamoDB Table (thanks Alex 🤝).

The DynamoDB event structure is well-defined, so basically what I need to do is make sure that:

All of that is defined in filterPatterns section of Lambda configuration. Below is a snippet from Serverless Framework serverless.yml config file. Support for filterPatterns was introduced in version 2.68.0 - make sure you are using it or newer.

    functionName:
      handler: src/functionName/function.handler
      # other properties
      events:
      - stream:
          type: dynamodb
          arn: !GetAtt DynamoDbTable.StreamArn
          maximumRetryAttempts: 1
          batchSize: 1
          filterPatterns:
            - eventName: [MODIFY]
              dynamodb:
                 NewImage:
                   Type:
                     S: [Invoice]

And that's all you need to do to filter your DynamoDB Stream.

Amazing, isn't it?

Gotchas

Bear in mind that there can be several filters on a single source. In such case, each filter works independently of the other. Simply put, there is OR not AND logic between them.

I learned that the hard way by mistakenly creating two filters:

          filterPatterns:
            - eventName: [MODIFY]
            - dynamodb:
                 NewImage:
                   Type:
                     S: [Invoice]

by adding - in front of dynamodb:. It resulted in the wrong filter:

{
  "filters": [
    {
      "pattern": "{\"eventName\":[\"MODIFY\"]}"
    },
    {
      "pattern": "{\"dynamodb\":{\"NewImage\":{\"Type\":{\"S\":[\"Invoice\"]}}}}"
    }
  ]
}

That one catches all MODIFY actions OR anything that has Invoice as Type in NewImage object, so DynamoDB INSERT actions as well!

Correct filter:

{
  "filters": [
    {
      "pattern": "{\"eventName\":[\"MODIFY\"],\"dynamodb\":{\"NewImage\":{\"Type\":{\"S\":[\"Invoice\"]}}}}"
    }
  ]
}

You can view filter in Lambda console, under Configuration->Triggers section.

Global Tables

As kolektiv mentioned in the comments below, this functionality does not work with Global Tables.

One more catch, you can't use filtering with global tables, your filter will not be evaluated and your function will not be called. Confirmed with aws support.

Thanks for pointing that out.

How much does it cost?

Nothing.

There is no information about any additional pricing. Also, Jeremy Daly confirmed that during re:Invent 2021.

In reality, this functionality saves you money on maintenance because it's easier to write & debug Lambda code, and on operations, as functions are executed only responding to business relevant events.

Low coupling

Before the update, people implemented event filtering logic in a single Lambda function. Thus, struggling from high coupling (unless they utilized some kind of dispatcher pattern).

Now, we can have several independent Lambda functions, each with its filter criteria, attached to the same DynamoDB Stream. That results in lower coupling between code that handles different event types. This will be very much appreciated by all single-table design practitioners.

Update

I forgot to mention that you can do more than just evaluate string equals condition in the filter. There are more possibilities, delivered by several comparison operators.

Here is a table stolen borrowed from AWS Docs (If it's not OK to included it here please let me know.):

Comparison operatorExampleRule syntax
Null UserID is null "UserID": [ null ]
Empty LastName is empty "LastName": [""]
Equals Name is "Alice" "Name": [ "Alice" ]
And Location is "New York" and Day is "Monday" "Location": [ "New York" ], "Day": ["Monday"]
Or PaymentType is "Credit" or "Debit" "PaymentType": [ "Credit", "Debit"]
Not Weather is anything but "Raining" "Weather": [ { "anything-but": [ "Raining" ] } ]
Numeric (equals) Price is 100 "Price": [ { "numeric": [ "=", 100 ] } ]
Numeric (range) Price is more than 10, and less than or equal to 20 "Price": [ { "numeric": [ ">", 10, "<=", 20 ] } ]
Exists ProductName exists "ProductName": [ { "exists": true } ]
Does not exist ProductName does not exist "ProductName": [ { "exists": false } ]
Begins with Region is in the US "Region": [ {"prefix": "us-" } ]

Summary

I hope this short article convinced you to refactor your Lambda functions that are invoked by DynamoDB Streams. It's really simple and makes a huge difference in terms of code clarity and costs.