How to Design AWS DynamoDB Data Modeling
The nonrelational database solves the challenges from the relational database such as a well-defined structure of data, the predefined schema, and the vertical database scaling. NoSQL database provides the flexibility to extend the data structure even for the complex hierarchical data structures. So when we design AWS DynamoDB data modeling on the nonrelational database, instead of thinking of a well-defined data structure, we need to focus on analyzing the business requirement on the access patterns. We can design the access patterns through data store models, primary keys, and secondary indexes in AWS DynamoDB. Then the applications retrieve the given access pattern with a single query on the table or the index.
AWS DynamoDB Data Modeling
Before we talking about how to handle data modeling on the one-to-one, the one-to-many, and the many-to-many relationships that we saw in the relational database, let’s quickly go through AWS DynamoDB’s data store models, primary keys and secondary indexes.
Data Store Models
AWS DynamoDB is a fully managed database and supports both document and key-value data models. A document database, also known as a document-oriented database, is a type of NoSQL database that is designed to store semi-structured data as documents. A document database typically stores data in JSON format. A Key-value database, also known as a key-value store database, is a type of NoSQL database that uses the simple key and value method to store data. The key-value database treats the data as a single opaque collection of key and value pairs. Hence, AWS DynamoDB offers flexibility during data modeling. We will see this flexibility during the hierarchical data scenario.
AWS DynamoDB uses two types of primary keys: the simple primary key and the composite primary key. The simple primary key consists of only the partition key. The composite primary key consists of both partition key and sort key. The partition key of an item is also known as its hash attribute. The hash attribute determines the physical storage internal to DynamoDB to store the item. The sort key of an item is also known as its range attribute. So DynamoDB stores the items with the same partition key physically close together in the sorted order of the range attribute value.
The secondary indexes provide query flexibility. DynamoDB supports two different kinds of indexes: Global Secondary Index(GSI) and Local Secondary Index(LSI). The global secondary index is an index with a partition key and sort key that can be different from those on the table. The local secondary index is an index that has the same partition key as the table, but a different sort key. We can create GSI at any time. But must create LSI during the table creating time. The limitation on secondary indexes is creating up to five LSIs and five GSIs per table. We will see how to combine primary key and secondary indexes to maintain the access patterns as in few tables as possible in the following scenarios. To get more details on DynamoDB core components and cost of DynamoDB, please review my previous posts.
Now, let’s take a look at how to handling the common relationships in DynamoDB.
This is a very straight relationship. This type of relationship has a unique attribute to identify the access pattern. We can use the key-value store model to store data. For example, the books have an Id or an ISBN to uniquely identifies each item in the table. You want to access the book’s information either through the ID or ISBN. We can define the Id as the partition key of the table with the remaining attributes. Then we create GSI with ISBN as the GSI’s partition key. So we cover both access patterns with this design to retrieve the data by either of the unique attributes.
This is a parent-children relationship. This type of relationship has a unique attribute to associate with multiple items. We can use the table or GSI with hash and range key to complete the access patterns. For example, one publisher can publish many books. You want to access the publishers and the book’s information. We can define the publisher Id as the partition key and the book id as the sort key. Then we create GSI with book id as the GSI’s partition key. So the application can retrieve the publishers’ list and the books report in the separated queries.
This is a complex relationship. This type of relationship is a type of cardinality that refers to the relationship between two entities A and B in which A may contain a parent instance for which there are many children in B and vice versa. For example, customers can buy multiple books. The same book can appear in many customers’ purchase reports. The bookstore wants to determine the customers’ purchase reports and the books sale reports. The adjacency lists are design pattern suggested by AWS for modeling many-to-many relationships. So We use the table and GSI with partition key and sort key switched to handle these access patterns. The table has the customer id as the partition key and book id as the sort key. Then the GSI has the book id as the partition key and customer id as the sort key. In this way, the application can retrieve the purchase report with the given customer’s id and find the book’s sale report with a given book id.
Hierarchical data is a common data model in which the data are representing tree-like data structures. To design this tree-like data structure in DynamoDB, we need to first identify the access patterns. For example, the bookstores’ locations have four layers of hierarchy from the country, state, city to office. We want to retrieve the individual bookstore. We also need to provide the stores’ list in the particular area (e.g. country, state, city or zip code). This example is very similar to the office example in Rick Houlihan’s talk Advanced Design Patterns for Amazon DynamoDB at re:Invent 2017. So we design the bookstores table with the bookstore id as the partition key to satisfy the first access pattern. For the second access condition, we need to create GSI with the country as the partition key and store index (e.g. state#city#zipcode) as the sort key.
Another example is also mentioned in Rick Houlihan’s talk. For example, the bookstore has multiple products such as books, movies, and music albums etc. We can create the table with product id as the partition key and the category as the sort key. Then put the remaining attributes for each product into a JSON document as one JSON attribute. You may want to check this example: how to use adjacency list design pattern to transfer a complex HR hierarchical data into DynamoDB.
In conclusion, DynamoDB provides flexibility for data modeling. We don’t need to think of the schema. However, we do need to think the access patterns and maintain as few tables as possible for the access patterns in general. Keep in mind the following limitations on DynamoDB. You can only create up to five GSIs and five LSIs. DynamoDB doesn’t support indexing on JSON attribute. The maximum item size in DynamoDB is 400 KB.
- Hands-on with DynamoDB
- AWS Data Warehouse – Build with Redshift and QuickSight
- AWS Relational Database Solution: Hands-on with AWS RDS
- Which is Right Hadoop Solution for You?
- Apache Hadoop Ecosystem Cheat Sheet
- Data Storage for Big Data: Aurora, Redshift or Hadoop?
- AWS Kinesis Data Streams vs. Kinesis Data Firehose
- Streaming Platforms: Apache Kafka vs. AWS Kinesis
- AWS Machine Learning on AWS Redshift Data
- Why Use AWS Redshift Spectrum with Data Lake
- How to Design AWS DynamoDB Data Modeling
- When Should Use Amazon DynamoDB Accelerator (AWS DAX)?
- Web Application with Aurora Serverless Cluster
- Top IT Certifications for 2018
- How I Passed AWS CSAA in 3 Months
- How to Pass AWS Certified Big Data Specialty
- AWS Elastic Beanstalk or AWS Elastic Container Service for Kubernetes (AWS EKS)
- How to Use AWS CodeStar to Manage Lambda Java Project from Source to Test Locally