博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
How to incrementally migrate DynamoDB data to Table Store
阅读量:5877 次
发布时间:2019-06-19

本文共 11542 字,大约阅读时间需要 38 分钟。

is a fully-managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB can dynamically scale tables as needed, without interrupting external services or compromising service performance. Its capabilities made it very popular among AWS users when the service was released.

is also a distributed NoSQL database service that is built on Alibaba Cloud's Apsara distributed file system. As a cloud-based NoSQL database service that automatically scales tables, Table Store is very similar to DynamoDB. Table Store enables seamless expansion of data size and access concurrency through an automatic load-balanced system, providing storage and real-time access to massive structured data.

Table Store lets you offload the administrative burdens of operating and scaling a distributed database, so that you don't have to worry about hardware malfunctions, setup and configuration, replication, software patching, or upgrades.

In this article, we will show you how to incrementally migrate DynamoDB data to Table Store.

Data conversion rules

Table Store supports the following data formats:

  • String can be null and the primary key column can be string. For a primary key column, the maximum size is 1 KB. For an attribute column, the maximum size is 2 MB.
  • For a 64 bit Integer, the maximum size of the primary key is 8 bytes.
  • Binary can be null and the primary key column can be binary. For a primary key column, the maximum size is 1 KB. For an attribute column, the maximum size is 2 MB.
  • For a 64-bit Double, the maximum size is 8 bytes.
  • Boolean can be true or false, with a maximum size of 1 byte.

Currently, DynamoDB supports the following data formats:

  • Scalar type - A scalar type can exactly express one value. The types are numbers, strings, binary, boolean, and null.
  • Document type - A document type can express a complex structure with nested attributes, such as the structure you find in a JSON file. The types are lists and maps.
  • Set type - A set type can express multiple scalar values. The types are string sets, number sets, and binary sets.

If you create a table, you must convert type data into type string or type binary for storage in Table Store. As reading the data, you must deserialize data into the JSON format.

When preparing for migration from DynamoDB to Table Store, perform the following data conversions:

Note: This format conversion given below is only for your reference. You need to decide how to convert the format based on your business needs.

DynamoDB type Data example Corresponding Table Store type
number (N) '123' Integer
number (N) '2.3' Double, cannot be a primary key
null (NULL) TRUE String, null string
binary (B) 0x12315 binary
binary_set (BS) { 0x123, 0x111 } binary
bool (BOOL) TRUE boolean
bool (BOOL) TRUE boolean
list (L) [ { "S" : "a" }, { "N" : "1" }] string
map (M) { "key1" : { "S" : "value1" }} string
str (S) This is test! string
num_set (NS) { 1, 2 } string
str_set (SS) { "a", "b" } string

Incremental data migration system

When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table. Lambda lets you run synchronous programs without building the environment. The process is shown in the following figure:

1

Use the eventName field in the data stream to detect Insert, Modify, and Remove operations:

  • The INSERT command Inserts data similar to PutRow.
  • The MODIFY command modifies data.
  • If the OldImage and NewImage have identical keys, a data update operation is performed similar to Update.
  • If the OldImage has more keys than the NewImage, the difference-set keys are deleted similar to Delete.
  • The REMOVE operation deletes data similar to DeleteRow.

SPECIAL NOTE:

  • According to the stream, conversion behaviors, including insert, modify, and remove, conform to the expectations.
  • Table Store currently does not support secondary indexes, so only data from primary tables can be synced.
  • For the consistency of primary keys in DynamoDB tables and Table Store tables, the number type primary keys must be integers.
  • DynamoDB restricts the maximum size of an individual project to 400 KB. However, Table Store has no size restrictions for individual rows. Note that no more than 4 MB of data can be submitted at once. For more information, see and .
  • If you perform full data migration first, you must enable Stream in advance. Because the DynamoDB Stream can only save data from the past 24 hours, you must complete the full data migration within 24 hours. After completing the full migration, you can enable the Lambda migration task.
  • You must ensure the eventual consistency of the data. During incremental data synchronization, some of the full data may be rewritten. For example, if you enable Stream and perform a full migration at T0, which is completed at T1, DynamoDB data operations performed between T0 and T1 are synchronously written to Table Store.

Procedures

  1. Create a data table in DynamoDB
    Here, we use the table Source as an example, with the primary key user_id (string type) and the sort key action_time (numerical). You must use the reserved settings of DynamoDB, because they impact the read/write concurrency.

2

  1. Enable Stream for the Source table
    In Stream mode, you must select New and old images-both the new and old images of the item.

3

  1. Go to the IAM console and create a role

    To create an IAM role (execution role) for this exercise, do the following:

    1. Log on to the .
    2. Choose Roles,and then choose Create role.
    3. In Select type of trusted entity, choose AWS service, and then choose Lambda.
    4. Choose Next: Permissions.
      1
    5. In Filter: Policy type, enter AWSLambdaDynamoDBExecutionRole and choose Next: Review.
      2
    6. In Role name*, enter a role name that is unique within your AWS account (for example, lambda-dynamodb-execution-role) and then choose Create role.
      3
  2. Go to the Lambda console and create the relevant data sync function
    Enter the instance function name data-to-tablestore, select Python 2.7 as the runtime language, and use the role lambda-dynamodb-execution-role.

4

  1. Associate with the Lambda event source
    Click the DynamoDB button to configure the event source. At this point, set the Source data table's batch processing size to 10 to test in small batches. Table Store has a batch operation limit of 200 rows of data, so the value cannot be higher than 200. In practice, we suggest setting the value to 100.

5

  1. Configure the Lambda function.
    Click the Lambda function icon to configure the function.

Table Store relies on SDKs, protobuf, and other dependency packages. Therefore, you must install and package SDK dependencies in the way you ").
Use the function zip package ") to directly upload from your local device. Or, you can upload to S3 first.
The default processing program portal is lambda_function.lambda_handler.
In Basic Settings, you must set the event time-out setting to at least 1 minute (in consideration of the batch submission delay and network transmission time).
6

  1. Configure Lambda operation variables

To import data, you must provide the Table Store instance name, AK, and other information. You can use the following methods:

  • Method 1 (recommended): Directly configure the relevant environment variables in Lambda, as shown in the following figure.

    Use the Lambda environment variables to ensure that a single function code zip package can support different data tables, so that you do not need to modify the configuration file in the code package for each data source. For more information, see .

    7

  • Method 2: Open lambda_function.zip to modify example_config.py, then package it for upload. Or you can modify it on the console after uploading.

    8

    Configuration description:

    Environment variable Required Description
    OTS_ID Yes The AccessKeyId used to access Table Store.
    OTS_SECRET Yes The AccessKeySecret used to access Table Store.
    OTS_INSTANCE Yes The instance name to import to Table Store.
    OTS_ID Yes The AccessKeyId used to access Table Store.
    OTS_ENDPOINT No The domain name to be imported to Table Store. If none exists, use the default Internet domain name of the instance.
    TABLE_NAME Yes The table name to be imported to Table Store.
    PRIMARY_KEY Yes The primary key information of the table to be imported to Table Store. You must ensure that the proper primary key sequence and the primary key names be consistent with the source table.

    SPECIAL NOTE:

    • If there is the same variable name, it will be read first from Lambda's variable configuration. If it does not exist, read it from example_config.py.
    • The access key indicates the access permission to the resource. We strongly recommend you only use the access key of a Table Store subaccount with write permission to the specified resource, because this reduces the risk of access key leakage. For more information, see .
  1. Create a data table in Table Store.

On the create a data table named target, with the primary keys user_id (string) and action_time (integer).

8

  1. Perform testing and debugging.

    Edit the event source on the Lambda console for debugging.

    Click Configure Test Event in the upper-right corner and enter the JSON content for a sample event.

    In this article, we have three Stream sample events:

    • test_data_put.json simulates the insertion of a row of data in DynamoDB. For more information, see .
    • test_data_update.json simulates the update of a row of data in DynamoDB. For more information, see .
    • test_data_update.json simulates the deletion of a row of data in DynamoDB. For more information, see .

Save the contents of the above three events as putdata, updatedata, and deletedata.

9

10

After saving, select the event you want to use and click Test:

If the execution result shows the test was successful, you can read the following test data from the Target table in Table Store.

Select putdata, updatedata, and deletedata in sequence. You will find that the data in Table Store is updated and deleted.

11

12

  1. In practice
    If the tests are successful, write a new row of data in DynamoDB. Then, you can read this row of data immediately in Table Store, as shown in the following figure.

13
14

  1. Troubleshooting
    All Lambda operation logs are written to . In CloudWatch, select the appropriate function name to query the Lambda op

eration status in real time.

15
16

Code analysis

In the Lambda function, the main code logic is lambda_function.py. For more information about how to implement the code, see . Others are SDK source codes that may be used. lambda_function.py includes the following functions:

  • def batch_write_row(client, put_row_items) batch-writes grouped data items (including insert, modify, and remove) to Table Store.
  • def get_primary_key(keys) gets source and target table primary key information based on the PRIMARY_KEY variable.
  • def generate_update_attribute(new_image, old_image, key_list) analyzes Modify operations in the Stream to determine if some attribute columns have been updated or deleted.
  • def generate_attribute(new_image, key_list) gets attributes column information inserted into a single Record.
  • def get_tablestore_client() initializes the Table Store client based on the instance name, AK, and other information in the variables.
  • def lambda_handler(event, context) is the Lambda portal function.

In the case of more complex synchronization logic, you can make changes based on .

The status logs printed in lambda_function.py do not distinguish betweenINFO andERROR. To ensure data consistency during synchronization, you must process the logs and monitor operation statuses, or use Lambda's error handling mechanism to ensure the fault-tolerant handling of abnormalities.

转载地址:http://mokix.baihongyu.com/

你可能感兴趣的文章
1-小程序
查看>>
CentOS图形界面和命令行切换
查看>>
HTML5通信机制与html5地理信息定位(gps)
查看>>
Mind_Manager_2
查看>>
手动升级 Confluence - 规划你的升级
查看>>
汽车常识全面介绍 - 悬挂系统
查看>>
电子政务方向:We7.Cloud政府云门户
查看>>
虚拟机Centos7连接Internet
查看>>
ansible 基本操作(初试)
查看>>
更改tomcat的根目录路径
查看>>
51nod 1292 字符串中的最大值V2(后缀自动机)
查看>>
加快ALTER TABLE 操作速度
查看>>
学习笔记之软考数据库系统工程师教程(第一版)
查看>>
基本网络概念
查看>>
将 ASP.NET Core 2.0 项目升级至 ASP.NET Core 2.1 RC 1
查看>>
js提交图片转换为base64
查看>>
学习CodeIgniter框架之旅(二)继承自定义类
查看>>
Y2161 Hibernate第三次考试 2016年8月18日 试卷分析
查看>>
Angular CLI 使用教程指南参考
查看>>
PHP 程序员的技术成长规划
查看>>