DX NetOps

 View Only

How to Fastrack Massive Data Migration to Amazon S3

  • 1.  How to Fastrack Massive Data Migration to Amazon S3

    Posted Nov 25, 2017 04:30 AM

    In many enterprises, applications use enormous amounts of data every day. Regardless of the type of enterprise data that we’re talking about, data loss or decreased availability of data can cause significant financial losses. To meet the challenges of enterprise data, organizations are increasingly relying on cloud storage and cloud computing as a solution for storing their data securely.


    Amazon S3 is a secure cloud solution that allows encryption, cross-region replication, and data access control. If we consider the fact that Amazon S3 also guarantees 99.99% availability and 99.999999999% durability, it’s clear why enterprise organizations use it.


    But how can you migrate large amounts of data to S3 without experiencing any issues with the application? In a previous post, we discussed several ways to upload data to Amazon S3. In this post, we show you how to migrate large amounts of data to S3. For large amounts of data, the challenge of maintaining business continuity during the migration process and the challenge of transitioning to Amazon S3 seamlessly are even bigger issues.

    Plan Your Data Migration

    No matter what your reason is for migrating your data, the goal remains the same: Perform the process safely and quickly to maintain business continuity. To prevent any complications, you need to plan each step of your migration process carefully.


    A serious challenge that you might face during migration is time. So, how much time will you need to transfer your data to Amazon S3? You can estimate the amount of time that you will need with the following formula:


    Number of days = (Total bytes) / (Megabits per second * 125 * 1,000 * Network utilization * 60 seconds * 60 minutes * 24 hours)


    Let’s say that you want to migrate 10TB of data over your 10Mbps-bandwidth company internet connection. And you also want to ensure that there is no internet disconnection during the process (that is, you want to maintain a solid connection 80% of the time). In this case, you will need approximately 122 days.


    As you can see, this migration method would demand a lot of time. However, there are several ways to shorten the process of migrating large amounts of data to Amazon S3 or to another Amazon Web Services (AWS) storage location. For example, you can use AWS-managed data migration tools or a third-party tool.

    Choose the Right Data Migration Tool

    AWS Direct Connect enables you to establish a dedicated network connection between your data center and one of the AWS Direct Connect locations. This connection allows you to create a virtual interface directly to your AWS environment and will enable a private, non-internet-routed connection. By using AWS Direct Connect, you increase your network throughput and decrease the time that’s required for data migration. It also helps lower your network expenses and facilitates a much more stable connection than the one that you have from your data center over the internet.


    However, often there’s so much data that even AWS Direct Connect can’t reduce data migration time to a reasonable amount. In such cases, you have AWS Snowball at your disposal. AWS Snowball accelerates the process of moving large amounts of data into and out of the AWS cloud. Snowball can help you avoid some of the biggest challenges of transferring large amounts of data. For example, to help keep your data secure, Snowball uses various levels of safety and is designed to protect your data.


    If Snowball’s 80TB capacity isn’t enough for you, you can use AWS Snowball Edge. This 100TB data-transfer device offers storage and compute resources, making it a true mini AWS data center.


    You also have yet another option: AWS Storage Gateway. This storage service enables simple data backup to the cloud. It connects your on-premises data center to the AWS cloud and ensures integration between your environment and the AWS storage infrastructure.


    But what if all these tools still aren’t enough? Read on.

    Standard Data Migration Options

    Now we will look at the architecture of various data migration options that the previously mentioned tools use. The first use case that we evaluate is a massive data migration from your data center to the AWS cloud. To migrate the massive data as quickly as possible, the best solution is AWS Snowball (Figure 1). AWS Snowball is suitable for migrating 50TB to 80TB of data in a single import job. You start the migration process by creating a new data transfer job in the AWS Snowball console. AWS then delivers Snowball to your data center.


    After AWS Snowball has arrived, you connect to your Snowball interface and transfer data from your own storage devices to Snowball. When the data transfer is complete, you disconnect the Snowball device from your network and prepare it for delivery to AWS. Your Snowball has a tracking number that helps you track its progress toward the designated AWS data center. When Snowball arrives at the designated AWS data center, the process of importing data to Amazon S3 storage begins. You can monitor the entire process that your data goes through by using the AWS Snowball console.

    Figure 1. Migration of large amounts of data by using AWS Snowball.

    If you have a hybrid cloud environment, you want to ensure stable network performance between data that on-premises systems use and data that AWS systems use for ongoing data migration. In that case, you will most likely use AWS Direct Connect (Figure 2). As we already mentioned, AWS Direct Connect allows a dedicated connection to your AWS cloud, thus bypassing your ISP. With AWS Direct Connect, you have a Gatewaycached volume in your data center that allows data caching during migration to Amazon S3.


    To use AWS Direct Connect, you must create an AWS Direct Connect connection between your on-premises infrastructure and the AWS cloud. After establishing the connection, you need to establish an iSCSI connection through the IP address of your storage gateway. When the setup is complete, the data that your users create through your application is stored in your on-premises storage. Then the Gatewaycached volume behaves as cache storage while data waits to be migrated to Amazon S3.

    Figure 2. Data migration by using AWS Direct Connect.

    A Better Data Migration Option

    The preceding options are good, proven solutions to help with your data migrations to Amazon S3. But what if you’re looking for something that’s a little simpler to use and that lets you get started immediately? Or what if you have other data migration needs, like going directly from one NFS server to another? Or what if you want to facilitate your CIFS data migrations?


    For these scenarios, you can use a wide variety of DIY tools. But if you’re looking for something that’s really simple, efficient, and cost-effective, you’re better off with a data migration service like  NetApp Cloud Sync (Figure 3).


    NetApp Cloud Sync is an intuitive data migration service. You can transfer and synchronize your data from any NFS (v3 or v4) or CIFS file system to or from Amazon S3 or to or from another NFS or CIFS server. Cloud Sync takes care of all the complexities that are involved in data movement, synchronization, and integrity checks. With the easy-to-understand interface and dashboard, you can easily establish new data replication relationships and can quickly see the state of your existing relationships. Thanks to the ability of Cloud Sync to parallelize data transfers, you can measure the duration of data transfer in minutes, not hours. And after the initial synchronization is complete, only the changes in data are synchronized on the next synchronization schedule.


    When you are ready to try Cloud Sync, with the 14-day free trial, you must make sure that your system is properly configured. First, you need an AWS account. Because Cloud Sync is a software-as-a-service (SaaS) offering from the AWS Marketplace, after the 14-day free trial, you go through AWS to subscribe to Cloud Sync.


    After your AWS account is set up, make sure that you have network connectivity between your NFS or CIFS servers and your chosen destination, whether it’s Amazon S3 or another NFS or CIFS server. Your NFS or CIFS servers can be storage appliances that run in your AWS account or on the premises. If they are on the premises, you just need to make sure that you have a VPN connection or a Direct Connect connection to your AWS account.


    When you have your networking in order, you then need to configure a data broker. A data broker is effectively the “engine” that helps perform the migration of data from the source to the destination system. Cloud Sync makes it very easy to launch the data broker in your AWS account. It also gives you the option to launch the data broker on a virtual machine in your own data center if you prefer.

    Figure 3. Synchronization relationship from an NFS server to an S3 bucket.

    The data broker synchronizes data according to the schedule that you define. That way, you don’t have to spend time creating scripts and constantly monitoring the migration process. Cloud Sync is a “set it and forget it” service with an intuitive and clear management web dashboard and with great alerting.


    Migrating your data into and out of a cloud environment is never a simple task. But with the increased number of AWS-managed tools and alternative data migration services, the migration process is becoming easier. In most cases, AWS options help you with one-way migration, but migrating data from Amazon S3 storage back to your on-premises data center or to an alternate destination can be a very demanding task. AWS Snowball can help you export your data from S3 storage back to your own data center, but it can’t help you with data synchronization. For secure, cost-effective, and fast data migration and synchronization, NetApp Cloud Sync is an excellent choice.