Thursday, September 29, 2016

AWS Lambda and S3 - How to do Cross Accounts Bucket Copy

Sometimes it is necessary to do a AWS s3 cross accounts bucket replication. For example, you have two AWS account, one is your "production" account and the other is your "audit" account. In your production account you have a s3 bucket called "access-logs" which stores all your important access logs, and you want to copy these logs file over to "audit" account - "audit-access-logs" bucket, and also setup a trigger (whenever there are changes in access-logs, the same change can be mirrored in audit-access-logs bucket). That's where the AWS lambda function comes in.

Steps:
1. Create source bucket - "access-logs" in "production" account
2. Create destination bucket - "audit-access-logs" in "audit" account
3. Create two IAM users, "source_client" (in production account) and "dest_client" (in audit account), "source_client" will be used for read access logs in "acess-logs" bucket, and "dest_client" can be used for uploading logs to "audit-access-logs". Make sure "source_client" has read-only access to "access-logs" and "dest_client" has write-only access to "audit-access-logs". You can control access permission by creating bucket policy and IAM policy.
4. Create a "access-id" bucket in "production", in this bucket, you will create four files, "source_client_id", "source_client_key", "dest_client_id", "dest_client_key". Each contains the AWS access ID and access key. Make sure this bucket is read-only by the lambda role, for example (lambda_s3_exec_role)! This is very important!
5. Go to "AWS" -> "Lambda"
6. Create a fucntion called "s3BucketCopy", it will look like the following:
from __future__ import print_function

import boto3
import os
import sys

# For security, we store IDs and Keys in s3 bucket.
print("Retrieve access ID and access secret keys for source and dest clients")
s3 = boto3.resource('s3')

# Source client
object = s3.Object('access-id','source_client_id')
source_client_id = str(object.get()["Body"].read())
object = s3.Object('access-id','source_client_key')
source_client_key = str(object.get()["Body"].read())

# Destnation client
object = s3.Object('access-id','dest_client_id')
dest_client_id = str(object.get()["Body"].read())
object = s3.Object('access-id','dest_client_key')
dest_client_key = str(object.get()["Body"].read())

# source_client user
source_client = boto3.client(
    's3',
    aws_access_key_id=source_client_id,
    aws_secret_access_key=source_client_key)

# dest_client user
dest_client = boto3.client(
    's3',
    aws_access_key_id=dest_client_id,
    aws_secret_access_key=dest_client_key)

# The destination bucket, source bucket is defined in the lambda trigger
dest_bucket = 'audit-access-logs'

def lambda_handler(event, context):
    print('Execute lambda handler')
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        for key in source_client.list_objects(Bucket=bucket)['Contents']:
            temp_key = key['Key']
            if not temp_key.endswith("/"):
                # Prepare download and upload path
                download_path = '/tmp/{}'.format(temp_key)
                upload_path = '/tmp/{}'.format(temp_key)
                print("downloading {} to {}".format(temp_key, download_path))
                # Start downloading and uploading
                source_client.download_file(bucket, temp_key, download_path)
                dest_client.upload_file(upload_path, dest_bucket, key['Key'])
            else:
                # If the key is a directory
                if not os.path.exists('/tmp/{}'.format(temp_key)):
                    os.makedirs('/tmp/{}'.format(temp_key))

The configuration looks like:


Set the sourece bucket in triggers:
arn:aws:s3:::access-logs
Event type: ObjectCreatedByPut

You are good to go! Create a test and test it out!

No comments: