aws s3 ls - find files by modified date?
Problem
Hi, We'd like to be able to search a bucket with many thousands (likely growing to hundreds of thousands) of objects and folders/prefixes to find objects that were recently added or updated. Executing aws s3 ls on the entire bucket several times a day and then sorting through the list seems inefficient. Is there a way to simply request a list of objects with a modified time <, >, = a certain timestamp? Also, are we charged once for the aws s3 ls request, or once for each of the objects returned by the request? New to github, wish I knew enough to contribute actual code...appreciate the help.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Efficiently List S3 Objects by Modified Date
The AWS CLI does not provide a built-in way to filter S3 objects by their last modified date directly. The 'aws s3 ls' command lists all objects in a bucket, which can be inefficient for buckets with a large number of objects. Instead, using the 'aws s3api list-objects-v2' command allows for pagination and can be combined with filtering logic in a script to retrieve only the objects modified within a specific date range.
Awaiting Verification
Be the first to verify this fix
- 1
Set Up AWS CLI
Ensure that the AWS CLI is installed and configured with the necessary permissions to access the S3 bucket.
bashaws configure - 2
Use S3API to List Objects
Use the 'aws s3api list-objects-v2' command to retrieve objects from the S3 bucket. This command supports pagination and provides detailed metadata, including the LastModified timestamp.
bashaws s3api list-objects-v2 --bucket your-bucket-name --query 'Contents[?LastModified>=`2023-10-01T00:00:00`]' - 3
Filter Results by Date
Modify the query to filter objects based on the desired date range. Adjust the date in the query to match your requirements. This example filters for objects modified on or after October 1, 2023.
bashaws s3api list-objects-v2 --bucket your-bucket-name --query 'Contents[?LastModified>=`2023-10-01T00:00:00` && LastModified<=`2023-10-31T23:59:59`]' - 4
Handle Pagination
If your bucket contains a large number of objects, implement pagination by using the 'ContinuationToken' returned in the response to retrieve additional objects until all are listed.
bashaws s3api list-objects-v2 --bucket your-bucket-name --starting-token your-continuation-token - 5
Understand Cost Implications
AWS charges for S3 requests based on the number of requests made, not the number of objects returned. Therefore, each 'list-objects-v2' request counts as one request regardless of how many objects are returned.
Validation
To confirm the fix worked, run the modified 'aws s3api list-objects-v2' command and check if the output includes only the objects modified within the specified date range. Verify the LastModified timestamps in the output.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep