FG
☁️ Cloud & DevOpsAmazon

AWS S3 Sync Issues

Freshabout 21 hours ago
Mar 14, 20260 views
Confidence Score80%
80%

Problem

There have been a few issues with respect to the `sync` command, particularly in the case of syncing down from S3 (`s3 -> local`). I'd like to try to summarize the known issues as well as a few proposals of possible options, and give people the opportunity to share any feedback they might have. Sync Behavior Overview The sync behavior is intended to be an efficient `cp`; only copy over the files from the source to the destination that are different. In order to do that we need to be able to determine whether or not a file in s3/local are different. To do this, we use two values: - File Size (from `stat`'ing the file locally and from the `Size` key in a `ListObjects` response) - Last modified time (mtime of the local file and the `LastModified` key in a `ListObjects` response) As an aside, we use the `ListObjects` operation because we get up to 1000 objects returned in a single call. This means that we're limited to information that comes back from a `ListObjects` response which is `LastModified, ETag, StorageClass, Key, Owner, Size`. Now given the remote and local files file size and last modified times we try to determine if the file is different. The file size is easy, if the file sizes are different, then we know the files are different and we need to sync the file. However, last modified time is more interesting. While the mtime of the local file is a true mtime, the `LastModified` time from `ListObjects` is really the time the object was uploaded. So imagine t

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Enhance AWS S3 Sync Logic for Accurate File Comparison

Medium Risk

The AWS S3 sync command relies on file size and last modified time to determine if files are different. However, the last modified time from S3 represents the upload time, which may not accurately reflect changes made to the file locally. This can lead to incorrect sync behavior, where files are not updated as expected.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Implement ETag Comparison

    Modify the sync logic to include ETag comparison in addition to file size and last modified time. If the ETag of the S3 object differs from the local file's ETag, the file should be considered different and synced.

    pseudo
    if (local_file.etag != s3_object.etag) { sync_file(); }
  2. 2

    Use a Custom Metadata Tag for Versioning

    Add a custom metadata tag to S3 objects that indicates the version of the file. This tag can be updated whenever the file is modified locally. During sync, check this tag to determine if the file needs to be updated.

    bash
    aws s3 cp local_file s3://bucket/path --metadata version=1.0
  3. 3

    Add a Force Sync Option

    Introduce a command-line option to force sync files regardless of size or last modified time. This can be useful in scenarios where the user knows that the local file has changed but the sync logic does not detect it.

    bash
    aws s3 sync s3://bucket/path local_path --force
  4. 4

    Log Sync Decisions

    Implement detailed logging of the sync decisions made by the command. This will help in diagnosing why certain files were not synced, providing insights into the file size, last modified time, and ETag comparisons.

    javascript
    console.log(`Syncing ${file}: size=${local_size} vs s3_size=${s3_size}, mtime=${local_mtime} vs s3_mtime=${s3_mtime}, etag=${local_etag} vs s3_etag=${s3_etag}`);

Validation

To confirm the fix worked, run the sync command on a set of files with known differences in size, last modified time, and ETag. Verify that all expected files are synced correctly and that the logging output reflects the correct sync decisions.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

awscliclouds3syncfeature-requests3s3md5