FG
💻 Software🤖 AI & LLMs

Whether we should lock element when UpdateNeighborsInMemory ?

Fresh5 days ago
Mar 14, 20260 views
Confidence Score50%
50%

Problem

The following crash was triggered when we created the hnsw index using pgvector in GPDB: [code block] The crash is caused by the `neighborElement` is NULL in f4 UpdateNeighborsInMemory. But when we add some log when we find `neighborElement` is NULL and regain the `neighborElement`, it's not NULL, and it may change. We tried to understand the code of UpdateNeighborsInMemory, we think that we should require e->lock when we do `UpdateNeighborsInMemory`: [code block] because the `HnswElement e` has been added to Memory and may be neighbors of other elements, and thay may UpdateConnection of e during we `UpdateNeighborsInMemory` e. But we are not know much about the code, so initiate this discussion. --- Reproduction method: 1. create a table and insert some data 2. set max_parallel_maintenance_workers to 20; 3. CREATE INDEX hnsw_cosine_qa_data_bge_new ON qa_data_bge USING hnsw (embedding vector_cosine_ops) WITH (m = 100, ef_construction = 1000); 4. Ctrl-C stop `CREATE INDEX` after 5s if not crash, goto 3

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Implement Locking Mechanism in UpdateNeighborsInMemory

Medium Risk

The crash occurs because the `neighborElement` is NULL during the execution of `UpdateNeighborsInMemory`. This happens when the `HnswElement e` is being modified by other threads while it is being processed, leading to inconsistent state. Locking `e` during the update ensures that no other thread can modify it until the update is complete, preventing the NULL dereference issue.

Awaiting Verification

Be the first to verify this fix

  1. 1

    Identify Locking Requirement

    Review the `UpdateNeighborsInMemory` function to confirm that access to `HnswElement e` must be synchronized to prevent concurrent modifications.

  2. 2

    Add Locking Mechanism

    Implement a locking mechanism around the critical section of the `UpdateNeighborsInMemory` function where `HnswElement e` is accessed. Use a mutex or similar synchronization primitive to ensure thread safety.

    c
    pthread_mutex_lock(&e->lock);
    // UpdateNeighborsInMemory logic here
    pthread_mutex_unlock(&e->lock);
  3. 3

    Test for Race Conditions

    Run tests to simulate concurrent updates to `HnswElement e` while executing `UpdateNeighborsInMemory`. Ensure that the locking mechanism prevents crashes and maintains data integrity.

  4. 4

    Log Neighbor Element Status

    Add logging to track the status of `neighborElement` before and after the locking mechanism is applied. This will help in diagnosing any future issues related to NULL values.

    c
    log_info("Neighbor Element before update: %p", neighborElement);
    // Lock and update
    log_info("Neighbor Element after update: %p", neighborElement);
  5. 5

    Review and Optimize Locking

    After implementing the locking, review the performance impact. Optimize the locking strategy if necessary to reduce contention while maintaining safety.

Validation

Confirm the fix by running the reproduction steps multiple times without encountering the crash. Monitor logs to ensure that `neighborElement` is consistently non-NULL during updates.

Sign in to verify this fix

Environment

Submitted by

AC

Alex Chen

2450 rep

Tags

pgvectorembeddingsvector-search