Limit HNSW build's shared memory size for small tables
Problem
It occurs to me that it's silly that we allocate an area based on `maintenance_work_mem`, even if the table is small. For example, if maintenance_work_mem is set to 10 GB, and the table has 1000 rows, clearly we don't need to allocate 10 GB. There's some precedence for this in parallel VACUUM. It allocates an array to hold TIDs of dead tuples based on `maintenance_work_mem`, but it uses an upper limit on the worst case assumption that every page is full of tiny tuples, all deleted. See `dead_items_max_items` It'd be nice t do something similar in HNSW build.
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Implement Upper Limit on HNSW Build Memory Allocation
The current implementation of HNSW build allocates memory based on the 'maintenance_work_mem' setting, which can lead to excessive memory usage for small tables. This is inefficient as it does not account for the actual size of the data being processed, resulting in wasted resources.
Awaiting Verification
Be the first to verify this fix
- 1
Define Maximum Memory Allocation
Introduce a new configuration parameter that sets an upper limit on memory allocation for HNSW builds based on the number of rows in the table. This will ensure that even if 'maintenance_work_mem' is set high, the actual memory used will be capped appropriately for small tables.
javascriptconst MAX_MEMORY_PER_ROW = 1024; // Set maximum memory allocation per row in bytes const maxMemory = Math.min(maintenance_work_mem, numberOfRows * MAX_MEMORY_PER_ROW); - 2
Modify HNSW Build Function
Update the HNSW build function to utilize the newly defined maximum memory allocation instead of directly using 'maintenance_work_mem'. This will involve changing the memory allocation logic to reference the calculated 'maxMemory'.
javascriptfunction buildHNSW(data) { const numberOfRows = data.length; const memoryToAllocate = Math.min(maintenance_work_mem, numberOfRows * MAX_MEMORY_PER_ROW); allocateMemory(memoryToAllocate); // Proceed with HNSW build using allocated memory } - 3
Test Memory Allocation Logic
Create unit tests to validate that the memory allocation logic correctly caps the memory usage for various table sizes. Ensure that for small tables, the memory allocated does not exceed the defined limits.
javascriptdescribe('HNSW Memory Allocation', () => { it('should allocate capped memory for small tables', () => { const smallTableRows = 1000; const allocatedMemory = buildHNSW(smallTableRows); expect(allocatedMemory).toBeLessThanOrEqual(smallTableRows * MAX_MEMORY_PER_ROW); }); }); - 4
Update Documentation
Revise the documentation to include the new memory allocation strategy for HNSW builds. Clearly outline the new parameter and how it interacts with 'maintenance_work_mem'. This will help users understand the changes and optimize their configurations.
Validation
To confirm the fix worked, run HNSW builds on tables of varying sizes and monitor memory usage. Ensure that memory allocation does not exceed the defined limits for small tables, and that performance remains optimal for larger tables.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep