Joining text files with 600M+ lines
Fresh3 days ago
Mar 15, 20261859 viewsConfidence Score0%
0%
Problem
I have two files, and . has around 600M rows and it's 14 GB. Each line has four space separated words (tokens) and finally another space separated column with a number. has 150K rows with a size of ~3M, a space separated word and a number. Both files are sorted using the sort command, with no extra…
Error Output
cat huge.txt|join -o 1.1 1.2 1.3 1.4 2.2 - small.txt > output.txt join: memory exhausted
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Canonical Fix
Unverified Fix
New Fix – Awaiting Verification
Fix for: Joining text files with 600M+ lines
Low Risk
IMO the best way to do this would be to use the programming/scripting language you know best and: load small.txt into an in-memory hash/map/associative array keyed on the words Process huge.txt line by line, adding the column looked up from the hash…
Awaiting Verification
Be the first to verify this fix
Sign in to verify this fix