FG

Joining text files with 600M+ lines

Fresh3 days ago
Mar 15, 20261859 views
Confidence Score0%
0%

Problem

I have two files, and . has around 600M rows and it's 14 GB. Each line has four space separated words (tokens) and finally another space separated column with a number. has 150K rows with a size of ~3M, a space separated word and a number. Both files are sorted using the sort command, with no extra…

Error Output

cat huge.txt|join -o 1.1 1.2 1.3 1.4 2.2 - small.txt > output.txt

join: memory exhausted

Unverified for your environment

Select your OS to check compatibility.

1 Fix

Canonical Fix
Unverified Fix
New Fix – Awaiting Verification

Fix for: Joining text files with 600M+ lines

Low Risk

IMO the best way to do this would be to use the programming/scripting language you know best and: load small.txt into an in-memory hash/map/associative array keyed on the words Process huge.txt line by line, adding the column looked up from the hash…

Awaiting Verification

Be the first to verify this fix

Sign in to verify this fix

Environment