How to use BERT for finding similar sentences or similar news?
Problem
I have used BERT NextSentencePredictor to find similar sentences or similar news, However, It's super slow. Even on Tesla V100 which is the fastest GPU till now. It takes around 10secs for a query title with around 3,000 articles. Is there a way to use BERT better for finding similar sentences or similar news given a corpus of news articles?
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Optimize BERT for Faster Similar Sentence Retrieval
The slow performance of BERT NextSentencePredictor in finding similar sentences is primarily due to the model's architecture, which processes each query independently and requires significant computational resources. Additionally, the lack of efficient indexing and retrieval mechanisms for large corpora exacerbates the latency issues.
Awaiting Verification
Be the first to verify this fix
- 1
Use Sentence Embeddings
Instead of using NextSentencePredictor, utilize BERT to generate embeddings for each sentence in your corpus. This allows you to represent sentences as fixed-length vectors, which can be compared more efficiently using cosine similarity.
pythonfrom sentence_transformers import SentenceTransformer model = SentenceTransformer('bert-base-nli-mean-tokens') corpus_embeddings = model.encode(corpus) - 2
Implement Efficient Similarity Search
Use a library like FAISS (Facebook AI Similarity Search) to index the sentence embeddings. FAISS is optimized for fast nearest neighbor search, which will significantly reduce the time taken to find similar sentences.
pythonimport faiss index = faiss.IndexFlatL2(corpus_embeddings.shape[1]) index.add(corpus_embeddings) D, I = index.search(query_embedding, k) - 3
Batch Processing
Process queries in batches instead of one at a time. This can leverage the GPU more effectively and reduce the overhead of multiple calls to the model.
pythonquery_embeddings = model.encode(queries) D, I = index.search(query_embeddings, k) - 4
Use Mixed Precision Training
If you're using PyTorch, enable mixed precision training with NVIDIA's Apex or PyTorch's native AMP. This can speed up inference times and reduce memory usage.
pythonfrom torch.cuda.amp import autocast with autocast(): query_embedding = model.encode(query) - 5
Consider Distillation or Quantization
If latency is still an issue, consider using a distilled version of BERT (like DistilBERT) or quantizing the model to reduce its size and improve inference speed.
pythonfrom transformers import DistilBertModel model = DistilBertModel.from_pretrained('distilbert-base-uncased')
Validation
To confirm the fix worked, measure the time taken for retrieval before and after implementing these changes. The retrieval time should significantly decrease, ideally to under 1 second for the same number of articles.
Sign in to verify this fix
Environment
Submitted by
Alex Chen
2450 rep