Semantic Search vs Keyword Search
Embeddings, Relevance, and Hybrid Retrieval
Related Tools
Related Labs
Search systems help people find useful information in huge collections of text, images, and data. Two major approaches are keyword search and semantic search. Keyword search looks for exact words or close matches, while semantic search tries to understand the meaning behind a query. This difference matters because users often describe the same idea with different words.
Keyword search is fast and effective when the query contains the exact terms stored in documents. Semantic search uses language models, embeddings, or knowledge graphs to compare ideas based on context and similarity rather than only word overlap. In practice, many modern systems combine both methods to improve relevance, speed, and accuracy. Understanding the strengths and limits of each approach helps students see how search engines, chatbots, and recommendation tools work.
Key Facts
- Keyword search ranks results mainly by term matching, frequency, and metadata such as title or tags.
- A common keyword scoring idea is TF-IDF, where score increases with term frequency and decreases with how common the term is across documents.
- Semantic search often represents text as vectors and compares them with cosine similarity: cos(theta) = (A·B) / (|A||B|).
- In vector search, documents with embeddings closer to the query embedding are treated as more semantically related.
- Keyword search works well for exact names, codes, and quoted phrases such as product IDs or error messages.
- Hybrid search combines lexical and semantic signals, often using final_score = a(keyword_score) + b(semantic_score).
Vocabulary
- Keyword search
- A search method that finds results by matching the exact words or phrases typed by the user.
- Semantic search
- A search method that tries to match the meaning and context of a query rather than only the exact words.
- Embedding
- A numerical vector representation of text, image, or other data that captures patterns of meaning.
- Cosine similarity
- A measure of how similar two vectors are based on the angle between them, often used in semantic search.
- Relevance ranking
- The process of ordering search results so the most useful or related items appear first.
Common Mistakes to Avoid
- Assuming keyword search understands intent, which is wrong because it mainly reacts to literal word matches and may miss synonyms or paraphrases.
- Thinking semantic search ignores exact terms, which is wrong because many systems still benefit from exact matches for names, formulas, codes, and rare terms.
- Believing a higher similarity score always means a correct answer, which is wrong because semantically close text can still be off topic or factually incorrect.
- Using only one search method for every task, which is wrong because exact lookup tasks and meaning-based discovery often need different ranking signals or a hybrid system.
Practice Questions
- 1 A keyword system returns a score based on the number of exact query word matches. Query: renewable energy storage. Document A contains all 3 words, Document B contains 2 of the 3 words, and Document C contains 1 of the 3 words. Rank the documents from highest to lowest keyword score.
- 2 A semantic system compares query and document embeddings using cosine similarity. If the query vector is Q = (1, 2), Document A is A = (2, 4), and Document B is B = (2, 0), compute cos(theta) for Q with A and Q with B, then decide which document is more semantically similar.
- 3 A user searches for car repair tips, but the best article uses the phrase automobile maintenance advice and never uses the word car. Explain which search approach is more likely to find the article and why.