7  Synonyms with POS

7.0.1 Synonym Detection Using Word Embeddings

We can detect synonyms by checking the similarity between word vectors in the embedding space. Here’s an example using the gensim library with the pre-trained Word2Vec model.

7.0.1.1 Steps:

  1. Install necessary libraries:

    pip install gensim spacy
    python -m spacy download en_core_web_sm  # Download the English model for spaCy
  2. Load pre-trained word embeddings and find similar words:

    import gensim.downloader as api
    
    # Load a pre-trained Word2Vec model from Gensim
    model = api.load("word2vec-google-news-300")  # A popular pre-trained word2vec model
    
    # Example word for synonym detection
    word = "happy"
    
    # Get top 5 most similar words to the target word
    similar_words = model.most_similar(word, topn=5)
    
    print(f"Top 5 synonyms for '{word}':")
    for similar_word, similarity_score in similar_words:
        print(f"{similar_word} ({similarity_score})")

This will output the top 5 words that are most similar to “happy” based on their proximity in the embedding space.

Sample Output:

Top 5 synonyms for 'happy':
joyful (0.714)
cheerful (0.701)
content (0.689)
delighted (0.678)
elated (0.665)

To filter out words that share the same part-of-speech (POS) as the target word when performing synonym detection, we need to combine the word embedding approach with POS tagging. This ensures that the similar words returned are not only semantically related but also belong to the same grammatical category (e.g., noun, verb, adjective).

We can achieve this by using a POS tagger from a library like spaCy, which allows us to tag words and filter out only those with the same POS as the target word.

Python Code to Show All POS Tags in spaCy:

import spacy
from spacy.symbols import POS

# Load the spaCy English model
nlp = spacy.load("en_core_web_sm")

# List all available POS tags in spaCy with their explanations
pos_tags = nlp.get_pipe("tagger").labels

print("All available POS tags in spaCy:")
for pos in pos_tags:
    print(f"{pos}: {spacy.explain(pos)}")

Output:

All available POS tags in spaCy:
$: symbol, currency
'': closing quotation mark
,: punctuation mark, comma
-LRB-: left round bracket
-RRB-: right round bracket
.: punctuation mark, sentence closer
:: punctuation mark, colon or ellipsis
ADD: email
AFX: affix
CC: conjunction, coordinating
CD: cardinal number
DT: determiner
EX: existential there
FW: foreign word
HYPH: punctuation mark, hyphen
IN: conjunction, subordinating or preposition
JJ: adjective (English), other noun-modifier (Chinese)
JJR: adjective, comparative
JJS: adjective, superlative
LS: list item marker
MD: verb, modal auxiliary
NFP: superfluous punctuation
NN: noun, singular or mass
NNP: noun, proper singular
NNPS: noun, proper plural
NNS: noun, plural
PDT: predeterminer
POS: possessive ending
PRP: pronoun, personal
PRP$: pronoun, possessive
RB: adverb
RBR: adverb, comparative
RBS: adverb, superlative
RP: adverb, particle
SYM: symbol
TO: infinitival "to"
UH: interjection
VB: verb, base form
VBD: verb, past tense
VBG: verb, gerund or present participle
VBN: verb, past participle
VBP: verb, non-3rd person singular present
VBZ: verb, 3rd person singular present
WDT: wh-determiner
WP: wh-pronoun, personal
WP$: wh-pronoun, possessive
WRB: wh-adverb
XX: unknown
_SP: whitespace
``: opening quotation mark

Here’s the revised version of the code, where the “word” and “pos” assignment are handled separately.

7.0.1.2 Revised Python Code:

import gensim.downloader as api
import spacy

# Load pre-trained Word2Vec model from Gensim
model = api.load("word2vec-google-news-300")

# Load spaCy POS tagger
nlp = spacy.load("en_core_web_sm")
# Define a function to get the POS tag of a word
def get_pos(word):
    doc = nlp(word)
    return doc[0].pos_  # Returns the POS tag of the word

# Function to find synonyms with the same POS
def find_synonyms_with_same_pos(word, topn=10):
    try:
        # Get the POS of the target word
        word_pos = get_pos(word)

        # Get the most similar words from the model
        similar_words = model.most_similar(word, topn=topn)

        # Filter similar words by POS tag
        filtered_words = [
            (w, sim) for w, sim in similar_words if get_pos(w) == word_pos
        ]

        return filtered_words
    except KeyError:
        print(f"Word '{word}' not found in the model vocabulary.")
        return []

Separate input box for word and POS tagging

word = "happy"  # Define the target word
pos = get_pos(word)  # Get the POS tag for the target word

# Find synonyms with the same POS
synonyms_with_same_pos = find_synonyms_with_same_pos(word, topn=10)

# Output the result
print(f"Synonyms for '{word}' with the same POS ({pos}):")
for synonym, similarity in synonyms_with_same_pos:
    print(f"{synonym} ({similarity})")

Example Output:

For word = "happy", the output will be something like:

Synonyms for 'happy' with the same POS (ADJ):
joyful (0.714)
cheerful (0.701)
delighted (0.678)
content (0.689)
ecstatic (0.662)