WordNet and Wikipedia are often utilized in text mining algorithms for enriching short text representation [1] or for extracting additional knowledge about words. [2] WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.[3] In this post we will look how to pull information from WordNet using python. Also we will look how to build graph for relations between words using python and NetworkX.
WordNet groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. [4]
Here is how to get all synsets for the word ‘good’ using NLTK package:
from nltk.corpus import wordnet as wn
print (wn.synsets('good'))
#This is the output of above line:
#[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]
All synsets are connected to other synsets by means of semantic relations. These relations, which are not all shared by all lexical categories, include:
hypernyms: Y is a hypernym of X if every X is a (kind of) Y (canine is a hypernym of dog)
hyponyms: Y is a hyponym of X if every Y is a (kind of) X (dog is a hyponym of canine)
meronym: Y is a meronym of X if Y is a part of X (window is a meronym of building)
holonym: Y is a holonym of X if X is a part of Y (building is a holonym of window) [4]
Here is how can we can get hypernyms and hyponyms from WordNet.
car = wn.synset(‘car.n.01’)
print (“HYPERNYMS”)
print (car.hypernyms())
print (“HYPONYMS”)
print (car.hyponyms())
Here is the output from above code:
HYPERNYMS
[Synset(‘motor_vehicle.n.01’)]
HYPONYMS
[Synset(‘ambulance.n.01’), Synset(‘beach_wagon.n.01’), Synset(‘bus.n.04’), Synset(‘cab.n.03’), Synset(‘compact.n.03’), Synset(‘convertible.n.01’), Synset(‘coupe.n.01’), Synset(‘cruiser.n.01’), Synset(‘electric.n.01’), Synset(‘gas_guzzler.n.01’), Synset(‘hardtop.n.01’), Synset(‘hatchback.n.01’), Synset(‘horseless_carriage.n.01’), Synset(‘hot_rod.n.01’), Synset(‘jeep.n.01’), Synset(‘limousine.n.01’), Synset(‘loaner.n.02’), Synset(‘minicar.n.01’), Synset(‘minivan.n.01’), Synset(‘model_t.n.01’), Synset(‘pace_car.n.01’), Synset(‘racer.n.02’), Synset(‘roadster.n.01’), Synset(‘sedan.n.01’), Synset(‘sport_utility.n.01’), Synset(‘sports_car.n.01’), Synset(‘stanley_steamer.n.01’), Synset(‘stock_car.n.01’), Synset(‘subcompact.n.01’), Synset(‘touring_car.n.01’), Synset(‘used-car.n.01’)]
Here is how to get synonyms, antonyms , lemmas and similarity: [5]
synonyms = []
antonyms = []
for syn in wn.synsets("good"):
for l in syn.lemmas():
synonyms.append(l.name())
if l.antonyms():
antonyms.append(l.antonyms()[0].name())
print(set(synonyms))
print(set(antonyms))
print (syn.lemmas())
w1 = wn.synset('ship.n.01')
w2 = wn.synset('cat.n.01')
print(w1.wup_similarity(w2))
Here is how we can use Textblob package [6] and represent some word relations via graph. The output graph is shown below.
from textblob import Word
word = Word("plant")
print (word.synsets[:5])
print (word.definitions[:5])
word = Word("computer")
for syn in word.synsets:
for l in syn.lemma_names():
synonyms.append(l)
import networkx as nx
import matplotlib.pyplot as plt
G=nx.Graph()
w=word.synsets[1]
G.add_node(w.name())
for h in w.hypernyms():
print (h)
G.add_node(h.name())
G.add_edge(w.name(),h.name())
for h in w.hyponyms():
print (h)
G.add_node(h.name())
G.add_edge(w.name(),h.name())
print (G.nodes(data=True))
plt.show()
nx.draw(G, width=2, with_labels=True)
plt.savefig("path.png")
Here is the full source code
from nltk.corpus import wordnet as wn
print (wn.synsets('good'))
car = wn.synset('car.n.01')
print ("HYPERNYMS")
print (car.hypernyms())
print ("HYPONYMS")
print (car.hyponyms())
synonyms = []
antonyms = []
for syn in wn.synsets("good"):
for l in syn.lemmas():
synonyms.append(l.name())
if l.antonyms():
antonyms.append(l.antonyms()[0].name())
print(set(synonyms))
print(set(antonyms))
print (syn.lemmas())
w1 = wn.synset('ship.n.01')
w2 = wn.synset('cat.n.01')
print(w1.wup_similarity(w2))
from textblob import Word
word = Word("plant")
print (word.synsets[:5])
print (word.definitions[:5])
word = Word("computer")
for syn in word.synsets:
for l in syn.lemma_names():
synonyms.append(l)
import networkx as nx
import matplotlib.pyplot as plt
G=nx.Graph()
w=word.synsets[1]
G.add_node(w.name())
for h in w.hypernyms():
print (h)
G.add_node(h.name())
G.add_edge(w.name(),h.name())
for h in w.hyponyms():
print (h)
G.add_node(h.name())
G.add_edge(w.name(),h.name())
print (G.nodes(data=True))
plt.show()
nx.draw(G, width=2, with_labels=True)
plt.savefig("path.png")
References
1. Enriching short text representation in microblog for clustering
2. Automatic Topic Hierarchy Generation Using WordNet
3. WordNet
4. WordNet
5. WordNet NLTK Tutorial
6. Tutorial: What is WordNet? A Conceptual Introduction Using Python