This procedure allows word and phrase embeddings from the SentiWordNet lexical resource to be loaded into MLDB.
Using these embeddings, each word or phrase in English is convertible to a 3-dimensional set of coordinates representing sentiment scores: positivity, negativity, objectivity.
This is a simple implementation that does not do word sense disambiguation. SentiWordNet provides sentiment scores for each of WordNet's synsets. For a given word, this implementation does a weighted average of the sentiment scores of each of the word's synsets. This means more weight will be given to the scores of the more likely word sense in general rather than in the current context.
A new procedure of type import.sentiwordnet
named <id>
can be created as follows:
mldb.put("/v1/procedures/"+<id>, {
"type": "import.sentiwordnet",
"params": {
"dataFileUrl": <Url>,
"outputDataset": <OutputDatasetSpec>,
"runOnCreation": <bool>
}
})
with the following key-value definitions for params
:
Field, Type, Default | Description |
---|---|
dataFileUrl | Path to SentiWordNet 3.0 data file |
outputDataset | Output dataset for result |
runOnCreation | If true, the procedure will be run immediately. The response will contain an extra field called |
The dataFileUri
parameter should point to a SentiWordNet 3.0 data file.
It can be obtained on the SentiWordNet website.
The row names will be a word followed by a #
and a one character code indicating the synset type.
The following table shows the synset codes (source):
Code | Name |
---|---|
n | NOUN |
v | VERB |
a | ADJECTIVE |
s | ADJECTIVE SATELLITE |
r | ADVERB |
Assuming the SentiWordNet data is imported in the sentiWordNet table, the following query gets the embedding for the word love in the context of a verb and dog in the context of a noun.
SELECT * FROM sentiWordNet WHERE rowName() IN ('love#v', 'dog#n')
SentiPos | SentiNeg | SentiObj | POS | baseWord |
---|---|---|---|---|
0 | 0.1928374618291855 | 0.8071626424789429 | "n" | "dog" |
0.6249999403953552 | 0.01499999966472387 | 0.3600000143051147 | "v" | "love" |
pooling
function type is used to embed a bag of words in a vector space like SentiWordNet