<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>https://wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Lecture_6._Synonyms_and_near-synonyms_detection</id>
	<title>Lecture 6. Synonyms and near-synonyms detection - История изменений</title>
	<link rel="self" type="application/atom+xml" href="https://wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Lecture_6._Synonyms_and_near-synonyms_detection"/>
	<link rel="alternate" type="text/html" href="https://wikicshse.ru/index.php?title=Lecture_6._Synonyms_and_near-synonyms_detection&amp;action=history"/>
	<updated>2026-06-06T11:08:50Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://wikicshse.ru/index.php?title=Lecture_6._Synonyms_and_near-synonyms_detection&amp;diff=433&amp;oldid=prev</id>
		<title>imported&gt;Katya: Migrated current public revision from wiki.cs.hse.ru</title>
		<link rel="alternate" type="text/html" href="https://wikicshse.ru/index.php?title=Lecture_6._Synonyms_and_near-synonyms_detection&amp;diff=433&amp;oldid=prev"/>
		<updated>2016-11-05T19:49:40Z</updated>

		<summary type="html">&lt;p&gt;Migrated current public revision from wiki.cs.hse.ru&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Ekaterina Chernyak, Dmitry Ilvovsky&lt;br /&gt;
&lt;br /&gt;
== Examples ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Synonyms&amp;#039;&amp;#039;&amp;#039;: Netherlands and Holland, buy and purchase&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Near synonyms&amp;#039;&amp;#039;&amp;#039;: pants, trousers and slacks, mistake and error&lt;br /&gt;
&lt;br /&gt;
== Approaches to synonyms and near-synonyms detection ==&lt;br /&gt;
&lt;br /&gt;
* Thesaurus-based approach&lt;br /&gt;
* Distributional semantics&lt;br /&gt;
* Context-based approach&lt;br /&gt;
* word2vec&lt;br /&gt;
* Web search-based approach&lt;br /&gt;
&lt;br /&gt;
=== Synonyms in WordNet ===&lt;br /&gt;
&lt;br /&gt;
Given a word, look for synonyms in every synset.&lt;br /&gt;
&lt;br /&gt;
==== WordNet NLTK interface ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
In[1]: for i,j in enumerate(wn.synsets(&amp;#039;error&amp;#039;)):&lt;br /&gt;
&lt;br /&gt;
In[2]: print &amp;quot;Meaning&amp;quot;,i, &amp;quot;NLTK ID:&amp;quot;, j.name()&lt;br /&gt;
&lt;br /&gt;
In[3]: print &amp;quot;Definition:&amp;quot;,j.definition()&lt;br /&gt;
&lt;br /&gt;
In[4]: print &amp;quot;Synonyms:&amp;quot;, &amp;quot;, &amp;quot;.join(j.lemma names())&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Wordnet Web interface: [http://wordnetweb.princeton.edu/perl/webwn|http://wordnetweb.princeton.edu/perl/webwn]&lt;br /&gt;
&lt;br /&gt;
=== Distributional semantics ===&lt;br /&gt;
&lt;br /&gt;
[[Файл:L6p1.jpg|500px|слева]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &lt;br /&gt;
 &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
==== Exercise 6.1 ==== &lt;br /&gt;
&lt;br /&gt;
Calculate PPMI for Table 1.&lt;br /&gt;
&lt;br /&gt;
==== Exercise 6.2 ==== &lt;br /&gt;
&lt;br /&gt;
Input: def.txt or your own text&lt;br /&gt;
&lt;br /&gt;
Output 1: term-context matrix&lt;br /&gt;
&lt;br /&gt;
Output 2: term-term similarity matrix (use cosine similarity)&lt;br /&gt;
&lt;br /&gt;
Output 3: 2D visualization by means of LSA&lt;br /&gt;
&lt;br /&gt;
Hint: use cfd = nltk.ConditionalFreqDist((term, context) for ...) for computing conditional frequency dictionary&lt;br /&gt;
&lt;br /&gt;
Hint: use R for SVD and visualization&lt;br /&gt;
&lt;br /&gt;
=== word2vec [Mikolov, Chen, Corrado, Dean, 2013] ===&lt;br /&gt;
&lt;br /&gt;
Very complex machine learning (deep learning) applied to term-context matrices. &lt;br /&gt;
&lt;br /&gt;
There are two regimes:&lt;br /&gt;
* CBOW predicts the current word based on the context&lt;br /&gt;
* Skip-gram predicts surrounding words given the current word&lt;br /&gt;
&lt;br /&gt;
word2vec project page: [https://code.google.com/p/word2vec/|https://code.google.com/p/word2vec/]&lt;br /&gt;
demo: [http://rare-technologies.com/word2vec-tutorial/|http://rare-technologies.com/word2vec-tutorial/]&lt;br /&gt;
&lt;br /&gt;
Example: vec(Madrid) - vec(Spain) + vec(France) = vec(Paris)&lt;br /&gt;
&lt;br /&gt;
=== Context-based approach (1) [Lin, 1998] ===&lt;br /&gt;
&lt;br /&gt;
==== Dependency triple [Lin, 1998] ====&lt;br /&gt;
&lt;br /&gt;
A dependency triple (w, r, w&amp;#039;) consists of two words and the grammatical relationship between them in the input sentence. &lt;br /&gt;
&lt;br /&gt;
I have a brown dog: (have subj I), (I subj-of have), (dog obj-of have), (dog adj-mod brown), (brown adj-mod-of dog), (dog det a), (a det-of dog)&lt;br /&gt;
&lt;br /&gt;
||w, r, w&amp;#039;|| — frequency of (w, r, w&amp;#039;)&lt;br /&gt;
&lt;br /&gt;
||w, r, * || — total occurrences of w-r relationships&lt;br /&gt;
&lt;br /&gt;
|| *, *, *|| — total number of dependency triples&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Mutual information between w, w&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
[[Файл:L6p2.jpg|300px|слева]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Файл:L6p3.jpg|450px|слева]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt;   &amp;lt;br&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Results: &amp;#039;&amp;#039;&amp;#039;brief(noun)&amp;#039;&amp;#039;&amp;#039; — affidavit 0.13, petition 0.05, memo-randum 0.05, motion 0.05, lawsuit 0.05, depo-sition 0.05, slight 0.05, prospectus 0.04, docu-ment 0.04 paper 0.04 &lt;br /&gt;
* some sort of dependency parsing is required&lt;br /&gt;
* no difference between synonyms and antonyms (win / loose the game)&lt;br /&gt;
&lt;br /&gt;
=== Web or corpus search approach ===&lt;br /&gt;
&lt;br /&gt;
==== Hearst patterns [Hearst, 1998] ====&lt;br /&gt;
&lt;br /&gt;
Lexico-syntactic patterns to recognize hyponymy:&lt;br /&gt;
* such NP as NP, NP and / or NP;&lt;br /&gt;
* NP such as NP, NP and / or NP;&lt;br /&gt;
* NP, NP or other NP;&lt;br /&gt;
* NP, NP and other NP;&lt;br /&gt;
* NP, including NP, NP and / or NP;&lt;br /&gt;
* NP, especially NP, NP and / or NP;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Text &amp;amp;rArr; pattern &amp;amp;rArr; concordance&lt;br /&gt;
&lt;br /&gt;
To rank concordances: PatternSim [Panchenko, Morozova, Naets]&lt;br /&gt;
&lt;br /&gt;
Concordance ranking:&lt;br /&gt;
&lt;br /&gt;
Input: Terms C, Corpus D&lt;br /&gt;
&lt;br /&gt;
Output: Similarity matrix S(C x C)&lt;br /&gt;
 &lt;br /&gt;
K &amp;amp;#8592;  extract concord(D);&lt;br /&gt;
&lt;br /&gt;
K&amp;lt;sub&amp;gt;lem &amp;lt;/sub&amp;gt; &amp;amp;#8592; lemmatize concord(K);&lt;br /&gt;
&lt;br /&gt;
KC &amp;amp;#8592; filter concord(Klem; C);&lt;br /&gt;
&lt;br /&gt;
S &amp;amp;#8592; get extractionfreq(C;K);&lt;br /&gt;
&lt;br /&gt;
S &amp;amp;#8592; rerank(S; C;D);&lt;br /&gt;
&lt;br /&gt;
S &amp;amp;#8592; normalize(S);&lt;br /&gt;
&lt;br /&gt;
return S&lt;br /&gt;
&lt;br /&gt;
Example ranking &amp;#039;&amp;#039;&amp;#039;Efreq-Cfreq&amp;#039;&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
[[Файл:L6p4.jpg|250px|слева]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &lt;br /&gt;
&lt;br /&gt;
P(w&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;)  — frequency of (w&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;) P(w&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;, w&amp;lt;sub&amp;gt;j&amp;lt;/sub&amp;gt;) — extraction probability of (w&amp;lt;sub&amp;gt;i&amp;lt;/sub&amp;gt;,w&amp;lt;sub&amp;gt;j&amp;lt;/sub&amp;gt;)&lt;br /&gt;
demo: [http://serelex.cental.be/|http://serelex.cental.be/]&lt;/div&gt;</summary>
		<author><name>imported&gt;Katya</name></author>
	</entry>
</feed>