[Senseclusters-users] using sense clusters to find semantic classes
Status: Beta
Brought to you by:
tpederse
|
From: Marco B. <ba...@ei...> - 2004-08-25 14:58:51
|
Dear All, We recently started a project in which we look for clusters of semantically related words in a literary corpus using unsupervised clustering techniques based on co-occurrence of tokens within windows of a certain size. In order to avoid wheel-re-inventions, we thought we could use the SenseClusters package. In particular, we were inspired by the message sent by Amruta a while ago about using SenseClusters for similar purposes. However, after we created our cooccurrence vectors, we got stuck -- we do not know what to do next, and we do not know where to find the relevant documentation. Following Amruta's mail: 1. The N-gram Statistics Package (http://www.d.umn.edu/~tpederse/nsp.html) creates the list of word pairs that co-occur in some window from each other and their association scores. Run programs count.pl, combig.pl and statistics.pl in order ! The output of statistics will be the list of word pairs that co-occur in some window and their association scores as computed by tests like log-likelihood, mutual information, chi-squared test etc. => We did this. 2. Give the output of step 1 to wordvec.pl in SenseClusters Package (http://senseclusters.sourceforge.net/). This program will create a word-by-word association matrix that shows the co-occurrence vector of each word. => We did this, but we started being confused. => We ran wordvec.pl like this: => $ wordvec.pl --wordorder nocare --feats feats.txt --dims dims.txt wordpairs.txt > firsttry.vec The feature and dimension files are created by wordvec.pl, and they are identical, which makes sense, since for now we are simply looking at the cooccurrence of all words with all words. 3. Cluster these word vectors with (give the output of step 2 to) vcluster program in Cluto http://www-users.cs.umn.edu/~karypis/cluto/ to get clusters of words ! => Here is where we get stuck. Given the output of the previous => step, where can we find documentation on a simple way to obtain => clusters from cluto (or from any other package?) Any advice greatly appreciated! Best regards, Marco Baroni & Sara Piccioni SSLMIT, University of Bologna http://sslmit.unibo.it/~baroni http://sslmit.unibo.it/~spiccioni |