senseclusters-news Mailing List for SenseClusters
Status: Beta
Brought to you by:
tpederse
You can subscribe to this list here.
| 2004 |
Jan
(3) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(1) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2005 |
Jan
(2) |
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2006 |
Jan
(1) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(2) |
Jun
(1) |
Jul
(3) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2008 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Ted P. <dul...@gm...> - 2015-10-03 22:35:02
|
We are pleased to announce a new release of SenseClusters. This is a very minor bug fix release, but might be something you want to adopt since it will eliminate some annoying warning messages that appear as of Perl v 5.15. In that version of Perl the use of defined (@array) has been deprecated, and so we have a few spots where we were using this and where it now causes warnings. That has been resolved, so as of version 1.05 of SenseClusters you should not see these warnings. More about defined (@array) can be found here, if you are interested : http://www.perlmonks.org/index.pl?node_id=1077762 You can download the most current version of SenseClusters from CPAN or Sourceforge by following the links here : http://senseclusters.sourceforge.net Please let us know if any questions arise. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2015-05-25 14:55:38
|
Greetings all, I just wanted to mention that SenseClusters participated in Task 15 of SemEval 2015, and I'll be presenting a poster about that on Thur June 4 from 2:45 - 4:00 pm (as a part of the SemEval workshop). Here's a bit more about the task: http://alt.qcri.org/semeval2015/task15/ And here's the schedule for Semeval : http://alt.qcri.org/semeval2015/cdrom/program.html Finally, here's the paper that describes the SenseClusters system : http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval076.pdf So, if you will be in Denver it would be nice to see you at the poster session or some other time! Cordially, Ted |
|
From: Ted P. <tpederse@d.umn.edu> - 2013-06-30 03:11:34
|
We are pleased to announce the release of version 1.03 of SenseClusters. This is the first new release in 5 years, and should be the first of several upcoming releases. There has been a little bit of clean up in the test scripts and other places, but the main new functionality are some additional ways of labeling the discovered clusters. Before this version clusters have been labeled with significant bigrams - as of version 1.03 it is now possible to label clusters with trigrams or 4-grams. Additional functionality related to cluster labeling is expected to be released in the coming months, so please give this a try and let us know of any suggestions or observations you might have. The changes in this version are enumerated below. You can download from CPAN or sourceforge via the links provided here : http://senseclusters.sourceforge.net 1.03 Released June 29, 2013 (changes by TDP and AMJ) Modify install.sh to default to Linux-x86_64 for Cluto installation (TDP) Removed various instances of if (defined %hash) in preprocess/sval2 in favor of if (%hash) - defined %hash is now deprecated - however left that in keyconvert.pl as removed caused syntax issue that should be checked out further (TDP) Fixed Testing/ALL-TESTS.sh to run all test cases by enumerating in for loop - previous method of using wild card did not seem to be running all cases (TDP) Fixed some test cases for clusterstopping in Testing - note that we still have Sun test cases included although no Sun platform to test on. Should keep those though as cluto still comes with a Sun version (TDP) Added the flag "ngram" for clusterlabeling.pl. It will allow user to provide the value for ngram. The features selection while creating the labels of cluster will be based on this parameter. (AMJ) Added --label_ngram option to discriminate.pl to support new --ngram option in clusterlabeling.pl (AMJ) Added test cases testA6 and testA7 to test changes in clusterlabeling. (AMJ) Updated INSTALL to mention depencies on csh and using bash as the system shell (TDP) Please let us know of any questions, problems, or suggestions! Enjoy, Ted and Anand -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <tpederse@d.umn.edu> - 2013-06-06 13:16:13
|
SenseClusters participated (yet again) in a SemEval task this year. The paper describing the system and a little bit about the task is available here : http://www.d.umn.edu/~tpederse/Pubs/pedersen-semeval-2013.pdf And I will present a poster next Friday (June 14) in Atlanta at the SemEval workshop. I participated in task 11, which has its poster session at 3:30 I believe... http://www.cs.york.ac.uk/semeval-2013/index.php?id=schedule I do hope to see you there! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2008-04-06 22:02:16
|
We are pleased to announce the release of SenseClusters version 1.01. This is the first new release since August 2006 and there are some significant changes to the package, so please do upgrade as the opportunity arises. The most visible change in this release is the move to CPAN. SenseClusters is now available via the CPAN archives at Text-SenseClusters, which opens up the possibility of doing more automated installs via CPAN, which we have taken advantage of by creating a Bundled install for SenseClusters. We hope that installation is now much easier if you choose to use CPAN. Please check out the INSTALL instructions for more details... In addition. CPAN offers a very nice visual presentation of our documentation and code, so we hope that encourages you to read and comment on what we have out there. So, please do check out the new look of SenseClusters at CPAN : http://search.cpan.org/dist/Text-SenseClusters (Note that we will continue to use sourceforge as a distribution site as well, and will continue to use sourceforge CVS). We have tried to clean up some of the documentation and also make the testing process a little more reliable, both in terms of running the test scripts and especially with respect to svdpackout.pl. While we are currently evaluating the possibility of moving SVD processing to SVDLIBC, we are still using SVDPACKC as of 1.01. We have made some very significant changes to how SVD output is processed (in svdpackout.pl) so if you use SVD please do check those out. There are options with svdpackout.pl that will make it backwards compatible, but the defaults have changed now so SVD processing will be somewhat different (and we hope more standard and reliable as a result). Please do review the CHANGE log for more details about this new release. The web interface has been updated to 1.01 as you can try it out there if you prefer. Find more info and links to both the CPAN and sourceforge download sites, please visit : http://senseclusters.sourceforge.net Finally, this release marks the start of a new push with respect to SenseClusters development, so it's a very good time to make comments and requests. Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <tpederse@d.umn.edu> - 2008-03-30 15:07:05
|
Greetings all... I've released version 1.00 of SenseClusters, now available on CPAN at http://search.cpan.org/dist/Text-SenseClusters This release includes revised INSTALL instructions (see below :) and also some fairly significant clean up to Toolkit program documentation. I am in the process of trying to provide more examples in the SYNOPSIS sections, and then just making misc changes to enforce consistency among the programs. This is still a development release however (as demonstrated by the even number of the release). So, please do use with caution. 0.95 remains the most current "stable" release. I anticipate at least one more development release before having a stable release. I plan to experiment with using SVDLIBC rather than SVDPACKC, which really does seem (from my perspective at least) to be incompatible with gcc version 4.0.0 or better, which will become an increasingly difficult problem to deal with. I'm also continuing to work on improving the installation procedures, making them as automatic as possible. I'm also going to try and provide at least a few test cases that use "make test" so that we can expand our testing efforts in that direction, which is generally more compatible with CPAN releases (and easier for the user to do too). My hope is that as of version 1.00, SenseClusters is now backwards compatible to Perl 5.6.2, and that SenseClusters and all the dependent CPAN modules can be installed via the Bundle that has also been provided on CPAN. (http://search.cpan.org/dist/Bundle-Text-SenseClusters) So, if you have the opportunity please do check out the new release, and any and all comments are most welcome, and are especially timely now. Cordially, Ted On Sat, Mar 29, 2008 at 9:45 AM, Ted Pedersen <tpederse@d.umn.edu> wrote: > Hi Teshome, > > The commands in that version of the INSTALL documenation are out of > order, unfortunately. Sorry about that, I am fixing that today. You > should first run the perl -MCPAN command to get SenseClusters and the > CPAN components, and then you can do the External install. > > Here's a preview of the new instructions - note that you'll need to > locate your .cpan directory to find the sources. That's usually in > your home directory as shown below (or the root home). > > NAME > INSTALL Installation instructions for SenseClusters > > SYNOPSIS > If you have su or sudo access, you should be able to install and test > the installation of SenseClusters via automatic download from CPAN as > follows: > > # install SenseClusters and all dependent CPAN modules > perl -MCPAN -e 'install Bundle::Text::SenseClusters'; > > # install cluto and SVDPACKC (included in SenseClusters) > cd ~/.cpan/build/Text-SenseClusters-[insert_version] > cd External > csh ./install.sh /usr/local/bin > cd ~ > > # run SC test cases (note that location of cpan build > # directory might vary on your system. > > cd ~/.cpan/build/Text-SenseClusters-[insert_version] > cd Testing > csh ./ALL-TESTS.sh > cd ~ > > This assumes that /usr/local/bin is in your PATH and is your preferred > location for user installed executable scripts. If it is not, substitute > your perferred directory here. > > Hope this helps, > Ted > > > > On Sat, Mar 29, 2008 at 2:23 AM, Teshome Kassie <tk...@ya...> wrote: > > Hello Ted; > > I coudn't use your SenseClusters in doing my thesis. The reason is that the > > problem of installing it to my PC. I have seen from the internet Bundle Text > > SenseClusters from the web. As I understood the whole requirement is bundled > > with the above to install it. From installation instruction, It says before > > installing Bundle Text SenseClusters it is necessary to install external > > packages CLUTO & SVDPACKC which could be installed by the following script > > which is provided: > > cd External > > csh ./ALL-TESTS.sh INSTALLDIR > > cd .. > > But I couldn't get the script to use for installing the external packages. > > So could you help me to get it. > > In Addition, Please instruct me in detail how to install all the required > > components of SenseClusters in order to use it for the language to apply for > > sense discrimination in a corpus of specific domain. > > Teshome. > > > > > > > > Ted Pedersen <tpederse@d.umn.edu> wrote: > > Hi Teshome, > > > > I'm afraid I'm not sure what the problem is here. PDL is supported by > > another group, so perhaps you could contact them and ask about the > > error. You can find their mailing list at : > > > > http://pdl.perl.org/maillists/ > > > > I am currently using PDL 2.4.1 which is a few versions behind 2.4.3, > > but I wouldn't think there would be that much difference between them. > > This is also the version being used for the SenseClusters web > > interface. > > > > Good luck! > > Ted > > > > > > On Dec 5, 2007 10:17 AM, Teshome Kassie wrote: > > > Dear Sir; > > > > > > I tried to install PDL on my machine so many times according to your > > > instruction to use SenseClusters. The commands I used are as follows with > > > csh prompt: > > > > > > perl -MCPAN -e shell > > > cpan> install PDL > > > then I followed with answering to install for dependencies accordingly. > > > finally I end up with the following error. > > > XXXXXXXXXX Processing gl.h > > > Running cpp on /usr/include/GL/gl.h > > > *** CPP command: gcc -E -P -DGL_MESA_program_debug=0 -D_REENTRANT > > > -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -I/usr/include/gdbm -D_REENTRANT -D_GNU_SOURCE > > > -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 > > > -I/usr/include/gdbm -D_LANGUAGE_C -DAPIENTRY='' tmp_gl.h | > > > open_fencestr = '11HdyTbIVg6s'; close_fencestr = '23Cnba1nbf31' > > > rawfile has 1807 lines... > > > SUB CPP: Returning 1503 lines... > > > XXXXXXXXXX Processing glx.h > > > Running cpp on /usr/include/GL/glx.h > > > *** CPP command: gcc -E -P -DGL_MESA_program_debug=0 -D_REENTRANT > > > -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -I/usr/include/gdbm -D_REENTRANT -D_GNU_SOURCE > > > -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 > > > -I/usr/include/gdbm -D_LANGUAGE_C -DAPIENTRY='' tmp_glx.h | > > > open_fencestr = '11HdyTbIVg6s'; close_fencestr = '23Cnba1nbf31' > > > rawfile has 5970 lines... > > > SUB CPP: Returning 225 lines... > > > XXXXXXXXXX Processing glu.h > > > Running cpp on /usr/include/GL/glu.h > > > *** CPP command: gcc -E -P -DGL_MESA_program_debug=0 -D_REENTRANT > > > -D_GNU_SOURCE -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -I/usr/include/gdbm -D_REENTRANT -D_GNU_SOURCE > > > -fno-strict-aliasing -pipe -Wdeclaration-after-statement > > > -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 > > > -I/usr/include/gdbm -D_LANGUAGE_C -DAPIENTRY='' tmp_glu.h | > > > open_fencestr = '11HdyTbIVg6s'; close_fencestr = '23Cnba1nbf31' > > > rawfile has 1874 lines... > > > SUB CPP: Returning 137 lines... > > > cp OpenGL.pm ../../../blib/lib/PDL/Graphics/OpenGL.pm > > > /usr/bin/perl /usr/lib/perl5/5.8.8/ExtUtils/xsubpp -typemap > > > /usr/lib/perl5/5.8.8/ExtUtils/typemap -typemap > > > /root/.cpan/build/PDL-2.4.3/Basic/Core/typemap.pdl OpenGL.xs > OpenGL.xsc > > > && mv OpenGL.xsc OpenGL.c > > > Error: No OUTPUT definition for type 'GLvoid', typekind 'T_VOID' found in > > > OpenGL.xs, line 7547 > > > make[3]: *** [OpenGL.c] Error 1 > > > make[3]: Leaving directory > > > `/root/.cpan/build/PDL-2.4.3/Graphics/TriD/OpenGL' > > > make[2]: *** [subdirs] Error 2 > > > make[2]: Leaving directory `/root/.cpan/build/PDL-2.4.3/Graphics/TriD' > > > make[1]: *** [subdirs] Error 2 > > > make[1]: Leaving directory `/root/.cpan/build/PDL-2.4.3/Graphics' > > > make: *** [subdirs] Error 2 > > > /usr/bin/make -- NOT OK > > > Running make test > > > Can't test without successful make > > > Running make install > > > make had returned bad status, install seems impossible > > > > > > cpan> > > > > > > Any help to find out what the error is? > > > > > > Can I use to install locally by downloading PDL, which version ? > > > > > > With Regards; > > > > > > Teshome > > > > > > > > > > > > ________________________________ > > > Looking for last minute shopping deals? Find them fast with Yahoo! Search. > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse > > > > > > > > ________________________________ > > You rock. That's why Blockbuster's offering you one month of Blockbuster > > Total Access, No Cost. > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-08-26 17:46:49
|
We are pleased at announce the release of SenseClusters version 0.95. SenseClusters is a freely available package that allows you to cluster similar contexts, or to cluster words that occur in similar contexts. It is fully unsupervised, and can automatically discover the optimal number of clusters in your text. As of version 0.95, we now fully support Latent Semantic Analysis for context and word clustering, and we continue to improve the native SenseClusters methods, which includes the ability to cluster first and second order representations of context. SenseClusters can be downloaded from : http://senseclusters.sourceforge.net/ You can also try out SenseClusters via our web interface: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi In both native and LSA modes, SenseClusters relies on lexical features (such as unigrams, bigrams, and co--occurrences) that can be identified in raw text. The tokenization is very flexible - a user can define this via Perl regular expressions - so it is possible to work with many other languages besides English, and you can easily work with tokenization schemes other than white-space separated words, such as character based tokens, like 2 letter sequences, etc. The native SenseClusters methods support traditional first order context clustering, where you identify a feature set, and then determine which of those features occur in the contexts you are clustering. The native methods also support second order context clustering, where each word is represented by a vector of the words with which it co-occurs. All the words in a context to be clustered are replaced by their associated vectors, and these vectors are averaged together to represent that context. Note that you can also cluster the word vectors to identify sets of related words. Latent Semantic Analysis differs from the native SenseClusters methods in that each feature is represented by a vector that shows the contexts in which that feature occurs. Then, all the features in a context to be clustered are replaced by their associated vectors, and these are averaged together to represent the context. Note that you can also cluster the feature vectors directly to identify sets of related features. This release represents a major step forward in the functionality of SenseClusters. Much of work in providing LSA support was carried out by Mahesh Joshi this past spring and summer. And has always been the case over the last two years, Anagha Kulkarni played a large role in this release, and she has included many improvements to automated cluster stopping and other areas in 0.95. Please give this a try, and let us know if you have any comments or questions! If you aren't certain if your problem can be approached using SenseClusters, please let us know what you would like to do and maybe we can help you get started. Cordially, Ted, Anagha, and Mahesh ==================================================================== ChangeLog: http://www.d.umn.edu/~tpederse/Code/Changelog.SenseClusters-v0.95.txt Installation Instructions: http://www.d.umn.edu/~tpederse/Code/SenseClusters-v0.95-INSTALL.txt Related Publications (includes links to data you can use): http://www.d.umn.edu/~tpederse/senseclusters-pubs.html -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-07-15 05:03:06
|
Greetings all, I am pleased to report that Anagha has finished her MS thesis, which means she is now officially a Master of Science! :) Congratulations on a job very well done! Her thesis is entitled: Unsupervised Context Discrimination and Cluster Stopping and is available from : http://www.d.umn.edu/~tpederse/senseclusters-pubs.html or http://www.d.umn.edu/~tpederse/masters.html This is the most complete (and best) description of the automatic cluster stopping methods that are now available in SenseClusters. It also contains a great deal of other significant content, including a new and impressive set of experiments on newsgroup data, name conflate data, word sense data, and manually annotated web search data! (All of this data is available at http://www.d.umn.edu/~tpederse/Data/anagha-thesis-data.zip btw). So, please do check this out, and also join me in wishing Anagha well as she finishes her work here at UMD, and prepares to move on to CMU!! Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-07-12 15:22:30
|
Greetings all, I wanted to mention that there will be two SenseClusters related events at AAAI in Boston next week. First, I will be presenting a tutorial called "Language Independent Methods of Clustering Similar Contexts (with applications)" that will take place on Monday July 17, from 2-6 pm. This is meant to be a general overview of the methodology that underlies SenseClusters. You can see the material from this tutorial (and previous ones) at: http://www.d.umn.edu/~tpederse/SCTutorial.html Second, Anagha Kulkarni will be presenting a poster entitled "How many different "John Smiths", and who are they?" which is all about name discrimination and how we have tackeled that with SenseClusters. The poster will be presented on Wednesday evening, July 19, as a part of the demo/poster session. Here is the paper that accompanies the poster : http://www.d.umn.edu/~tpederse/Pubs/aaai06-anagha-poster.pdf So, if you are in Boston for AAAI, please do check these out, and stop by and say hi! Cordially, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-07-08 05:55:49
|
We are very pleased to announce the release of SenseClusters version 0.93. This version marks our first steps towards supporting Latent Semantic Analysis in addition to our native SenseClusters methods. In this version we now support word clustering (feature clustering really, as it is not limited to just unigrams or single words) that is based on a feature by context representation. In other words, features are clustered based on the contexts in which they occur. These matrices can optionally be reduced with SVD prior to clustering. We refer to this as LSA feature clustering. These feature by context representations are what we believe characterizes LSA, and makes it different from our native SenseClusters methods. We have supported a form of word clustering prior to this release, and it is based on a word by word representation, that is words are clustered based on the words with which they occur. You can download version 0.93 from sourceforge: http://sourceforge.net/projects/senseclusters/ As a preview, in version 0.95 we will have support for doing context discrimination "the LSA way". The features found in contexts to be discriminated will be represented by vectors that show which contexts those features occur in, thus providing a second way of doing order 2 representations. At present our native SenseClusters order 2 methodology is based on replacing the words in the contexts to be clustered with vectors showing the words with which they occur. There are some other significant changes in version 0.93, among them that SenseClusters now requires the use of Perl 5.8.5 or better. The most current version of Perl is 5.8.8 now, and 5.8.5 is several years old, so it is probably time to upgrade anyway if you are running something less than 5.8.5. Also, we have attempted to clarify the installation instructions further. We will continue to work on that in 0.95, hopefully making SenseClusters much easier to install. We think the instructions are quite a bit better now, so please check them out: http://www.d.umn.edu/~tpederse/Code/SenseClusters-v0.93-INSTALL.txt The more detailed ChangeLog for 0.93 can be found here: http://www.d.umn.edu/~tpederse/Code/Changelog.SenseClusters-v0.93.txt Please let us know if there are any questions, and please do plan on upgrading to 0.93, or trying it out on the web interface: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi We would be happy to answer any questions or receive any comments you might have. Enjoy, Ted, Mahesh, and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-06-18 00:29:17
|
We are pleased to announce the release of SenseClusters version 0.91. This release includes a number of significant improvements to our web interface, and hopefully simplifies the setup of the web interface if you would like to run your own version of that. You can download this new version of SenseClusters at: http://sourceforge.net/projects/senseclusters/ BTW, please note that you do not need to install the Web interface if you don't want too, ours is always available at: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi The main change to the web interface visible to users is that it now provides plots (as pdf files) that illustrate the cluster stopping decision making process, showing essentially the change in the criterion function values and where our different measures elect to stop clustering. Also note that we have cleaned up our FAQ a little bit, and would welcome new questions to include in that. You can find the more detailed ChangeLog below. Please let us know if you have any questions or comments! Enjoy, Ted and Anagha ================================================================== Changes made in Sense-Clusters version 0.89 during version 0.91 Ted Pedersen tpederse@d.umn.edu Anagha Kulkarni kulka020@d.umn.edu 1. Added config.txt under SC-cgi dir and now the settings for PATH, PERL5LIB, complete path to SC-cgi and SC-htdocs and name of the cgi dir are read by second.cgi, fifth.cgi and callwrap.pl from this single file. - Anagha 2. Modified fourth.cgi to include the missing case for --cluststop "gap" option setting. - Anagha 3. Included plot generation scripts under SC-cgi dir and updated the callwrap.pl accordingly. - Anagha 4. Modified /Web/README.Web.pod to indicate the following pre-requisites for the plot generation: gnuplot, latex and ps2pdf - Anagha 5. Updated /Web/README.Web.pod for the new config.txt related changes. - Anagha 6. Updated Docs/FAQs.pod - Anagha 7. Added FAQs.html to Docs/HTML dir. - Anagha (Changelog-v0.89to0.91 Last Updated on 06/16/2006 by Anagha) -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-05-28 04:38:53
|
We are pleased to announce the release of SenseClusters version 0.89. This includes a small but important fix to 0.87, which itself included a small but important fix to 0.85. So, you probably want to make sure you are running 0.89 to avoid these small but important problems or discrepencies that we found in the earlier releases! You can download this version from : http://senseclusters.sourceforge.net/ or http://www.d.umn.edu/~tpederse/senseclusters.html Here are the Changelogs for both 0.89 and 0.87: First, in 0.87 : Changes made in Sense-Clusters version 0.85 during version 0.87 Ted Pedersen tpederse@d.umn.edu Anagha Kulkarni kulka020@d.umn.edu 1. Fixed a bug in clusterstopping.pl related to the case of empty column, i.e, when a feature(s) does not occur in any of the contexts/instances. -Anagha 2. Updated INSTALL and Makefile.PL to require v0.03 of Algorithm::RandomMatrixGe neration. -Anagha (Changelog-v0.85to0.87 Last Updated on 05/16/2006 by Anagha) ------------------------------------------------------------------------ And then in 0.89: Changes made in Sense-Clusters version 0.87 during version 0.89 Ted Pedersen tpederse@d.umn.edu Anagha Kulkarni kulka020@d.umn.edu 1. Modified the Makefile.PL and INSTALL document to require v0.04 of Algorithm::RandomMatrixGeneration instead of 0.03 -Anagha 2. Changed the default precision from 4 to 6 in discriminate.pl and /Web/SC-cgi/first.cgi -Anagha (Changelog-v0.87to0.89 Last Updated on 05/27/2006 by Anagha) ------------------------------------------------------------------------ Let us know if you have any questions, comments, or requests! Enjoy! Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-05-09 01:07:36
|
We are pleased to announce the release of version 0.85 of SenseClusters. This release features our adaptation of the Gap Statistic, a state of the art method for automatically identifying the number of clusters in a given set of data. You can download this version from the links provided at : http://senseclusters.sourceforge.net/ or http://www.d.umn.edu/~tpederse/senseclusters.html You can also find the web interface to version 0.85 available at these links. With the Gap Statistic, there are now 4 different methods of finding the number of clusters automatically in SenseClusters. We will be presenting a demo of all of these at NAACL in New York City on June 6. You can see the paper that describes what we are demoing here: Automatic Cluster Stopping with Criterion Functions and the Gap Statistics (Pedersen and Kulkarni), Appears in the Proceedings of the Demonstration Session of the Human Language Technology Conference and the Sixth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, June 6, 2006, New York City. http://www.d.umn.edu/~tpederse/Pubs/naacl06-demo.pdf So, please check out this new version, and if you are at NAACL please visit our demo! We will also have Knoppix CDs available with SenseClusters already installed, so you can run on your own PC without having to install. Please let us know if you have any questions or comments! Enjoy, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-03-22 03:29:02
|
I will be attending EACL in Trento, Italy April 3-7, and I will be doing three different presentations that revolve around SenseClusters. Please plan on attending any or all of these. There is one paper, one tutorial, and one demo, so you get a little bit of everything. First, on April 3 I will present the following paper at the Cross Language Induction Workshop http://www.site.uottawa.ca/~diana/eacl2006-clki-workshop.html : Improving Name Discrimination : A Language Salad Approach (Pedersen, Kulkarni, Angheluta, Kozareva, and Solorio) - Appears in the Proceedings of the EACL 2006 Workshop on Cross-Language Knowledge Induction, April 3, 2006, Trento, Italy. http://www.d.umn.edu/~tpederse/Pubs/eacl2006-salad.pdf This is very fun work that I like very much, where we have mixed together English with Bulgarian, Romanian and Spanish in order to improve name discrimination. As crazy as that sounds, it works pretty well. :) Second, the next day (April 4) I will present a tutorial that focuses on the methods that are implemented in SenseClusters. This tutorial will also feature the unveiling and debut of our new SenseClusters Live! CD. This is a Knoppix based Linux distribution that includes SenseClusters and lots of data, and you can run it from the CD without having to install Linux or SenseClusters on your hard drive. I will have extra CDs available so even if you don't come to the tutorial you can get one, and we will also have an iso version of this posted so if you aren't at EACL can download and burn onto a CD, just like you do for Linux. Here's a short description of the tutorial, and it will be on the afternoon of April 4. http://eacl06.itc.it/tutorials/tutorial.htm#TU03 Third, the *next* day (April 5) I will present a demo of the new cluster stopping techniques found in SenseClusters. Those are described in the following paper: Selecting the "Right" Number of Senses Based on Clustering Criterion Functions (Pedersen and Kulkarni), Appears in the Proceedings of the Posters and Demo Program of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, April 5-7, 2006, Trento, Italy. http://www.d.umn.edu/~tpederse/Pubs/eacl2006-demo.pdf If you haven't already gotten your SenseClusters Live! CD by this time, please stop by and see the demo and get a CD. I will be in Demo Session 2 on April 5, and it looks like there are quite a few demos of interest at all the sessions so please plan on visiting several of them. http://eacl06.itc.it/posters-demos/posters.htm So, if you are at EACL please do come to some or all of these events. They are all really different so you won't get bored (I promise :)! Your questions or comments on any of the above are of course most welcome. See you in Trento! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-02-09 00:00:27
|
We are pleased to announce the release of version 0.83 of SenseClusters. We have made a larger version increment than usual (from 0.73 to 0.83) to make the point that there is significant new functionality in the package as of 0.83. You can download this new version from: http://www.d.umn.edu/~tpederse/senseclusters.html or http://senseclusters.sourceforge.net In particular, we have incorporated support for automatically identifying the number of clusters in a given data set. There are three methods provided, and they are described more completely in the following paper that will appear at EACL (in conjunction with a demo) in April: Selecting the "Right" Number of Senses Based on Clustering Criterion Functions (Pedersen and Kulkarni), To appear in the Proceedings of the Posters and Demo Program of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, April 3-7, 2006, Trento, Italy. http://www.d.umn.edu/~tpederse/Pubs/eacl2006-demo.pdf You can also try out this new functionality on our web interface, available at: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi Please do give this a try. This is a very significant enhancement to the package. Your comments are particularly welcome as we seek to improve and expand our ability to automatically identify the number of clusters in a given set of data. Enjoy, Ted and Anagha ======================================================================== Below is a copy of the ChangeLog for Version 0.83. 1. Added Toolkit/clusterstop/clusterstopping.pl -Anagha 2. Integrated clusterstopping.pl with discriminate.pl -Anagha 3. Added test-cases for clusterstopping.pl -Anagha 4. Modified web-interface to support clusterstopping -Anagha 5. Modified/added documentation for cluster stopping: README.SC.pod, README.Toolkit.pod, discriminate.html, clusterstopping.html -Anagha 6. Removed /svd/pdlsvd.pl and related threads -Anagha 7. Fixed a bug about pattern matching in format_clusters.pl -Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-02-07 22:42:38
|
The SenseClusters web interface is back to it's usual home: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi Please note that the version installed here is now 0.83, which is our new version that will be released in the coming days. It includes new support for automatically determining the number of clusters, which we think is rather nice. The previous site I mentioned (on a machine called talisker) may remain active, but please be advised that this is normally a development machine and may not have the correct or current version available. So please use the web interface at: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi Enjoy! Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2006-01-25 13:45:13
|
We are pleased to announce the release of SenseClusters version 0.73. You can download this from: http://www.d.umn.edu/~tpederse/senseclusters.html or http://senseclusters.sourceforge.net This release includes a number of small but important bug fixes that are enumerated below (in the Changelog). Please let us know if you have any questions, suggestions, or concerns! ------------- Changes made in Sense-Clusters version 0.71 during version 0.73 Ted Pedersen tpederse@d.umn.edu Anagha Kulkarni kulka020@d.umn.edu 1. Added check for empty token file. -Anagha 2. Added check to identify unknown/misspelled options and to abort after giving relevant message. -Anagha 3. Added rowmodel and colmodel options supported by Cluto. -Anagha 4. Modified clusterlabeling.pl to create the temp files in current-working-directory instead of the installation dir. -Anagha 5. Fixed the problem of uploading external stoplist file for the web-interface. -Anagha (Changelog-v0.71to0.73 Last Updated on 01/23/2006 by Anagha) ------------- Enjoy, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-09-21 04:38:42
|
We are pleased to announce the release of version 0.71 of SenseClusters! You may download this new version from : http://senseclusters.sourceforge.net ... or ... http://www.d.umn.edu/~tpederse/senseclusters.html This version contains a number of significant fixes to existing problems or glitches, and also cleans up and reorganizes some of the code. In particular, we have brought in two programs from the SenseTools package (preprocess.pl and nsp2regex.pl) thus eliminating the need to install SenseTools (which contains a number of other programs that SenseClusters never uses). In addition, we have tried to make some of the more detailed program documentation more clearly visible via the following: http://senseclusters.sourceforge.net/SenseClusters-Code-README.html The web interface has been updated to the new version, so you can check out v0.71 without installing if you wish: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi The detailed list of changes for this version can be found at: http://www.d.umn.edu/~tpederse/Code/Changelog.SenseClusters-v0.71.txt Please let us know if you have any questions or concerns about the new version! Enjoy, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-06-13 19:23:56
|
We are pleased to announce the release of version 0.69 of SenseClusters. This is a small set of changes to 0.67, so if you have already upgraded to 0.67 then you might wish to consider this as an optional upgrade. However, if you are at 0.65 or above, you should seriously consider the upgrade to 0.69. The differences between 0.65 and previous versions and the now current 0.69 are considerable. You can find the new version at : http://senseclusters.sourceforge.net In 0.69, you will find a rewritten version of the README that describes SenseClusters in more general terms, taking it from being a word sense discrimination package to a more generic clustering contexts package. Making this functionality more apparent was one of the objectives of the 0.67 release, and the README has now been updated to reflect that. In addition, the Makefile.PL has undergone some small changes, as have the installation instructions (INSTALL). Enjoy, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-06-07 03:05:25
|
We are very pleased to announce the release of version 0.67 of SenseClusters ( http://sourceforge.net/projects/senseclusters/ ) This version includes several significant changes, among them the inclusion of support for word clustering and headless clusering in the web interface (in addition to our support for target word discrimination, which has always existed in SenseClusters). You can use the web interface at: http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi In addition, we have clarified the different features that we support and made them more consistent in order 1 and order 2 representations. We have renamed the order 1 co-occurrence feature that existed in version 0.65 and before as a target co-occurrence, since these features must always include a target word. We have also added the more traditional form of co-oocurrence feature to order 1, and then target co-occurrence features to order 2. Thus, as of version 0.67, we support 3 kinds of features in both order 1 and order 2 represenations: 1) bigram (default in both order 1 and order 2) 2) co-occurrences 3) target co-occurrences In addition, we continue to support unigram features in order 1. You can find the new version 0.67 at : http://sourceforge.net/projects/senseclusters/ Please let us know if you have any questions or concerns about this release! Cordially, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-03-09 17:54:43
|
Dear SenseClusterians, We are happy to announce the release of version 0.65 of SenseClusters. You can find the new release at: http://www.d.umn.edu/~tpederse/senseclusters.html or http://sourceforge.net/projects/senseclusters/ There are a number of significant changes included in this release. A detailed ChangeLog appears below, but a few highlights include... 1) It is now possible to perform SVD on order 1 feature vectors. This was not possible prior to v0.65. 2) Cluster labeling has been further improved to provide both discriminating and descriptive features. (Development of cluster labeling is ongoing, so expect further enhancements). 3) The web interface continues to evolve and support more features, and hopefully provide a cleaner and easier to use format. You can see that here, and it is included in the package so you could install it locally too. http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi Please check out the new version of SenseClusters, and let us know if you have any comments or questions! Cordially, Ted and Anagha ===================================================================== Detailed ChangeLog ===================================================================== Changes made in Sense-Clusters version 0.63 during version 0.65 1. Added the logic for discriminating/unique cluster labels to -Anagha Toolkit/clusterlabel/clusterlabeling.pl 2. Added --format option functionality for sparse representation of matrix in Toolkit/vector/wordvec.pl -Anagha 3. Added the logic for filtering out 0's from the input matrix as well as the ones introduced because of formating to wordvec.pl -Anagha 4. Re-organized controls and made some visual changes to Web/SC-cgi/index.cgi Web/SC-cgi/second.cgi Web/SC-cgi/third.cgi Web/SC-cgi/fourth.cgi -Anagha 5. Added the following validations to discriminate.pl 1. test scope option valid only for input data with-head tags. 2. train scope option valid only for train data with-head tags. -Anagha 6. Modified the logic for checking if discriminate.pl executed without errors in Web/SC-cgi/callwrap.pl -Anagha 7. Modified Toolkit/clusterlabel/clusterlabeling.pl to give both the kind of labels - Descriptive and Discriminating And so removed the --unique option -Anagha 8. Modified Web/SC-cgi/callwrap.pl and /Web/SC-cgi/fourth.cgi for the changes mentioned in point #7 above -Anagha 9. Changed the rank options default value to 10 from none -Anagha 10. Augmented the logic for removing the Null columns to /Toolkit/vector/order1vec.pl, thus now svd can be used with order 1 context representation. -Anagha 11. Turned on the Taint mode for the SenseClusters web-interface and implemented the required changes in the *.cgi files. -Anagha 12. Added few clustering related option validations to SC web-interface -Anagha 13. Opened up the option of performing svd with 1st order context representation for the web-interface -Anagha 14. Added a test-case for the modified order1vec.pl -Anagha 15. Added the svd branch for order1 representation in the flow diagram: /Docs/Flows/flowchart.fig and /Docs/Flows/flowchart.pdf -Anagha 16. Added the lower bound of 10 for svd's dimensonality reduction. -Anagha 17. Modified the logic of checking if discriminate.pl executed without errors in Web/SC-cgi/callwrap.pl -Anagha (Changelog-v0.63to0.65 Last Updated on 03/08/2005 by Anagha) -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-02-12 05:51:19
|
We are pleased to announce the release of SenseClusters version 0.63! You can find links to the new version and the web interface (updated to run 0.63) at: http://senseclusters.sourceforge.net The most significant change in this version is the addition of cluster labeling - SenseClusters will now generate a label for each of your discovered clusters automatically. This is just the first iteration of our efforts in cluster labeling, so your feedback will be most appreciated. Enjoy, Ted, Anagha, and Amruta --- The complete changelog for this version follows: Changes made in Sense-Clusters version 0.61 during version 0.63 Anagha Kulkarni kulka020@d.umn.edu Ted Pedersen tpederse@d.umn.edu Amruta Purandare am...@cs... 1. Added the cluster labeling program (clusterlabeling.pl) to Toolkit/clusterlabel/ -Anagha 2. Modified discriminate.pl to now also include the cluster labeling program (clusterlabeling.pl). -Anagha 3. Modified the Docs/Flows/flowchart.fig and Docs/Flows/flowchart.pdf to reflect additon of clusterlabeling.pl -Anagha 4. Modified the Makefile.PL to reflect additon of clusterlabeling.pl -Anagha 5. Added test cases for clusterlabeling.pl under Testing/clusterlabel/clusterlabeling -Anagha 7. Added option in discriminate.pl to create dendogram trees of clusters. -Amruta 8. Added Test and Train scope options to the web-interface -Anagha 9. Added Format/Precision option to the web-interface -Anagha 10. Modified Web/SC-cgi/callwrap.pl to include clusterlabeling.pl and for handling the problem of browser timeout for large i/p data. -Anagha 11. Added default stopfile to /Docs dir, and updated README.Docs.pod to document this change. -Amruta 12. Fixed mis-formatting of Senseval-2 example in FAQs.pod. -Amruta (Changelog-v0.61to0.63 Last Updated on 02/11/2005 by Amruta) -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-01-20 17:15:40
|
We are pleased to announce the release of SenseClusters v0.61. This can largely be considered a documentation and clean up release. No functionality has been added or changed, but some clarifications have been added to various parts of the documentation and source code comments. You can download 0.61 from... http://senseclusters.sourceforge.net http://www.d.umn.edu/~tpederse/senseclusters.html Changes made in SenseClusters version 0.59 during version 0.61 1. removed <!DOCTYPE corpus SYSTEM "lexical-sample.dtd"> from -Anagha Demos/eng-lex-samp.evaluation.xml.gz and Demos/eng-lex-sample.training.xml.gz 2. Scripts/traverse.sh was moved to Docs/HTML/ during v0.57 -Anagha but was not removed from here. So removed it. 3. Added an entry for README.user_data.pod under man pages section -Anagha Makefile.PL 4. Modified the files containing Amruta's previous email-id and -Anagha University. 5. Modified the Docs/Flows/flowchart.fig and Docs/Flows/flowchart.pdf -Anagha to reflect the addition of format_clusters.pl 6. Updated the version information in Toolkit/evaluate/report.pl -Anagha (Changelog-v0.59to0.61 Last Updated on 01/19/2005 by Anagha) -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2005-01-20 17:05:47
|
The SenseClusters web interface has a new URL, which reflects the new organization as of v0.59. http://marimba.d.umn.edu/cgi-bin/SC-cgi/index.cgi -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: ted p. <tpederse@d.umn.edu> - 2004-12-18 20:35:30
|
We have been working on a web interface to SenseClusters, and have reached the point where we have a beta version that is online:. http://marimba.d.umn.edu/cgi-bin/SC-WEB/data_form.cgi It is up and running and we think sufficiently interesting to mention. If you have the opportunity to try this out, your feedback would be most appreciated! We have put together a few sample data files that you can use to get started. You can find that data at : http://www.d.umn.edu/~tpederse/Data/SC-WEB-DATA.tar.gz You could also use the data that is distributed in SenseClusters in the Demos files, or any data you have used in the past. The format of the data for the web interface is exactly the same as for the command line interface. In fact, the web interface is running the command line interface, so the parameters and functionality are the same, although the web interface is a bit more convenient. Find links to both the interface and the sample data at: http://www.d.umn.edu/~tpederse/senseclusters.html The web interface will be evolving over the next few weeks, so please stay tuned, and please let us know if you have any ideas or opinions on what we should include, exclude, and how we should organize things. Enjoy! Ted, Anagha, and Amruta -- Ted Pedersen http://www.d.umn.edu/~tpederse |