Stanza - Browse /v1.10.1 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2024-12-24	3.0 kB	0
v1.10.1 - rebuild with UD 2.15 source code.tar.gz	2024-12-24	1.3 MB	3
v1.10.1 - rebuild with UD 2.15 source code.zip	2024-12-24	1.6 MB	0
Totals: 3 Items		2.8 MB	3

In this release, we rebuild all of the models with UD 2.15, allowing for new languages such as Georgian, Komi Zyrian, Low Saxon, and Ottoman Turkish. We also add an Albanian model composed of the two available UD treebanks and an Old English model based on a prototype dataset not yet published in UD.

Other notable changes:

Include a contextual lemmatizer in English for 's -> be or have in the default_accurate package. Also built is a HI model. Others potentially to follow. Now with fewer bugs at startup. https://github.com/stanfordnlp/stanza/pull/1422
Upgrade the FR NER model to a gold edited version of WikiNER: https://huggingface.co/datasets/danrun/WikiNER-fr-gold https://github.com/stanfordnlp/stanza/commit/ad1f938276ef81ac9a602d7f1f21f50fd67e5d24
Pytorch compatibility: set weights_only=True when loading models. https://github.com/stanfordnlp/stanza/pull/1430 https://github.com/stanfordnlp/stanza/issues/1429
augment MWT tokenization to accommodate unexpected ' characters, including " used in "s - https://github.com/stanfordnlp/stanza/pull/1437 https://github.com/stanfordnlp/stanza/issues/1436
when training the lemmatizer, take advantage of CorrectForm annotations in the UD treebanks https://github.com/stanfordnlp/stanza/commit/dbdf429aff4175fec33856501e6899e96b390e86
add hand-lemmatized French verbs and English words to the "combined" lemmatizers, thanks to Prof. Lapalme: https://github.com/stanfordnlp/stanza/commit/99f7038634101ea7b92140696c8383a333af1cbc
add VLSP 2023 constituency dataset: https://github.com/stanfordnlp/stanza/commit/1159d0db8ea1d20c6cf9fb37f8fa8676e0f60f49

Bugfixes:

raise_for_status earlier when failing to download something, so that the proper error gets displayed. Thank you @pattersam https://github.com/stanfordnlp/stanza/pull/1432
Fix the usage of transformers where an unexpected character at the end of a sentence was not properly handled: https://github.com/stanfordnlp/stanza/commit/53081c28ba3128fc89ad36919762a54f6cb88f77
reset the start/end character annotations on tokens which are predicted to be MWT by the tokenizer, but not processed as such by the MWT processor: https://github.com/stanfordnlp/stanza/commit/1a36efb53135e53dd40ad550bc3a659c81b15980 https://github.com/stanfordnlp/stanza/issues/1436
similar to the start/end char issue, fix a situation where a token's text could disappear if the MWT processor didn't split a word: https://github.com/stanfordnlp/stanza/commit/215c69e53bf9f11e174b82bb064767749f7dd403
missing text for a Document does not cause the NER model to crash: https://github.com/stanfordnlp/stanza/commit/07326289ce0efef1ba17a0632c011652f884363c https://github.com/stanfordnlp/stanza/issues/1428
tokenize URLs with unexpected TLDs into single tokens rather than splitting them up: https://github.com/stanfordnlp/stanza/commit/f59ccd86b9d146737dd5c0325ac31e4da814ddfa https://github.com/stanfordnlp/stanza/issues/1423

Source: README.md, updated 2024-12-24

Stanza Files

Stanford NLP Python library for many human languages

Stanza Files

Stanford NLP Python library for many human languages

Get an email when there's a new version of Stanza