Bump stanza from 1.6.1 to 1.7.0 #254

dependabot · 2023-12-04T05:01:44Z

Bumps stanza from 1.6.1 to 1.7.0.

Release notes

v1.7.0: Neural coref!

Neural coref processor added!

Conjunction-Aware Word-Level Coreference Resolution https://arxiv.org/abs/2310.06165 original implementation: https://github.com/KarelDO/wl-coref/tree/master

Updated form of Word-Level Coreference Resolution https://aclanthology.org/2021.emnlp-main.605/ original implementation: https://github.com/vdobrovolskii/wl-coref

If you use Stanza's coref module in your work, please be sure to cite both of the above papers.

Special thanks to vdobrovolskii, who graciously agreed to allow for integration of his work into Stanza, to @KarelDO for his support of his training enhancement, and to @Jemoka for the LoRA PEFT integration, which makes the finetuning of the transformer based coref annotator much less expensive.

Currently there is one model provided, a transformer based English model trained from OntoNotes. The provided model is currently based on Electra-Large, as that is more harmonious with the rest of our transformer architecture. When we have LoRA integration with POS, depparse, and the other processors, we will revisit the question of which transformer is most appropriate for English.

Future work includes ZH and AR models from OntoNotes, additional language support from UD-Coref, and lower cost non-transformer models

stanfordnlp/stanza#1309

Interface change: English MWT

English now has an MWT model by default. Text such as won't is now marked as a single token, split into two words, will and not. Previously it was expected to be tokenized into two pieces, but the Sentence object containing that text would not have a single Token object connecting the two pieces. See https://stanfordnlp.github.io/stanza/mwt.html and https://stanfordnlp.github.io/stanza/data_objects.html#token for more information.

Code that used to operate with for word in sentence.words will continue to work as before, but for token in sentence.tokens will now produce one object for MWT such as won't, cannot, Stanza's, etc.

Pipeline creation will not change, as MWT is automatically (but not silently) added at Pipeline creation time if the language and package includes MWT.

stanfordnlp/stanza#1314 stanfordnlp/stanza#1314

Other updates

NetworkX representation of enhanced dependencies. Allows for easier usage of Semgrex on enhanced dependencies - searching over enhanced dependencies requires CoreNLP >= 4.5.6 stanfordnlp/stanza#1295 stanfordnlp/stanza#1298

Sentence ending punct tags improved for English to avoid labeling non-punct as punct (and POS is switched to using a DataLoader) stanfordnlp/stanza#1000 stanfordnlp/stanza#1303

Optional rewriting of MWT after the MWT processing step - will give the user more control over fixing common errors. Although we still encourage posting issues on github so we can fix them for everyone! stanfordnlp/stanza#1302

Remove deprecated output methods such as conll_as_string and doc2conll_text. Use "{:C}".format(doc) instead stanfordnlp/stanza@e01650f

Mixed OntoNotes and WW NER model for English is now the default. Future versions may include CoNLL 2003 and CoNLL++ data as well.

Sentences now have a doc_id field if the document they are created from has a doc_id. stanfordnlp/stanza#1314

Optional processors added in cases where the user may not want the model we have run by default. For example, conparse for Turkish (limited training data) or coref for English (the only available model is the transformer model) stanfordnlp/stanza#1314

Updated requirements

Support dropped for python 3.6 and 3.7. The peft module used for finetuning the transformer used in the coref processor does not support those versions.

Added peft as an optional dependency to transformer based installations

Added networkx as a dependency for reading enhanced dependencies. Added toml as a dependency for reading the coref config.

Commits

5948c9f Oops, the default GRC model was miswritten
7b5ebae Add a depparse column for the scores of running the GRC parser with a transfo...
a4e2af5 Initial column of POS scores for different GRC models. Need to do depparse a...
0f0139c Oops, needed to drop the usage of pos_batch_size from prepare_depparse_treeba...
a3b5a75 Switch GRC to Perseus - the Proiel dataset doesn't have any punctuation
d208732 Switch to default coref model of electra-large
a31162c Add documentation on the papers and original github repos
8d0e716 Don't create a model from scratch (which involves reading the training data) ...
ca15f56 This is a very useful log line - print it out at INFO
178ad5c Add the coref chains to the dict output. This means we turn the attachment i...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore <dependency name> major version will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
@dependabot ignore <dependency name> minor version will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
@dependabot ignore <dependency name> will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
@dependabot unignore <dependency name> will remove all of the ignore conditions of the specified dependency
@dependabot unignore <dependency name> <ignore condition> will remove the ignore condition of the specified dependency and ignore conditions

Bumps [stanza](https://github.com/stanfordnlp/stanza) from 1.6.1 to 1.7.0. - [Release notes](https://github.com/stanfordnlp/stanza/releases) - [Commits](stanfordnlp/stanza@v1.6.1...v1.7.0) --- updated-dependencies: - dependency-name: stanza dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Dec 4, 2023

benknoll-umn merged commit 7aa972b into main Dec 4, 2023
5 checks passed

benknoll-umn deleted the dependabot/pip/stanza-1.7.0 branch December 4, 2023 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump stanza from 1.6.1 to 1.7.0 #254

Bump stanza from 1.6.1 to 1.7.0 #254

dependabot bot commented on behalf of github Dec 4, 2023

Bump stanza from 1.6.1 to 1.7.0 #254

Bump stanza from 1.6.1 to 1.7.0 #254

Conversation

dependabot bot commented on behalf of github Dec 4, 2023

v1.7.0: Neural coref!

Neural coref processor added!

Interface change: English MWT

Other updates

Updated requirements