Download Latest Version PyText v0.3.3.zip (2.0 MB)
Email in envelope

Get an email when there's a new version of PyText

Home / 0.3.0
Name Modified Size InfoDownloads / Week
Parent folder
pytext-nlp-0.3.0.tar.gz 2019-12-05 295.6 kB
pytext_nlp-0.3.0-py3-none-any.whl 2019-12-05 417.7 kB
PyText v0.3.0.tar.gz 2019-11-27 1.7 MB
PyText v0.3.0.zip 2019-11-27 1.9 MB
README.md 2019-11-27 6.3 kB
Totals: 5 Items   4.4 MB 0

New Features

RoBERTa and XLM-R - Integrate XLM-R into PyText (#1120) - Consolidate BERT, XLM and RobERTa Tensorizers (#1119) - Add XLM-R for joint model (#1135) - Open source Roberta (#1032) - Simple Transformer module components for RoBERTa (#1043) - RoBERTa models for document classification (#933) - Enable MLM training for RobertaEncoder (#1126) - Standardize RoBERTa Tensorizer Vocab Creation (#1113) - Make RoBERTa usable in more tasks including QA (#1017) - RoBERTa-QA JIT (#1088) - Unify GPT2BPE Tokenizer (#1110) - Adding Google SentencePiece as a Tokenizer (#1106)

TorchScript support - General torchscript module (#1134) - Support torchscriptify XLM-R (#1138) - Add support for torchscriptification of XLM intent slot models (#1167) - Script xlm tensorizer (#1118) - Refactor ScriptTensorizer with general tensorize API (#1117) - ScriptXLMTensorizer (#1123) - Add support for Torchscript export of IntentSlotOutputLayer and CRF (#1146) - Refactor ScriptTensorizor to support both text and tokens input (#1096) - Add torchscriptify API in tokenizer and tensorizer (#1055) - Add more stats in torchscript latency script (#1044) - Exported Roberta torchscript model include both traced_model and pre-processing logic (#1013) - Native Torchscript Wordpiece Tokenizer Op for BERTSquadQA, Torchscriptify BertSQUADQAModel (#879) - TorchScript-ify BERT training (#887) - Modify Return Signature of TorchScript BERT (#1058) - Implement BertTensorizer and RoBERTaTensorizer in TorchScript (#1053)

Others - FairseqModelEnsemble class (#1116) - Inverse Sqrt Scheduler (#1150) - Lazy modules (#1039) - Adopt Fairseq MemoryEfficientFP16Optimizer in PyText (#910) - Add RAdam (#952) - Add AdamW (#945) - Unify FP16&FP32 API (#1006) - Add precision at recall metric (#1079) - Added PandasDataSource (#1098) - Support testing Caffe2 model (#1097) - Add contextual feature support to export for Seq2Seq models - Convert matmuls to quantizable nn.Linear modules (#1304) - PyTorch eager mode implementation (#1072) - Implement Blockwise Sparsification (#1050) - Support Fairseq FP16Optimizer (#1008) - Make FP16OptimizerApex wrapper on Apex/amp (#1007) - Remove vocab from cuda (#955) - Add dense input to XLMModel (#997) - Replace tensorboardX with torch.utils.tensorboard (#1003) - Add mentioning of mixed precision training support (#643) - Sparsification for CRF transition matrix (#982) - Add dense feature normalization to Char-LSTM TorchScript model. (#986) - Cosine similarity support for BERT pairwise model training (#967) - Combine training data from multiple sources (#953) - Support visualization of word embeddings in Tensorboard (#969) - Decouple decoder and output layer creation in BasePairwiseModel (#973) - Drop rows with insufficient columns in TSV data source (#954) - Add use_config_from_snapshot option(load config from snapshot or current task) (#970) - Add predict function for NewTask (#936) - Use create_module to create CharacterEmbedding (#920) - Add XLM based joint model - Add ConsistentXLMModel (#913) - Optimize Gelu module for caffe2 export (#918) - Save best model's sub-modules when enabled (#912)

Documentation / Usability

  • XLM-R tutorial in notebook (#1159)
  • Update XLM-R OSS tutorial and add Google Colab link (#1168)
  • Update "raw_text" to "text" in tutorial (#1010)
  • Make tutorial more trivial (add git clone) (#1037)
  • Changes to make tutorial code simpler (#1002)
  • Fix datasource tutorial example (#998)
  • Handle long documents in squad qa datasource and models (#975)
  • Fix pytext tutorial syntax (#971)
  • Use torch.equal() instead of "==" in Custom Tensorizer tutorial (#939)
  • Remove and mock doc dependencies because readthedocs is OOM (#983)
  • Fix Circle CI build_docs error (#959)
  • Add OSS integration tests: DocNN (#1021)
  • Print model into the output log (#1127)
  • Migrate pytext/utils/torch.py logic into pytext/torchscript/ for long term maintainability (#1082)
  • Demo datasource fix + cleanup (#994)
  • Documentation on the config files and config-related commands (#984)
  • Config adapter old data handler helper (#943)
  • Nicer gen_config_impl (#944)

Deprecated Features

  • Remove DocModel_Deprecated (#916)
  • Remove RNNGParser_Deprecated, SemanticParsingTask_Deprecated, SemanticParsingCppTask_Deprecate, RnngJitTask,
  • Remove QueryDocumentTask_Deprecated(#926)
  • Remove LMTask_Deprecated and LMLSTM_Deprecated (#882)
  • CompositionDataHandler to fb/deprecated (#963)
  • Delete deprecated Word Tagging tasks, models and data handlers (#910)

Bug Fixes

  • Fix caffe2 predict (#1103)
  • Fix bug when tensorizer is not defined (#1169)
  • Fix multitask metric reporter for lr logging (#1164)
  • Fix broken gradients logging and add lr logging to tensorboard (#1158)
  • Minor fix in blockwise sparsifier (#1130)
  • Fix clip_grad_norm API (#1143)
  • Fix for roberta squad tensorizer (#1137)
  • Fix multilabel metric reporter (#1115)
  • Fixed prepare_input in tensorizer (#1102)
  • Fix unk bug in exported model (#1076)
  • Fp16 fixes for byte-lstm and distillation (#1059)
  • Fix clip_grad_norm_ if grad_norm > max_norm > 0: TypeError: '>' not supported between instances of 'float' and 'NoneType' (#1054)
  • Fix context in multitask (#1040)
  • Fix regression in ensemble trainer caused by recent fp16 change (#1033)
  • ReadTheDocs OOM fix with CPU Torch (#1027)
  • Dimension mismatch after setting max sequence length (#1154)
  • Allow null learning rate (#1156)
  • Don't fail on 0 input (#1104)
  • Remove side effect during pickling PickleableGPT2BPEEncoder
  • Set onnx==1.5.0 to fix CircleCI build temporarily (#1014)
  • Complete training loop gracefully even if no timing is reported (#1128)
  • Propagate min_freq for vocab correctly (#907)
  • Fix gen-default-config with Model param (#917)
  • Fix torchscript export for PyText modules (#1125)
  • Fix label_weights in DocModel (#1081)
  • Fix label_weights in bert models (#1100)
  • Fix config issues with Python 3.7 (#1066)
  • Temporary fix for Fairseq dependency (#1026)
  • Fix MultipleData by making tensorizers able to initialize from multiple data sources (#972)
  • Fix bug in copy_unk (#964)
  • Division by Zero bug in MLM Metric Reporter (#968)
Source: README.md, updated 2019-11-27