| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| PyTorch_XLA 2.7 release source code.tar.gz | 2025-04-22 | 7.8 MB | |
| PyTorch_XLA 2.7 release source code.zip | 2025-04-22 | 8.4 MB | |
| README.md | 2025-04-22 | 3.3 kB | |
| Totals: 3 Items | 16.2 MB | 2 | |
Highlights
- Easier training on Cloud TPUs with TorchPrime
- A new Pallas-based kernel for ragged paged attention, enabling further optimizations on vLLM TPU (#8791)
- Usability improvements
- Experimental JAX interoperability with JAX operations (#8781, #8789, #8830, #8878)
- re-enabled GPU CI build [#8593]
Stable Features
- Operator Lowering
- Lower
as_strided_copyto use fast path withslice(#8374) - Lower
_conj_copy. (#8686) - Support splitting physical axis in SPMD mesh (#8698)
- Support of placeholder tensor (#8785).
- Dynamo/AOTAutograd traceable flash attention(#8654)
- C++11 ABI build is the default
Experimental Features
- Gated Recurrent Unit (GRU) implemented with scan (#8777)
- Introduce
apply_xla_patch_to_nn_linearto improveeinsumperformance (#8793) - Support splitting physical axis in SPMD mesh (#8698)
- Enable default buffer donation for step barriers (#8721, #8982)
Usability
- Better profiling control: the start and the end of the profiling session can be controlled by the new profiler API (#8743)
- API to query number of cached compilation graphs (#8822)
- Enhancement on host-to-device transfer (#8849)
Bug fixes
- fix a bug in tensor.flatten (#8680)
- cummax: fix 0-sized dimension reduction. (#8653)
- Fix dk/dv autograd error on TPU flash attention (#8685)
- Fix a bug in flash attention where kv_seq_len should divide block_k_major. (#8671)
- [scan] Make sure inputs into fn are not device_data IR nodes(#8769)
Libtpu stable version
- Pin 2.7 release to stable libtpu version '0.0.11.1'
Deprecations
- Deprecate
torch.exportand instead, use torchax to export graph to StableHLO for full dynamism support - Remove
torch_xla.core.xla_model.xrt_world_size, replace withtorch_xla.runtime.world_size - Remove
torch_xla.core.xla_model.get_ordinal, replace withtorch_xla.runtime.global_ordinal - Remove
torch_xla.core.xla_model.parse_xla_device, replace with_utils.parse_xla_device - Remove
torch_xla.experimental.compile, replace withtorch_xla.compile