Download Latest Version v25.12.00 source code.tar.gz (9.4 MB)
Email in envelope

Get an email when there's a new version of cuDF

Home / v25.06.00
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-06-05 24.9 kB
v25.06.00 source code.tar.gz 2025-06-05 8.1 MB
v25.06.00 source code.zip 2025-06-05 10.6 MB
Totals: 3 Items   18.7 MB 0

🚨 Breaking Changes

  • Remove cudf.BaseIndex (#18751) @mroeschke
  • Implement BIT_COUNT unary operation (#18589) @ttnghia
  • Expose column chunk metadata in read_parquet_metadata() (#18579) @mhaseeb123
  • Fix overflow for MERGE_M2 groupby aggregation (#18546) @ttnghia
  • Deduplicate parquet physical type enums (#18526) @mhaseeb123
  • Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
  • Promote Parquet type enums to enum classes (#18441) @mhaseeb123
  • Move parquet schema types and structs to public headers (#18424) @mhaseeb123
  • Start removal of vector factories with _sync suffix by deprecating them and adding versions without the suffix (#18414) @vuule
  • Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
  • Deprecate nvtext subword tokenizer (#18334) @davidwendt
  • Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
  • Remove extranous modules from top level cudf namespace (#18287) @mroeschke
  • Add Keep Option Parameter to Distinct (#18237) @warrickhe
  • Update to CCCL 2.8.x with no CCCL patches (#18235) @bdice

šŸ› Bug Fixes

  • Disable pytest benchmark for Narwhals CI job (#19074) @Matt711
  • Avoid undefined behaviour in rolling_store_output_functor (#19069) @wence-
  • Filter out pkg_resources UserWarning to make nightly CI pass (#19058) @Matt711
  • Pin deltalake to <1.0.0 (#19017) @Matt711
  • [BUG] Incorrectly getting the caller's frame when searching for locals and globals in cudf.pandas (#18979) @Matt711
  • Ensure gc fixture is used in custreamz test (#18915) @TomAugspurger
  • Fix a potential segfault in PQ reader's number of rows per source calculation (#18906) @mhaseeb123
  • Fix Dataframe getitem when MultiIndex columns exist (#18880) @galipremsagar
  • Ensure eq/ne between Columns in public objects don't return bool (#18875) @mroeschke
  • Fix fencepost error in Repartition task generation (#18854) @wence-
  • Fix cudf_polars pl.col(...).len() always excluding null values (#18849) @mroeschke
  • Throw a descriptive exception in Parquet reader when trying to read files with more than two billion rows (#18835) @mhaseeb123
  • Skip a decompression test (#18825) @vuule
  • Update strings benchmarks to use alloc_size column/table function (#18822) @davidwendt
  • Fix host decompression of empty DEFLATE data (#18805) @vuule
  • Avoid going OOM in test_row_limit_exceed_raises by using dummy array (#18802) @Matt711
  • Fix host decompression of empty Snappy data (#18800) @vuule
  • Skip test that fails due to polars issue (#18787) @wence-
  • Ensure scalar dtype is always set in from_py (#18780) @vyasr
  • Fix reading of Snappy compressed Avro files (#18774) @vuule
  • Fix missing semicolon in label_bins.cu (#18765) @evanramos-nvidia
  • Fix noexcept annotations on strings_column_view (#18763) @wence-
  • Fix integer overflows in pylibcudf from_column_view_of_arbitrary (#18758) @wence-
  • Fix overflow case and clean up some logic (#18734) @vyasr
  • Link to nvtx3::nvtx3-cpp instead of nvToolsExt (#18730) @jakirkham
  • Revise DaskIntegration protocol to align with rapidsmpf (#18720) @rjzamora
  • Fix skip_compression option in the Parquet writer with host compression (#18714) @vuule
  • Add missing header (#18671) @vyasr
  • Revert "Set flag to always use unsafe atomic storage" (#18657) @PointKernel
  • Fix optional operator* called on a disengaged value in clamp.cu (#18655) @davidwendt
  • Add missing header to host_memory.cpp (#18649) @alliepiper
  • Fix device compression when writing Parquet files without using nvCOMP (#18644) @vuule
  • Add CUDA_ARCHITECTURES setting to cpp-linters script (#18637) @davidwendt
  • Pin to cython<3.1 (#18617) @wence-
  • Fix DataFrame.memory_usage output order (#18595) @mroeschke
  • Set flag to always use unsafe atomic storage (#18590) @PointKernel
  • Update KvikIO S3 endpoint usage (#18565) @kingcrimsontianyu
  • Skip cuml third-party integration tests that may segfault (#18561) @Matt711
  • Allow .iloc with cuDF objects as column indexers (#18558) @mroeschke
  • Fix overflow for MERGE_M2 groupby aggregation (#18546) @ttnghia
  • Add back cudf root (#18544) @vyasr
  • Change default memory resource for 'distributed' cudf-polars (#18531) @rjzamora
  • Fix copy-on-write buffer separation and cleanup (#18530) @galipremsagar
  • Fix cpp examples cmake to use the rapids_config.cmake (#18501) @davidwendt
  • Rename rapidsmp to rapidsmpf (#18493) @rjzamora
  • Fix compilation with the C++20 standard (#18486) @vuule
  • Fix an error when reading some compressed Parquet V2 files (#18478) @vuule
  • Support title-case characters in strings capitalize() and title() APIs (#18457) @davidwendt
  • Ensure DataFrame column label operations reset label_dtype (#18452) @mroeschke
  • Fix a segfault when reading a Parquet file with unsupported compression type (#18451) @vuule
  • Fix logger macros (#18444) @vyasr
  • Fix auto-detection of compression type in host-side decompression (#18440) @shrshi
  • Use delete not free to release data allocated with new (#18412) @wence-
  • Fix synchronization issues in host compression and decompression (#18395) @vuule
  • Update Dask array-conversion handling (#18382) @rjzamora
  • Fixed indexing on empty DataFrame with no columns (#18381) @TomAugspurger
  • Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) @TomAugspurger
  • Fix index of right table in unary operators in AST, in Joins (#18333) @karthikeyann
  • Add offsetalator to contiguous-split (#18312) @davidwendt
  • Support large strings in nvtext vocabulary-tokenizer (#18283) @davidwendt
  • Handle empty aggregations in multi-partition cudf.polars group_by (#18277) @TomAugspurger

šŸ“– Documentation

  • Docs for streaming executor options (#18934) @quasiben
  • Fix some duplicate toctree issues and improve groupby docs (#18580) @vyasr
  • [DOC] Running libcudf benchmarks and comparing output results (#18548) @Matt711
  • Fix doxygen usage of the contraction for it is (#18517) @davidwendt
  • Clarify @brief tag as description/title on documentation guide (#18515) @davidwendt
  • [DOC] Improve clarity in parquet APIs set_row_groups and set_columns parquet (#18466) @Matt711
  • Add a usage page to cudf-polars documentation (#18460) @Matt711
  • [DOC] Fix typo in CONTRIBUTING.md on build type tests (#18456) @JigaoLuo
  • improve docs related to documentation contribution (#18418) @ncclementi
  • Add restart kernel note in cudf pandas docs (#18374) @ncclementi

šŸš€ New Features

  • Add CLI argument to enable RMM async memory resource in PDS-H (#18899) @pentschev
  • Scan a headerless CSV file with column names provided (#18816) @Matt711
  • Add fast paths for DataFrame.to_cupy (#18801) @Matt711
  • Require numba-cuda&gt;=0.11.0 (#18770) @brandon-b-miller
  • Create a pylibcudf Column from a python iterable (#18768) @Matt711
  • Support ConditianalJoin via broadcasting in cudf-polars streaming engine (#18723) @rjzamora
  • Experimental PQ reader utility to calculate total rows in input row groups (#18716) @mhaseeb123
  • Extend explain_query to support printing the logical plan (pre lowered plan) (#18708) @Matt711
  • Reuse libcudf dependencies for Java JNI build when they are available (#18682) @ttnghia
  • Add alloc_size member function to cudf::column and cudf::table (#18639) @davidwendt
  • Print the physical cudf-polars plan in pdsh.py (#18635) @rjzamora
  • String Transform Examples (#18616) @lamarrr
  • Add streaming support for group_by -&gt; n_unique to cudf-polars (#18606) @rjzamora
  • Export cudf compiler flags and definitions (#18604) @ttnghia
  • Implement BIT_COUNT unary operation (#18589) @ttnghia
  • Expose column chunk metadata in read_parquet_metadata() (#18579) @mhaseeb123
  • Add APIs to check ORC and Parquet compression support at runtime (#18578) @vuule
  • Add Distinct support to the cudf-polars streaming executor (#18576) @rjzamora
  • Add support for large list host Arrow data conversion (#18562) @vyasr
  • Implement BITWISE_AGG aggregations (bitwise AND, OR and XOR) for sort-based groupby and reduction (#18551) @ttnghia
  • Implement row group pruning with bloom filters in experimental PQ reader (#18545) @mhaseeb123
  • Implement row group pruning with stats in experimental PQ reader (#18543) @mhaseeb123
  • [JNI] Expose row-wise sha1 api (#18540) @warrickhe
  • Add Sort + head/tail support to streaming cudf-polars executor (#18538) @rjzamora
  • Add multi-partition MapFunction support to cudf-polars (#18523) @rjzamora
  • Adds support for writing raw UTF-8 characters (without escaping) in the JSON writer (#18508) @Matt711
  • Support reading from device buffers in the pylibcudf IO APIs (#18496) @Matt711
  • Support multi-partition Select operations with aggregations (#18492) @rjzamora
  • Implemented String Output & User-data Support for Transforms (#18490) @lamarrr
  • Add a utility to bulk set multiple null masks (#18489) @mhaseeb123
  • High level interface for experimental PQ reader and implementation of metadata APIs (#18480) @mhaseeb123
  • Added pylibcudf.utilities.is_ptds_enabled (#18467) @TomAugspurger
  • Add a public API for copying a table_view to device array (#18450) @Matt711
  • Support cudf-polars cast_time_unit (#18442) @brandon-b-miller
  • Support creating a pylibcudf Column from a host array (#18425) @Matt711
  • Move parquet schema types and structs to public headers (#18424) @mhaseeb123
  • Add optional dtype argument to Scalar.from_any (#18415) @Matt711
  • Expose cudf::chunked_pack in pylibcudf (#18411) @wence-
  • Add support for long string columns in cudf::contiguous_split (#18393) @nvdbaranec
  • Implemented String Input support for Transforms and Removed jit::column_device_view (#18378) @lamarrr
  • Automatically dispatch between host and device decompression/compression based on the number of buffers (#18363) @vuule
  • Expose join hash table load factor (#18361) @PointKernel
  • Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
  • Sort-based inner join for high-multiplicity tables (#18318) @shrshi
  • Support constructing pylibcudf Columns and Tables from views into arbitrary objects (#18314) @vyasr
  • Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
  • Support cudf-polars isoyear and week (isoweek) (#18265) @brandon-b-miller
  • Add Keep Option Parameter to Distinct (#18237) @warrickhe
  • Add rapidsmp shuffle support to cudf-polars (#18231) @rjzamora
  • Support cudf-polars strftime (#18181) @brandon-b-miller
  • Add benchmark for join operations with low build table cardinality (#18105) @shrshi
  • Add nvtext substring deduplication APIs (Part 2) (#18104) @davidwendt
  • Support include_file_paths in cudf polars (#18057) @Matt711
  • Add support for the Arrow device capsule interfaces (#15370) @vyasr

šŸ› ļø Improvements

  • use 'rapids-init-pip' in wheel CI, other CI changes (#18902) @jameslamb
  • Avoid RecursionError in custreamz test (#18887) @TomAugspurger
  • Update NumPy dependency in cudf.pandas-catboost integration test (#18870) @Matt711
  • CPU only execution for PDSH (#18869) @quasiben
  • Remove more top level cudf imports in core (#18862) @mroeschke
  • Remove top level cudf imports in core (#18857) @mroeschke
  • Add CUDF_INSTALL_DIR for JAVA build script (#18852) @pxLi
  • Call the correct from_pandas in hdf reader (#18850) @galipremsagar
  • Update __all__ in cudf_polars/dsl/ir.py (#18848) @Matt711
  • Upload examples conda package (#18847) @vyasr
  • Add retries to prevent failures in occasionally slow CI runs (#18843) @galipremsagar
  • Finish CUDA 12.9 migration and use branch-25.06 workflows (#18839) @bdice
  • Remove toplevel import cudf from window/tools/join directories (#18833) @mroeschke
  • Remove toplevel import cudf from cudf/io files (#18829) @mroeschke
  • Update pdsh benchmark script to support explain-only (#18826) @TomAugspurger
  • Refactor UDF utils and add a hook to enable NRT when necessary (#18823) @brandon-b-miller
  • Fix memory access error in nvtext::edit_distance (#18821) @davidwendt
  • Update to clang 20 (#18818) @bdice
  • Reduce more data sizes of Python tests (#18814) @mroeschke
  • Mark DataFrame.dtypes as an _external_only_api (#18809) @mroeschke
  • Change calls to thrust::swap to cuda::std::swap (#18808) @davidwendt
  • Move implemented BaseIndex methods over to Index (#18807) @mroeschke
  • Improve pandas version fetching script (#18793) @galipremsagar
  • Change cudf::sort googlebench benchmarks to nvbench (#18786) @davidwendt
  • Only warn in cudf.pandas if rmm mode explicitly set and rmm already configured (#18785) @jcrist
  • Quote head_rev in conda recipes (#18784) @bdice
  • Move RangeIndex implementation below Index (#18777) @mroeschke
  • Remove unecessary _Ravelled class (#18771) @Matt711
  • Remove pytest-rerunfailures (#18766) @mroeschke
  • Replace from_arrow with direct calls Column/Table constructors in pylibcudf and cudf-polars tests (#18762) @Matt711
  • CUDA 12.9 use updated compression flags (#18755) @robertmaynard
  • fix(rattler): add librmm to host for libcudf to fix overlinking error (#18754) @gforsyth
  • Remove the file name from the output in cudf-polars' explain APIs (#18752) @Matt711
  • Remove cudf.BaseIndex (#18751) @mroeschke
  • Support creating a pylibcudf Column from a general ndarray (#18744) @Matt711
  • Improve lowering of Distinct IR nodes for high-cardinality data (#18725) @rjzamora
  • Simplify Numba-CUDA MVC logic (#18724) @bdice
  • Test with CUDA 12.9.0 (#18721) @bdice
  • Add more cudf.Series microbenchmarks (#18718) @Matt711
  • Run unit-tests-cudf-pandas on branch-25.06 for nightly tests (#18717) @davidwendt
  • Move test_large_unique_categories_repr to benchmarks (#18715) @galipremsagar
  • Allow pylibcudf.Column to consume objects exposing __arrow_c_stream__ (#18712) @mroeschke
  • Switch from printing to logging (#18711) @vyasr
  • Add Python tests for different compression implementations (#18710) @vuule
  • Remove redundant xfails in cuml integration tests (#18699) @Matt711
  • ci: run unit-tests-cudf-pandas on branch-25.06 workflow (#18692) @gforsyth
  • Exclude librmm.so from auditwheel (#18691) @bdice
  • Add C++ tests for different compression implementations (#18690) @vuule
  • Improve runtime of cuDF Python unit tests (#18689) @mroeschke
  • Require at least numba-cuda 0.10.1 (#18688) @brandon-b-miller
  • Add nvidia-cuda-{nvrtc, nvcc} as a dependency for cuDF wheels (#18686) @brandon-b-miller
  • Support rolling aggregations in in-memory cudf-polars execution (#18681) @wence-
  • Replace parquet_blocksize with target_partition_size (#18669) @rjzamora
  • Skip test_large_unique_categories_repr in CI (#18666) @bdice
  • Locally import pyarrow.dataset and fsspec for import cudf performance (#18663) @mroeschke
  • Disable arm64 python tests (#18662) @galipremsagar
  • Pin numba-cuda>=0.9.0,!=0.10.0 due to CI hangs on ARM (#18661) @mroeschke
  • Fix compile warnings in Java JNI (#18660) @ttnghia
  • Drop Empty nodes from IR graph (#18658) @rjzamora
  • Add support for Python 3.13 (#18648) @gforsyth
  • Cleanup libcudf detail/aggregation.hpp/.cuh (#18642) @davidwendt
  • Skip all known pytest failures in pandas-tests (#18641) @galipremsagar
  • Preserve partitioning after Filter and Projection in cudf-polars (#18638) @rjzamora
  • Support quantile in cudf-polars grouped aggregations (#18634) @wence-
  • Deprecate Series.nullmask, Series.nullable, Series.from_categorical, Series.from_masked_array, cudf.isclose (#18631) @mroeschke
  • Access private objects by importing from module instead of cudf.core/util namespace (#18629) @mroeschke
  • Replace unnecessary cudf::size_of() calls with sizeof() (#18628) @davidwendt
  • Improve cold cache dropping (#18626) @kingcrimsontianyu
  • Improve default config values for cudf-polars streaming (#18623) @rjzamora
  • Add gtest error check for nvtext::wordpiece_tokenize (#18621) @davidwendt
  • Polars dataframe serialize using chunked pack (#18614) @madsbk
  • xfail all known errors in pandas-test suite (#18612) @galipremsagar
  • Add TemporalBaseColumn as a parent class to DatetimeColumn and TimedeltaColumn (#18611) @mroeschke
  • Update cudf::cast internal function to use sizeof instead of cudf::size_of (#18607) @davidwendt
  • Move cudf/utils/utils.py methods to appropriate locations (#18605) @mroeschke
  • pylibcudf.Column: add device_buffer_size and register a dask.sizeof function for cudf-polars Column and DataFrame (#18602) @madsbk
  • Use cached_property for Datetime and Timedelta column properties (#18601) @mroeschke
  • Annotate and simplify from_arrow (#18600) @mroeschke
  • Enable reporting peak memory usage for gtests (#18599) @davidwendt
  • Prune methods from Frame that are specific to subclasses (#18597) @mroeschke
  • Switch tensorflow integration tests to use 12.x (#18596) @galipremsagar
  • refactor: use libnvcomp from libkvikio wheel to unblock Python 3.13 upgrade (#18593) @gforsyth
  • Add temporary pdsh benchmarks to cudf_polars.experimental (#18592) @rjzamora
  • Update numba-cuda dependency to &gt;=0.9.0 (#18591) @brandon-b-miller
  • use 'certifi' certificates in fetch_pandas_versions script (#18588) @jameslamb
  • Add nvtext substring duplication APIs (Part 1) (#18585) @davidwendt
  • Bump polars version to <1.29 (#18581) @Matt711
  • Allow datetime.timedelta objects in pylibcudf.Scalar.from_py (#18577) @mroeschke
  • Rework strings split_helper utility for better reuse (#18575) @davidwendt
  • Additional tests strings for strings split APIs (#18574) @davidwendt
  • Support datetime.datetime objects in pylibcudf.Scalar.from_py (#18572) @mroeschke
  • Store Python scalars instead of PyArrow Scalars in cudf_polars Literal expr (#18563) @mroeschke
  • Support plc.Scalar.from_py(None) and plc.Scalar.from_py(int, float type) (#18559) @mroeschke
  • Add xfail window function tests for cudf_polars (#18557) @btepera
  • Add fast paths to Series.to_cupy and Series.values (#18555) @Matt711
  • Reduce cudf-polars pyarrow usage (#18554) @vyasr
  • Avoid possible invalid kernel grid error in cudf::set_null_masks if no bitmasks to set (#18553) @mhaseeb123
  • Adjust cudf Python groupby test for cuCollections update (#18550) @mroeschke
  • Refactor scan test I/O logic into shared make_partitioned_source helper (#18542) @Matt711
  • Download build artifacts from Github for CI jobs (#18539) @VenkateshJaya
  • Update hypothesis version (#18537) @galipremsagar
  • Make Python testing dependencies more specific to pylibcudf vs cudf (#18535) @mroeschke
  • Pin hypothesis<6.131.1 due to performance issues (#18532) @mroeschke
  • Deduplicate parquet physical type enums (#18526) @mhaseeb123
  • Reduce the number of miscellaenous pandas unit tests run with cudf.pandas (#18524) @mroeschke
  • Improve nvtext::tokenize_with_vocabulary performance (#18522) @davidwendt
  • Make pylibcudf.Column.from_rmm_buffer a Python staticmethod (#18521) @mroeschke
  • Add more short circuit checks for .equals (#18520) @mroeschke
  • Add synchronous task scheduler to cudf-polars (#18519) @rjzamora
  • Don't fetch dlpack headers when building cuDF Python (#18518) @mroeschke
  • Refactor polars configuration (#18516) @TomAugspurger
  • Refactor internal strings utility to separate header and definition file (#18514) @davidwendt
  • Fix print() keyword argument in cudf pandas test (#18513) @trxcllnt
  • Improve performance of strings split-record on whitespace (#18510) @davidwendt
  • Use cuda::std::iter_value_t instead of thrust iterator traits (#18509) @miscco
  • Remove redundant task-graph logic for streaming GroupBy (#18507) @rjzamora
  • Replace GPU_ARCHS build variable by CMAKE_CUDA_ARCHITECTURES (#18506) @ttnghia
  • Optimize pandas metadata generation to reduce memory pressure (#18505) @galipremsagar
  • Replace deprecated host_buffer in favor of host_span in SourceInfo (#18503) @Matt711
  • Add pylibcudf.Column.from_rmm_buffer (#18502) @mroeschke
  • Replace thrust functors with libcu++ ones (#18500) @miscco
  • Rename cudf-polars executors (#18499) @rjzamora
  • Remove casting functions in pylibcudf utils (#18497) @Matt711
  • Increase wheel size limit. (#18487) @bdice
  • Add CategoricalIndex.from_codes (#18485) @mroeschke
  • Split join header (#18484) @shrshi
  • Fix unspecified behavior involving move semantics and order of evaluation (#18481) @kingcrimsontianyu
  • Remove need for to_cudf_compatible_scalar (#18477) @mroeschke
  • Rerun flaky pytests in CI (#18476) @galipremsagar
  • Vendor RAPIDS.cmake (#18473) @bdice
  • Add ARM conda environments. (#18470) @bdice
  • Bump polars version to <1.28 (#18469) @Matt711
  • Add sink support in cudf_polars (#18468) @mroeschke
  • Enable rapidsmpf spilling in cudf-polars (#18461) @madsbk
  • Promote Parquet type enums to enum classes (#18441) @mhaseeb123
  • Consolidate logic in DataFrame.init for listlike arguments (#18439) @mroeschke
  • Update compression formats supported in JSON reader (#18438) @shrshi
  • Disabled Jitify Minification (#18436) @lamarrr
  • Fix printing decimal128 types that are zero (#18435) @trxcllnt
  • Replace direct use of nvCOMP and of its adapter with the higher-level decompression API (#18434) @vuule
  • Add more cudf.DataFrame constructor pytest benchmarks (#18433) @mroeschke
  • Test against stable tags for narwhals (#18431) @Matt711
  • Refcount-based dropping of cached evaluations in cudf-polars executor (#18430) @wence-
  • Replace Thrust iterator facilities with libcu++ ones (#18427) @miscco
  • Remove numpy requirement when converting 2d cuda array interface objects to pylibcudf Columns (#18426) @Matt711
  • Share more cudf.Column methods for indices_of/isin (#18423) @mroeschke
  • Switch the ptr type in gpumemoryview from Py_ssize_t to uintptr_t (#18419) @Matt711
  • Add strings::extract_single API (#18417) @davidwendt
  • Add to_arrow_host_stringview interop API (#18416) @davidwendt
  • Start removal of vector factories with _sync suffix by deprecating them and adding versions without the suffix (#18414) @vuule
  • Allow polars arrow conversion to produce string_view (#18413) @wence-
  • Change dask_cudf.to_parquet behavior for local filesystems (#18408) @rjzamora
  • Add rank and label_bin methods to ColumnBase (#18407) @mroeschke
  • Improve performance of strings::like for long strings (#18406) @davidwendt
  • Automatic single-partition fallback in cudf-polars (#18405) @rjzamora
  • Remove _sync suffix from hostdevice types (#18404) @vuule
  • Use owning Arrow types in C++ to expose data to Python (#18402) @vyasr
  • add static push and pop methods to NvtxRange (#18401) @zpuller
  • Deprecate cudf.Scalar (#18394) @mroeschke
  • Bump polars version to <1.27 (#18387) @Matt711
  • Branch 25.06 merge 25.04 (#18380) @Matt711
  • Silence warning by setting BUILD_SHARED_LIBS (#18371) @vyasr
  • Rewrite groupby aggregations in cudf-polars to simplify evaluation (#18369) @wence-
  • Pass stream through when taking ownership from libcudf (#18367) @wence-
  • Expose new grouped_range_rolling API in pylibcudf (#18365) @wence-
  • Avoid patching sort algorithms from CCCL (#18364) @miscco
  • Deprecate old nvtext::normalize_characters (#18360) @davidwendt
  • refactor(rattler): enable strict channel priority for builds (#18358) @gforsyth
  • Optimize sequences by introducing make_offsets_child_column (#18357) @ustcfy
  • Decompress all data in a single decompress_page_data when reading Parquet input in a single chunk (#18352) @vuule
  • Moving wheel builds to specified location and uploading build artifacts to Github (#18346) @VenkateshJaya
  • Performance improvement for to_lower/to_upper for multi-byte UTF-8 characters (#18345) @davidwendt
  • Branch 25.06 merge branch 25.04 (#18344) @vyasr
  • Use dask-cuda for cudf-polars experimental testing (#18343) @rjzamora
  • Deprecate nvtext subword tokenizer (#18334) @davidwendt
  • Remove cudf.Scalar in as_column (#18331) @mroeschke
  • Add tests for cudf.polars to be able to work on a cpu-only machine (#18327) @galipremsagar
  • Allow cudf.DataFrame.from_pylibcudf to accept a pylibcudf.io.TableWithMetadata (#18319) @mroeschke
  • Avoid stateful construction in DataFrame.__init__ (#18306) @mroeschke
  • Improve the groupby performance for extremely low cardinality (#18290) @PointKernel
  • Remove extranous modules from top level cudf namespace (#18287) @mroeschke
  • Require type annotations in cudf.polars (#18285) @TomAugspurger
  • Removing unnecessary StreamSynchronization in reading (#18279) @JigaoLuo
  • Update to CCCL 2.8.x with no CCCL patches (#18235) @bdice
  • Reduce register pressure for compute_column_kernel (#18226) @matal-nvidia
  • Use the mapped buffer for all read operations in the memory-mapped source; switch default source to the kvikIO one (#18204) @vuule
  • Improve test coverage in the catboost integration tests (#18126) @Matt711
  • Create file sources in parallel (#18094) @vuule
  • Enable stumpy_distributed tests (#17969) @galipremsagar
  • Refactor distinct join to use primitive row operators when proper (#17726) @PointKernel
  • Update chunked parquet reader benchmarks (#16543) @sdrp713
Source: README.md, updated 2025-06-05