Download Latest Version kubectl-ray_v1.5.0-rc.0_darwin_amd64.tar.gz (34.3 MB)
Email in envelope

Get an email when there's a new version of KubeRay

Home / v1.5.0-rc.0
Name Modified Size InfoDownloads / Week
Parent folder
kuberay_1.5.0-rc.0_checksums.txt 2025-10-30 438 Bytes
kubectl-ray_v1.5.0-rc.0_darwin_amd64.tar.gz 2025-10-30 34.3 MB
kubectl-ray_v1.5.0-rc.0_darwin_arm64.tar.gz 2025-10-30 33.0 MB
kubectl-ray_v1.5.0-rc.0_linux_amd64.tar.gz 2025-10-30 34.1 MB
kubectl-ray_v1.5.0-rc.0_linux_arm64.tar.gz 2025-10-30 32.1 MB
README.md 2025-10-30 88.0 kB
v1.5.0-rc.0 source code.tar.gz 2025-10-30 9.1 MB
v1.5.0-rc.0 source code.zip 2025-10-30 9.6 MB
Totals: 8 Items   152.3 MB 1

Changelog

  • [d59cbd] Fix rayClusterScaleExpectation deletion to use request object when instance is nil (#4039)
  • [480e12] Inject the --block option to ray start command automatically (#932)
  • [785077] Remove ray-cluster.without-block.yaml (#675)
  • [38ac16] [Telemetry] Inject env identifying KubeRay. [#562]
  • [97425f] AGC gateway api example (#4076)
  • [dc203e] Add DeepSeek example RayService (#3838)
  • [7c0aa6] Add FAQ page (#1150)
  • [dcb97c] Add Grafana Dashboard for KubeRay Operator (#3676)
  • [bf7e49] Add Helm chart unit tests to ray-cluster (#3374)
  • [113909] Add Helm chart unittests to CI (#3280)
  • [d0e8b5] Add KubeRay e2e Test for custom idleTimeoutSeconds with v2 Autoscaler (#2725)
  • [6818a0] Add KubeRay related blogs (#1147)
  • [042e6b] Add NumOfHosts to RayCluster helm-chart template (#1969)
  • [5adc91] Add NumOfHosts to WorkerGroupSpec (CRD change only) (#1834)
  • [80a6d5] Add Ray cluster spec for TPU pods (#1292)
  • [abb529] Add RayCluster YAML for verl example (#3833)
  • [f232b5] Add RayClusterProvisioned Condition Type (#2301)
  • [cbaf5d] Add RayClusterReady Condition Type (#2271)
  • [732453] Add RayJob training example using pytorch resnet image classifier (#2107)
  • [a0afea] Add RayService Manifests for Stable Diffusion TPU Examples (#2198)
  • [753dc0] Add RayService sample test (#1377)
  • [c83511] Add TPU to Known Custom Accelerators for generated rayStartCommand (#2495)
  • [f05fa2] Add ray.io/originated-from labels (#1830)
  • [87407a] Add a document for profiling (#1299)
  • [08792c] Add a document to outline the default settings for rayStartParams in Kuberay (#1057)
  • [b92d95] Add a flag to enable/disable worker init container injection (#1069)
  • [885108] Add a grouping for 'google.golang.org/*' to avoid inconsistency between sub-projects (#3470)
  • [ceb9f0] Add a sample RayJob to fine-tune a PyTorch lightning text classifier (#1891)
  • [a85149] Add a test util function for killing the head Pod and wait (#3890)
  • [073de1] Add a util function to convert string and bytes array (#2621)
  • [3e20a9] Add a variant of the ray data processing job with GCSFuse CSI driver (#2401)
  • [7fe405] Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. (#1102)
  • [8b61b7] Add all and worker node type to kubectl ray log (#2442)
  • [966d9b] Add apply configurations to generated client (#1818)
  • [ef0129] Add basic Helm chart unittests for kuberay-operator (#3253)
  • [cd239a] Add basic e2e test for kubectl plugin (#2287)
  • [e1edb4] Add batch-scheduler option, deprecate enable-batch-scheduler option (#2300)
  • [e330c0] Add common containerEnv section to Helm Chart (#1932)
  • [70ef24] Add consistency check for deepcopy generated files (#1127)
  • [36267e] Add dashboard component to master (#3566)
  • [cb2914] Add deletecollection for multi-namespace role (#2) (#2231)
  • [8bb322] Add dependabot.yml for enabling "Dependabot version updates" (#3357)
  • [9a0b9d] Add dnsConfig to head, worker and additional workers (#2377)
  • [80ce66] Add documentation for API Server monitoring (#1479)
  • [0bd28e] Add documentations for the release process of Helm charts (#723)
  • [7f3fe8] Add e2e KubeRay operator upgrade test (#3060)
  • [e9f315] Add e2e test for kubectl ray job submit (#2614)
  • [4fc48c] Add e2e test make sure resource quota error is surfaced (#3087)
  • [ff4592] Add end to end tests to apiserver (#1460)
  • [4b0f7c] Add env and patch permission. (#740)
  • [1bcfa9] Add env variable comment to kuberay-operator
  • [d93c3c] Add example and tutorial to explain how to create custom metrics for Prometheus (#914)
  • [a34a42] Add flag leader-election-namespace (#1624)
  • [a2ebc6] Add gofumpt instructions from internal doc (#1180)
  • [044008] Add instruction to skip unit tests in DEVELOPMENT.md (#1171)
  • [abafd1] Add kubectl plugin with basic command and deprecate cli (#2243)
  • [3e6860] Add kubectl ray cluster log command (#2296)
  • [12babc] Add kubectl ray create cluster (#2607)
  • [61a282] Add kubectl ray delete rayservice/job/cluster (#2635)
  • [8c64e6] Add kubectl-plugin pre-commit (#2255)
  • [11c75e] Add kuberay operator servicemonitor (#3717)
  • [25d556] Add kubernetes dependency in python client library (#998)
  • [c8f826] Add kubernetes event to inform user of upgrade strategy (#2592)
  • [106f8f] Add missing labels on RayCluster TPU manifests (#1987)
  • [4a12d7] Add more grouping to resolve inconsistencies when bumping versions (#3554)
  • [785602] Add rayVersion in the RayCluster chart (#975)
  • [402176] Add rayjob yaml generation to ray job submit command (#2644)
  • [d22d75] Add release command and guidance for KubeRay cli (#834)
  • [e9544f] Add reminders to avoid RBAC synchronization bug (#576)
  • [08da59] Add seccompProfile to KubeRay operator deployment for PSS compliance (#3931)
  • [522807] Add seccompProfile.type=RuntimeDefault to kuberay-operator. (#1955)
  • [b5b423] Add structured config and default sidecar container configuration (#1822)
  • [224a44] Add support for openshift routes (#1183)
  • [43ed24] Add support for parsing neuron core resource limit and pass it as ray… (#2409)
  • [2de3fe] Add support for pvcs to apiserver (#1118)
  • [3cc611] Add support for tolerations, env, annotations and labels (#1070)
  • [aeba37] Add test for autoscaler and its desired state (#2601)
  • [76633c] Add test for configurable k8s job backoff limit (#2134)
  • [865aff] Add tools and docs for changelog generator (#833)
  • [e36183] Add top-level Labels and Resources Structed fields to HeadGroupSpec and WorkerGroupSpec (#4106)
  • [36102a] Add topology spread constraints test for RayCluster (#2472)
  • [658bd9] Add unit test for cluster get and add steps in workflows (#2263)
  • [e6722b] Add v4 TPU manifests samples (#1968)
  • [33ccc9] Add v6e TPU Ray CR Manifests (#2445)
  • [b22792] Add vLLM TPU example RayService manifest (#3000)
  • [f8ed87] Add validating webhook (#1584)
  • [ecd6ec] Add validation for RAY_enable_autoscaler_v2 environment variable (#3963)
  • [d950d5] Add volcano taskSpec annotations to pod (#1754)
  • [925eff] Add workerGroupSpec.idleTimeoutSeconds to v1 RayCluster CRD (#2558)
  • [4e1454] Added Pod securityContext value to Helm charts (#2160)
  • [1f728c] Added Python API server client (#1561)
  • [280902] Added Ray-Serve Config For LLMs (#3517)
  • [bc9067] Added security to the API server (#1677)
  • [ccd88c] Added support for ephemeral volumes and ingress creation support (#1409)
  • [803374] Adding API server support for service account (#1148)
  • [9af821] Adding a test for the document for the Pod security standard (#866)
  • [6e4ac2] Adding capability to create ray cluster with serve support -clean (#1672)
  • [d10103] Adding example of manually setting up NGINX Ingress (#699)
  • [61adf5] Align Init Container's ImagePullPolicy with Ray Container's ImagePullPolicy (#1080)
  • [761559] Align RayJob's ManagedBy with RayCluster's ManagedBy. (#2630)
  • [584da5] Alkanso/python client (#901)
  • [59d703] Allow E2E tests to run with arbitrary k8s cluster (#1306)
  • [c857ca] Allow annotations in ray cluster helm chart (#574)
  • [847585] Allow app.kubernetes.io/component to be overriden (#3198)
  • [828afb] Allow configuration of restartPolicy (#2197)
  • [153f35] Allow manually creating init containers in Kuberay helm charts (#1287)
  • [ff66bc] Allow to install and remove operator via scripts (#1545)
  • [4892ac] Api server makefile (#1301)
  • [f0b5ea] Api server refactor/allow multiple job statuses in jobe2e (#3363)
  • [7de5f1] Api server refactor/allow multiple job statuses in servicee2e (#3375)
  • [6901e4] Best practice for fault-tolerant redis with kuberay (#2684)
  • [be4f98] Build Headless Service for Multi-Host TPU Worker Pods (#1920)
  • [51b64f] Buildkite autoscaler e2e (#2199)
  • [6c235d] Bump @babel/runtime from 7.24.1 to 7.27.1 in /dashboard (#3591)
  • [530318] Bump Kubernetes dependencies to v0.34.x (#4147)
  • [91245a] Bump braces from 3.0.2 to 3.0.3 in /dashboard (#3590)
  • [385814] Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)
  • [410e8f] Bump github.com/Masterminds/semver/v3 in /ray-operator (#3500)
  • [168dd4] Bump github.com/emicklei/go-restful in /ray-operator (#1348)
  • [2590a0] Bump github.com/jarcoal/httpmock from 1.2.0 to 1.4.0 in /ray-operator (#3536)
  • [196f95] Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /apiserver (#3475)
  • [00b4b1] Bump github.com/prometheus/client_golang in /apiserver (#3394)
  • [2a2042] Bump github.com/rs/zerolog from 1.33.0 to 1.34.0 in /apiserver (#3393)
  • [4a4471] Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 in /kubectl-plugin (#3499)
  • [587c6f] Bump go to 1.22.4 to fix ray-operator vulnerabilities (#2325)
  • [00c926] Bump go.mongodb.org/mongo-driver from 1.3.4 to 1.5.1 in /apiserver (#1407)
  • [f6a5c7] Bump golang.org/x/net from 0.14.0 to 0.17.0 in /experimental (#1701)
  • [7e7262] Bump golang.org/x/net from 0.26.0 to 0.33.0 in /proto (#2723)
  • [3a6aac] Bump golang.org/x/net from 0.33.0 to 0.38.0 in /experimental (#3407)
  • [4a3a37] Bump golang.org/x/net in /cli (#1405)
  • [a8f730] Bump golang.org/x/net in /proto (#1345)
  • [aafe2e] Bump golang.org/x/net to v0.33.0 fix upstream vulnerability (#2799)
  • [26deb4] Bump golang.org/x/sys in /cli (#1347)
  • [2292e6] Bump golang.org/x/sys in /proto (#1346)
  • [53b702] Bump golang.org/x/text from 0.3.5 to 0.3.8 in /proto (#1344)
  • [a9255c] Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /cli (#2229)
  • [8bdd7d] Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /experimental (#2248)
  • [0d1629] Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /cli (#1993)
  • [7d49b2] Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /experimental (#1992)
  • [667142] Bump google.golang.org/protobuf from 1.34.2 to 1.36.6 in /experimental (#3395)
  • [877832] Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 in /apiserver (#3391)
  • [f605b6] Bump nanoid from 3.3.7 to 3.3.11 in /dashboard (#3589)
  • [05b77e] Bump next from 15.2.3 to 15.2.4 in /dashboard (#3709)
  • [1c07bc] Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.20.4 in /apiserver (#3392)
  • [06ccd0] Bump the golangci-lint version in the api server makefile (#1342)
  • [16e44d] Bump the google-golang group across 5 directories with 3 updates (#3493)
  • [102d9e] Bump the kubernetes group across 3 directories with 9 updates (#3390)
  • [8fcfb9] Bump tj-actions/verify-changed-files in /.github/workflows (#1795)
  • [3738f7] CVE fix - Upgrade golang.org/x/net (#2081)
  • [df0565] Change Kuberay operator Deployment strategy type to Recreate (#566)
  • [085dbb] Change the rules in role.yaml and multiple_namespaces_role.yaml to use the same template in _helpers.tpl to ensure consistency. (#2244)
  • [7c6aed] Changes required make a build after update of component-base (#3004)
  • [bf3fd6] Check existing pods for suspended RayCluster before calling DeleteCollection (#1745)
  • [db42cc] Chore: fix indentation issues in RayJob sample YAML (#3874)
  • [046d4c] Clean up WorkersToDelete field during the CI test (#1763)
  • [8282e6] Configuration Test Framework Prototype (#605)
  • [f52b8b] Connect Ray client with TLS using Nginx Ingress on Kind cluster (#1051)
  • [86506d] Convert byte slice and string without copy (#2628)
  • [1c6c4a] Correct sumGPUs to include MIGs in count (#3933)
  • [cce10a] Cross-reference docs. (#703)
  • [4b5085] Customize the Prometheus export port (#954)
  • [561a09] Delete [raycluster|rayjob|rayservice]_types_test.go unnecessary tests (#2935)
  • [3a7a17] Delete ray_v1alpha1_rayjob.batch-inference.yaml (#1360)
  • [7bc9c9] Dependencies: Upgrade golang.org/x packages (#1281)
  • [983137] Deprecate Kuberay CLI for Ray Kubectl plugin (#2246)
  • [197fcc] Do not update pod labels if they haven't changed (#1304)
  • [1cbac5] Documentation and example for running simple NLP service on kuberay (#1340)
  • [2ac9c4] Don't print redundant time unit in the log message (#2335)
  • [ee0a89] Don’t assign the rayv1.Failed to the State field (#2258)
  • [cf1c6f] Downgrade kind from to v0.20.0 to v0.11.1 (#1313)
  • [33ba38] Drop unused configmaps/status permission + configurable binary path (#2478)
  • [d1d9e2] Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876)
  • [efed87] Enhancements to e2e test, adding Autoscaling (#1765)
  • [dbcc68] Ensure all temp files are deleted after the compatibility test (#886)
  • [649074] Ensure container ports without names are also included in the head node service (#891)
  • [e93ebc] Example Pod to connect Ray client to remote a Ray cluster with TLS enabled (#994)
  • [2d5200] Example RayCluster spec with Labels and label_selector API (#4136)
  • [28d07c] Expose entire head pod Service to the user (#1040)
  • [c610f7] Expose security context in helm chart. (#773)
  • [e430a9] Exposing Serve Service (#1117)
  • [dc17fb] Exposing min/max replica counts for default worker group (#1963)
  • [ba50bf] Fall back to CPU requests if limit is not specified (#2365)
  • [f6b4f1] Feature/cron scheduling rayjob 2426 (#3836)
  • [35fe6f] Fix CI (#1145)
  • [271b25] Fix FromAsCasing warning. (#2830)
  • [f5fb7d] Fix Log to indicate we are Using DashboardPort in RayService (#2001)
  • [1ced2b] Fix RayCluster auth sample to include --config-file in kube-rbac-proxy (#2604)
  • [bc0562] Fix apiserver linter (#3296)
  • [348ef3] Fix broken link in documentation (#3697)
  • [7e21b5] Fix duplicated volume issue (#690)
  • [389ba0] Fix finalizer typo and re-create manifests (#631)
  • [c9665d] Fix for Sample YAML Config Test - 2.4.0 Failure due to 'suspend' Field (#1096)
  • [418247] Fix for deprecate-cli deploy error (#2251)
  • [3571d5] Fix in HeadPod Service Generation logic which was causing frequent reconciliation (#1056)
  • [0d848f] Fix incorrect comment in raycluster_controller.go (#3003)
  • [5769a6] Fix issue where unescaped semicolons caused task execution failures. (#3691)
  • [19ddf0] Fix issue with head pod not monitered by Prometheus under certain condition (#963)
  • [3c53af] Fix issue with operator OOM restart (#946)
  • [7dcdb2] Fix light weight job submitter e2e flaky test (#4092)
  • [5bf70e] Fix logging issue for FetchHeadServiceURL (#2216)
  • [648d84] Fix misconfiguration. (#602)
  • [926515] Fix mkDocs (#1448)
  • [ed4b75] Fix ray nightly image env var setup (#3826)
  • [2b8947] Fix release actions (#1323)
  • [422098] Fix typo (#1232)
  • [8b2acf] Fix typo (#1241)
  • [940449] Fix typo in DEVELOPMENT.md (#1698)
  • [ed7f3d] Fix upgrade gomega (#3483)
  • [047699] Fix v6e TPU Scripts and RayJob CRs (#2447)
  • [242c7b] Fix versioning in sample manifests (#1857)
  • [8d25e9] Fix/make helm and kustomize consistent (#2624)
  • [321f98] Fix: Helm lint and test CI failed (#3505)
  • [cc6b7b] Fix: Typo (#1295)
  • [86f896] Fixed download URL for Helm chart (#573)
  • [4432b7] Fixed processing of job submitter (#1562)
  • [1ec290] Fixed the issue with jobSubmitter resources (#1676)
  • [2c26cf] Fixes to shorten generated Route name with consideration for namespace (#1883)
  • [91361e] Fixing Python client handling of env from (#1845)
  • [5c9db5] Flip Min and max replicas for apiserver workerNodeSpec (#1638)
  • [477282] Follow up 3992: Remove logs and add comments (#4006)
  • [e7f0c2] Generate RayCluster Hash on KubeRay Version Change (#2320)
  • [cb86f9] Get details of only declarative serve apps (#4084)
  • [de675e] Handle nil HostPath type in GetVolumeHostPathType and add unit tests. (#3965)
  • [2d38f5] Helm chart ray-cluster template reference fix (#1469)
  • [607ac1] Helm: add service type configuration to head group for ray-cluster (#614)
  • [bdbf37] Improve Grafana Dashboard (#3734)
  • [1634d7] Improve flexibility in RayCluster yaml test (#1812)
  • [eb66a2] Improve log message wording when service already exists during reconciliation (#4096)
  • [753429] Improve the observability of the init container (#1149)
  • [419987] Include KUBERAY_VERSION in the user-agent (#2042)
  • [9c55fc] Increase head node memory limit for RayService sample to avoid OOM (#4089)
  • [656602] Increase rayJob e2e timeout (#4124)
  • [8fb4ee] Increased time precision using uint (#1675)
  • [6823da] Init dashboardClientFunc and httpProxyClientFunc by the config arg (#2092)
  • [87dde2] Inject cluster name as an environment variable into head and worker pods (#934)
  • [928d69] Integrate with rayci (#3215)
  • [738801] Integration: KAI Scheduler (#3886)
  • [ca9348] Kuberay 0.5.0 docs validation update docs for GCS FT (#1004)
  • [c7edea] Make KubeRay Operator Image FIPS compliant (#1633)
  • [9662bd] Make k8s job backoff limit configurable for RayJob (#2091)
  • [8ec59e] Make sure kubectl ray logs only get ray container logs (#2649)
  • [1d98fe] MobileNet example (#1175)
  • [2bb04c] Move BatchSchedulerManager into reconciler option (#3935)
  • [5fde3c] Move matching labels to association.go (#2734)
  • [249610] Numerous fixes to the API server to make RayJob APIs working (#1447)
  • [974bed] One word typo fix in docs and README (#1068)
  • [a1ef76] Only build/push Multi Arch images when merging to master (#1764)
  • [510827] Only try once in HTTP health check commands (#3469)
  • [e7fbf7] Operator support for openShift (#1371)
  • [fe2940] Parametrize ray operator makefile to support other container engines (#1121)
  • [2c97ac] Pin operator version in single namespace installation (#1210)
  • [d0b633] Pin to working config + stable release (#3885)
  • [44fc97] Post release 1.0.0 (#1651)
  • [fc1e2d] Post release 1.1.0 (#2040)
  • [14f96f] Properly set env field based on containerEnv values (#2175)
  • [56b4d1] Publish Multi Arch images (#1716)
  • [346ddd] Ray serve gke gateway ingress (#1978)
  • [b8f6d0] RayCluster Headless Worker Service Should PublishNotReadyAddresses (#2375)
  • [0a3c18] RayCluster Helm: Make volumeMounts and volumes optional for workers (#1689)
  • [15daa5] RayCluster updates status frequently (#1211)
  • [8e3296] RayClusterProvisioned status should be set while cluster is being provisioned for the first time (#2304)
  • [362da3] RayJob Volcano Integration (#3972)
  • [acafbf] RayJob: don't delete submitter job when ShutdownAfterJobFinishes=true (#1881)
  • [0216b3] RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates (#1852)
  • [621e9c] RayService event can't set redis password in both GCSFaultTolerance and rayStartParam (#3153)
  • [795db0] RayService object's Status is being updated due to frequent reconciliation (#1065)
  • [aeb8b0] RayService: Omits Min and Max replicas from hash calculation (#2172)
  • [6c4a77] Rayjob event can't set redis password in both GCSFaultTolerance and rayStartParam (#3093)
  • [26372c] Read cluster domain from env (#951)
  • [62ad93] Refactor Apiserver e2e run in cluster (#3529)
  • [f0ff2c] Refactor UpgradeStrategy to UpgradeSpec.Type (#2678)
  • [79c6c2] Refactor configuration test framework to follow Pylint conventions (#671)
  • [ec642e] Refactor multiple cases in single test function with array (#2857)
  • [160ab1] Refactor to Ensure Consistent Use of CRDType (#1892)
  • [a7197c] Refactor validateRayServiceSpec (#2711)
  • [5f158a] Release v0.5.0 doc validation (#997)
  • [31c1e6] Release v0.5.0 doc validation part 2 (#999)
  • [9dd516] Release v0.5.0 python client library validation (#1006)
  • [e4e872] Release v0.6.0 doc validation (#1271)
  • [f256dd] Remove GOARCH in ray-operator/Dockfile to support multi-arch images (#1442)
  • [728e1c] Remove ray-pod.tls.yaml (#3762)
  • [22cc61] Remove default option for batch scheduler name (#2371)
  • [c3b17f] Remove extranous arguments from examples (#2051)
  • [e2e420] Remove generate target from build/test targets (#1874)
  • [910943] Remove helm-chart-releaser (#721)
  • [16fd58] Remove ingress.enabled from KubeRay operator chart (#812)
  • [cc2e14] Remove kustomize from helm, as it is not required (#1370)
  • [2ae757] Remove miniReplicas in raycluster-cluster.yaml (#1473)
  • [fffe77] Remove preStop hooks from Ray CR Samples (#2724)
  • [fb7a48] Remove redundant log line that is failing golangci-lint (#2366)
  • [1dbd94] Remove unecessary raycluster log in kai-scheduler logger (#3997)
  • [21a361] Remove unused fields from KubeRay operator and RayCluster charts (#839)
  • [5b0b9a] Remove unused icon from dashboard (#3599)
  • [ffda62] Remove vLLM examples in favor of Ray Serve LLM (#3786)
  • [8be0a2] Removed use of the of BUILD_FLAGS in apiserver makefile (#1336)
  • [082389] Reorganize python client library (#984)
  • [e4cf15] Replace kubectl wait command with RayClusterAddCREvent (#705)
  • [4b6f1d] Reuse contexts across ray operator controllers (#1126)
  • [c30fae] Revert "Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)" (#4031)
  • [e77b09] Revert "Disable async serve handler in Ray Service cluster (#447)" (#606)
  • [8b4782] Revert "Feature/cron scheduling rayjob 2426 (#3836)" (#3911)
  • [ba1a00] Revert "Fix issue where unescaped semicolons caused task execution failures. (#3691)" (#3771)
  • [a16f91] Revert "[BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)" (#2529)
  • [064e0e] Revert "[Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)" (#3573)
  • [493eb8] Revert "[CI] Skip redis raycluster test (#1465)" (#1490)
  • [537374] Revert "[CRD] Delete CRD v1alpha1 (#1771)" (#1784)
  • [d8ffec] Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413)
  • [347934] Revert "kubectl ray job submit: provide empty entrypoint (#3127)" (#3165)
  • [5d38ed] Revise sample configs, increase memory requests, update Ray versions (#761)
  • [b2a701] Rewrite detached actor test with go (#2722)
  • [4c2c04] Set imagePullPolicy in manager.yaml (#1710)
  • [6359d3] Show cluster name in kubectl get rayjob (#2065)
  • [d0683a] Single go.mod file (#3640)
  • [7f77e4] Standardize imports of github.com/ray-project/kuberay/ray-operator/apis/ray/v1alpha1 (#1112)
  • [979b90] Support --address flag for kubectl ray job submit (#3922)
  • [72a63a] Support Apache YuniKorn as one batch scheduler option (#2184)
  • [869409] Support disable leader election for manager go binary via Values.yaml to mitigate kuberay restarts (#2262)
  • [f27e4a] Support for Image pull policy (#2101)
  • [df5577] Support gang scheduling with Apache YuniKorn (#2396)
  • [79f757] Support json structured logging (#1912)
  • [86abaa] Support suspension of RayClusters (#1711)
  • [55b99e] Support to set QPS and burst by configuration. (#3969)
  • [413b8a] Support uppercase default resource names for top-level Resources (#4137)
  • [dbd6b7] TPU Multi-Host Support (#1913)
  • [3a2be0] Update APIServer docs for release v0.4.0 (#778)
  • [ff8929] Update Autoscaler YAML for the Autoscaler tutorial (#1400)
  • [91921f] Update CHANGELOG for v1.0.0 (#1650)
  • [0becdd] Update Dockerfile to address closed CVEs (#1488)
  • [2057d7] Update Dockerfiles to address CVE-2023-44487 (HTTP/2 Rapid Reset) (#1540)
  • [25eb75] Update GCS fault tolerance YAML (#1404)
  • [87c554] Update KubeRay release documentation (#3226)
  • [0c16aa] Update KubeRay versions. (#821)
  • [7f986b] Update Kuberay doc to version 1.0.0 rc.0 (#1441)
  • [ba5f7e] Update RayCluster values.yaml (#3950)
  • [fbdf31] Update RayServices section title (#3906)
  • [6d0c63] Update TPU Ray CR manifests to use Ray 2.41.0 (#2965)
  • [714aea] Update V6e TPU Ray Samples (#2448)
  • [729c1b] Update Volcano integration doc (#1380)
  • [02135a] Update apiserver chart location in readme (#896)
  • [b438b5] Update bug-report.yml (#1906)
  • [af8fb0] Update contribution doc to show users how to reach out via slack (#936)
  • [aae9fa] Update doc and base image for Go 1.19 (#1330)
  • [06a056] Update feature-request.yml (#1907)
  • [328325] Update gcs-ft.md (#777)
  • [247b7c] Update grafana dashboards to ray 2.49.2 + add README instructions on how to do the update (#4111)
  • [bab00b] Update kind version (#1957)
  • [bde5e9] Update kuberay mcad integration doc (#1373)
  • [38e352] Update latest release to v1.0.0-rc.0 in tests (#1467)
  • [15ce56] Update operator development instruction (#1458)
  • [be1037] Update overwrite-container-cmd example (#1722)
  • [7a185a] Update ray operator Dockerfile (#1213)
  • [944a04] Update ray-operator documentation and image version in ray-cluster.heterogeneous.yaml (#585)
  • [b12a72] Update samples to use Ray 2.41.0 images (#2964)
  • [2e35bf] Update securityContext values.yaml for kuberay-operator to safe defaults. (#1896)
  • [e4a964] Update swagger-initializer.js (#2543)
  • [12c0a9] Update test config (#654)
  • [e6b292] Update update-ray-job.kueue-toy-sample.yaml (#3782)
  • [b9f020] Update v6e-256 KubeRay Sample (#2466)
  • [6b12c1] Updated API server documentation (#1435)
  • [71984f] Updated default timeout seconds for probes (#2265)
  • [f14414] Updates to the apiserver swagger-ui (#1410)
  • [279349] Updating logrus and net packages in go.mod (#1495)
  • [e3bdc8] Upgrade Kubernetes dependencies to v0.28.3 and Golang to 1.20 (#1648)
  • [f2d94f] Upgrade dependencies to address CVEs (#1865)
  • [b73daa] Upgrade golang linter for precommit hook (#3319)
  • [1213d1] Upgrade manifests kustomize v5 (#2352)
  • [31d8a8] Upgrade to Go 1.19 (#1325)
  • [ce960e] Upgrade to address High CVEs (#1731)
  • [9be8ab] Use Go 1.24.0 in go module (#3835)
  • [a4893a] Use ImplementationSpecific in ray-cluster.separate-ingress.yaml (#3781)
  • [838bc1] Use a default user agent 'kuberay-operator' instead of the default user-agent from controller-runtime (#1982)
  • [0fa7d3] Use ctrl log and create logger in function in kai-scheduler (#3995)
  • [484530] Use ctrl logger in Volcano scheduler to include context (#4023)
  • [747708] Use helm-docs to generate README for chart kuberay-operator automatically (#3331)
  • [c1dbdf] Use standard golang image as build image and distroless image as base image for kuberay operator. (#1967)
  • [2e173a] Use webhook.CustomValidator instead of deprecated webhook.Validator. (#2803)
  • [6cbb5d] User longer exec probe timeouts for Head pods (#2353)
  • [f0abc1] [0.4.0 Release] Minor doc improvements (#780)
  • [6f5047] [0.4.0 release] Update changelog for KubeRay 0.4.0 (#836)
  • [c45fcf] [1/N] [Lint] Group imports by sections (#3428)
  • [732a67] [1/N][apiserver] Fix half of linter issues for apiserver (#3328)
  • [b16de0] [2.5.0 Release] Change version numbers 2.4.0 -> 2.5.0 (#1151)
  • [a068e7] [2/N] [Lint] Group imports by sections (#3429)
  • [894470] [2/N] [apiserver] Fix second-half apiserver lint (#3338)
  • [05c5e6] [3/N] [Lint] Group imports by sections (#3430)
  • [1eac37] [API Server] Add Ray Job output - start/end time and ray cluster name (#2533)
  • [f3353b] [API Server] Add security context to Ray Cluster (#2538)
  • [773a47] [API Server] Add v2 related helm (#3677)
  • [846416] [API Server] consolidate e2e test (#3674)
  • [a8ec75] [APIServer][Docs] Identify API server as community-managed and optional (#753)
  • [5c0e2e] [APIserver] [Ray Job] Added Job submission support to the API server (#1639)
  • [796bf0] [Apiserver] Determine the minimum resource requirements for KubeRay API server e2e tests (#3526)
  • [2ba0dd] [Apiserver] Set the right amount of resource in e2e test (#3465)
  • [a361dc] [Apiserver] Use Eventually from Gomega instead of wait from apimachinery (#3433)
  • [af6a00] [Apiserver][Refactor] Use polling in autoscaler e2e test (#3402)
  • [5f5197] [Autoscaler V2] Polish Autoscaler V2 YAML (#2064)
  • [d125ab] [Autoscaler] Improve TestRayClusterAutoscalerAddNewWorkerGroup (#3682)
  • [9e14ba] [Autoscaler] Print the value of WorkerGroupSpec.Replicas (#3005)
  • [c15949] [Autoscaler][Sample] Add comment for AUTOSCALER_UPDATE_INTERVAL_S (#3294)
  • [759ab3] [Autoscaler][Sample] Add comment for RAY_LOGGER_LEVEL (#4104)
  • [3f69f0] [Autoscaler][Test] Fix flaky idleTimeoutSeconds test (#2862)
  • [9c5579] [BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)
  • [9c28b7] [Benchmark] KubeRay memory / scalability benchmark (#1324)
  • [8ad2c1] [Bug] Add default value for entrypoint flags in job_submit.go (#3808)
  • [20636f] [Bug] All worker Pods are deleted if using KubeRay v1.0.0 CRD with KubeRay operator v1.1.0 image (#2087)
  • [7ad3ac] [Bug] Allow zero replica for workers for Helm (#968)
  • [258646] [Bug] Autoscaler doesn't support TLS (#1119)
  • [cceb7a] [Bug] Avoid assigning an entry to a map that is nil (#1715)
  • [ec4018] [Bug] Change image repository for make deploy (#2059)
  • [f56c66] [Bug] Clean up WorkersToDelete after the scaling process finishes (#1747)
  • [39562b] [Bug] Enable ResourceQuota by adding Resources for the health-check init container (#1043)
  • [3f7b34] [Bug] Fail to create ingress due to the deprecation of the ingress.class annotation (#646)
  • [7fd392] [Bug] Fix RayCluster with an overridden app.kubernetes.io/name (#2147) (#2166)
  • [af0c7a] [Bug] Fix flakiness of RayService e2e tests (#1385)
  • [b0096b] [Bug] Fix flaky sample YAML tests (#1590)
  • [f3ec71] [Bug] Fix flaky test: should be able to update all Pods to Running (#893)
  • [c42013] [Bug] Fix null map handling in BuildServiceForHeadPod function (#1095)
  • [f1e961] [Bug] Fix rebase error (#1897)
  • [c683ad] [Bug] Fix the filename of text summarizer YAML (#1415)
  • [cf41e2] [Bug] Issue with glibc version GLIBC_2.34 and GLIBC_2.32 not found in earlier operator tags (#2272)
  • [e4d483] [Bug] KubeRay does not work on M1 macs. (#869)
  • [791ea3] [Bug] KubeRay operator failed to watch endpoint (#2080)
  • [c22fbf] [Bug] KubeRay operator fails to get serve deployment status due to 500 Internal Server Error (#1173)
  • [7aea94] [Bug] KubeRay tries to create ClusterRoleBinding when singleNamespaceInstall and rbacEnable are set to true (#1190)
  • [a0e59b] [Bug] Long image pull time will trigger blue-green upgrade after the head is ready (#1231)
  • [e2a6ae] [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_detached_actor flaky (#619)
  • [1ab5a0] [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_ray_serve flaky (#650)
  • [d46b43] [Bug] Modification of nameOverride will cause label selector mismatch for head node (#572)
  • [cbc9b0] [Bug] Pod reconciliation fails if worker pod name is supplied (#587)
  • [47b4e8] [Bug] Ray operator crashes when specifying RayCluster with resources.limits but no resources.requests (#2077)
  • [52af13] [Bug] RayService restarts repeatedly with Autoscaler (#1037)
  • [ac56e3] [Bug] RayService with GCS FT HA issue (#1551)
  • [2bd5c9] [Bug] Re-enable flaky kubectl plugin e2e test "should reconnect after pod connection is lost" (#3116)
  • [79c7c8] [Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go (#3124)
  • [a87f9a] [Bug] Reconciler error when changing the value of nameOverride in values.yaml of helm installation for Ray Cluster (#1966)
  • [0cabd1] [Bug] Service (Serve) changing port from 8000 to 9000 doesn't work (#1081)
  • [60de97] [Bug] Shallow copy causes different worker configurations (#714)
  • [01b488] [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141) (#4156)
  • [5dab94] [Bug] Submitter K8s Job fails even though the RayJob has a JobDeploymentStatus Complete and a JobStatus SUCCEEDED (#1919)
  • [d05964] [Bug] TestRayServiceInPlaceUpdate is flaky (#2620)
  • [457d67] [Bug] Update wait function in test_detached_actor (#635)
  • [82c925] [Bug] autoscaler not working properly in rayjob (#1064)
  • [3581b9] [Bug] client_golang used by KubeRay has a vulnerability (#728)
  • [2b136c] [Bug] compatibility test for the nightly Ray image fails (#1055)
  • [118673] [Bug] error: git cmd when following docs (#831)
  • [ddb5e5] [Bug] fix RayActorOptionSpec.items.spec.serveConfig.deployments.rayActorOptions.memory int32 data type (#1220)
  • [c88002] [Bug] kubectl plugin e2e test is flaky (#3147)
  • [17264a] [Bug] label rayNodeType is useless (#698)
  • [067295] [Bug] rayStartParams is required at this moment. (#1031)
  • [bc6be0] [Bug][Autoscaler] Operator does not remove workers (#1139)
  • [0d813b] [Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)
  • [deec37] [Bug][Doc] Increase default operator resource requirements, improve docs (#727)
  • [ca929e] [Bug][Doc] fix the link error of operator document (#1046)
  • [d632ac] [Bug][GCS FT] Clean up the Redis key before the head Pod is deleted (#1989)
  • [2019b4] [Bug][GCS FT] Worker pods crash unexpectedly when gcs_server on head pod is killed (#1036)
  • [0e959c] [Bug][RayCluster] Fix RAY_REDIS_ADDRESS parsing with redis scheme and multiple addresses (#1556)
  • [664b19] [Bug][RayJob] Avoid nil pointer dereference (#1756)
  • [5da4a0] [Bug][RayJob] Check dashboard readiness before creating job pod (#1381) (#1429)
  • [5a974f] [Bug][RayJob] Fix FailedToGetJobStatus by allowing transition to Running (#1583)
  • [f10673] [Bug][RayJob] RayJob with custom head service name (#1332)
  • [9b26ba] [Bug][RayService] KubeRay does not recreate Serve applications if a head Pod without GCS FT recovers from a failure. (#1420)
  • [c9802e] [Bug][apiserver] fix apiserver create rayservice missing serve port (#734)
  • [72ca16] [Bug][breaking change] Unauthorized 401 error on fetching Ray Custom Resources from K8s API server (#1128)
  • [f6a172] [Bug][k8s compatibility] k8s v1.20.7 ClusterIP svc do not updated under RayService (#1110)
  • [387535] [Bug][kubectl-plugin] Wrong behavior for InteractiveMode RayJob with BackoffLimit set (#3555)
  • [99505a] [Build][kubectl-plugin] Add release script for kubectl plugin (#2407)
  • [4b7575] [CI] Add kind-in-Docker test to Buildkite CI (#1243)
  • [1d1b8c] [CI] Add apiserver e2e test to buildkite (#3351)
  • [ba6a7a] [CI] Add shellcheck and fix error of it (#2933)
  • [4ca05a] [CI] Add workflow to manually trigger release image push (#801)
  • [268a77] [CI] Auto download golang tools in pre-commit (#2917)
  • [bd7feb] [CI] Bump Go version to 1.23 to support E2E Operator Version Upgrade tests (#3406)
  • [0e5338] [CI] Change Pre-commit-shellcheck-to-shellcheck-py (#2974)
  • [4db24e] [CI] Composable kube resource logger when test failed (#3070)
  • [7542c5] [CI] Create release tag for ray-operator Go module (#1574)
  • [e595ee] [CI] Deflaky TestRayServiceGCSFaultTolerance (#2660)
  • [03f1a2] [CI] Don't need to publish the security proxy image (#1885)
  • [b56a97] [CI] Don't push new images to DockerHub (#1923)
  • [05e927] [CI] Downgrade runner image from ubuntu-latest to ubuntu-22.04 (#2714)
  • [00abf6] [CI] Enable testifylint empty rule (#2908)
  • [1830a6] [CI] Enable testifylint error-nil rule (#2907)
  • [3e9788] [CI] Enable testifylint expected-actual rule (#2914)
  • [67ed6c] [CI] Enable testifylint float-compare rule (#2910)
  • [17d606] [CI] Enable testifylint require-error rule (#2909)
  • [bc2bd7] [CI] Enable testifylint bool-compare rule (#2911)
  • [2ac2a9] [CI] Enable testifylint formatter rule (#2915)
  • [cdee6f] [CI] Enable testifylint len rule (#2945)
  • [4d2795] [CI] Enable testifylint rule (#2896)
  • [67596d] [CI] Fix MultiArch image push (#3575)
  • [a1e8c5] [CI] Fix RayService CI (#2525)
  • [2a9e64] [CI] Fix apiserver test in image-release process (#1880)
  • [02909a] [CI] Fix autoscaler e2e test flakiness caused by timeout (#3668)
  • [894f31] [CI] Fix image release pipeline (#1878)
  • [0ac994] [CI] Fix lint error (require-error) (#2931)
  • [535a40] [CI] Fix variable initializations used in test case declarations (#1775)
  • [abd3f8] [CI] Fix: /etc/docker/daemon.json: No such file or directory (#3565)
  • [f3ed17] [CI] Generate CRD json schema separately in pre-commit (#2930)
  • [08b990] [CI] Install kuberay operator in buildkite test (#1308)
  • [4bd2da] [CI] Jail flaky test: TestRayServiceInPlaceUpdate (#2638)
  • [fa6772] [CI] Make release.yaml only be triggered manually (#2798)
  • [353e87] [CI] Move e2e tests to buildkite (#2639)
  • [cce897] [CI] Only run test_ray_serve for Ray 2.6.0 and later (#1288)
  • [c7a689] [CI] Pin crd-ref-docs to v0.0.10 (#1988)
  • [39a848] [CI] Pin go version in CRD consistency check (#794)
  • [3db8d2] [CI] Pin kustomize to v5.3.0 (#2067)
  • [4f8505] [CI] Publish KubeRay operator / apiserver images to Quay (#1307)
  • [dec813] [CI] Reenable rayjob sample yaml latest test (#1464)
  • [2c5a6d] [CI] Refactor pipeline and test RayCluster sample yamls (#1321)
  • [77d0bb] [CI] Remove RayService tests from comopatibility-test.py (#1395)
  • [56cdfb] [CI] Remove compatibility-test.py and modified CI (#2882)
  • [0e9d17] [CI] Remove create tag step from release (#3249)
  • [629bc8] [CI] Remove extraPortMappings from kind configurations (#1366)
  • [2e2350] [CI] Remove test_security.py and all python test dependencies in CI (#3123)
  • [1fdf04] [CI] Remove unnecessary kind load $RAY_IMAGE for e2e sample YAML tests (#1863)
  • [085c29] [CI] Remove unnecessary release.yaml workflow (#1168)
  • [a1cf47] [CI] Remove unnecessary sample YAML symbolic links (#2118)
  • [0b6152] [CI] Replace lint CI with pre-commit (#2129)
  • [df7cfe] [CI] Run sample job YAML tests in buildkite (#1315)
  • [4bb122] [CI] Skip kubectl plugin flaky e2e tests (#2800)
  • [84c35a] [CI] Skip redis raycluster test (#1465)
  • [21058d] [CI] Skip the flaky compatibility test test_detached_actor until https://github.com/ray-project/ray/issues/41343 (#1694)
  • [75a63a] [CI] Split Autoscaler e2e tests into 2 buildkite runners (#3715)
  • [028828] [CI] Stop publishing images to DockerHub (#1926)
  • [da763f] [CI] Stop to publish new images to DockerHub (#1702)
  • [83f309] [CI] Unjail TestRayServiceInPlaceUpdate (#2650)
  • [4fbdb9] [CI] Update latest ray version 2.5.0 -> 2.6.3 (#1320)
  • [0561ba] [CI] Upload logs as artifacts to BuildKite (#3405)
  • [ef7cf5] [CI] Use golang:1.24-bookworm (Debian 12) in CI for Python-3.11 support (#3949)
  • [e801dc] [CI] Use quay as the default image registry (#1939)
  • [9e37e1] [CI] Verify kubectl in kind-in-docker step (#1305)
  • [a9aa9a] [CI] apply resource logger to ray cluster test (#3075)
  • [945698] [CI] apply resource logger to ray service test (#3081)
  • [23c9e5] [CI] dump failed test k8s resources (#3025)
  • [3114a0] [CI] fix locust versions (#3100)
  • [60bc89] [CI] fix missing Go module release step (#3644)
  • [c76402] [CI] split rayservice e2e test into another runner and decrease timeout to 30m (#2667)
  • [e9073f] [CI] stream operator logs from kind in go e2e tests (#1793)
  • [03969c] [CI]: Kuberay operator e2e tests (#1575)
  • [f123a4] [CI]: change kubectl plugin e2e test to buildkite (#2861)
  • [bd5376] [CI][#2905] Improvement: enable testifylint compares rule (#2977)
  • [f82e7e] [CI][Buildkite] An example test for Buildkite (#919)
  • [1a8895] [CI][Buildkite] Fix the PATH issue (#952)
  • [0e1c24] [CI][GitHub-Actions] Upgrade actions/upload-artifact to v4 (#2373)
  • [cbde87] [CI][HELM] Use chart-testing to install Helm charts (#3412)
  • [e96ded] [CI][Hotfix] Increase the timeout of Test E2E from 30m to 1h (#2664)
  • [1d4a40] [CI][RayService] deflaky the TestAutoscalingRayService (#3119)
  • [d723f5] [CRD] Delete CRD v1alpha1 (#1771)
  • [77e299] [CRD] Inject CRD version to the Autoscaler sidecar container (#1496)
  • [96c4d6] [CRD] Set maxDescLen to 0 (#1449)
  • [7b00ac] [CRD] Sync v1alpha1 CRD with v1 CRD (#1788)
  • [b7bc7a] [CRD][1/n] Create v1 CRDs (#1481)
  • [1184bc] [CRD][2/n] Update from CRD v1alpha1 to v1 (#1482)
  • [7336ea] [Chore] Add RayJob InteractiveMode sample yaml (#3062)
  • [491fbd] [Chore] Add golangci-lint rules (#2128)
  • [d901fd] [Chore] Add kubectl plugin and dashboard to components in issue template (#3678)
  • [49a572] [Chore] Add pre-commit hooks (#2127)
  • [ca98d1] [Chore] Create example Modin RayJob (#2221)
  • [e0318a] [Chore] Delete redundant pod existance checking (#2113)
  • [41c9e9] [Chore] Fix golangci-lint rule: gosec (#2163)
  • [fb5842] [Chore] Fix lint errors caused by casting int to int32 (#2368)
  • [445b94] [Chore] Improve the appearance of compute resources status in the output of kubectl describe (#1802)
  • [2b31c3] [Chore] Make error as a local variable (#2841)
  • [80ab11] [Chore] Modify pre-commit yaml to allow golangci-lint version with prefix "v" (#2824)
  • [b16fb3] [Chore] Remove CHANGELOG.md (#3819)
  • [3471f9] [Chore] Remove duplicate make command (#4145)
  • [e02751] [Chore] Run operator outside the cluster (#2090)
  • [5d3d9d] [Chore] Turn off golangci-lint rules except ray-operator (#2138)
  • [7a4353] [Chore] Turn off no-commit-to-branch rule (#2139)
  • [7cc354] [Chore] Upgrade Ray to 2.46.0 follow-up (#3722)
  • [35e913] [Chore] Use Ray 2.9.0 for Apache YuniKorn example (#2427)
  • [949875] [Chore] Use new golangci-lint rules only for ray-operator (#2152)
  • [6eeca3] [Chore] Use safe YAML for helm-chart-verify-rbac (#2230)
  • [589414] [Chore] make err as local variable in if-statement (#2718)
  • [d2ae62] [Chore] make ingressClassName as a local variable (#2815)
  • [dd46cb] [Chore] remove redundant var declaration (#2811)
  • [635003] [Chore] remove unnecessary line break in log (#2709)
  • [5db301] [Chore] specify the capacity on calling make (#2719)
  • [20ed56] [Chore] update comment for headGroupSpec and entrypoint (#2802)
  • [d97e37] [Chore][CI] Limit the release-image-build github workflow to only take tag as input (#3117)
  • [9b0eda] [Chore][CI] Remove StreamKubeRayOperatorLogs (#2637)
  • [0c09b0] [Chore][CI] Upgrade ray version to 2.40 except for TestRayServiceInPlaceUpdate (#2629)
  • [7b8197] [Chore][Comment] Fix wrong comment (#2294)
  • [54ba28] [Chore][Linter] Upgrade golangci-lint to 1.60.3 (#2362)
  • [784b7f] [Chore][Log] Delete error loggings right before returned errors (#2103)
  • [b08a5a] [Chore][Minor] Add .gitignore to kubectl-plugin (#2383)
  • [ca7db1] [Chore][RayJob] Remove the TODO of verifying the schema of RayJobInfo because it is already correct (#1911)
  • [351485] [Chore][Sample-yaml] Upgrade pytorch-lightning to 1.8.5 for ray-job.pytorch-distributed-training.yaml (#3796)
  • [296d48] [Chore][Samples] Rename ray-cluster.mini.yaml and add workerGroupSpecs (#2100)
  • [708d75] [Chore][YuniKorn] Add sample yaml file for Apache YuniKorn (#2412)
  • [135f12] [Chore][kubectl-plugin] Fix wrong homepage link in krew template file (#2461)
  • [ab1736] [Chore][precommit] Replace grep with awk in pre-commit hooks for BSD compatibility (#2541)
  • [ea0b9c] [Community] Add KubeRay community guide (#3859)
  • [38a07e] [Community][2/N] Governance model (#3977)
  • [30c5d7] [Compatibility] Update Redis image for compatibility tests (#2852)
  • [c88b17] [DOCS] Apiserver improve docs readability (#3564)
  • [d1b07d] [DOCS] KubeRay APIServer V2 document (#3594)
  • [4ac20b] [DOCS] document step to do before running e2e test (#3385)
  • [aeab36] [Dashboard-client] Add proper error checking in dashboard client (#3953)
  • [39d7e7] [Dashboard-client] replace http method from string to constant (#3961)
  • [b87480] [Doc] Add helm update command to chart validation step in release process (#1165)
  • [656584] [Doc] Add a YAML to explain why some worker pod are not ready in RayService (#3139)
  • [f5e0ef] [Doc] Add blogs and talks to readme (#1691)
  • [1359dd] [Doc] Add git fetch --tags command to release instructions (#1164)
  • [41018b] [Doc] Add gke bucket yaml (#1372)
  • [44ff72] [Doc] Cannot build kuberay with Go 1.16 (#575)
  • [e52dd3] [Doc] Copyedit dev guide (#1012)
  • [ffac2c] [Doc] Delete unused docs (#1440)
  • [83fea9] [Doc] Deprecate ServiceUnhealthySecondThreshold and DeploymentUnhealthySecondThreshold (#1688)
  • [e9a269] [Doc] Develop Ray Serve Python script on KubeRay (#1250)
  • [9c53a7] [Doc] Fix Doc Typos (#2060)
  • [739134] [Doc] Fix Yaml Typos (#2049)
  • [856a33] [Doc] Fix release doc format (#1578)
  • [b26f10] [Doc] Fix the order of comments in sample Job YAML file (#1242)
  • [1ee5f9] [Doc] GKE GPU cluster setup (#1223)
  • [04388d] [Doc] Improve DEVELOPMENT.md by adding more guidances (#1794)
  • [c16cac] [Doc] Improve FAQ page and RayService troubleshooting guide (#1225)
  • [3b8160] [Doc] Improve RayService doc (#1235)
  • [cb1248] [Doc] Reference helm chart version in helm-chart/kuberay-operator/README.md.gotmpl with go template (#3763)
  • [73eef7] [Doc] Remove KubeRay CLI references and add Python client details (#2521)
  • [3754d3] [Doc] Support CRD docs generation (#1625)
  • [cc1ff4] [Doc] Support consistency check for API reference in CI (#1655)
  • [d78d34] [Doc] Update README (#1433)
  • [be22ec] [Doc] Update README (#3695)
  • [6e1f1b] [Doc] Update nav to include missing files and reorganize nav (#1011)
  • [9425e7] [Doc] Update release docs (#1621)
  • [6c0fbb] [Doc] Update version from 0.4.0 to 0.5.0 on remaining kuberay docs files (#1018)
  • [7a1e32] [Doc] Upload a screenshot for the Serve page in Ray dashboard (#1236)
  • [adde70] [Doc] [RayJob] Add documentation for submitterPodTemplate (#1228)
  • [d55dfc] [Doc] add ray cluster uv sample yaml (#3720)
  • [f3ebea] [Doc][CI] Align K8s version in Doc and CI with minimal required version (#3628)
  • [98496f] [Doc][Fix] correct the indention of storageClass in ray-cluster.persistent-redis.yaml (#3780)
  • [167a71] [Doc][Website] Add complete document link (#1224)
  • [fa26bb] [Doc][Website] Update KubeRay introduction and fix layout issues (#1042)
  • [843041] [Docs] Add kubectl plugin create cluster sample yaml config files (#3804)
  • [fd4ab9] [Docs] Align development guide with Makefile docker-build logic (#3248)
  • [89e980] [Docs] Correct command to load KubeRay operator image (#3387)
  • [192d1e] [Docs] Revise release note docs (#835)
  • [36f32e] [Docs] Update Security Guidance on Dashboard Ingress (#1413)
  • [053264] [Docs] add sample RayCluster using kube-rbac-proxy for dashboard access control (#2578)
  • [ebf8a5] [Docs] add sample RayCluster with FluentBit sidecar to persist Ray logs (#2602)
  • [c69314] [Docs] update development md (#3230)
  • [7fb46a] [Docs][Development] Delete linting docs (#2145)
  • [f37a4c] [Docs][kubectl-plugin] Add doc for install via Krew (#2458)
  • [dcbdbf] [Docs][kubectl-plugin] Add instructions for downloading from GitHub release (#2450)
  • [06367a] [Docs][ray-operator] Add types of tests and debug tips to development doc (#3401)
  • [0a56cd] [Enhancement] GPU RayCluster doesn't work on GKE Autopilot (#1470)
  • [eb59de] [Enhancement] Remove unused variables in constant.go (#1474)
  • [e00970] [Experimental] Fix Makefile tool check: replace -s with test -s (#3970)
  • [9e6836] [FEAT] show event message when raycluster not found in clusterSelector in rayjob (#4125)
  • [9321b2] [FIX][DOC] development markdown example (#2687)
  • [35b96f] [Feat] Add e2e test for applying ray-job.interactive-mode.yaml (#3779)
  • [b81af7] [Feat] Add sample yaml for RayJob clusterSelector config (#2505)
  • [6186a7] [Feat] Deprecate ForcedClusterUpgrade (#2075)
  • [f3430b] [Feat] Remove RayService sample YAML Python tests (#2565)
  • [227876] [Feat]: Add a field to configure whether to add a proxy actor on the head Pod to the K8s serve service or not (#2598)
  • [5d3bce] [Feat][Kubectl-Plugin] Implement kubectl session for RayJob and RayService (#2379)
  • [678635] [Feat][Kubectl-Plugin]Implement kubectl ray job submit (#2394)
  • [ea314d] [Feat][RayCluster] Introduce the RayClusterStatus.Conditions field (#2214)
  • [d2b333] [Feat][RayCluster] Make the Head service headless (#2117)
  • [ca39dc] [Feat][RayCluster] Use a new RayClusterReplicaFailure condition to reflect the result of reconcilePods (#2259)
  • [cc94c6] [Feat][RayJob] Delete RayJob CR after job termination (#2225)
  • [cf4a87] [Feat][RayJob] UserMode SubmissionMode (#2364)
  • [6079dc] [Feat][Sample-yaml] Deprecated python sample yaml test cleanup (#2507)
  • [bc61ad] [Feat][apiserver] Support CORS config (#3711)
  • [84839a] [Feat][kubectl-plugin] Add Long, Example, shell completion for kubectl ray log (#2405)
  • [4e3340] [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray get node & workergroup (#3154)
  • [f69885] [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray session (#2390)
  • [800ac1] [Feat][kubectl-plugin] Add instructions for static shell completion (#2384)
  • [bee1b7] [Feat][kubectl-plugin] Add kubectl ray version command (#2424)
  • [32d8cd] [Feat][kubectl-plugin] Create cluster with TPUs (--worker-tpu, --num-of-hosts) and TPUs' validation (#3258)
  • [090fad] [Feat][kubectl-plugin] Include LICENSE file into kubectl plugin tar (#2422)
  • [6e8b0b] [Feat][kubectl-plugin] Retry port-forward when connection lost (#2704)
  • [52e330] [Feat][kubectl-plugin] Support -v flag for kubectl ray job submit (#3524)
  • [39d42f] [Feature] Add Kubernetes manifest validation in pre-commit. (#2380)
  • [f7edc2] [Feature] Add ManagedBy field to RayCluster (#2597)
  • [e6af2c] [Feature] Add ManagedBy field to RayJob (#2589)
  • [4bce73] [Feature] Add a chart-test script to enable chart lint error reproduction on laptop (#563)
  • [99abcc] [Feature] Add a flag to make zero downtime upgrades optional (#1564)
  • [0bbdec] [Feature] Add allow CORS in apiserversdk (#4059)
  • [3fe960] [Feature] Add an e2e test for Autoscaler to scale up by manually updating (#2634)
  • [9d2566] [Feature] Add an e2e test for K8s Job submitter failures (#2688)
  • [96d1ac] [Feature] Add an example for RayService high availability (#1566)
  • [dcaf6a] [Feature] Add apiserver unit test(pkg/util/cluster.go) (#3348)
  • [d56356] [Feature] Add cleanup for terminated RayJob/RayCluster metrics (#3923)
  • [ed4442] [Feature] Add default init container in workers to wait for GCS to be ready (#973)
  • [6b3836] [Feature] Add e2e test for UpdateRayService function (#3446)
  • [668795] [Feature] Add e2e test for setting RayCluster deletion delay in RayService (#3912)
  • [0474e8] [Feature] Add e2e tests for Autoscaler V2 (#2588)
  • [c85646] [Feature] Add eslint and Prettier to ray dashboard (#3975)
  • [5990b0] [Feature] Add initializing timeout for RayService (#4143)
  • [cd9b2e] [Feature] Add python client test to action (#993)
  • [a0ee1c] [Feature] Add service account section in helm chart (#969)
  • [f45155] [Feature] Add timeout for apiserver grpc server (#3427)
  • [39e802] [Feature] Add timestamps for logs in e2e tests (#3006)
  • [7db8f6] [Feature] Add unit test for update service request validation (#3546)
  • [e11a9b] [Feature] Adding RAY_CLOUD_INSTANCE_ID as unique id for Ray node (#1759)
  • [de8bc2] [Feature] Allow RayCluster Helm chart to specify different images for different worker groups (#1352)
  • [002e37] [Feature] Allow custom labels&annotations for kuberay operator (#1276)
  • [c13498] [Feature] Auto detect MIG GPUs and pass them into Ray’s logical resources. (#3567)
  • [34e394] [Feature] Consistency check for RBAC (#577)
  • [633ff6] [Feature] Define a general-purpose cleanup method for CREvent (#849)
  • [e12886] [Feature] Disable zero downtime upgrade for a RayService using RayServiceSpec (#2468)
  • [13eb7b] [Feature] Display reconcile failures as events (ServiceAccount) (#2290)
  • [b6b00c] [Feature] Docker support for chart-testing (#623)
  • [260085] [Feature] Enable namespaced installs via helm chart (#860)
  • [40775c] [Feature] Expose initContainer image in RayCluster chart (#674)
  • [2ee95c] [Feature] Fix auto upgrade prometheus (#3449)
  • [6cbb8e] [Feature] Fix dependency upgrade for gomock (#3558)
  • [471489] [Feature] Improve and fix Prometheus & Grafana integrations (#895)
  • [244003] [Feature] Improve observability for flaky RayJob test (#1587)
  • [c6df15] [Feature] Improve the observability of integration tests (#775)
  • [551de6] [Feature] Include CR UID in kuberay metrics (#4003)
  • [c6bafa] [Feature] Make Ray and Logs links proxy to their Ray dashboards (#4112)
  • [3aebd8] [Feature] Make head serviceType optional (#851)
  • [1ed0b7] [Feature] Make replicas optional for WorkerGroupSpec (#1443)
  • [3bb01e] [Feature] Manually fix controller runtime package upgrade (#3448)
  • [a53d94] [Feature] Manually fix net package upgrade (#3447)
  • [1a94b4] [Feature] Manually upgrade k8s package group (#3486)
  • [8c8222] [Feature] Move some functions from prototype test framework to a new utils file (#837)
  • [c552d3] [Feature] Override the block option of rayStartParams to true (#1718)
  • [2fb946] [Feature] Print KubeRay logs in Buildkite runner when tests fail (#2690)
  • [49e752] [Feature] Provide multi-arch images for apiserver and security proxy (#4131)
  • [78b982] [Feature] REP 54: Add PodName to the HeadInfo (#2266)
  • [ad06bb] [Feature] Ray container must be the first application container (#1379)
  • [fd27b7] [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750)
  • [4fdb87] [Feature] RayService HA test - GCS fault tolerance + kill GCS process (#2590)
  • [dd7ed9] [Feature] Refactor test framework & test kuberay-operator chart with configuration framework (#759)
  • [ffcf70] [Feature] Remove Docker container and NodePort from compatibility test (#844)
  • [3129b8] [Feature] Remove checking CRD in Volcano scheduler initialization (#4011)
  • [d0debd] [Feature] Replace service name with Fully Qualified Domain Name (#938)
  • [1d3f53] [Feature] Run config tests with the latest release of KubeRay operator (#858)
  • [ea6e8d] [Feature] Running end-to-end tests on local machine (#589)
  • [8a35f1] [Feature] Separate controller namespace and CRD namespaces for KubeRay-Operator Dashboard (#4088)
  • [fd06b5] [Feature] Set default appProtocol for Ray head service to tcp (#668)
  • [6691b7] [Feature] Split ray.io/originated-from into ray.io/originated-from-cr-name and ray.io/originated-from-crd (#1864)
  • [a9beaf] [Feature] Support ARM image for test (#2699)
  • [f22a75] [Feature] Support Volcano Network Topology Aware Scheduling for kuberay (#4105)
  • [d6aef8] [Feature] Support Volcano for batch scheduling (#755)
  • [017e58] [Feature] Support configurable RayCluster deletion delay in RayService (#3864)
  • [baccb0] [Feature] Support environment variables for KubeRay operator chart (#978)
  • [a45e4a] [Feature] Support for overwriting the generated ray start command with a user-specified container command (#1704)
  • [6c9f85] [Feature] Support inject specific env vars to all Ray containers in all RayCluster CRs by configuration (#4103)
  • [9bc5d8] [Feature] Support suspend in RayJob (#926)
  • [4aa53f] [Feature] Sync for manifests and helm chart (#564)
  • [56b2f6] [Feature] Sync logs to local file (#632)
  • [ca6d79] [Feature] TLS authentication (#989)
  • [b4b1ce] [Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones (#678)
  • [65a770] [Feature] Test sample RayService YAML to catch invalid or out of date one (#731)
  • [71e260] [Feature] The default ImagePullPolicy should be IfNotPresent (#947)
  • [f6a401] [Feature] Upgrade ginkgo (#3503)
  • [37cf2a] [Feature] Upgrade golang version (#3461)
  • [962077] [Feature] Upgrade grpc gateway version manually (#3491)
  • [1be2ae] [Feature] Upgrade net package (#3485)
  • [f4412f] [Feature] Use image of Ray head container as the default Ray Autoscaler container (#1401)
  • [92c290] [Feature] Validation of RayFTEnabled is false and GcsFaultToleranceOption is not nil (#2726)
  • [fa7491] [Feature] Warn Users When Updating the RayClusterSpec in RayJob CR (#1778)
  • [36b112] [Feature] Watch CR in multiple namespaces with namespaced RBAC resources (#1106)
  • [27728d] [Feature] [API Server] Support activeDeadlineSeconds in API Server RayJob resource (#3335)
  • [bc17cd] [Feature] [Fix] Ensure Correct Logs Display for Go Test Logs in Buildkite Runner (#2837)
  • [bbdff7] [Feature] [KubeRay DashBoard] Reimplement and replace the Compute Template section in the New Job (#4119)
  • [89f5fb] [Feature] [RayJobs] Use finalizers to implement stopping a job upon cluster deletion (#735)
  • [b6bcf1] [Feature] [scheduler-plugins] Support second scheduler mode (#3852)
  • [09aad7] [Feature] integrate RayDashboard with apiserver V2 (#4054)
  • [2db5c5] [Feature] update yarn version from v1 to latest (#3945)
  • [f2d7c1] [Feature]: Add a new event type FailedToDeleteWorkerPodCollection (#2680)
  • [536ca3] [Feature][APIServer v2] Support Compute Template in APIServer v2 (#3959)
  • [491c48] [Feature][APIServer] Support decimal memory values in KubeRay APIServer (#3956)
  • [8a31bf] [Feature][APIServer] add retry for http client (#3551)
  • [5a766f] [Feature][Doc] Access S3 bucket from Pods in EKS (#958)
  • [2a84a4] [Feature][Doc] End-to-end KubeRay Operator development process on Kind (#826)
  • [f4b282] [Feature][Doc] Explain that RBAC should be synchronized manually (#641)
  • [1c648a] [Feature][Doc] Kubeflow integration (#937)
  • [3ac1b5] [Feature][Docs] AWS Application Load Balancer (ALB) support (#658)
  • [056474] [Feature][Docs] Explain how to specify container command for head pod (#912)
  • [cfa120] [Feature][GCS FT] Best-effort redis cleanup job for 5 minutes (#1766)
  • [72ba3a] [Feature][GCS FT] Clean up Redis once a GCS FT-Enabled RayCluster is deleted (#1412)
  • [310911] [Feature][Helm] Align the key of minReplicas and maxReplicas (#663)
  • [0adc50] [Feature][Helm] Enable sidecar configuration in Helm chart (#604)
  • [4e9fdb] [Feature][Helm] Expose the autoscalerOptions (#666)
  • [5ca90b] [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979)
  • [9835cc] [Feature][Observability] Scrape Autoscaler and Dashboard metrics (#1493)
  • [692138] [Feature][Ray-operator] Improve RayJob validation for shutdownAfterJobFinishes and ttlSecondsAfterFinished (#3653)
  • [5231db] [Feature][RayCluster]: Deprecate the RayCluster .Status.State field (#2288)
  • [d02579] [Feature][RayCluster]: Generate GCS FT Redis Cleanup Job creation events (#2382)
  • [5062a8] [Feature][RayCluster]: Implement the HeadReady condition (#2261)
  • [b5f14f] [Feature][RayCluster]: introduce RayClusterSuspending and RayClusterSuspended conditions (#2403)
  • [b2dbb1] [Feature][RayJob] Remove the deprecated RuntimeEnv from CRD. Use RuntimeEnvYAML instead. (#1792)
  • [fab00b] [Feature][RayJob] Support light-weight job submission (#1893)
  • [809bfb] [Feature][RayJob] Support light-weight job submission with entrypoint_num_cpus, entrypoint_num_gpus and entrypoint_resources (#1904)
  • [6d5020] [Feature][RayJob] Use Use RayContainerIndex instead of 0 (#1427)
  • [73e6c5] [Feature][RayJob]: Generate submitter and RayCluster creation/deletion events (#2389)
  • [1283a6] [Feature][RayService] Set default ports (#3262)
  • [72e993] [Feature][autoscaler v2] Set RAY_NODE_TYPE_NAME when starting ray node (#1973)
  • [da78df] [Feature][kubectl-plugin] Expose setting shutdownAfterJobFinishes and ttlSecondsAfterFinished in ray job submit (#3627)
  • [22c2b4] [Feature][kubectl-plugin] Implement kubectl ray session (#2298)
  • [c86b03] [Feature][kubectl-plugin] Quick fix for Job Submission ID (#2469)
  • [4e5a91] [Feature][kubectl-plugin] add KubeRay operator version query (#2443)
  • [6ca956] [Feature][kubectl-plugin] e2e test for 'kubectl ray log' (#2486)
  • [1bc821] [Feature][kubectl-plugin] return usage error when no entrypoint input (#2503)
  • [5cb2f5] [Feature][kubectl-plugin]'ray log command' Add check and cleanup directory when no ray node exist (#2473)
  • [fdf725] [Fix] Adjust crd path to verify changed files (#3103)
  • [a56b09] [Fix] Consistent parsing of custom accelerator resources (#2464)
  • [6e70fd] [Fix] Directly fail if RayJob metadata is invalid (#3981)
  • [2f2c1a] [Fix] RayCluster fails to transit Status.State to Ready when numOfHosts > 1 (#3353)
  • [230081] [Fix] Standardize Buildkite Display Format Across All Tests (#2992)
  • [906810] [Fix] Update Ray Service Troubleshooting Link (#2727)
  • [fd9c90] [Fix] Use go 1.22 on Buildkite autoscaler e2e tests (#2211)
  • [7d53e7] [Fix] changelog-generator.py failed to parse some commit messages (#3818)
  • [955922] [Fix][CI] E2E tests do not reflect error (#3021)
  • [795f79] [Fix][CI] Fix ray operator image build error by setting up docker buildx (#3750)
  • [93e32d] [Fix][CI] Fix revive error (#2183)
  • [5124ef] [Fix][CI] Redirect stderr to stdout in Test Autoscaler E2E (nightly operator) (#3074)
  • [f6bf32] [Fix][CI] kubectl plugin krew index CI error (#3015)
  • [dea87f] [Fix][Envtest] Decorate container nodes with Ordered (#2285)
  • [b903d4] [Fix][HelmChart] Move service.headService -> head.headService in values.yaml (#1998)
  • [990ffe] [Fix][Helm] Fix ClusterRole for volcano if .Values.batchScheduler.name is set (#2474)
  • [c7fe15] [Fix][Operator] Explictly wait for pod not found for satisfying the delete scale exectation (#3520)
  • [6c168a] [Fix][RayCluster] Make the RayClusterReplicaFailureReason to capture the correct reason (#2282)
  • [084368] [Fix][RayCluster] fix missing pod name in CreatedWorkerPod and FailedToCreateWorkerPod events (#3057)
  • [96fbbc] [Fix][RayJob] Invalid quote for RayJob submitter (#2949)
  • [40f5dd] [Fix][RayService] Raise error if spec.rayClusterConfig.headGroupSpec.headService.metadata.name is set (#2440)
  • [efbd35] [Fix][RayService] Use LRU cache for ServeConfigs (#2683)
  • [baa2cc] [Fix][Release] Fix Krew release indenetation error (#3823)
  • [b8c4e5] [Fix][Release] Fix KubeRay dahsboard image build pipeline (#3702)
  • [a69252] [Fix][Sample-Yaml] Increase ray head CPU resource for pytorch minst (#2330)
  • [f68779] [Fix][kubectl-plugin] Create separate namespaces for each kubectl plugin e2e test (#2745)
  • [3efef2] [Fix][kubectl-plugin] Don't print wrapped error for job submit startup (#3027)
  • [c8d34f] [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (#2892)
  • [909f66] [Fix][kubectl-plugin] Release bot opens PRs to Krew repo with unexpected whitespace changes (#3090)
  • [abb0bf] [Fix][kubectl-plugin] Remove controller-runtime logger warning in kubectl ray job submit (#3669)
  • [b8484a] [Fix][kubectl-plugin] Remove filepath.Clean for ray job submit workingDir (#3518)
  • [029cd7] [Fix][kubectl-plugin] make tests use a temporary kube config (#2894)
  • [a86088] [Fix][kubectl-plugin] ray job submit runtime-env-json null error (#3063)
  • [c09415] [Fix][kubectl-plugin]: make version handle digests (#2876)
  • [d8b7c6] [Fix][precommit] Fix pre-commit golangci-lint always success (#2140)
  • [4b4682] [Fix]remove broken link in doc (#3519)
  • [a614b1] [Follow Up][Test] Support to set QPS and burst by configuration (#3999)
  • [4bbaa0] [GCS FT] Add e2e tests for configuring GCS FT with annotations (#2766)
  • [1c9de2] [GCS FT] Consider the case of sidecar containers (#1386)
  • [fe26dc] [GCS FT] Enhance observability of redis cleanup job (#1709)
  • [55b1d3] [GCS FT] Give readiness / liveness probes good default values (#1364)
  • [6fa2d3] [GCS FT] Improve GCS FT cleanup UX (#1592)
  • [7f95a6] [GCS FT] More validations for configuring GCS FT with envs and annotations (#2772)
  • [a81ea8] [GCS FT] Redis e2e cleanup check (#2773)
  • [937297] [GCS FT] Unify configuring Gcs FT into a single function (#2755)
  • [e79e0b] [GCS FT][Refactor] Redefine the behavior for deleting Pods and stop listening to Kubernetes events (#1341)
  • [637522] [Golang] Remove go get (#1283)
  • [10cc89] [Grafana] Add a Cluster variable to the Grafana Dashboard to enable filtering of different RayClusters (#2685)
  • [f6637d] [Grafana] Add flag for enabling auto load dashboards (#3689)
  • [9f013a] [Grafana] Allow auto-load dashboard jsons (#3643)
  • [e89ae3] [Grafana] Update Grafana dashboard (#2106)
  • [848d40] [Grafana] Update Grafana dashboard (#3726)
  • [3425b4] [Grafana] Use PodMonitor instead of ServiceMonitor for the Head Node to avoid metric duplication (#2689)
  • [9e4e70] [Grafana] Use Range option instead of instant (#4062)
  • [627f52] [Grafana][Observability] Embed Grafana dashboard panels into Ray dashboard (#1278)
  • [17ee13] [HELM] Add Helm unit tests for chart kuberay-apiserver (#3361)
  • [9658af] [HELM] Define name templates for all resources (#3381)
  • [5265ee] [HELM] Fix serviceAccount name inconsistency in templates (#3451)
  • [75ea7a] [HELM] Typo correction (operatorComand -> operatorCommand) (#3450)
  • [22f570] [Helm Chart] Set honorLabel of serviceMonitor to true (#3805)
  • [9d4686] [Helm] Add gcsFaultToleranceOptions in RayCluster chart (#3881)
  • [ef9206] [Helm] Add missing environment variables to operator chart (#3867)
  • [611496] [Helm] Add priorityClassName for kuberay-operator chart (#3703)
  • [3a512d] [Helm] Clean up RayCluster Helm chart ahead of KubeRay 0.4.0 release (#751)
  • [799f07] [Helm] Enable leader election when leaderElectionEnabled is not set (#2284)
  • [9296c2] [Helm] Make Kube Client QPS and Burst configurable for kuberay-operator (#4002)
  • [ae9198] [Helm] Make reconcile concurrency configurable for kuberay-operator (#3962)
  • [b65e4a] [Helm] Use helm-docs to generate README for chart api-server automatically (#3916)
  • [a099da] [Helm] Use helm-docs to generate README for chart ray-cluster automatically (#3887)
  • [6db864] [Helm] add sizeLimit for emptyDir (#2532)
  • [cde251] [Helm][RBAC] Introduce the option crNamespacedRbacEnable to enable or disable the creation of Role/RoleBinding for RayCluster preparation (#1162)
  • [831b55] [Helm][ray-cluster] Fix parsing envFrom field in additionalWorkerGroups (#1039)
  • [3a925f] [Hotfix] Extend Autoscaler e2e tests timeout (#3665)
  • [981c94] [Hotfix] Increase the timeout of the ProxyActor health check (#2082)
  • [165291] [Hotfix][Bug] Avoid unnecessary zero-downtime upgrade (#1581)
  • [1fe5ae] [Hotfix][Bug] suspend is not a stateless operation (#1741)
  • [9ad6b1] [Hotfix][CI] Pin setup-envtest dep (#2038)
  • [00dc45] [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000)
  • [12b9df] [Kueue] Add a sample YAML for Kueue toy sample (#1956)
  • [4fc179] [Logging] Avoid using fmt.Sprintf inside logging functions (#2508)
  • [df0386] [Logging] Remove duplicate info in CR logs (#2531)
  • [23b08e] [Logging] add context info for yunikorn logger (#2522)
  • [105e88] [Metric] kuberay_job_deployment_status (#3656)
  • [d40692] [Metrics] Remove serviceMonitor.yaml (#3795)
  • [fdd4bd] [Minor] Remove redundant variable (#2281)
  • [7905bc] [N/N][Lint] Group imports by sections (#3454)
  • [077529] [Nit] Remove redundant code snippet (#1810)
  • [f77ee0] [Perf] Add NUM_WORKERS and CPUS_PER_WORKER env to the mnist workload (#2126)
  • [afab55] [Perf] Add a CPU-based image resizing workload using Ray Data (#2135)
  • [c099de] [Perf] Add a CPU-based training workload (#2116)
  • [b5f237] [Perf] Improve perf-test YAMLs and README (#2110)
  • [c83b1b] [Post Ray 2.2.0 Release] Update Ray versions to Ray 2.2.0 (#822)
  • [8da54d] [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925)
  • [473dfd] [Post Ray 2.4 Release] Update Ray versions to Ray 2.4.0 (#1049)
  • [cc4155] [Post Ray 2.7.0 Release] Update Ray versions to Ray 2.7.0 (#1423)
  • [666679] [Post Ray 2.8.0 Release] Update Ray versions to Ray 2.8.0 (#1678)
  • [df3cc3] [Post release v0.5.0] Remove block from rayStartParams (#1015)
  • [ba814e] [Post release v0.5.0] Remove block from rayStartParams for python client and KubeRay operator tests (#1050)
  • [67a0f4] [Post release v0.5.0] Remove serviceType (#1013)
  • [4234e5] [Post release v0.5.0] Update CHANGELOG.md (#1026)
  • [dfc197] [Post release v0.5.0] Update release doc (#1028)
  • [72d1c2] [Post release v0.6.0] Update CHANGELOG.md (#1274)
  • [ded945] [Post v0.5.0] Remove init containers from YAML files (#1010)
  • [74496a] [Post v1.0.0-rc.1] Reenable sample YAML tests for latest release and update some docs (#1544)
  • [4eed01] [Post v1.1.0] Run the sample YAML tests with KubeRay v1.1.0 (#2039)
  • [022ff0] [Prometheus] Add kuberay_cluster_provisioned_duration_seconds metric (#3212)
  • [240910] [Prometheus] Add kuberay_cluster_info metric (#3535)
  • [f7102b] [Prometheus] Add serviceMonitor for KubeRay Operator (#3530)
  • [6cefc4] [Prometheus] Refactor kuberay_cluster_provisioned_duration_seconds (#3497)
  • [238cb4] [Quay] Sanity check for KubeRay repository setup (#1300)
  • [fb1463] [REFACTOR]: refactor execute pod cmd with client-go function (#2467)
  • [eef1d8] [Ray 2.3.0] Update --redis-password for RayCluster (#929)
  • [827814] [Ray 2.9.0 Release] Update Ray versions from 2.8.0 to 2.9.0 (#1770)
  • [f652d5] [Ray Observability] Disk usage in Dashboard (#1152)
  • [7d0eae] [Ray-operator] Feature flag login bash (#3679)
  • [d4784a] [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841)
  • [d1eeaa] [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850)
  • [944b60] [RayCluster] Add multi-host indexing labels (#3998)
  • [ffac34] [RayCluster] Add serviceName to status.headInfo (#2089)
  • [6463f2] [RayCluster] IsAutoscalingEnabled takes RayClusterSpec (#3111)
  • [13df01] [RayCluster] Make headpod name back to non-deterministic (#3872)
  • [b9d8b1] [RayCluster] Make headpod name deterministic (#3028)
  • [04f5b7] [RayCluster] Toggle usage of deterministic/non-deterministic head pod name with feature flag (#3873)
  • [d169f5] [RayCluster] Update sample yamls to use the new gcsFaultToleranceOptions option (#2856)
  • [829aad] [RayCluster] Validate GCSFaultToleranceOptions and redis password (#2754)
  • [eba145] [RayCluster] Validate RayClusterSpec for empty containers and GCS FT (#2749)
  • [8d60d6] [RayCluster] don't allow overriding ray.io/cluster label (#2555)
  • [8feef9] [RayCluster] e2e test for GCS FT with Redis Username (#2855)
  • [9bd31c] [RayCluster] grant pods and pods/resize patch permissions for IPPR (#3960)
  • [28c729] [RayCluster] improve generated pod names for Ray clusters
  • [a78896] [RayCluster] support suspending worker groups (#2663)
  • [f11a1f] [RayCluster] yunikorn batchscheduler respect gang scheduling (#4075)
  • [94636b] [RayCluster]Upgrade volcano to 1.11.0 (#3159)
  • [836248] [RayCluster][CI] add e2e tests for RayClusterStatusCondition (#2661)
  • [c62910] [RayCluster][CI] add e2e tests for the RayClusterSuspended status condition (#2686)
  • [801f08] [RayCluster][Expectation] Add a test to ensure expectations work well during scaling down (#3543)
  • [c2f382] [RayCluster][Feature] Make RayClusterStatusConditions feature gate Beta and enabled by default (#2562)
  • [7a768f] [RayCluster][Feature] add GcsFaultToleranceOptions to the RayCluster CRD [1/N] (#2715)
  • [991b9c] [RayCluster][Feature] add redis password to head pod from GcsFaultToleranceOptions (#2731)
  • [0055bf] [RayCluster][Feature] add redis username to head pod from GcsFaultToleranceOptions (#2760)
  • [7bb82d] [RayCluster][Feature] reject redis username to head pod out side of GcsFaultToleranceOptions (#2796)
  • [82e255] [RayCluster][Feature] setup GCS FT annotations and the RAY_REDIS_ADDRESS env by the GcsFaultToleranceOptions (#2721)
  • [d86ea6] [RayCluster][Feature] skip suspending worker groups if the in-tree autoscaler is enabled (#2748)
  • [a4d7dd] [RayCluster][Fix] Add expectations of RayCluster (#2150)
  • [42f299] [RayCluster][Fix] DesiredReplicas, MinReplicas and MaxReplicas should respect workerGroupSpec.Suspend (#2728)
  • [6c1c16] [RayCluster][Fix] evicted head-pod can be recreated or restarted (#2217)
  • [b5bcb8] [RayCluster][Fix] leave .Status.State untouched when there is a reconcile error (#2622)
  • [ae880c] [RayCluster][Refactor] use RayClusterAllPodsAssociationOptions instead (#2756)
  • [17809b] [RayCluster][Status][1/n] Remove ClusterState Unhealthy (#2068)
  • [4da183] [RayJob] Add Cluster Name For Rayjob. (#2046)
  • [8f0619] [RayJob] Add Failure Feedback (log and event) for Failed k8s Creation Task (#2306)
  • [fe981a] [RayJob] Add JobDeploymentStatusFailed Status and Reason Field to Enhance Observability for Flyte/RayJob Integration (#1942)
  • [1f44bd] [RayJob] Add Tests for Atomic Suspend Operation (#2050)
  • [bb5b78] [RayJob] Add RayJobInfo to RayJob CRD status (#3673)
  • [f53b42] [RayJob] Add additional print columns for RayJob (#1895)
  • [775715] [RayJob] Add default CPU and memory for job submitter pod (#1319)
  • [738642] [RayJob] Add e2e sample yaml test for shutdownAfterJobFinishes (#1269)
  • [aa1736] [RayJob] Add field to expose entrypoint num cpus in rayjob (#1359)
  • [5de4a4] [RayJob] Add runtime env YAML field (#1338)
  • [8682b2] [RayJob] Add spec.backoffLimit for retrying RayJobs with new clusters (#2192)
  • [9a4de5] [RayJob] ClusterSelector shouldn't support SidecarMode (#4074)
  • [b33642] [RayJob] Deflaky RayJob e2e tests (#2963)
  • [af4f6a] [RayJob] Delete the Kubernetes Job and its Pods immediately when suspending (#1791)
  • [9382c1] [RayJob] Enable job log streaming by setting PYTHONUNBUFFERED in job container (#1375)
  • [58a3ff] [RayJob] Enhance RayJob DeletionStrategy to Support Multi-Stage Deletion (#4040)
  • [528abc] [RayJob] Fix RayJob status reconciliation (#1539)
  • [370fc4] [RayJob] Follow up of RayJob deletion policy PR (#2763)
  • [edfc34] [RayJob] Improve dashboard client log (#1903)
  • [4fb457] [RayJob] Inject RAY_SUBMISSION_ID env variable for user provided submitter template (#1868)
  • [91fcd3] [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943)
  • [f191a7] [RayJob] RayJob deletion policy validation (#2771)
  • [931f97] [RayJob] Refactor Rayjob E2E Tests to Use Server-Side Apply (#1927)
  • [34a8d9] [RayJob] Rewrite RayJob envtest (#1916)
  • [2281d9] [RayJob] Set missing CPU limit (#1899)
  • [f9b2cb] [RayJob] Set the timeout of the HTTP client from 2 mins to 2 seconds (#1910)
  • [7639b9] [RayJob] Sidecar Mode (#3971)
  • [2583d8] [RayJob] Submit job using K8s job instead of checking Status and using DashboardHTTPClient (#1177)
  • [007412] [RayJob] Support ActiveDeadlineSeconds (#1933)
  • [f5d713] [RayJob] Support deletion policies based on job status (#3731)
  • [c78c75] [RayJob] Transition to Complete if the JobStatus is STOPPED (#1855)
  • [6b027b] [RayJob] Unified checkBackoffLimitAndUpdateStatusIfNeeded codepath and add an e2e test for retry (#2215)
  • [55a668] [RayJob] UserMode -> InteractiveMode and check rayjob.spec.jobId instead of annotation (#2446)
  • [024aae] [RayJob] Validate RayJob spec (#1813)
  • [608768] [RayJob] Validate whether runtimeEnvYAML is a valid YAML string (#1898)
  • [c9fa01] [RayJob] Yunikorn Integration (#3948)
  • [d0f1c3] [RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay (#1361)
  • [7f15e1] [RayJob] add Failing RayJob in HTTPMode e2e test for rayjob with retry (#2242)
  • [27b1dc] [RayJob] add Failing submitter K8s Job e2e test for rayjob with retry (#2226)
  • [bd33d5] [RayJob] add Light-weight RayJob Submitter (#3943)
  • [1efaf6] [RayJob] add RayJob pass Deadline e2e-test with retry (#2241)
  • [7e04b2] [RayJob] allow create verb for services/proxy, which is required for HTTPMode (#2321)
  • [010630] [RayJob] avoid RayCluster resource leak in k8s job mode(#3903) (#4080)
  • [0544f8] [RayJob] implement deletion policy API (#2643)
  • [72a776] [RayJob] remove redundant RayJob status-transition logs in reconciler (#3976)
  • [1c5f3e] [RayJob]: Add RayJob with RayCluster spec e2e test (#1636)
  • [631cd7] [RayJob]: Always use target RayCluster image as default RayJob submitter image (#1548)
  • [b0fee8] [RayJob][10/n] Add finalizer to the RayJob when the RayJob status is JobDeploymentStatusNew (#1780)
  • [c0b6b0] [RayJob][Chore] make err as a local variable (#2789)
  • [0274fa] [RayJob][Doc] Fix RayJob sample config. (#807)
  • [bcc8c0] [RayJob][Fix] Use --no-wait for job submission to avoid carrying the error return code to the log tailing (#3216)
  • [c45d95] [RayJob][Kueue] Move limitation check to validateRayJobSpec (#1854)
  • [0ed5e7] [RayJob][Refactor] use ray job status and ray jog lobs to be tolerant of duplicated job submissions (#2579)
  • [ce4ec2] [RayJob][Status][1/n] Redefine the definition of JobDeploymentStatusComplete (#1719)
  • [4ff389] [RayJob][Status][11/n] Refactor the suspend operation (#1782)
  • [349068] [RayJob][Status][12/n] Resume suspended RayJob (#1783)
  • [448e33] [RayJob][Status][13/n] Make suspend operation atomic by introducing the new status Suspending (#1798)
  • [a9c7ab] [RayJob][Status][14/n] Decouple the Initializing status and Running status (#1801)
  • [f65466] [RayJob][Status][15/n] Unify the codepath for the status transition to Suspended (#1805)
  • [83327f] [RayJob][Status][16/n] Refactor Running status (#1807)
  • [1eed06] [RayJob][Status][17/n] Unify the codepath for status updates (#1814)
  • [c55f3c] [RayJob][Status][18/n] Control the entire lifecycle of the Kubernetes submitter Job using KubeRay (#1831)
  • [d5d7e5] [RayJob][Status][19/n] Transition to Complete if the K8s Job fails (#1833)
  • [ba4203] [RayJob][Status][2/n] Redefine ready for RayCluster to avoid using HTTP requests to check dashboard status (#1733)
  • [8760d9] [RayJob][Status][3/n] Define JobDeploymentStatusInitializing (#1737)
  • [62bbc1] [RayJob][Status][4/n] Remove some JobDeploymentStatus and updateState function calls (#1743)
  • [1594e8] [RayJob][Status][5/n] Refactor getOrCreateK8sJob (#1750)
  • [d49a7a] [RayJob][Status][6/n] Redefine JobDeploymentStatusComplete and clean up K8s Job after TTL (#1762)
  • [59503c] [RayJob][Status][7/n] Define JobDeploymentStatusNew explicitly (#1772)
  • [cac764] [RayJob][Status][8/n] Only a RayJob with the status Running can transition to Complete at this moment (#1774)
  • [6af407] [RayJob][Status][9/n] RayJob should not pass any changes to RayCluster (#1776)
  • [3e64de] [RayJob][Test] make sure annotation populated to RayCluster (#3199)
  • [7f33c1] [RayJob][Test] refactor TestValidateRayJobSpec with table test (#3223)
  • [834aed] [RayService] Add New Status: NumServeEndpoints (#1901)
  • [bbb65b] [RayService] Add RayService High Availability Test Doc (#1986)
  • [e062d0] [RayService] Add RayService alb ingress CR (#1169)
  • [1c276f] [RayService] Add a safeguard and remove the dead code to ensure that both clusters are not empty before reconciling serve (#2778)
  • [c13949] [RayService] Add an envtest for RayService happy path (#2868)
  • [c80779] [RayService] Add an envtest for autoscaler (#2872)
  • [62faf2] [RayService] Add checks of RayService conditions in e2e tests (#2864)
  • [8db4f6] [RayService] Add e2e tests (#1167)
  • [0ee398] [RayService] Add logs and remove in-place update for the TestOldHeadPodFailDuringUpgrade e2e test (#2819)
  • [44c0d5] [RayService] Add support for multi-app config in yaml-string format (#1156)
  • [77a102] [RayService] Add unit tests for isZeroDowntimeUpgradeEnabled (#2871)
  • [355de9] [RayService] Add zero-downtime triggered test after rayVersion is updated (#2881)
  • [de3e03] [RayService] Address Recent Flakiness in RayService Zero Downtime Rollout Test (#1979)
  • [a6cf6e] [RayService] Allow updating WorkerGroupSpecs without rolling out new cluster (#1734)
  • [f46f32] [RayService] Always check the readiness of head Pods for both pending / active clusters if cluster exists (#2783)
  • [edd332] [RayService] Avoid Duplicate Serve Service (#1867)
  • [1a4254] [RayService] Avoid passing RayServiceStatus to functions in reconcileServe (#2828)
  • [f93296] [RayService] Avoid sending health check requests to the head Pod when excludeHeadPodFromServeSvc is true (#2776)
  • [80cab4] [RayService] Calculate status based on K8s resources (#2818)
  • [47d55f] [RayService] Change runtime env for e2e autoscaling test (#1178)
  • [850fd4] [RayService] Compare cached hashed config before triggering update (#655)
  • [5a5534] [RayService] Create k8s events after creating/updating k8s resources (#2873)
  • [41ee4d] [RayService] Deflaky RayService envtest (#2962)
  • [2970e3] [RayService] Deprecate the built-in ingress support of RayService (#1843)
  • [a03f72] [RayService] Fixed issue where the custom serve port is not reflected in the serve health check for worker Pods (#1816)
  • [deb29b] [RayService] Ignore deployments status to decide whether to deploy serve application (#1014)
  • [33ee67] [RayService] Mark ServiceStatus as deprecated (#2863)
  • [18bee5] [RayService] Merge initConditions into calculateConditions (#2866)
  • [0143fe] [RayService] More envtests that follow the most common scenario in the RayService code path (#2880)
  • [96a2ce] [RayService] Move HTTP Proxy's Health Check to Readiness Probe for Wokers (#1808)
  • [81d760] [RayService] Move cleanUpRayClusterInstance from reconcileRayCluster to Reconcile (#2838)
  • [e1bee8] [RayService] Move the cluster switch logic from reconcileServe to Reconcile (#2777)
  • [495c0a] [RayService] Move the update of RayClusterStatus to calculateStatus (#2826)
  • [a31e09] [RayService] Passing serve applications to calculateStatus and avoid calling Status().Update(...) inside reconcileServe (#2831)
  • [5b8e9c] [RayService] Refactor createRayClusterInstance (#2874)
  • [126598] [RayService] Refactor reconcileRayCluster to avoid updating CR status in the function (#2859)
  • [bb3166] [RayService] Refactor updateRayClusterInstance (#2875)
  • [ab9344] [RayService] Refactor envtests (#2888)
  • [8e1b92] [RayService] Refactor fake http proxy client and test (#2636)
  • [e616dc] [RayService] Refactor to Rely More on RayService Status in RayService E2E Tests (#1928)
  • [bccb35] [RayService] Refactor unit tests for ShouldPrepareNewCluster (#2928)
  • [17a534] [RayService] Remove WaitForServeDeploymentReady (#2842)
  • [26cdac] [RayService] Remove HealthLastUpdateTime from ServeDeploymentStatus (#2825)
  • [9263dc] [RayService] Remove updateStatusForActiveCluster (#2827)
  • [9c9797] [RayService] Remove everything related to Ray Serve V1 API (#1790)
  • [019a6c] [RayService] Remove outdated env tests (#2886)
  • [3c080d] [RayService] Remove serve v1 API (#1779)
  • [9e4aa8] [RayService] Remove the dependencies between constructRayClusterForRayService and the reconciler to make it more unit testable (#2853)
  • [7ef565] [RayService] Rename Restarting to PreparingNewCluster (#2785)
  • [395950] [RayService] Revisit the conditions under which a RayService is considered unhealthy and the default threshold (#1293)
  • [2f8ee7] [RayService] Setting observedGeneration inside calculateStatus (#2869)
  • [0df4d8] [RayService] Skip update events without change (#811)
  • [c3f373] [RayService] Stable Diffusion example (#1181)
  • [b0649c] [RayService] Submit requests to the Dashboard after the head Pod is running and ready (#1074)
  • [2acc21] [RayService] Support Incremental Zero-Downtime Upgrades (#3166)
  • [794040] [RayService] Track whether Serve app is ready before switching clusters (#730)
  • [64da63] [RayService] Trim Redis Cleanup job less than 63 chars (#2846)
  • [7fd79f] [RayService] Unify multi-app and single-app codepath (#1787)
  • [46355e] [RayService] Unify the cluster switch over logic together (#2805)
  • [ecd153] [RayService] Update docs to use multi-app (#1179)
  • [25f787] [RayService] Use DashboardPort for RayService instead of DashboardAgentPort (#1742)
  • [f7cf95] [RayService] Use Ready condition in e2e tests (#2849)
  • [8ea39d] [RayService] Use Ready condition in e2e tests (#2854)
  • [4e912b] [RayService] Use original ClusterIP for new head service (#2343)
  • [9be883] [RayService] Use waitGroup to ensure goroutine completion in rayservice_ha_test (#2657)
  • [b753f1] [RayService] a safeguard for preventing overriding the pending cluster during a upgrade (#2887)
  • [f88b2f] [RayService] adapter vllm 0.6.1.post2 (#2823)
  • [b66763] [RayService] don't update serveConfigV2 in current ray cluster if ray… (#3559)
  • [a61267] [RayService] e2e for check the readiness of head Pods for both pending / active clusters (#2806)
  • [8f75ad] [RayService] e2e for redeploying RayServe application after recreating a new Head Pod (#2834)
  • [78d030] [RayService] fix kubebuilder printcolumn annotations for RayService (#1981)
  • [0056fb] [RayService] make RayClusterSpec required (#3169)
  • [19924c] [RayService] make checkIfNeedSubmitServeApplications more unit testable (#2822)
  • [e11fe5] [RayService] refactor envtest by adding a util function rayServiceTemplate (#2833)
  • [d64bf5] [RayService] reword the comment on ServiceStatus = rayv1.Running (#2848)
  • [2e8f53] [RayService][Bug] Serve Service May Select Pods That Are Actually Unready for Serving Traffic (#1856)
  • [19054c] [RayService][Doc] RayService troubleshooting handbook (#1221)
  • [73f4f2] [RayService][HA] Fix flaky tests (#1823)
  • [6c2281] [RayService][Health-Check][1/n] Offload the health check responsibilities to K8s and RayCluster (#1656)
  • [4557a0] [RayService][Health-Check][2/n] Remove the hotfix to prevent unnecessary HTTP requests (#1658)
  • [aa42f8] [RayService][Health-Check][3/n] Update the definition of HealthLastUpdateTime for DashboardStatus (#1659)
  • [07d14d] [RayService][Health-Check][4/n] Remove the health check for Ray Serve applications (#1660)
  • [584132] [RayService][Health-Check][5/n] Remove unused variable deploymentUnhealthySecondThreshold (#1664)
  • [ed56a9] [RayService][Health-Check][6/n] Remove ServiceUnhealthySecondThreshold (#1665)
  • [276776] [RayService][Health-Check][7/n] Remove LastUpdateTime from multiple places (#1666)
  • [c54c3d] [RayService][Health-Check][8/n] Add readiness / liveness probes (#1674)
  • [aad2fc] [RayService][Hotfix] Hotfix for Flaky Zero Downtime Rollout Test (#1837)
  • [dd7789] [RayService][Observability] Add actionable logging messages for users when they do not specify ports for Ray Serve (#1218)
  • [384a92] [RayService][Observability] Add more logging for RayService troubleshooting (#1230)
  • [45d3a4] [RayService][Observability] Add more loggings about networking issues (#1282)
  • [881008] [RayService][Refactor] Avoid flooding Kubernetes events (#2546)
  • [3c8904] [RayService][Refactor] Change the ServeConfigs to nested map (#2591)
  • [75dbbd] [RayService][Refactor] Remove ctrlResult (#2545)
  • [c62058] [RayService][Status][1/n] Remove DashboardStatus (#1839)
  • [0575bd] [RayService][Status][2/n] Remove WaitForDashboard (#1840)
  • [57c639] [RayService][Test] create curl pod waiting until running (#3740)
  • [594eaf] [RayService][Test] make sure annotation populated to RayCluster (#3210)
  • [c3b335] [RayService][Test] util for creating empty RayClusterSpec in test (#3182)
  • [39d145] [RayService][refactor] Remove updateState (#2705)
  • [da6b35] [Refactor] Add a util function IsAutoscalingEnabled and refactor validations of RayJob deletion policy (#2775)
  • [c81496] [Refactor] Define the value type of the concurrent map explicitly to avoid type conversion (#1789)
  • [c5d7de] [Refactor] Do not use RAYCLUSTER_DEFAULT_REQUEUE_SECONDS_ENV as timeout of status check in tests (#1755)
  • [b89882] [Refactor] Eliminate redundant range variable capture with Go 1.22 scoped iteration (#4044)
  • [7a9622] [Refactor] Encapsulate RayCluster metrics in a custom Prometheus collector (#3310)
  • [0b7290] [Refactor] Encapsulate RayJob metrics in a custom Prometheus collector (#3444)
  • [b875b8] [Refactor] Extract KubectlApplyYaml and yaml deserialization to support package (#2498)
  • [59ae10] [Refactor] Fix CreatedWorkerPod for worker Pod deletion event and refactor logs (#2346)
  • [f38951] [Refactor] Follow-up for PR 1930 (#2124)
  • [a616a4] [Refactor] Format API server Makefile for consistency (#3435)
  • [a83d3c] [Refactor] Improve API server developer experience (#3458)
  • [4ff831] [Refactor] Improve developer experience of API server e2e-test (#3466)
  • [4492fe] [Refactor] Make port name variables consistent and meaningful (#1389)
  • [7f02eb] [Refactor] Merge raycluster_gcs_ft_test.go and raycluster_gcsft_test.go (#3008)
  • [298539] [Refactor] Move ValidateRayJobStatus to validation.go and create its unit test (#2813)
  • [8c53bd] [Refactor] Move ValidateRayClusterSpec to validation.go and its unit test to validation_test.go (#2790)
  • [8dd249] [Refactor] Move validateRayClusterStatus function to validation.go and move unit test to validation_test.go (#2780)
  • [84f736] [Refactor] Move constant.go from common to utils to avoid circular dependency (#1726)
  • [3d1c6c] [Refactor] Move function ValidateRayJobSpec to validation.go and its unit test (#2812)
  • [28ab5c] [Refactor] Move functions that don’t rely on the controller to non-controller member functions (#2747)
  • [086702] [Refactor] Move test name from map key to struct field (#2865)
  • [5ccf36] [Refactor] Move validateRayServiceSpec to validation.go and its unit test to validation_test.go (#2816)
  • [3a1fed] [Refactor] Parameterize TestGetAndCheckServeStatus (#1450)
  • [b77582] [Refactor] RayJob Spec ClusterSelector validation logic (#4032)
  • [83104b] [Refactor] Refactor testRayJob global variable to avoid test side effects (#4017)
  • [3d533b] [Refactor] Remove Dashboard Agent service (#1207)
  • [374874] [Refactor] Remove any unnecessary logger (#1894)
  • [bafb00] [Refactor] Remove cleanupInvalidVolumeMounts (#2104)
  • [03eb92] [Refactor] Remove duplicate definition of get_ray_cluster_status (#3608)
  • [eee9d9] [Refactor] Remove global utils.GetRayXXXClientFuncs (#1727)
  • [500799] [Refactor] Rename EnableAgentService to EnableServeService (#1673)
  • [76889c] [Refactor] Rename raycluster_controller_fake_test.go to XXX_unit_test.go (#2074)
  • [4836d0] [Refactor] Renaming RayHttpProxyClient attribute UseProxy [#1980] (#2093)
  • [542f24] [Refactor] Replace Hard-Coded HTTP Values with Constants (#2702)
  • [0f2f44] [Refactor] Rewrite RayCluster envtest (#1949)
  • [dcc8b7] [Refactor] Run golangci-li...
Source: README.md, updated 2025-10-30