Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Features

  • This project is licensed under the Apache-2.0 License
  • The simplest way to get started with the SageMaker Spark Container is to use the pre-built images via the SageMaker Python SDK
  • To get started building and testing the SageMaker Spark container, you will have to setup a local development environment
  • Many available SageMaker Spark Images
  • Build the pre-built container images that are used when running Spark jobs on Amazon SageMaker
  • It provides high-level APIs in Scala, Java, Python, and R

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow SageMaker Spark Container

SageMaker Spark Container Web Site

Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform Icon
Gemini 3 and 200+ AI Models on One Platform

Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of SageMaker Spark Container!

Additional Project Details

Programming Language

Python

Related Categories

Python Frameworks, Python Business Performance Management Software, Python Data Analytics Tool, Python Stream Processing Tool

Registered

2022-07-04