Open Source Alternatives to AWS SageMaker

ALTERNATIVES · OPEN SOURCE

8.0kstars598forks23contributors114issuesLast commit 3y ago

Cortex provides cloud infrastructure for deploying, managing, and scaling machine learning models in production. It supports various workloads, including real-time, asynchronous, and batch processing, with automated cluster management and CI/CD integrations for seamless operation.

•Serverless workloads: Respond to requests in real-time and autoscale based on in-flight request volumes.

•Async processing: Handle requests asynchronously and autoscale based on request queue length.

•Batch processing: Execute distributed and fault-tolerant batch processing jobs on-demand.

•Cluster autoscaling: Scale clusters elastically with CPU and GPU instances.

•Spot instances: Run workloads on spot instances with automated on-demand backups.

•Environments: Create multiple clusters with different configurations.

•Provisioning: Provision clusters with declarative configuration or a Terraform provider.

•Metrics: Send metrics to any monitoring tool or use pre-built Grafana dashboards.

•Logs: Stream logs to any log management tool or use the pre-built CloudWatch integration.

•EKS: Cortex runs on top of EKS to scale workloads reliably and cost-effectively.

•VPC: Deploy clusters into a VPC on your AWS account to keep your data private.

•IAM: Integrate with IAM for authentication and authorization workflows.

•Model serving: Deploy machine learning models as real-time workloads and scale inference across CPU or GPU instances.

•MLOps: Create services that continuously retrain and evaluate models to maintain their accuracy over time.

•Microservices: Scale compute-intensive microservices without dealing with timeouts or resource limits.

•Image, video, and audio processing: Scale data processing pipelines to handle large structured or unstructured data sets.

Cortex is built for AWS, leveraging EKS, VPC, and IAM to ensure reliable, secure, and scalable machine learning applications. Its comprehensive feature set makes it an invaluable tool for managing machine learning operations at scale.

3.6kstars243forks80contributors106issuesLast commit 1y ago

Ploomber is a powerful platform designed to help developers build and deploy enterprise-grade data applications with ease. It allows you to develop iteratively and deploy anywhere, providing a seamless experience from development to production. With Ploomber, you can secure your applications, use any framework, and gain valuable insights through real-time analytics, all while ensuring enterprise-level security and performance.

•Enterprise Authentication: Add enterprise-grade authentication with a single click without modifying your app’s source code.

•Custom Domains: Serve your app from a custom domain or subdomain for a professional appearance and better brand alignment.

•Real-time Analytics: Access a real-time analytics dashboard to understand how customers are using your app.

•Framework Flexibility: Build with any major framework or use Docker for maximum flexibility.

•VPC Deployment: Enhance security with Virtual Private Cloud deployment capabilities.

•IP Whitelisting: Restrict access to your app by allowing only specific IP ranges.

•Static IP: Deploy your app with a static IP for secure database connections and more.

•Stripe Integration: Quickly integrate payment processing into your app without extra coding.

Ploomber offers an all-in-one solution for building and deploying data applications, ensuring you can focus on development while it takes care of the rest. Whether you’re deploying in the cloud or on-premises, Ploomber provides the tools and features necessary to create secure, scalable, and efficient data apps.

3.5kstars262forks33contributorsLast commit 1y ago

Towhee is an open-source machine learning pipeline that helps you encode your unstructured data into embeddings. It is dedicated to making neural data processing pipelines simple and fast, allowing you to focus on your core tasks without worrying about the complexities of data processing.

•Easy to Use: You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments.

•Various Modalities: From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities.

•Blazing Fast: We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding to model inference, making your pipeline execution 10x faster.

•SOTA Models: We provide 700+ pre-trained embedding models spanning 5 fields (CV, NLP, Multimodal, Audio, Medical), 15 tasks, and 140+ model architectures. These include BERT, CLIP, ViT, SwinTransformer, data2vec, etc.

•Fully Integrated with Ecosystems: Towhee provides out-of-the-box integration with your favorite libraries, tools, and frameworks, making development quick and easy.

•Pythonic API: Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.

Towhee is all you need to efficiently process and encode your unstructured data into useful embeddings, leveraging state-of-the-art models and seamless integration with existing tools.

AWS SageMaker alternatives.

Cortex

Ploomber

Zilliz's Towhee