clip-vit-large-patch14-336 download

clip-vit-large-patch14-336 is a vision-language model developed by OpenAI as part of the CLIP (Contrastive Language–Image Pre-training) family. It uses a Vision Transformer (ViT) backbone with 14×14 patch size and 336×336 image resolution to learn joint representations of images and text. Though detailed training data is undisclosed, the model was trained from scratch and enables powerful zero-shot classification by aligning visual and textual features in the same embedding space. Users can apply this model to perform tasks like zero-shot image recognition, image search with text, or text generation from visual cues—without task-specific training.

Features

Vision Transformer architecture with 336×336 input resolution
Supports zero-shot image classification and retrieval
Joint image-text embedding space for multi-modal tasks
Compatible with Hugging Face Transformers and PyTorch
Fine-tunable for domain-specific vision-language tasks
Base for many fine-tuned adapters and visual apps

Project Samples

Project Activity

See All Activity >

Follow clip-vit-large-patch14-336

clip-vit-large-patch14-336 Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of clip-vit-large-patch14-336!

Additional Project Details

Registered

2025-07-01

Similar Business Software

Seaweed

Seaweed is a foundational AI model for video generation developed by ByteDance. It utilizes a diffusion transformer architecture with approximately 7 billion parameters, trained on a compute equivalent to 1,000 H100 GPUs. Seaweed learns world representations from vast multi-modal data, including...

See Software
GPT-4

GPT-4 (Generative Pre-trained Transformer 4) is a large-scale unsupervised language model, yet to be released by OpenAI. GPT-4 is the successor to GPT-3 and part of the GPT-n series of natural language processing models, and was trained on a dataset of 45TB of text to produce human-like text...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software