Showing 187 open source projects for "text encoding"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • 1
    Text Encoding Initiative

    Text Encoding Initiative

    TEI produces the TEI Guidelines and associated software

    The TEI is an international and interdisciplinary standard used by libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Tiktoken

    Tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models

    tiktoken is a high-performance, tokenizer library (based on byte-pair encoding, BPE) designed for use with OpenAI’s models. It handles encoding and decoding text to token IDs efficiently, with minimal overhead. Because tokenization is a fundamental step in preparing text for models, tiktoken is optimized for speed, memory, and correctness in model contexts (e.g. matching OpenAI’s internal tokenization). The repo supports multiple encodings (e.g.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Endroid QR Code

    Endroid QR Code

    QR Code Generator

    Endroid QR Code is a PHP library that allows developers to generate QR codes with customizable parameters. It supports creating QR codes in various formats, including PNG and SVG, and offers options for encoding URLs, text, or other data. The library is flexible and easy to integrate into applications that require QR code generation, such as ticketing systems or payment gateways.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    cryptii

    cryptii

    Web app and framework offering modular conversion and encoding

    Web app and framework offering modular conversion, encoding and encryption. Translations are done client-side without any server interaction. This framework and web app aims to support a wide variety of ciphers, formats, algorithms and methods (called 'Bricks') while keeping them easily combinable. There are currently two types of Bricks: Encoders and Viewers. Encoders manipulate content by encoding or decoding in a specific way and using specific settings while Viewers allow users to access and edit the content fed into or outputted by Encoders in a certain way and format. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    httpexpect

    httpexpect

    End-to-end HTTP and REST API testing for Go

    ...Basically, httpexpect is a set of chainable builders for HTTP requests and assertions for HTTP responses and payload, on top of net/http and several utility packages. URL path construction, with simple string interpolation provided by go-interpol package. URL query parameters (encoding using go-querystring package). Headers, cookies, payload: JSON, urlencoded or multipart forms (encoding using form package), plain text. Custom reusable request builders and request transformers. Type-specific assertions, supported types: object, array, string, number, boolean, null, datetime. Regular expressions. Simple JSON queries (using subset of JSONPath), provided by jsonpath package. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    PHP QR Code

    PHP QR Code

    A PHP QR Code generator and reader with a user-friendly API

    chillerlan/php-qrcode is a modern, flexible PHP library for generating QR codes. It supports various customization options such as size, encoding, error correction, and logo embedding. The library is PSR-compliant and built for ease of use in modern PHP projects, making it suitable for generating QR codes in web apps, receipts, and authentication systems.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    Ksoup

    Ksoup

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    ...It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. It also understands both classic cross-reference tables and newer cross-reference streams, including PDF 1.5+ features, and it offers configurable strict vs permissive error handling depending on whether you prioritize correctness or robustness.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into token IDs and decode tokens back into text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    SentencePiece

    SentencePiece

    Unsupervised text tokenizer for Neural Network-based text generation

    SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ...By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on. In other words, use ripgrep if you like speed, filtering by default, fewer bugs and Unicode support.
    Downloads: 66 This Week
    Last Update:
    See Project
  • 12
    Render

    Render

    Go package for easily rendering JSON, XML, binary data, and HTML

    ...XML: Uses the encoding/xml package to marshal data into an XML-encoded response. Binary data: Passes the incoming data straight through to the HTTP.ResponseWriter. Text: Passes the incoming string straight through to the http.ResponseWriter. Render comes with a variety of configuration options. By default Render will attempt to load templates with a '.tmpl' extension from the "templates" directory.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    International Components for Unicode

    International Components for Unicode

    The home of the ICU project source code

    ...ICU is released under a nonrestrictive open-source license that is suitable for use with both commercial software and with other open-source or free software. Convert text data to or from Unicode and nearly any other character set or encoding. ICU's conversion tables are based on charset data collected by IBM over the course of many decades and is the most complete available anywhere. Compare strings according to the conventions and standards of a particular language, region or country. ICU's collation is based on the Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository, a comprehensive source for this type of data.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 14
    MIME Component

    MIME Component

    Allows manipulating MIME messages

    ...It is commonly used for handling email content and attachments in Symfony applications. The component supports building complex email structures, including multi-part messages, and correctly encoding text and binary data. Mime is an essential part of Symfony Mailer but can be utilized independently to handle MIME message generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual tokenizers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Nette Utility Classes

    Nette Utility Classes

    Lightweight utilities for string & array manipulation, image handling

    In package nette/utils you will find a set of useful classes for everyday use. Lightweight utilities for string & array manipulation, image handling, safe JSON encoding/decoding, validation, slug or strong password generating etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Yjs

    Yjs

    Shared data types for building collaborative software

    Yjs is a high-performance, open-source CRDT (Conflict-free Replicated Data Type) implementation for building collaborative, real-time applications. It enables multiple users to edit shared data structures—such as text documents, arrays, maps, and XML trees—synchronously and offline. Yjs is network-agnostic and works with WebRTC, WebSocket, or any other transport layer, making it ideal for collaborative editors, whiteboards, and design tools. Its compact updates and powerful reconciliation...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Kafdrop

    Kafdrop

    Kafka Web UI

    Kafdrop is a web UI for viewing Kafka topics and browsing consumer groups. The tool displays information such as brokers, topics, partitions, and consumers, and lets you view messages. This project is a reboot of Kafdrop 2.x, dragged kicking and screaming into the world of Java 17+, Kafka 2.x, Helm and Kubernetes. It's a lightweight application that runs on Spring Boot and is dead-easy to configure, supporting SASL and TLS-secured brokers.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 20
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Loro

    Loro

    Make your JSON data collaborative and version-controlled with CRDTs

    loro is a high-performance CRDT (Conflict-free Replicated Data Type) engine designed for building collaborative applications that sync in real time across multiple peers or devices. Written in Rust, loro is designed to be compact, fast, and embeddable in a wide range of environments, from desktop to mobile to web via WebAssembly. Its architecture supports multiple data types like text, maps, and lists, and it offers automatic conflict resolution with minimal data overhead. It is ideal for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Redpanda Console

    Redpanda Console

    Redpanda Console is a developer-friendly UI for managing your workload

    ...Explore your topics' messages in our message viewer through ad-hoc queries and dynamic filters. Find any message you want using JavaScript functions to filter messages. Supported encodings are JSON, Avro, Protobuf, XML, MessagePack, Text and Binary (hex view). The used encoding (except Protobuf) is recognized automatically. Redpanda is a Kafka®-compatible streaming data platform that is proven to be 10x faster and 6x lower in total costs. It is also JVM-free, ZooKeeper®-free, Jepsen-tested, and source available. A single binary with built-in everything. No ZooKeeper®. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    eslint-plugin-unicorn

    eslint-plugin-unicorn

    More than 100 powerful ESLint rules

    More than 100 powerful ESLint rules. You might want to check out XO, which includes this plugin. Each rule has emojis denoting if it belongs to the recommended configuration if some problems reported by the rule are automatically fixable by the --fix command line option, or if some problems reported by the rule are manually fixable by editor suggestions. Use a preset config or configure each rules in package.json.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Mini QR

    Mini QR

    Create & scan cute qr codes easily

    Mini QR is a web app focused on making QR codes feel friendly and design-forward, combining a polished QR generator with a built-in scanner so you can both create and decode codes in the same place. It emphasizes customization so the QR you generate can match a brand, event theme, or personal style, including color and styling controls, framed layouts with labels, and the ability to add a logo image. Because QR reliability matters as much as looks, it exposes practical settings like error...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB