A powerful tool for creating datasets for LLM fine-tuning
This dataset code generates mathematical question and answer pairs
Photorealistic Synthetic Dataset for Holistic Indoor Scene
Unsplash images made available for research and machine learning
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus
Passport Index 2023: visa requirements for 199 countries, in .csv
ExDARK dataset is the largest collection of low-light images
The first large-scale public benchmark dataset for image harmonization
Framework to easily create LLM powered bots over any dataset
GeoIP lookup over DAG-CBOR dataset loaded from IPFS
Hub of ready-to-use datasets for ML models
Fluid, elastic data abstraction and acceleration for BigData/AI apps
Julia implementation of Parquet columnar file format reader
Dataset Management Framework, a Python library and a CLI tool to build
Tooling for the Common Objects In 3D dataset
Data and tools for generating and inspecting OLMo pre-training data
Unified open dataset enabling cross-embodiment learning for robotics
A dataset consists of 15,140 ChatGPT prompts from Reddit
Automatically find issues in image datasets
A tool for semi-automatic cell type classification, harmonization
Image polygonal annotation with Python
Easily turn large sets of image urls to an image dataset
Import public NYC taxi and for-hire vehicle (Uber, Lyft)
Synthetic data curation for post-training and data extraction
Save and load data in the HDF5 file format from Julia