Import public NYC taxi and for-hire vehicle (Uber, Lyft)
An AI-powered data science team of agents
Simple tools for data cleaning in R
Links to everything you'd ever want to learn about data engineering
An end-to-end Data Scientist
Basic To Intermediate Python data science guide
Analytics for developers, setup Analytics in 30 seconds
CSV Lint plug-in for Notepad++ for syntax highlighting
ExtractThinker is a Document Intelligence library for LLMs
The open source mesh processing system
Clean Jupyter notebooks of outputs, metadata, and empty cells
FDUPES is a program for identifying or deleting duplicate files
Data and tools for generating and inspecting OLMo pre-training data
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB
Miller is like awk, sed, cut, join, and sort for name-indexed data
Java dataframe and visualization library
PandasAI is a Python library that integrates generative AI
Automated Tool for Optimized Modelling
Scan and remove junk files, caches, logs, and more
Scalable data pre processing and curation toolkit for LLMs
Big Model Application Development Practice 1
Master the essential skills needed to recognize and solve problems
Cleans HTML to avoid XSS attacks
A natural language interface for computers
Jupyter notebooks that walk you through the fundamentals of ML