image-to-text

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Feb 3, 2026
Python

Flame-Code-VLM / Flame-Code-VLM

Star

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

react open-source front-end ai vue deep-learning frontend code-generation image-to-text vlm frontend-development multimodal data-synthesis design-to-code llm vision-language-model deepseek image-to-code screen-to-code

Updated Jan 27, 2026
Python

thanhkeke97 / RSTGameTranslation

Sponsor

Star

🎮 Real-time Game Translation Tool | OCR + AI Translation | Windows Gaming | Open Source

game translator ocr translation image-to-text ocr-recognition autotranslate game-translation

Updated Feb 5, 2026
C#

HorizonWind2004 / reconstruction-alignment

Star

[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

image-editing image-generation unified image-to-text text-to-image diffusion bagel understanding multimodal comfy aigc comfyui

Updated Feb 5, 2026
Python

zapolnoch / node-tesseract-ocr

Star

A Node.js wrapper for the Tesseract OCR API

ocr tesseract text-recognition image-to-text

Updated Jul 13, 2023
JavaScript

google / imageinwords

Star

Data release for the ImageInWords (IIW) paper.

evaluation dataset image-captioning dataset-generation image-to-text image-descriptions image-text human-annotation t2i i2t detailed-descriptions detailed-annotations

Updated Nov 17, 2024
JavaScript

shoryasethia / markdrop

Sponsor

Star

A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.

open-source pdf-to-text image-to-text marker agents pypi-package table-to-text markitdown llm pdf-to-markdown docling markdrop

Updated Jul 5, 2025
Python

Yushi-Hu / tifa

Star

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

image-to-text text-to-image visual-question-answering large-language-models

Updated Apr 29, 2024
Python

NormXU / nougat-latex-ocr

Star

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

image-to-text

Updated Sep 25, 2024
Python

yardstick17 / image_text_reader

Star

The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.

ocr tesseract-ocr image-to-text image-reader read-image ocr-text-reader

Updated Apr 3, 2019
Python

nateshmbhat / card-scanner-flutter

Star

A flutter package for Fast, Accurate and Secure Credit card & Debit card scanning

dart card-scanning ai image-processing ml credit-card flutter image-to-text debit-card credit-card-scaning image-re card-scanner card-scanner-library

Updated Nov 5, 2025
Swift

BEPb / image_to_ascii

Star

Everything is very simple: you either download a picture file or specify its link when running a python script, and output you get a text file, and you can immediately view on the command line how it will look the result of your conversion.

python converter script convert conversion python3 cmd image-to-text py cmdline video-to-text image-to gif-to-ascii

Updated Apr 13, 2023
Python

NanoNets / ocr-python

Star

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

python pdf ocr tesseract pdf-to-text image-to-text textract pdf-to-csv pdf-to-json searchable-pdf pytesseract-ocr extract-table table-extract image-to-text-converter extract-text-from-image extract-text-from-pdf

Updated Dec 2, 2022
Jupyter Notebook

mshdabiola / NotePad

Sponsor

Star

Notepad is multi module Jetpack compose note taking app with sketch pad, voice recorder, image capturing app

kotlin android-app image-to-text modulization room-persistence-library room-database github-actions jetpack-compose hilt-dependency-injection kotin-coroutines

Updated Oct 27, 2025
Kotlin

MIMICLab / L-Verse

Star

L-Verse: Bidirectional Generation Between Image and Text

deep-learning pytorch transformer image-captioning image-to-text text-to-image vq-vae pytorch-lightning l-verse

Updated Apr 1, 2025
Python

Cross2pro / DeepSeek-OCR-Dashboard

Star

An out-of-the-box local Web UI for DeepSeek-OCR. Built with FastAPI + Vue.js, it supports PDF/Image uploads, progress tracking, and result visualization with bounding boxes. Easily experience the power of a top-tier OCR model.

ocr computer-vision text-recognition research-tool image-to-text optical-character-recognition document-analysis multimodal math-ocr latex-ocr llm large-language-model pdf-ocr deepseek deepseek-ocr ocr-webservice

Updated Dec 6, 2025
Python

untrix / im2latex

Star

Solution to im2latex request for research of openai

machine-learning computer-vision deep-learning neural-network tensorflow generative-model sequence-to-sequence image-to-text ocr-recognition encoder-decoder im2latex image-to-markup

Updated Apr 23, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the image-to-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the image-to-text topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-to-text

Here are 361 public repositories matching this topic...

thiagoalessio / tesseract-ocr-for-php

killkimno / MORT

lucidrains / CoCa-pytorch

PaddlePaddle / PaddleMIX

Flame-Code-VLM / Flame-Code-VLM

thanhkeke97 / RSTGameTranslation

HorizonWind2004 / reconstruction-alignment

zapolnoch / node-tesseract-ocr

google / imageinwords

shoryasethia / markdrop

Yushi-Hu / tifa

NormXU / nougat-latex-ocr

yardstick17 / image_text_reader

nateshmbhat / card-scanner-flutter

BEPb / image_to_ascii

NanoNets / ocr-python

mshdabiola / NotePad

MIMICLab / L-Verse

Cross2pro / DeepSeek-OCR-Dashboard

untrix / im2latex

Improve this page

Add this topic to your repo