Explore projects
-
MobileTransformers is a lightweight, modular framework based on ONNX Runtime for running and adapting large language models (LLMs) directly on mobile and edge devices. It supports on-device fine-tuning (PEFT), efficient inference, quantization, weight merging, and direct inference from merged models. It includes advanced generation techniques like Retrieval-Augmented Generation (RAG) with vector databases and KV-cache with embedding reuse. The framework also provides export scripts for converting custom Huggingface SLM/LLM for on-device deployment with custom PEFT methods.
Updated