Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pc commandline mls and full multimodal android app #3167

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@

[MNN Homepage](http://www.mnn.zone)

## News 🔥
- [2025/01/23] We released our full multimodal LLM Android App:[MNN-LLM-Android](./project/android/apps/MnnLlmApp/README.md). including text-to-text, image-to-text, audio-to-text, and text-to-image generation.
<p align="center">
<img width="20%" alt="Icon" src="./project/android/apps/MnnLlmApp/assets/image_home.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./project/android/apps/MnnLlmApp/assets/image_diffusion.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./project/android/apps/MnnLlmApp/assets/image_sound.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./project/android/apps/MnnLlmApp/assets/image_image.jpg" style="margin: 0 10px;">
</p>

## Intro
MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models and has industry-leading performance for inference and training on-device. At present, MNN has been integrated into more than 30 apps of Alibaba Inc, such as Taobao, Tmall, Youku, DingTalk, Xianyu, etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. In addition, MNN is also used on embedded devices, such as IoT.

Expand Down
89 changes: 89 additions & 0 deletions project/android/apps/MnnLlmApp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# MNN-LLM Android App
[中文版本](./README_CN.md)
## Introduction
This is our full multimodal language model (LLM) Android app

<p align="center">
<img width="20%" alt="Icon" src="./assets/image_home.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_diffusion.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_sound.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_image.jpg" style="margin: 0 10px;">
</p>


### Features

+ **Multimodal Support:** Enables functionality across diverse tasks, including text-to-text, image-to-text, audio-to-text, and text-to-image generation (via diffusion models).

+ **CPU Inference Optimization:** MNN-LLM demonstrates exceptional performance in CPU benchmarking in Android, achieving prefill speed improvements of 8.6x over llama.cpp and 20.5x over fastllm, with decoding speeds that are 2.3x and 8.9x faster, respectively. the following is a comparison between llama.cpp and MNN-LLM on Android inferencing qwen-7b.
<p align="center">
<img width="60%" src="./assets/compare.gif" style="margin: 0 10px;">
</p>

+ **Broad Model Compatibility:** Supports multiple leading model providers, such as Qwen, Gemma, Llama (including TinyLlama and MobileLLM), Baichuan, Yi, DeepSeek, InternLM, Phi, ReaderLM, and Smolm.

+ **Privacy First:** Runs entirely on-device, ensuring complete data privacy with no information uploaded to external servers.


# How to Use
+ you can download the app from [Releases](#releases) or [build it yourself](#development);
+ After installing the application, you can browse all supported models, download them, and interact with them directly within the app.;
+ Additionally, you can access your chat history in the sidebar and revisit previous conversations seamlessly.

!!!warning!!! This version has been tested exclusively on the OnePlus 13 and Xiaomi 14 Ultra, Due to the demanding performance requirements of large language models (LLMs), many budget or low-spec devices may experience issues such as slow inference speeds, application instability, or even failure to run entirely. and its stability on other devices cannot be guaranteed. If you encounter any issues, please feel free to open an issue for assistance.


# Development
+ Clone the repository:
```shell
git clone https://github.com/alibaba/MNN.git
```
+ Build library:
```shell
cd project/android
mkdir build_64
../build_64.sh "-DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_USE_LOGCAT=true -DMNN_OPENCL=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true -DLLM_SUPPORT_AUDIO=true -DMNN_BUILD_AUDIO=true -DMNN_BUILD_DIFFUSION=ON -DMNN_SEP_BUILD=ON"
```
+ copy to llm android app project
```shell
find . -name "*.so" -exec cp {} ../apps/MnnLlmApp/app/src/main/jniLibs/arm64-v8a/ \;
```
+ build android app project and install
```shell
cd ../apps/MnnLlmApp/
./gradlew installDebug
```

# Releases
## Version 0.1
+ Click here to [download](https://meta.alicdn.com/data/mnn/mnn_llm_app_debug_0_1.apk)
+ this is our first public released version; you can :
+ search all our supported models, download and chat with it in the app;
+ diffusion model:
+ stable-diffusion-v1-5
+ audio model:
+ qwen2-audio-7b
+ visual models:
+ qwen-vl-chat
+ qwen2-vl-2b
+ qwen2-vl-7b




# About MNN-LLM
MNN-LLM is a versatile inference framework designed to optimize and accelerate the deployment of large language models on both mobile devices and local PCs, addressing challenges like high memory consumption and computational costs through innovations such as model quantization, hybrid storage, and hardware-specific optimizations. In CPU benchmarking, MNN-LLM excels, achieving prefill speed boosts of 8.6x over llama.cpp and 20.5x over fastllm, complemented by decoding speeds that are 2.3x and 8.9x faster, respectively. In
GPU-based assessments, MNN-LLM’s performance slightly declines
compared to MLC-LLM, particularly when using Qwen2-7B with shorter prompts, due to MLC-LLM’s advantageous symmetric quantization technique. MNN-LLM excels, achieving up to 25.3x faster prefill and 7.1x faster decoding than llama.cpp, and 2.8x and 1.7x improvements over MLC-LLM, respectively.
For more detailed information, please refer to the paper:[MNN-LLM: A Generic Inference Engine for Fast Large LanguageModel Deployment on Mobile Devices](https://dl.acm.org/doi/pdf/10.1145/3700410.3702126)


# Acknowledgements
This project is built upon the following open-source projects:

+ [progress-dialog](https://github.com/techinessoverloaded/progress-dialog)
+ [okhttp](https://github.com/square/okhttp)
+ [retrofit](https://github.com/square/retrofit)
+ [Android-SpinKit](https://github.com/ybq/Android-SpinKit)
+ [expandable-fab](https://github.com/nambicompany/expandable-fab)
+ [Android-Wave-Recorder](https://github.com/squti/Android-Wave-Recorder)
86 changes: 86 additions & 0 deletions project/android/apps/MnnLlmApp/README_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# MNN 大模型 Android App

## 简介
这是我们的全功能多模态语言模型(LLM)安卓应用。

<p align="center">
<img width="20%" alt="Icon" src="./assets/image_home.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_diffusion.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_sound.jpg" style="margin: 0 10px;">
<img width="20%" alt="Icon" src="./assets/image_image.jpg" style="margin: 0 10px;">
</p>


### 功能亮点

+ **多模态支持:** 提供多种任务功能,包括文本生成文本、图像生成文本、音频转文本及文本生成图像(基于扩散模型)。

+ **CPU推理优化:** 在安卓平台上,MNN-LLM展现了卓越的CPU性能,预填充速度相较于llama.cpp提高了8.6倍,相较于fastllm提升了20.5倍,解码速度分别快了2.3倍和8.9倍。下图为 llama.cpp 与 MNN-LLM 与 llama.cpp 的比较。
<p align="center">
<img width="60%" src="./assets/compare.gif" style="margin: 0 10px;">
</p>


+ **广泛的模型兼容性:** 支持多种领先的模型提供商,包括Qwen、Gemma、Llama(涵盖TinyLlama与MobileLLM)、Baichuan、Yi、DeepSeek、InternLM、Phi、ReaderLM和Smolm。

+ **本地运行:** 完全在设备本地运行,确保数据隐私,无需将信息上传至外部服务器。


# 使用说明
您可以通过 [Releases](#releases) 下载应用,或者 自行构建(#开发)。
+ 安装应用后,您可以浏览所有支持的模型,下载所需模型,并直接在应用内与模型交互。
+ 此外,您可以通过侧边栏访问聊天历史,轻松查看和管理之前的对话记录。

!!!warning!!! 此版本目前仅在 OnePlus 13 和 小米 14 Ultra 上进行了测试。由于大型语言模型(LLM)对设备性能要求较高,许多低配置设备可能会遇到以下问题:推理速度缓慢、应用不稳定甚至无法运行。对于其他设备的稳定性无法保证。如果您在使用过程中遇到问题,请随时提交问题以获取帮助。


# 开发
+ 克隆代码库:
```shell
git clone https://github.com/alibaba/MNN.git
```
+ 构建库:
```shell
cd project/android
mkdir build_64
../build_64.sh "-DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_USE_LOGCAT=true -DMNN_OPENCL=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true -DLLM_SUPPORT_AUDIO=true -DMNN_BUILD_AUDIO=true -DMNN_BUILD_DIFFUSION=ON -DMNN_SEP_BUILD=ON"
```
+ 复制到 LLM Android 应用项目:
```shell
find . -name "*.so" -exec cp {} ../apps/MnnLlmApp/app/src/main/jniLibs/arm64-v8a/ \;
```
+ 构建 Android 应用项目并安装:
```shell
cd ../apps/MnnLlmApp/
./gradlew installDebug
```

# Releases
## Version 0.1
+ 点击这里[下载](https://meta.alicdn.com/data/mnn/mnn_llm_app_debug_0_1.apk)
+ 这是我们的首个公开发布版本,您可以:
+ 搜索我们支持的所有模型,在应用中下载并与其聊天;
+ 生成扩散模型:
+ stable-diffusion-v1-5
+ 音频模型:
+ qwen2-audio-7b
+ 视觉模型:
+ qwen-vl-chat
+ qwen2-vl-2b
+ qwen2-vl-7b




# 关于 MNN-LLM
MNN-LLM 是一个多功能的推理框架,旨在优化和加速大语言模型在移动设备和本地 PC 上的部署。通过模型量化、混合存储和硬件特定优化等创新措施,解决高内存消耗和计算成本等挑战。在 CPU 基准测试中,MNN-LLM 表现优异,其预填充速度比 llama.cpp 快 8.6 倍,比 fastllm 快 20.5 倍,同时解码速度分别快 2.3 倍和 8.9 倍。在基于 GPU 的评估中,由于 MLC-LLM 的对称量化技术优势,MNN-LLM 的性能在使用 Qwen2-7B 进行较短提示时略有下降。MNN-LLM 的预填充速度比 llama.cpp 快 25.3 倍,解码速度快 7.1 倍,相较于 MLC-LLM 也分别提高 2.8 倍和 1.7 倍。如需更详细的信息,请参考论文:[MNN-LLM: A Generic Inference Engine for Fast Large LanguageModel Deployment on Mobile Devices](https://dl.acm.org/doi/pdf/10.1145/3700410.3702126)


# 致谢
该项目基于以下开源项目:
+ [progress-dialog](https://github.com/techinessoverloaded/progress-dialog)
+ [okhttp](https://github.com/square/okhttp)
+ [retrofit](https://github.com/square/retrofit)
+ [Android-SpinKit](https://github.com/ybq/Android-SpinKit)
+ [expandable-fab](https://github.com/nambicompany/expandable-fab)
+ [Android-Wave-Recorder](https://github.com/squti/Android-Wave-Recorder)
1 change: 1 addition & 0 deletions project/android/apps/MnnLlmApp/app/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/build
85 changes: 85 additions & 0 deletions project/android/apps/MnnLlmApp/app/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
plugins {
id 'com.android.application'
id 'org.jetbrains.kotlin.android'
}

android {
namespace 'com.alibaba.mnnllm.android'
compileSdk 34

defaultConfig {
applicationId "com.alibaba.mnnllm.android"
minSdk 26
targetSdk 34
versionCode 1
versionName "0.1"

testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
externalNativeBuild {
cmake {
cppFlags '-std=c++17'
}
}

buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
}

compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}

kotlinOptions {
jvmTarget = "1.8"
}

ndk {
//noinspection ChromeOsAbiSupport
abiFilters "arm64-v8a" // Include only arm64-v8a
}
}

buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
}
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
externalNativeBuild {
cmake {
path file('src/main/cpp/CMakeLists.txt')
version '3.22.1'
}
}
buildFeatures {
viewBinding true
}
}

dependencies {
// implementation 'uk.co.chrisjenx:calligraphy:2.2.0'
implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk8"
implementation 'androidx.appcompat:appcompat:1.7.0'
implementation 'com.google.android.material:material:1.12.0'
implementation 'androidx.constraintlayout:constraintlayout:2.2.0'
implementation 'com.github.techinessoverloaded:progress-dialog:1.5.1'
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.okhttp3:logging-interceptor:4.9.3'
implementation 'com.squareup.retrofit2:converter-scalars:2.9.0'
implementation 'com.github.ybq:Android-SpinKit:1.4.0'
implementation 'com.nambimobile.widgets:expandable-fab:1.2.1'
implementation 'com.github.squti:Android-Wave-Recorder:2.0.1'
testImplementation 'junit:junit:4.13.2'
androidTestImplementation 'androidx.test.ext:junit:1.2.1'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.6.1'
}
21 changes: 21 additions & 0 deletions project/android/apps/MnnLlmApp/app/proguard-rules.pro
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Add project specific ProGuard rules here.
# You can control the set of applied configuration files using the
# proguardFiles setting in build.gradle.
#
# For more details, see
# http://developer.android.com/guide/developing/tools/proguard.html

# If your project uses WebView with JS, uncomment the following
# and specify the fully qualified class name to the JavaScript interface
# class:
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
# public *;
#}

# Uncomment this to preserve the line number information for
# debugging stack traces.
#-keepattributes SourceFile,LineNumberTable

# If you keep the line number information, uncomment this to
# hide the original source file name.
#-renamesourcefileattribute SourceFile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package com.alibaba.mnnllm.android.demo;

import android.content.Context;

import androidx.test.platform.app.InstrumentationRegistry;
import androidx.test.ext.junit.runners.AndroidJUnit4;

import org.junit.Test;
import org.junit.runner.RunWith;

import static org.junit.Assert.*;

/**
* Instrumented test, which will execute on an Android device.
*
* @see <a href="http://d.android.com/tools/testing">Testing documentation</a>
*/
@RunWith(AndroidJUnit4.class)
public class ExampleInstrumentedTest {
@Test
public void useAppContext() {
// Context of the app under test.
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
assertEquals("com.alibaba.mnnllm.android.demo", appContext.getPackageName());
}
}
57 changes: 57 additions & 0 deletions project/android/apps/MnnLlmApp/app/src/main/AndroidManifest.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools">
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE"/>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC"/>
<uses-permission android:name="android.permission.POST_NOTIFICATIONS"/>
<uses-permission android:name="android.permission.READ_MEDIA_IMAGES"
tools:ignore="SelectedPhotoAccess" />
<uses-permission android:name="android.permission.READ_MEDIA_AUDIO"/>
<uses-permission android:name="android.permission.RECORD_AUDIO" />

<application
android:allowBackup="true"
android:icon="@drawable/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@drawable/ic_launcher"
android:supportsRtl="true"
android:name="com.alibaba.mnnllm.android.DemoApplication"
android:theme="@style/Theme.MnnLlmApp"
tools:targetApi="31">
<uses-native-library android:name="libOpenCL.so"
android:required="true"/>
<provider
android:name="androidx.core.content.FileProvider"
android:authorities="${applicationId}.fileprovider"
android:exported="false"
android:grantUriPermissions="true">
<meta-data
android:name="android.support.FILE_PROVIDER_PATHS"
android:resource="@xml/file_paths" />
</provider>
<activity android:name="com.alibaba.mnnllm.android.chat.ChatActivity"
android:exported="true"
android:configChanges="orientation|screenSize"
>
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.DEFAULT" />
</intent-filter>
</activity>

<service android:name="com.alibaba.mls.api.download.DownlodForegroundService"
android:foregroundServiceType="dataSync"
/>

<activity android:name=".MainActivity"
android:exported="true"
android:configChanges="orientation|screenSize">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>

</manifest>
Loading
Loading