Qwen/Qwen3.5-0.8B-GGUF
NewAlibaba Qwen3.5 sub-1B multimodal model. Text+image+video understanding with 262K context. Apache 2.0. Built for lightweight agentic assistants.
Alibaba Qwen3.5 sub-1B multimodal model. Text+image+video understanding with 262K context. Apache 2.0. Built for lightweight agentic assistants.
Alibaba Qwen3.5 2B edge-optimized model. Hybrid Gated DeltaNet+Attention architecture, 256K context, Apache 2.0. Built for tool-calling agents and multimodal workflows.
Microsoft Phi-4-mini distilled for edge reasoning. 3.8B params, 128K context, MIT license. Optimized for agentic tool-calling and multilingual tasks.
google/gemma-4-E4B-it
Google DeepMind multimodal instruction model. 4.5B effective params, 128K context, text+image+audio. Native function calling, configurable thinking modes, Apache 2.0.
To get started, install the `transformers` library:
pip install transformersThen, use the following snippet to load the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "google/gemma-4-E4B-it"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Your inference code here...google/gemma-4-E4B-base
Knowledge Distillation (Logits)
Flickr30k (Conceptual)
Multimodal Generation
| Metric | Student Model | Teacher Model |
|---|---|---|
| Model Size | 6.1GB | 8.5GB |
| BLEU Score | 28.5 | 30.1 |