NVIDIA Nemotron-3 Nano 4B

nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF

A highly efficient 4B parameter model from NVIDIA, optimized for low-latency on-device tasks and high-quality text generation.

How to Use

To get started, install the `transformers` library:

pip install transformers

Then, use the following snippet to load the model:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Your inference code here...

Available Versions

Tag / Variant	Size	Format	Download
nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M	2.8GB	GGUF	Link

Model Details

Teacher Model

Nemotron-3-Large

Distillation Method

Knowledge Distillation (Logits)

Training Dataset

Flickr30k (Conceptual)

Primary Task

Multimodal Generation

Performance Metrics (Example)

Metric	Student Model	Teacher Model
Model Size	2.8GB	8.5GB
BLEU Score	28.5	30.1