Fine-Tuned Models: Private Customization

Fine-tune foundation models on proprietary data inside TEEs. Better accuracy, zero data leakage. Keep your training data, gradients, and custom weights encrypted with hardware-enforced privacy.

Start Fine-Tuning Contact Us

LoRA & PEFT

Multi-GPU training

Sealed checkpoints

Training attestations

Fast deployment

Gradient privacy

Why It Matters

Why Private Fine-Tuning Matters

Custom performance demands private corp data; Phala lets you use it safely.

Training data contains business secrets

Traditional cloud infrastructure exposes sensitive information to operators and administrators.

More Information

Fine-tuned weights encode proprietary knowledge

Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.

More Information

Model gradients can leak training examples

End-to-end encryption protects data in transit, at rest, and critically during computation.

More Information

Vendors should never see your data or weights

Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.

More Information

How It Works

End-to-end confidential fine-tuning with hardware attestation and encrypted artifacts.

TEE Verified & Unsealed

QLoRA 2× Faster

Auto Safety Checks

Encrypted Export

Base Model Loaded

Dataset Streamed Privately

Optional DPO/GRPO

OpenAI Endpoint Ready

🔐

Remote Attestation

TEE Verified & Keys Unsealed

📦

Load Base Model

Llama / Mistral / Qwen

🗂️

Encrypted Dataset

streamed privately

🔥Fine-Tuning Loop Unsloth

Apply QLoRA

Train with Unsloth

2× Faster · 70% Less VRAM

Optional DPO/GRPO

Safety Checks

PII, Toxicity, Bias

🧾

Export Encrypted LoRA

+ Attestation Report

🚀

Deploy on Phala TEE

OpenAI / HF endpoint

Fine-Tuning LLaMA 3 with Unsloth on Phala Cloud

7-Step Tutorial: Confidential fine-tuning with hardware attestation and encrypted artifacts

Environment Setup

Install Unsloth and Hugging Face libraries with GPU support

Loading Chat Dataset Securely

Mount and load encrypted fine-tuning dataset in conversational format

Loading LLaMA 3 with Unsloth

Load base model with 4-bit quantization and memory optimization

Applying LoRA Adapters

Add Low-Rank Adapters to attention and feed-forward layers

Fine-Tuning with TRL

Supervised fine-tuning using HuggingFace TRL SFTTrainer

Merging LoRA into FP16 Weights

Merge LoRA adapters into base model for deployment

Saving and Uploading Model

Push merged model to Hugging Face Hub for inference

# Install Unsloth and Hugging Face libraries
pip install unsloth transformers accelerate trl datasets

# (Optional) Ensure PyTorch 2.1 with CUDA 12.1 is installed for H200 GPU
pip install torch==2.1.0

# Verify that the GPU is accessible
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_properties(0).name)"

Medical Care AI

HEALTHCARE / HIPAA COMPLIANCE

Fine-tune medical chatbots on patient conversations Private training on sensitive healthcare data with TEE isolation

Read case study

SaaS Sales AI

SALES / CRM TRAINING

Train sales assistants on conversation data Fine-tune on proprietary sales dialogues without data leakage

Read case study

HR Recruitment AI

HR / CONFIDENTIAL TRAINING

Fine-tune hiring models on resume data Private training on candidate information with compliance guarantees

Read case study

Industry-Leading Enterprise Compliance

Meeting the highest compliance requirements for your business

Explore Our Solutions

Discover how Phala Network enables privacy-preserving AI across different use cases

Private AI Data

Monetize and analyze sensitive data with TEEs and remote attestation—without exposing the raw data

Private AI Inference

Run AI models with end-to-end encryption to protect user inputs, outputs, and model IP

Fine-Tuned Models

Train and deploy custom AI models in secure enclaves while protecting your proprietary data

Confidential Training

Train AI models on sensitive data without exposing it, ensuring privacy and compliance

AI Agents

Build autonomous AI agents with cryptographic privacy guarantees for enterprise workflows

FAQ

Frequently Asked Questions

Everything you need to know about Private Fine-Tuning

What's the difference between full fine-tuning and LoRA fine-tuning?

Full fine-tuning updates all parameters of the base model, which requires huge compute and storage. LoRA (Low-Rank Adaptation) inserts small, trainable layers that capture the changes during fine-tuning, while keeping the base model frozen. On Phala, LoRA fine-tuning is the default because it's 10× faster, 10× cheaper, and can fit on a single GPU — yet can later be merged into full weights for deployment.

How does Unsloth improve fine-tuning performance?

Unsloth rewrites Transformer internals with custom Triton kernels, FlashAttention 2, and optimized quantization (4-bit QLoRA). It reduces VRAM use by up to 70% and increases training throughput by 1.5–2×. On Phala's H200 GPU TEEs, you can fine-tune large models like LLaMA 3 efficiently in real time.

Can I fine-tune large models like LLaMA 3 or Mistral with limited GPUs?

Yes. Phala Cloud allocates H200 or A100 enclaves with sufficient memory, and Unsloth's QLoRA compression lets 8B–13B models fine-tune comfortably on a single GPU. Multi-GPU distributed training is also supported via dstack orchestration.

How long does fine-tuning take on Phala Cloud?

It depends on model size and dataset volume. As a rule of thumb, an 8B model with 100k chat samples typically completes in 4–6 hours on a single H200, using LoRA fine-tuning. Full-weight merges add only a few minutes after training.

Is my training data encrypted during fine-tuning?

Yes. On Phala, all data is encrypted at rest, in transit, and in use. Your dataset is only decrypted inside a hardware Trusted Execution Environment (TEE) after remote attestation confirms the correct code. Even Phala's operators can't view your data.

Can others see or copy my fine-tuned model?

No. The entire training job runs in an isolated enclave, and the model artifacts are encrypted. Only you (the job owner) can export or share the resulting weights after attestation.

What compliance frameworks does Phala's training support?

Phala's confidential compute model aligns with the technical requirements of GDPR, HIPAA, and SOC 2. Remote attestation and audit logs provide verifiable proofs that your data was processed securely.

Does fine-tuning inside a TEE reduce model quality or performance?

Not in practice. The GPU TEE overhead is typically under 5%, and all kernels run natively on the H200 hardware. You get native performance with added privacy guarantees — no need to trade off accuracy for security.

Can I merge LoRA adapters into full weights for deployment?

Yes. After training, you can call save_pretrained_merged() in Unsloth to combine LoRA deltas with base weights, producing a full FP16 model ready for inference. This makes deployment easier — no LoRA adapter loading required.

Who owns the fine-tuned model after training?

You do. Phala acts only as the confidential runtime provider. The base model license (e.g., LLaMA's non-commercial license) still applies, but the fine-tuned derivative is your intellectual property.

How do I deploy my fine-tuned model?

You can deploy it directly to Phala's Inference TEEs, which expose OpenAI-compatible APIs. Alternatively, push it to Hugging Face Hub or your private registry, then run inference from your preferred stack.

Can I verify where and how the fine-tune was executed?

Yes. Each Phala fine-tune job generates a cryptographic attestation report — signed by the enclave hardware vendor — proving that your training ran on genuine secure hardware with a verified code base.

Start Private Fine-Tuning Today

Customize LLMs on your proprietary data with hardware-enforced confidentiality and zero-knowledge guarantees.

Deploy on Phala

LoRA/PEFT support
Multi-GPU training
Sealed checkpoints
Training attestations
24/7 technical support

Build AI People Can Trust.

Subscribe to our newsletter