Fine-Tuned Models: Private Customization

Fine-tune foundation models on proprietary data inside TEEs. Better accuracy, zero data leakage. Keep your training data, gradients, and custom weights encrypted with hardware-enforced privacy.

LoRA & PEFT
Multi-GPU training
Sealed checkpoints
Training attestations
Fast deployment
Gradient privacy
Why It Matters

Why Private Fine-Tuning Matters

Custom performance demands private corp data; Phala lets you use it safely.

Data security

Training data contains business secrets

Traditional cloud infrastructure exposes sensitive information to operators and administrators.

More Information
Confidential computing

Fine-tuned weights encode proprietary knowledge

Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.

More Information
Zero-trust architecture

Model gradients can leak training examples

End-to-end encryption protects data in transit, at rest, and critically during computation.

More Information
Attestation

Vendors should never see your data or weights

Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.

More Information

How It Works

Unsloth

End-to-end confidential fine-tuning with hardware attestation and encrypted artifacts.

🔐
Remote Attestation
TEE Verified & Keys Unsealed
📦
Load Base Model
Llama / Mistral / Qwen
🗂️
Encrypted Dataset
streamed privately
🔥Fine-Tuning LoopUnsloth
Apply QLoRA
Train with Unsloth
2× Faster · 70% Less VRAM
Optional DPO/GRPO
Safety Checks
PII, Toxicity, Bias
🧾
Export Encrypted LoRA
+ Attestation Report
🚀
Deploy on Phala TEE
OpenAI / HF endpoint

Fine-Tuning LLaMA 3 with Unsloth on Phala Cloud

7-Step Tutorial: Confidential fine-tuning with hardware attestation and encrypted artifacts

1

Environment Setup

Install Unsloth and Hugging Face libraries with GPU support

2

Loading Chat Dataset Securely

Mount and load encrypted fine-tuning dataset in conversational format

3

Loading LLaMA 3 with Unsloth

Load base model with 4-bit quantization and memory optimization

4

Applying LoRA Adapters

Add Low-Rank Adapters to attention and feed-forward layers

5

Fine-Tuning with TRL

Supervised fine-tuning using HuggingFace TRL SFTTrainer

6

Merging LoRA into FP16 Weights

Merge LoRA adapters into base model for deployment

7

Saving and Uploading Model

Push merged model to Hugging Face Hub for inference

# Install Unsloth and Hugging Face libraries
pip install unsloth transformers accelerate trl datasets

# (Optional) Ensure PyTorch 2.1 with CUDA 12.1 is installed for H200 GPU
pip install torch==2.1.0

# Verify that the GPU is accessible
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_properties(0).name)"

Industry-Leading Enterprise Compliance

Meeting the highest compliance requirements for your business

AICPA SOC 2ISO 27001CCPAGDPR
FAQ

Frequently Asked Questions

Everything you need to know about Private Fine-Tuning

1

What's the difference between full fine-tuning and LoRA fine-tuning?

Full fine-tuning updates all parameters of the base model, which requires huge compute and storage. LoRA (Low-Rank Adaptation) inserts small, trainable layers that capture the changes during fine-tuning, while keeping the base model frozen. On Phala, LoRA fine-tuning is the default because it's 10× faster, 10× cheaper, and can fit on a single GPU — yet can later be merged into full weights for deployment.

2

How does Unsloth improve fine-tuning performance?

Unsloth rewrites Transformer internals with custom Triton kernels, FlashAttention 2, and optimized quantization (4-bit QLoRA). It reduces VRAM use by up to 70% and increases training throughput by 1.5–2×. On Phala's H200 GPU TEEs, you can fine-tune large models like LLaMA 3 efficiently in real time.

3

Can I fine-tune large models like LLaMA 3 or Mistral with limited GPUs?

Yes. Phala Cloud allocates H200 or A100 enclaves with sufficient memory, and Unsloth's QLoRA compression lets 8B–13B models fine-tune comfortably on a single GPU. Multi-GPU distributed training is also supported via dstack orchestration.

4

How long does fine-tuning take on Phala Cloud?

It depends on model size and dataset volume. As a rule of thumb, an 8B model with 100k chat samples typically completes in 4–6 hours on a single H200, using LoRA fine-tuning. Full-weight merges add only a few minutes after training.

5

Is my training data encrypted during fine-tuning?

Yes. On Phala, all data is encrypted at rest, in transit, and in use. Your dataset is only decrypted inside a hardware Trusted Execution Environment (TEE) after remote attestation confirms the correct code. Even Phala's operators can't view your data.

6

Can others see or copy my fine-tuned model?

No. The entire training job runs in an isolated enclave, and the model artifacts are encrypted. Only you (the job owner) can export or share the resulting weights after attestation.

7

What compliance frameworks does Phala's training support?

Phala's confidential compute model aligns with the technical requirements of GDPR, HIPAA, and SOC 2. Remote attestation and audit logs provide verifiable proofs that your data was processed securely.

8

Does fine-tuning inside a TEE reduce model quality or performance?

Not in practice. The GPU TEE overhead is typically under 5%, and all kernels run natively on the H200 hardware. You get native performance with added privacy guarantees — no need to trade off accuracy for security.

9

Can I merge LoRA adapters into full weights for deployment?

Yes. After training, you can call save_pretrained_merged() in Unsloth to combine LoRA deltas with base weights, producing a full FP16 model ready for inference. This makes deployment easier — no LoRA adapter loading required.

10

Who owns the fine-tuned model after training?

You do. Phala acts only as the confidential runtime provider. The base model license (e.g., LLaMA's non-commercial license) still applies, but the fine-tuned derivative is your intellectual property.

11

How do I deploy my fine-tuned model?

You can deploy it directly to Phala's Inference TEEs, which expose OpenAI-compatible APIs. Alternatively, push it to Hugging Face Hub or your private registry, then run inference from your preferred stack.

12

Can I verify where and how the fine-tune was executed?

Yes. Each Phala fine-tune job generates a cryptographic attestation report — signed by the enclave hardware vendor — proving that your training ran on genuine secure hardware with a verified code base.

Start Private Fine-Tuning Today

Customize LLMs on your proprietary data with hardware-enforced confidentiality and zero-knowledge guarantees.

Deploy on Phala
  • LoRA/PEFT support
  • Multi-GPU training
  • Sealed checkpoints
  • Training attestations
  • 24/7 technical support