Fine-tune foundation models on proprietary data inside TEEs. Better accuracy, zero data leakage. Keep your training data, gradients, and custom weights encrypted with hardware-enforced privacy.
Custom performance demands private corp data; Phala lets you use it safely.

Traditional cloud infrastructure exposes sensitive information to operators and administrators.
More Information
Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.
More Information
End-to-end encryption protects data in transit, at rest, and critically during computation.
More Information
Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.
More Information
End-to-end confidential fine-tuning with hardware attestation and encrypted artifacts.

7-Step Tutorial: Confidential fine-tuning with hardware attestation and encrypted artifacts
Install Unsloth and Hugging Face libraries with GPU support
Mount and load encrypted fine-tuning dataset in conversational format
Load base model with 4-bit quantization and memory optimization
Add Low-Rank Adapters to attention and feed-forward layers
Supervised fine-tuning using HuggingFace TRL SFTTrainer
Merge LoRA adapters into base model for deployment
Push merged model to Hugging Face Hub for inference

Meeting the highest compliance requirements for your business
Discover how Phala Network enables privacy-preserving AI across different use cases
Everything you need to know about Private Fine-Tuning
Full fine-tuning updates all parameters of the base model, which requires huge compute and storage. LoRA (Low-Rank Adaptation) inserts small, trainable layers that capture the changes during fine-tuning, while keeping the base model frozen. On Phala, LoRA fine-tuning is the default because it's 10× faster, 10× cheaper, and can fit on a single GPU — yet can later be merged into full weights for deployment.
Unsloth rewrites Transformer internals with custom Triton kernels, FlashAttention 2, and optimized quantization (4-bit QLoRA). It reduces VRAM use by up to 70% and increases training throughput by 1.5–2×. On Phala's H200 GPU TEEs, you can fine-tune large models like LLaMA 3 efficiently in real time.
Yes. Phala Cloud allocates H200 or A100 enclaves with sufficient memory, and Unsloth's QLoRA compression lets 8B–13B models fine-tune comfortably on a single GPU. Multi-GPU distributed training is also supported via dstack orchestration.
It depends on model size and dataset volume. As a rule of thumb, an 8B model with 100k chat samples typically completes in 4–6 hours on a single H200, using LoRA fine-tuning. Full-weight merges add only a few minutes after training.
Yes. On Phala, all data is encrypted at rest, in transit, and in use. Your dataset is only decrypted inside a hardware Trusted Execution Environment (TEE) after remote attestation confirms the correct code. Even Phala's operators can't view your data.
No. The entire training job runs in an isolated enclave, and the model artifacts are encrypted. Only you (the job owner) can export or share the resulting weights after attestation.
Phala's confidential compute model aligns with the technical requirements of GDPR, HIPAA, and SOC 2. Remote attestation and audit logs provide verifiable proofs that your data was processed securely.
Not in practice. The GPU TEE overhead is typically under 5%, and all kernels run natively on the H200 hardware. You get native performance with added privacy guarantees — no need to trade off accuracy for security.
Yes. After training, you can call save_pretrained_merged() in Unsloth to combine LoRA deltas with base weights, producing a full FP16 model ready for inference. This makes deployment easier — no LoRA adapter loading required.
You do. Phala acts only as the confidential runtime provider. The base model license (e.g., LLaMA's non-commercial license) still applies, but the fine-tuned derivative is your intellectual property.
You can deploy it directly to Phala's Inference TEEs, which expose OpenAI-compatible APIs. Alternatively, push it to Hugging Face Hub or your private registry, then run inference from your preferred stack.
Yes. Each Phala fine-tune job generates a cryptographic attestation report — signed by the enclave hardware vendor — proving that your training ran on genuine secure hardware with a verified code base.
Customize LLMs on your proprietary data with hardware-enforced confidentiality and zero-knowledge guarantees.
Deploy on Phala