Confidential Training

Train Large-Scale Models in TEEs

Why It Matters

Why Confidential Training Matters

Centralized training infrastructure exposes datasets and model IP. Phala enables consortium learning and regulated-industry training with hardware isolation.

Training data contains regulated information

Traditional cloud infrastructure exposes sensitive information to operators and administrators.

More Information

Model checkpoints encode proprietary techniques

Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.

More Information

Gradient updates can leak training examples

End-to-end encryption protects data in transit, at rest, and critically during computation.

More Information

Consortium partners need data-custody guarantees

Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.

More Information

Training Workflow

How It Works

Deploy a fully optimized system and upgrade your current setup.

Secure Data Loading

Load training datasets directly from private sources inside TEEs. Your sensitive data never leaves the secure enclave during the entire training pipeline.

Encrypted Gradient Flow

Training gradients stay encrypted end-to-end. Hardware attestation proves your model updates never leaked.

Verifiable Attestation Reports

Every training run generates cryptographic proof that your data remained confidential throughout the process.

Global TEE Infrastructure

Train on distributed TEE clusters worldwide. Scale your confidential training workloads across secure data centers with hardware-level isolation.

Deployment Example

Deploy Multi-GPU Training

Launch distributed pre-training jobs on confidential GPU clusters. Slurm and Kubernetes templates with TEE attestation and sealed checkpoint storage.

train-cluster.sh

# Deploy distributed training on TEE cluster
docker run -d \
  --name phala-training \
  --gpus all \
  --device=/dev/tdx_guest \
  -v $(pwd)/data:/data \
  -v $(pwd)/checkpoints:/checkpoints \
  -e WORLD_SIZE=8 \
  -e RANK=0 \
  -e MASTER_ADDR=10.0.1.100 \
  -e MASTER_PORT=29500 \
  -e MODEL_CONFIG=/data/llama-70b.json \
  -e TRAINING_DATA=/data/consortium/*.jsonl \
  -e CHECKPOINT_DIR=/checkpoints \
  phalanetwork/training:latest

# Monitor training progress
docker logs -f phala-training

# Training output from sealed environment
# Epoch 1/10: Loss 2.134 | Throughput 1.2M tok/s
# Epoch 2/10: Loss 1.876 | Throughput 1.2M tok/s
# Checkpoint saved: /checkpoints/epoch-2.bin
# Attestation signed: 0x8a9b7c6d...

Verification Example

Verify Training Lineage

Generate cryptographic proofs of your training process. Verify cluster attestation, dataset hashes, and reproducible build IDs for auditors and consortium partners.

verify-lineage.sh

# Get cluster attestation and training lineage
curl -X POST https://cloud-api.phala.network/api/v1/training/verify \
  -H "Content-Type: application/json" \
  -d '{
    "job_id": "train-consortium-llama-70b",
    "verify_cluster_attestation": true,
    "verify_dataset_hashes": true,
    "verify_checkpoint_lineage": true
  }'

# Attestation proves sealed training
{
  "verified": true,
  "cluster_size": 8,
  "tee_type": "Intel TDX",
  "dataset_hashes": [
    "0x8a9b7c6d...",
    "0x1a2b3c4d..."
  ],
  "checkpoint_lineage": "llama-70b-base -> epoch-10.bin",
  "reproducible_build_id": "0xfe7d8c9b...",
  "timestamp": "2025-01-15T14:30:00Z"
}

Industry-Leading Enterprise Compliance

Meeting the highest compliance requirements for your business

Explore Our Solutions

Discover how Phala Network enables privacy-preserving AI across different use cases

Private AI Data

Monetize and analyze sensitive data with TEEs and remote attestation—without exposing the raw data

Private AI Inference

Run AI models with end-to-end encryption to protect user inputs, outputs, and model IP

Fine-Tuned Models

Train and deploy custom AI models in secure enclaves while protecting your proprietary data

Confidential Training

Train AI models on sensitive data without exposing it, ensuring privacy and compliance

AI Agents

Build autonomous AI agents with cryptographic privacy guarantees for enterprise workflows

FAQ

Frequently Asked Questions

Everything you need to know about Confidential Training

What's the performance overhead for multi-GPU TEE training?

GPU TEE overhead is typically 5-15% compared to bare metal. Memory encryption happens at hardware speed with Intel TDX/AMD SEV. High-speed RDMA interconnect keeps gradient synchronization efficient even across encrypted enclaves.

How does cost compare to standard cloud training?

TEE infrastructure adds 10-20% premium over standard GPU instances. However, consortium learning splits costs across partners while maintaining data custody—often more economical than each party training separately.

How do consortium partners maintain data custody?

Each partner's data stays in separate sealed storage. Training orchestrator coordinates gradient updates without exposing raw data cross-party. Remote attestation proves proper isolation before any party sends datasets.

Can cloud operators access training data or gradients?

No. Hardware memory encryption in TEEs prevents any operator access to runtime state. Gradients are computed and synchronized inside encrypted enclaves with cryptographic proofs of isolation.

What prevents gradient-based training data leakage?

Gradients are computed inside TEEs and never leave in plaintext. Differential privacy techniques can be applied within the enclave. Only final model checkpoints are exported with signed attestation lineage.

What distributed training patterns are supported?

Tensor parallelism, data parallelism, pipeline parallelism, and hybrid strategies. Phala's confidential scheduler supports PyTorch FSDP, DeepSpeed, and Megatron-LM inside TEE clusters.

Can we use existing training scripts?

Yes, minimal changes required. Wrap your training code in our confidential container and configure attestation policies. Standard frameworks (PyTorch, TensorFlow, JAX) run as-is inside TEEs.

How do we monitor training inside TEEs?

Enclave-safe telemetry exports training metrics without exposing sensitive data. TensorBoard and Weights & Biases integrations available with differential privacy filters for metric publishing.

Start Confidential Training Today

Train large-scale AI models on sensitive datasets with multi-GPU TEE clusters and hardware-enforced encryption.

Deploy on Phala

Multi-GPU TEE clusters
Consortium learning support
Sealed checkpoint storage
Reproducible attestation
24/7 technical support

Build AI People Can Trust.

Subscribe to our newsletter

Train Large-Scale Models in TEEs

Why Confidential Training Matters

Training data contains regulated information

Model checkpoints encode proprietary techniques

Gradient updates can leak training examples

Consortium partners need data-custody guarantees

How It Works

Verify & Setup

Load Data

Train Model

Export Weights

Secure Data Loading

Encrypted Gradient Flow

Verifiable Attestation Reports

Global TEE Infrastructure

Deploy Multi-GPU Training

Verify Training Lineage

Industry-Leading Enterprise Compliance

Explore Our Solutions

Private AI Data

Private AI Inference

Fine-Tuned Models

Confidential Training

AI Agents

Frequently Asked Questions

What's the performance overhead for multi-GPU TEE training?

How does cost compare to standard cloud training?

How do consortium partners maintain data custody?

Can cloud operators access training data or gradients?

What prevents gradient-based training data leakage?

What distributed training patterns are supported?

Can we use existing training scripts?

How do we monitor training inside TEEs?

Start Confidential Training Today