Private AI Inference
Centralized inference can log prompts and leak IP. Phala enclaves ensure no operator—cloud or vendor—can peek.

Traditional cloud infrastructure exposes sensitive information to operators and administrators.
More Information
Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.
More Information
End-to-end encryption protects data in transit, at rest, and critically during computation.
More Information
Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.
More InformationGPU TEE Protection
Zero-Trust Inference
GPU TEEs with Intel TDX and AMD SEV provide hardware-level memory encryption—your model weights, user prompts, and inference outputs stay encrypted in-use. Inputs, outputs, and weights stay inside attested GPU enclaves. Not even cloud admins or hypervisors can inspect runtime state.
Privacy as a human right—by design. Route requests via mTLS into enclave. Emit usage receipts; never store plaintext. OpenAI-compatible endpoints with verifiable attestation and zero-logging guarantees.
Access the latest frontier AI models with cryptographic privacy protection
Discover how leading companies are leveraging Phala's confidential AI to build exceptional digital experiences, while maintaining complete data privacy and regulatory compliance.
Deploy confidential AI inference with the flexibility of cloud and the security of on-premise infrastructure.
End-to-end encrypted
Hardware-attested routing
by OpenAI
by DeepSeek
by Qwen
End-to-end encrypted
Hardware-attested routing
by DeepSeek
OpenAI-compatible APIs with advanced capabilities running in TEE
Use OpenAI-compatible SDK to access 200+ models with hardware-enforced privacy. Drop-in replacement with zero code changes.
from openai import OpenAI
client = OpenAI(
api_key="<API_KEY>",
base_url="https://api.redpill.ai/api/v1"
)
response = client.chat.completions.create(
model="phala/deepseek-chat-v3-0324",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is your model name?"},
],
stream=True
)
print(response.choices[0].message.content)Every response includes cryptographic proof from NVIDIA and Intel TEE hardware. Verify attestation to ensure secure execution.
import requests
import jwt
# Fetch attestation report
response = requests.get(
"https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-v3",
headers={"Authorization": f"Bearer {api_key}"}
)
report = response.json()
# Verify NVIDIA GPU attestation
gpu_response = requests.post(
"https://nras.attestation.nvidia.com/v3/attest/gpu",
headers={"Content-Type": "application/json"},
data=report["nvidia_payload"]
)
# Check verification result
gpu_tokens = gpu_response.json()[1]
for gpu_id, token in gpu_tokens.items():
decoded = jwt.decode(token, options={"verify_signature": False})
assert decoded.get("measres") == "success"
print(f"{gpu_id}: Verified ✓")Choose the perfect privacy-first AI solution tailored to your needs
Private AI assistants for individuals who value data sovereignty and zero-logging guarantees.
OpenAI-compatible APIs with TEE protection—drop-in replacement with hardware-enforced privacy.
Scalable confidential AI infrastructure with compliance, auditability, and flexible deployment options.
Meeting the highest compliance requirements for your business
Discover how Phala Network enables privacy-preserving AI across different use cases
Everything you need to know about Private AI Inference
Phala uses GPU Trusted Execution Environments (TEEs) with Intel TDX and AMD SEV to encrypt all prompts, outputs, and model weights during inference. Not even cloud providers or system administrators can access data in use—only the attested enclave can decrypt your inputs.
No. Hardware-level memory encryption (Intel TDX/AMD SEV) prevents any operator—including Phala, cloud providers, or root users—from reading runtime memory. Data is encrypted from the moment it enters the TEE until it leaves.
Model weights are loaded directly into encrypted GPU memory inside TEEs. They never touch disk or CPU in plaintext. Each deployment is sealed with cryptographic measurements (mrenclave) you can verify before sending data.
Attestation proofs are cryptographic signatures from the CPU/GPU proving the exact code and environment running inside the TEE. Verify them via /v1/attestation endpoints before sending prompts—ensuring no tampering or backdoors exist.
Under 5 minutes. Use Docker containers with pre-configured TEE images, or deploy via Phala Cloud's one-click interface. No custom firmware or low-level TEE programming required.
No. Phala provides drop-in OpenAI-compatible API endpoints (base_url = https://api.redpill.ai/v1). Use the same SDKs (openai-python, openai-node) and just point to Phala's attested endpoints.
Yes. Upload weights encrypted with your key, and Phala will load them into TEE memory without ever decrypting them in transit. Use /v1/attestation to verify the deployment before sending prompts.
5-15% compared to bare-metal GPUs. Memory encryption happens at hardware speed with Intel TDX/AMD SEV, so most workloads see negligible impact. Batching and caching reduce overhead further.
Yes. Private inference is ideal for HIPAA/GDPR-regulated industries. Patient records or legal documents never leave the TEE in plaintext, and attestation proofs provide audit trails for compliance.
Embed your internal docs (HR policies, financial reports) into TEE-protected RAG pipelines. Employees query via private endpoints, and neither Phala nor cloud providers can read the documents or queries.
Up to 128k tokens for most models (e.g., Qwen2.5-72B, DeepSeek-V3). For longer documents, use chunking strategies or contact us for enterprise deployments with extended context windows.
Yes. OpenRouter uses Phala for confidential enterprise routes, NEAR AI for verifiable ML inference, and OODA AI (NASDAQ-listed) for decentralized GPU TEE deployments. See case studies above.
Deploy confidential LLM endpoints with hardware-enforced encryption and zero-knowledge guarantees.
Deploy on Phala