Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Lambda is a 12-year-old San Francisco company best known for offering graphics processing units (GPUs) on demand as a service to machine learning researchers and AI model builders and trainers.
But today it's taking its offerings a step further with the launch of the Lambda Inference API (application programming interface), which it claims to be the lowest-cost service of its kind on the market. The API allows enterprises to deploy AI models and applications into production for end users without worrying about procuring or maintaining compute.
The launch complements Lambda's existing focus on providing GPU clusters for training and fine-tuning machine learning models.
"Our platform is fully verticalized, meaning we can pass dramatic cost savings to end users compared to other providers like OpenAI," said Robert Brooks, Lambda's vice president of revenue, in a video call interview with VentureBeat. "Plus, there are no rate limits inhibiting scaling, and you don't have to talk to a salesperson to get started."
In fact, as Brooks told VentureBeat, developers can head over to Lambda's new Inference API webpage, generate an API key, and get started in less than five minutes.
Lambda's Inference API supports leading-edge models such as Meta's Llama 3.3 and 3.1, Nous's Hermes-3, and Alibaba's Qwen 2.5, making it one of the most accessible options for the machine learning community. The full list is available here and includes:
Pricing begins at $0.02 per million tokens for smaller models like Llama-3.2-3B-Instruct, and scales up to $0.90 per million tokens for larger, state-of-the-art models such as Llama 3.1-405B-Instruct.
As Lambda cofounder and CEO Stephen Balaban put it recently on X, "Stop wasting money and start using Lambda for LLM Inference." Balaban published a graph showing its per-token cost for serving up AI models through inference compared to rivals in the space.
Furthermore, unlike many other services, Lambda's pay-as-you-go model ensures customers pay only for the tokens they use, eliminating the need for subscriptions or rate-limited plans.
Closing the AI loop
Lambda has a decade-plus history of supporting AI advancements with its GPU-based infrastructure.
From its hardware ...