> ## Documentation Index
> Fetch the complete documentation index at: https://learn.playlab.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Selecting an LLM

> Choose the right AI model for your Playlab project

<div className="pl-badges">
  <span className="pl-badge pl-badge--neutral">Updated</span>
</div>

<Tip>
  **New:** `Gemini 3.5 Flash` is now available in Playlab — frontier-level coding and agentic performance at Flash-tier speed. `GPT 5.4` is also available. `Gemini 2.5 Pro` and `Gemini 2.5 Flash` have been deprecated in favor of the Gemini 3 family.
</Tip>

<Note>
  **New Models Added Regularly!** We're constantly adding and updating models to give Playlabbers access to the latest AI capabilities. Our goal is to provide more open weight models and eventually open source models to give you maximum flexibility and control over your applications.
</Note>

## What is this feature?

You can now build on top of even more LLMs in Playlab! There are now more than a dozen available AI models for you to build your Playlab apps on top of. We will try our best to always provide the latest models for you to build on top of.

<Warning>
  Changing the LLM may impact the performance of your app.
</Warning>

## Rationale for the feature

This feature allows Playlab users to experiment with and leverage the unique strengths of various AI models from different providers all within Playlab.

As you build, you might find that certain models perform better at different tasks. This will allow Playlab users to select the model that fits their needs better. The more available models, the more likely you are to find one that meets your needs. We believe that Playlabbers should have access to frontier models as we build in community.

## Understanding Model Types

Before selecting a model, it's helpful to understand the different categories of AI models available:

| **Frontier Models**                                                                                                 | **Open Weight Models**                                                                            | **Open Source Models**                                                        |
| ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| Cutting-edge, proprietary models developed by major AI companies                                                    | Models with publicly available parameters (weights) that can be downloaded and run independently  | Fully open models where both weights and training code are publicly available |
| Typically offer the most advanced capabilities and are continuously updated with the latest research breakthroughs  | While training code may not be available, you have more control over deployment and customization | Offer maximum transparency and customization potential                        |
| Examples: Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, GPT-5 Mini, Gemini 3.1 Pro, Gemini 3.5 Flash, Gemini 3 Flash | Examples: Llama models, DeepSeek R1, Kimi K2.5, Qwen 3, Mistral Large 3                           | Coming soon!                                                                  |

## How do I access these models?

<Steps>
  <Step title="Click the LLM selector">
    On the top left click the LLM. (By default it will be Claude Sonnet 4.6)

    <Frame>
      <img height="500" src="https://mintcdn.com/playlabai/_D6SZKjykVV5vZr9/images/Selectinganllm.gif?s=70f9071b02d2e6ab71b92a4aed958a7a" data-path="images/Selectinganllm.gif" />
    </Frame>
  </Step>

  <Step title="Choose your model">
    From the menu, select which LLM you want to build on top of. Each model shows its knowledge cutoff date so you can pick based on how current you need the model to be. You can read more about available models below in greater detail.

    <Frame>
      <img src="https://mintcdn.com/playlabai/MB54mLvHimQQLexo/images/cutoffdate.png?fit=max&auto=format&n=MB54mLvHimQQLexo&q=85&s=e54b7386070a8781d4e681e7daa04827" width="870" height="378" data-path="images/cutoffdate.png" />
    </Frame>
  </Step>

  <Step title="Build and Test">
    See how the model you chose impacts your app. Continue trying out different models to find the "best" fit for your app.
  </Step>
</Steps>

## Which models should I use?

Now that you know how to select models, here are some strengths and tradeoffs of each:

<CardGroup cols={2}>
  <Card title="Claude Opus 4.6 (Anthropic)" icon="crown">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Advanced model for complex analysis, even longer tasks with many steps, and higher-order math and coding.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2025</p>
    <p><strong>Strengths:</strong> Unmatched intelligence and reasoning depth. Superior performance on complex multi-step problems. Exceptional analytical and coding capabilities. Best-in-class for higher-order math and extended tasks.</p>
    <p><strong>Trade Offs:</strong> Slower response times and higher cost. Best reserved for tasks that truly require maximum capability.</p>
  </Card>

  <Card title="Claude Sonnet 4.6 (Anthropic)" icon="diamond">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Latest version of Claude Sonnet series - with the highest intelligence across most tasks.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2025</p>
    <p><strong>Strengths:</strong> Highest intelligence across most tasks. Superior instruction following and nuance understanding. Exceptional balance of speed and capability. Best-in-class for most applications requiring high quality output.</p>
    <p><strong>Trade Offs:</strong> More expensive than smaller models. May be more than needed for very simple tasks.</p>
  </Card>

  <Card title="Claude Haiku 4.5 (Anthropic)" icon="leaf">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Near-frontier intelligence at blazing speeds with extended thinking and exceptional cost-efficiency.</p>
    <p><strong>Knowledge Cutoff:</strong> July 2025</p>
    <p><strong>Strengths:</strong> Blazing fast response times with extended thinking capabilities. Near-frontier intelligence at exceptional cost-efficiency. Excellent for quick questions and lightweight tasks.</p>
    <p><strong>Trade Offs:</strong> Less capable than Sonnet or Opus models. May struggle with complex multi-step reasoning and advanced analysis.</p>
  </Card>

  <Card title="Claude 4.6 Sonnet (Reasoning) (Anthropic)" icon="brain">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Work through difficult problems using careful, step-by-step reasoning.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2025</p>
    <p><strong>Strengths:</strong> Exceptional step by step reasoning capabilities. Stronger at math and coding. Very good at explaining thought process.</p>
    <p><strong>Trade Offs:</strong> Slower response times. Not as optimized for creative tasks. Consider Claude Sonnet 4.6 or Claude Opus 4.6 for better overall performance.</p>
  </Card>

  <Card title="GPT 5.4 (OpenAI)" icon="star">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> OpenAI's latest coding and reasoning model.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2025</p>
    <p><strong>Strengths:</strong> State-of-the-art coding and reasoning performance. Exceptional problem-solving capabilities. Superior instruction following and nuance understanding.</p>
    <p><strong>Trade Offs:</strong> Slower response times and higher cost. May be unnecessary for simple tasks. Premium pricing for cutting-edge capabilities.</p>
  </Card>

  <Card title="GPT-5 Mini (OpenAI)" icon="star">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Faster model for well-defined tasks.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2025</p>
    <p><strong>Strengths:</strong> Fast response times for well-defined tasks. Cost-effective for regular applications. Strong performance across most tasks without premium overhead.</p>
    <p><strong>Trade Offs:</strong> Slightly reduced capabilities compared to GPT 5.4. May not excel at the most complex reasoning challenges requiring maximum model capacity.</p>
  </Card>

  <Card title="Gemini 3.1 Pro (Google)" icon="robot">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Google's most powerful thinking model with maximum response accuracy and state-of-the-art performance.</p>
    <p><strong>Knowledge Cutoff:</strong> Jan 2025</p>
    <p><strong>Strengths:</strong> Maximum response accuracy and state-of-the-art performance. Exceptional reasoning and problem-solving. Superior performance on complex analytical tasks. Enhanced creative and coding capabilities. Best-in-class for applications requiring advanced Google AI.</p>
    <p><strong>Trade Offs:</strong> Slower response times compared to Flash models. Higher cost for premium capabilities. May be unnecessary for simple tasks.</p>
  </Card>

  <Card title="Gemini 3.5 Flash (Google)" icon="bolt">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> Google's frontier-tier Flash model for agentic and coding workflows. Outperforms Gemini 3.1 Pro on coding and agentic benchmarks while running roughly four times faster.</p>
    <p><strong>Knowledge Cutoff:</strong> Jan 2026</p>
    <p><strong>Strengths:</strong> Frontier-level performance on coding and agentic tasks at Flash-tier speed. Strong multimodal understanding across text, image, audio, and video. 1M-token input context for long-horizon, multi-step workflows.</p>
    <p><strong>Trade Offs:</strong> Roughly 3x the per-token cost of Gemini 3 Flash. Dynamic thinking on by default may add latency for very simple prompts.</p>
  </Card>

  <Card title="Gemini 3 Flash (Google)" icon="bolt">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Frontier Model</span></div>
    <p><strong>Description:</strong> General purpose model optimized for fast response times.</p>
    <p><strong>Knowledge Cutoff:</strong> Jan 2025</p>
    <p><strong>Strengths:</strong> Extremely fast response times. Strong general-purpose performance. Good for simple instruction following and high volume tasks.</p>
    <p><strong>Trade Offs:</strong> Not ideal for multi-step problem solving or complex instruction following. May miss nuance in instructions.</p>
  </Card>

  <Card title="Mistral Large 3 (Mistral)" icon="robot">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Mistral's 675B parameter flagship model with strong multilingual capabilities.</p>
    <p><strong>Knowledge Cutoff:</strong> Oct 2024</p>
    <p><strong>Strengths:</strong> Strong reasoning and analytical capabilities. Excellent multilingual support. Open weight flexibility for customization and deployment.</p>
    <p><strong>Trade Offs:</strong> May not match top frontier models on the most demanding tasks. Performance varies by domain.</p>
  </Card>

  <Card title="Kimi K2.5 (Moonshot)" icon="rocket">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Advanced open weight model that excels in using tools.</p>
    <p><strong>Knowledge Cutoff:</strong> \~Apr 2024</p>
    <p><strong>Strengths:</strong> Excellent tool usage capabilities. Good for applications requiring API integrations. Strong technical reasoning.</p>
    <p><strong>Trade Offs:</strong> May be specialized for tool use rather than general conversation. Performance varies on creative tasks.</p>
  </Card>

  <Card title="DeepSeek R1 (DeepSeek)" icon="droplets">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Open-source model designed for efficiency.</p>
    <p><strong>Knowledge Cutoff:</strong> July 2024</p>
    <p><strong>Strengths:</strong> Cost-effective and efficient. Good for applications where budget is a primary concern. Open-source flexibility.</p>
    <p><strong>Trade Offs:</strong> May not match performance of frontier models on complex tasks. Limited compared to more advanced models.</p>
  </Card>

  <Card title="Llama 4 Maverick (Meta)" icon="infinity">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Advanced open-weight model for reasoning, math, and general knowledge.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2024</p>
    <p><strong>Strengths:</strong> Strong reasoning capabilities for math and general knowledge. Open weight benefits. Good performance across diverse tasks.</p>
    <p><strong>Trade Offs:</strong> Not as fast as smaller models. May require more specific prompting for best results.</p>
  </Card>

  <Card title="Llama 4 Scout (Meta)" icon="infinity">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Powerful for multi-document analysis, cross-lingual understanding, and context-aware reasoning.</p>
    <p><strong>Knowledge Cutoff:</strong> Aug 2024</p>
    <p><strong>Strengths:</strong> Excellent at analyzing multiple documents simultaneously. Strong cross-lingual capabilities. Advanced contextual understanding.</p>
    <p><strong>Trade Offs:</strong> May be slower for simple tasks. Specialized for document analysis rather than general usage.</p>
  </Card>

  <Card title="Llama 3.3 70B Instruct (Meta)" icon="infinity">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Advanced model for reasoning, math, and general knowledge.</p>
    <p><strong>Knowledge Cutoff:</strong> Dec 2023</p>
    <p><strong>Strengths:</strong> Strong general well balanced use cases. Performs well in math. Effective at following clear instructions. Open weight flexibility.</p>
    <p><strong>Trade Offs:</strong> Slower than smaller models. Does not follow instructions as well as Claude/GPT models.</p>
  </Card>

  <Card title="Qwen 3 (Alibaba)" icon="robot">
    <div className="pl-badges"><span className="pl-badge pl-badge--neutral">Open Weight Model</span></div>
    <p><strong>Description:</strong> Large-scale Qwen3 model with 235B parameters, optimized for instruction following and reasoning tasks.</p>
    <p><strong>Knowledge Cutoff:</strong> Oct 2023</p>
    <p><strong>Strengths:</strong> Excellent multilingual support. Strong performance on reasoning and instruction following tasks. Good balance of performance and efficiency. Open weight flexibility.</p>
    <p><strong>Trade Offs:</strong> May not match frontier model performance on highly specialized tasks. Performance varies depending on language and domain.</p>
  </Card>
</CardGroup>

## Tips for Selecting the Right Model

Selecting can be tricky. That's why we encourage you to play and experiment as you build to find the model that is best fit for your context.

### Selection Considerations

<Accordion title="Ask yourself what is an ideal response time for your app?">
  This will allow you to pick larger or smaller models that meet those needs. Claude Sonnet 4.6 and GPT-5 Mini offer excellent balance, while Claude Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 prioritize quality over speed. Claude Haiku 4.5 and Gemini 3 Flash excel at speed for simple tasks.
</Accordion>

<Accordion title="Identify what complexity level is your task?">
  For simple Q\&A or content generation, lighter models like Claude Haiku 4.5 or Gemini 3 Flash may suffice. For balanced everyday tasks, Claude Sonnet 4.6 or GPT-5 Mini are ideal. For the most complex multi-step reasoning, choose Claude Opus 4.6, GPT 5.4, or Gemini 3.1 Pro. For agentic and coding workflows at Flash-tier speed, Gemini 3.5 Flash is a strong choice.
</Accordion>

<Accordion title="What is the level of accuracy you are requiring of your app?">
  Critical accuracy use cases like data analysis, or HR operations might require Claude Opus 4.6, GPT 5.4, Gemini 3.1 Pro, or other powerful models even if they're slower. Use cases that require creativity or open ended responses work well with GPT 5.4, GPT-5 Mini, Claude Sonnet 4.6, or creative-focused models.
</Accordion>

<Accordion title="Do you need open weights or source code access?">
  If you need model customization, local deployment, or transparency into model operations, consider open weight models like Llama 4 series, Qwen 3, DeepSeek R1, or Mistral Large 3. For maximum performance and latest capabilities, frontier models like Claude Opus 4.6, GPT 5.4, Claude Sonnet 4.6, or Gemini 3.1 Pro are typically best. Consider your long-term deployment and customization needs when choosing between proprietary and open models.
</Accordion>

### Best Practices

<Accordion title="Try to match your model with your use case:">
  Everyday applications: Claude Sonnet 4.6, Claude Haiku 4.5, or GPT-5 Mini provide the best balance of performance and efficiency. Critical/Complex applications: Claude Opus 4.6, GPT 5.4, or Gemini 3.1 Pro for highest accuracy and reasoning capability. Creative applications: GPT 5.4, GPT-5 Mini, or Claude Sonnet 4.6 for creative tasks. Problem-solving tools: Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Gemini 3.1 Pro, or Llama 4 Maverick. Document analysis: Claude Opus 4.6, Claude Sonnet 4.6, or Llama 4 Scout for multi-document or cross-lingual analysis. Technical/Coding tasks: Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, or Kimi K2.5 for tool usage. Educational explanation: Claude Sonnet 4.6, GPT-5 Mini, Llama 3.3 70B Instruct, Llama 4 Maverick, or those with strong explanatory capabilities. High-volume applications: Balance quality with speed using Claude Sonnet 4.6, Claude Haiku 4.5, Gemini 3 Flash, or GPT-5 Mini. Budget-conscious applications: Claude Haiku 4.5, Qwen 3, DeepSeek R1, Mistral Large 3, or other open weight models for cost-effective solutions. Research/Experimentation: Open weight models like Llama 4 series, Qwen 3, or Mistral Large 3 for flexibility.
</Accordion>

<Accordion title="Test out multiple models for apps that you are building:">
  Changing a model may change performance of an app in Playlab. Test multiple models before finalizing, as performance can vary significantly on your specific tasks. Implement A/B testing as you're building and testing to continually evaluate model performance. Consider starting with Claude Sonnet 4.6 or GPT-5 Mini as your baseline for most applications. Test both frontier and open weight models to find the best fit for your needs.
</Accordion>

<Accordion title="Additional best practices:">
  We recommend that you remix apps as you're experimenting to not impact the original app. You can review activity to see how multiple models handle similar tasks. If you're building a suite of apps we recommend you use faster models like Claude Haiku 4.5 for simple queries and reserve powerful models like Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, or Gemini 3.1 Pro for complex tasks. Consider cost implications, as newer frontier models like Claude Opus 4.6, Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT 5.4 may be more expensive but offer better performance. For production apps requiring customization, evaluate open weight models like Qwen 3, Mistral Large 3, and Kimi K2.5 alongside frontier options. Keep track of which models work best for your specific use cases to build your own selection guidelines.
</Accordion>

## FAQ

<Accordion title="Will switching models affect my existing app?">
  Yes, changing the LLM model can impact the performance of your app. Different models have different strengths and trade-offs, so it's important to test your app with the new model before finalizing the change.
</Accordion>

<Accordion title="How do I know which model is best for my specific use case?">
  We recommend experimenting with different models for your specific use case. Consider factors like response time requirements, complexity level of tasks, accuracy needs, and whether you need open weights. You can implement A/B testing to evaluate model performance. For most applications, Claude Sonnet 4.6 or GPT-5 Mini are great starting points.
</Accordion>

<Accordion title="Can I use different models for different parts of my app suite?">
  Yes! We recommend using faster models like Claude Haiku 4.5 for simple queries and reserving more powerful models like Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Gemini 3.1 Pro, or Claude 4.6 Sonnet (Reasoning) for complex tasks if you're building a suite of apps.
</Accordion>

<Accordion title="When should I choose Claude Opus 4.6 vs Claude Sonnet 4.6 vs Claude Haiku 4.5?">
  Choose Claude Opus 4.6 for the most demanding tasks requiring maximum intelligence, reasoning depth, and nuanced understanding. It's the most powerful model in the Claude family. Choose Claude Sonnet 4.6 for most applications where you need excellent intelligence with a good balance of performance and efficiency. It is the new default for all Playlab apps. Choose Claude Haiku 4.5 for fast, lightweight tasks requiring quick response times.
</Accordion>

<Accordion title="What's the difference between GPT 5.4 and GPT-5 Mini?">
  GPT 5.4 is OpenAI's latest coding and reasoning model with top capabilities across all domains. GPT-5 Mini is a faster model for well-defined tasks with better cost efficiency. Choose GPT 5.4 when you need maximum capability and GPT-5 Mini when you need speed and cost-effectiveness.
</Accordion>

<Accordion title="What's the difference between frontier, open weight, and open source models?">
  Frontier models are cutting-edge proprietary models with the latest capabilities but require API access. Open weight models have publicly available parameters, allowing more control and customization. Open source models provide both weights and training code. Choose based on your needs for performance vs. customization and transparency.
</Accordion>

<Accordion title="When should I consider open weight models like Llama 4, Qwen 3, Mistral Large 3, or DeepSeek R1?">
  Consider open weight models when you need model customization, local deployment, cost control for high-volume applications, or transparency into model operations. They're also great for research and experimentation. However, frontier models typically offer better performance for most production applications.
</Accordion>

## We Want Your Feedback!

<Info>
  Have you tried building with different LLM models? We'd love to hear about your experience with the new models and which ones work best for your use cases!

  Contact us at [support@playlab.ai](mailto:support@playlab.ai)
</Info>

***

Last updated: 05-22-2026
