Agnes-2.0-Flash
Agnes-2.0-Flash is a fast and efficient language model developed by Sapiens AI, designed for agent workflows, tool calling, coding tasks, reasoning, multi-turn conversations, image understanding, and high-frequency production use cases. Agnes-2.0-Flash achieved strong performance on the Claw-Eval benchmark, ranking 9th on the General Leaderboard with a Pass^3 score of 60.9%, demonstrating strong autonomous agent capabilities among mainstream language models.Model Overview
Agnes-2.0-Flash is optimized for fast, reliable, and cost-efficient language generation, agent task execution, and image understanding. The model supports the following capabilities:| Capability | Description |
|---|---|
| Chat Completion | Generate high-quality responses for conversations and applications |
| Multi-turn Conversation | Maintain context continuity across multiple turns |
| Image URL Input | Accept image content through publicly accessible image URLs |
| Image Understanding | Understand image content, analyze screenshots, and extract visual information |
| Tool Calling | Call external tools and functions for agent workflows |
| Agent Workflows | Support planning, execution, and multi-step task completion |
| Coding Tasks | Assist with code generation, debugging, explanation, and refactoring |
| Reasoning | Handle structured reasoning, task decomposition, and decision-making |
| Streaming Output | Return responses in real time for a better user experience |
| OpenAI-Compatible API | Use a request structure compatible with the OpenAI Chat Completions API |
Use Cases
Agnes-2.0-Flash is suitable for the following scenarios:| Scenario | Example Use Cases |
|---|---|
| AI Assistant | General Q&A, daily assistant, productivity support |
| Autonomous Agents | Multi-step task execution, planning, and tool usage |
| Coding Assistant | Code generation, debugging, refactoring, and explanation |
| Workflow Automation | Task decomposition, process automation, and execution planning |
| Customer Support | FAQ answering, customer service chatbots, service automation |
| Search and Q&A | Search-based answers, summarization, information extraction |
| Content Generation | Marketing copy, articles, product descriptions, scripts |
| Developer Tools | API assistant, documentation assistant, coding copilot |
| AI-Native Applications | Consumer apps, productivity tools, agent applications |
| Image Understanding | Image description, screenshot analysis, visual Q&A, information extraction |
API Information
Endpoint
| Item | Description |
|---|---|
| API Endpoint | https://apihub.agnes-ai.com/v1/chat/completions |
| Request Method | POST |
| Content-Type | application/json |
| Authentication | Bearer Token |
| Authentication Header | Authorization: Bearer YOUR_API_KEY |
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name. Use agnes-2.0-flash |
| messages | array | Yes | Conversation message array, including system, user, and assistant messages |
| messages[].content | string / array | Yes | Message content. It can be a plain text string or an array containing text and image_url content blocks |
| temperature | number | No | Controls output randomness. Lower values produce more deterministic results |
| top_p | number | No | Controls nucleus sampling. Lower values make the output more focused |
| max_tokens | number | No | Maximum number of tokens to generate in the response |
| stream | boolean | No | Whether to enable streaming output |
| tools | array | No | Tool definitions for tool-calling workflows |
| tool_choice | string / object | No | Controls whether and how the model uses tools |
| chat_template_kwargs | object | No | Extension field for enabling Thinking and other capabilities in OpenAI-compatible requests |
| thinking | object | No | Field for enabling Thinking mode in Anthropic-compatible requests |
Image URL Input Support
Agnes-2.0-Flash supports image input through image URLs. Developers can pass both text instructions and an image URL in the samemessages request, allowing the model to understand, analyze, answer questions about, or extract information from the image.
Supported input types:
| Input Type | Format | Description |
|---|---|---|
| Text | text | Plain text instruction or question |
| Image URL | image_url | Pass image content through a publicly accessible image URL |
Image Content Structure
When using image URL input,messages[].content should use an array structure. Each content block represents one type of input.
Request Examples
1. Basic Chat Completion Request
Use this request to generate a standard chat completion response.2. Streaming Output Request
Use this request to enable streaming output.3. Tool Calling Request
Use this request for agent workflows that require external tool calls.4. Image URL Input Request
Use this request to pass an image through an image URL and let the model understand or analyze the image content.Response Format
Response Fields
| Field | Type | Description |
|---|---|---|
| id | string | Unique ID of the completion request |
| object | string | Object type, usually chat.completion |
| created | integer | Request timestamp |
| model | string | Model used for the request |
| choices | array | List of generated response results |
| choices[].index | integer | Index of the response result |
| choices[].message | object | Assistant message object |
| choices[].message.role | string | Role of the message sender |
| choices[].message.content | string | Response content generated by the model |
| choices[].finish_reason | string | Reason why generation stopped |
| usage | object | Token usage information |
| usage.prompt_tokens | integer | Number of input tokens |
| usage.completion_tokens | integer | Number of output tokens |
| usage.total_tokens | integer | Total number of tokens used |
Enable Thinking for Coding Tasks
For coding, debugging, reasoning, and agent workflows, it is recommended to enable Thinking mode to improve code quality, task decomposition, and problem-solving performance.OpenAI-Compatible Request
When using the OpenAI-compatible API format, addchat_template_kwargs.enable_thinking to the request body:
Anthropic-Compatible Request
When using the Anthropic-compatible API format, add thethinking field to the request body:
budget_tokens controls the maximum Thinking token budget. For common coding tasks, it is recommended to start with 2048. For more complex debugging, refactoring, or multi-step agent tasks, you can increase the value as needed.
Features and Compatibility
Agnes-2.0-Flash supports the following capabilities:- Chat Completion
- Multi-turn conversation
- System prompt
- Image URL input
- Image understanding
- Streaming output
- Tool calling
- Agent workflows
- Coding tasks
- Reasoning tasks
- JSON-style output
- OpenAI Chat Completions API-compatible request structure
Best Practices
Prompt Writing Tips
For better results, provide clear instructions, sufficient context, and the expected output format.Example: Product Copy Generation
text
Example: Coding Task
For coding tasks, provide the programming language, framework, error message, and expected behavior.text
Example: Agent Workflow
For agent workflows, clearly describe the goal, available tools, and task constraints.text
Example: Image Understanding Task
For image understanding tasks, clearly state what the model should focus on, such as overall description, text extraction, UI analysis, object recognition, or structured output.text
Recommended Prompt Structure
Use the following structure to organize prompts:text
Example
text
Image Understanding Prompt Example
text
Image URL Usage Tips
- The image URL must be publicly accessible.
- If the image URL requires login, authentication, or has hotlink protection, the model may not be able to read it.
- It is recommended to use standard image formats such as JPG, JPEG, PNG, or WebP.
- For screenshots, error images, or product UI images, add text instructions to clarify what the model should focus on.
- Image URL input can be used together with tool calling, streaming output, and agent workflows.
Model Limits
| Item | Value |
|---|---|
| Context | 256K |
| Max Output | 65.5K |
Pricing
| Type | Price | Current Price |
|---|---|---|
| Input Tokens | $0.03 / 1M tokens | $0 / 1M tokens |
| Output Tokens | $0.15 / 1M tokens | $0 / 1M tokens |
Notes
- Use
agnes-2.0-flashas the model name. - A basic Chat Completion request must include
modelandmessages. messages[].contentcan be a plain text string or an array containing text and an image URL.- To input an image, use
image_urland provide a publicly accessible image URL. - To enable streaming responses, set
streamtotrue. - For tool-calling workflows, provide
toolsand optionallytool_choice. temperaturecontrols randomness. Lower values are better for deterministic tasks, while higher values are better for creative generation.- Agnes-2.0-Flash is suitable for production applications that require fast responses, strong task completion, image understanding, and reliable agent performance.