-
BAML is a domain-specific language to generate structured outputs from LLMs
-
with good DX
- you can build reliable agents, chatbots, with RAG, MCP, etc.
-
with good DX
High-level Developer Walkthru
main.baml
class WeatherAPI {
city string @description("the user's city")
timeOfDay string @description("As an ISO8601 timestamp")
}
function UseTool(user_message: string) -> WeatherAPI {
client "openai-responses/gpt-5-mini"
prompt #"
Extract.... {# we will explain the rest in the guides #}
"#
}
-
Call your function in any language
import asyncio from baml_client import b from baml_client.types import WeatherAPI def main(): weather_info = b.UseTool("What's the weather like in San Francisco?") print(weather_info) assert isinstance(weather_info, WeatherAPI) print(f"City: {weather_info.city}") print(f"Time of Day: {weather_info.timeOfDay}") if __name__ == '__main__': main()
Enable reliable tool-calling with any model
-
BAML includes schema-aligned parsing algorithm to support flexible outputs LLMs can provide
- e.g. JSON blob, chain-of-thought, etc.
Schema Aligned Parsing
-
instead of relying on the model to strictly understand our desired format
-
write a parser that generously reads the output text
-
and applies error correction technique
- with knowledge of the original schema
-
and applies error correction technique
-
write a parser that generously reads the output text
-
instead of `jsonschema`, `bamlschema` is used as a more compressed way to define schemas
- this is because BAML is not as strict so we can omit characters
what is SAP?
- the idea is that the model will make mistakes, and to build a parser that is robust enough to handle those mistakes
-
in the context of structured data extraction
- we have a schema to guide us
Postel's Law coined by Jon Postel, creator of TCP/IP Be conservative in what you do, be liberal in what you accept from others
Some error correction techniques we use in SAP include:
- Unquoted strings
- Unescaped quotes and new lines in strings
- Missing commas, colons, brackets, keys
- Cast fractions to floats
- Remove superfluous keys in objects
- Strip yapping
- Picking the best of many possible candidates in case LLM produces multiple outputs
- Complete partial objects (due to streaming)