Skip to content

batchling

Streaming batches

vienneraphael/batchling

Streaming batches

Throughout the docs, we mostly rely on asyncio.gather for simplicity and developer experience, but it's also possible to stream batch rather than waiting for all to be processed before continuing the process!

`asyncio.gather` vs `asyncio.as_completed`

Before looking at how you can stream batches results, let's go through a quick reminder about the two main asyncio result collection methods.

asyncio.gather: Executes tasks concurrently and waits for the entire group to finish before returning. It preserves the original order of results, making it the most efficient choice when your next step requires the complete dataset.
asyncio.as_completed: Yields results in an iterable as they finish (fastest first). This reduces "time to first byte" by allowing you to process early successes immediately, though it introduces slight overhead by requiring a loop to handle results one by one.

How to stream batches

If you need to stream batches results as they are made available (for logging, tracking progress, saving intermediate results to db..), you can stream batches like so:

streaming_batches.py

import asyncio
import os

from dotenv import load_dotenv
from groq import AsyncGroq

from batchling import batchify

load_dotenv()


async def build_tasks() -> list:
    """Build an identical groq request for two models to create two batches."""
    client = AsyncGroq(api_key=os.getenv(key="GROQ_API_KEY"))
    models = ["llama-3.1-8b-instant", "openai/gpt-oss-20b"]
    question = "Tell me a short joke, one sentence max."
    return [
        client.chat.completions.create(
            model=model,
            messages=[
                {
                    "role": "user",
                    "content": question,
                }
            ],
        )
        for model in models
    ]


async def main() -> None:
    """Run the streaming batches example."""
    tasks = await build_tasks()
    processed_batches = 0
    for task in asyncio.as_completed(tasks):
        response = await task
        processed_batches += 1
        print(f"Processed batches: {processed_batches} / {len(tasks)}")
        print(f"{response.model} answer:\n{response.choices[0].message.content}\n")


async def run_with_batchify() -> None:
    """Run `main` inside `batchify` for direct script execution."""
    async with batchify(cache=False):
        await main()


if __name__ == "__main__":
    asyncio.run(run_with_batchify())

Output:

Processed batches: 1 / 2
llama-3.1-8b-instant answer:
A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?" The librarian replied, "It rings a bell, but I'm not sure if it's here or not."

Processed batches: 2 / 2
openai/gpt-oss-20b answer:
Why don't skeletons fight each other? They don't have the guts.