Skip to main content

Backend Swapping

If you want to do mixture of agents or something along those lines, you can dynamically swap between backends.
import ProgramState, { OpenAIBackend } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

await s
    .add(s.system`You are a helpful assistant.`)
    .add(s.user`Tell me a joke`)
    .add(s.assistant`${s.gen('answer1')}`);

s.fromOpenAI({ modelName: 'gpt-4o' });

await s
    .add(s.user`Tell me a better one`)
    .add(s.assistant`No problem! ${s.gen('answer2')}`);
console.log(s.get('answer1'), '\n------\n', s.get('answer2'));

Choices

Force the LLM to pick between multiple choices. It’s implemented by implemented by computing the token-length normalized log probabilities of all choices and selecting the one with the highest probability (full credit goes to the sglang folks for this).
import ProgramState from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

await s
    .add(s.system`You are an animal.`)
    .add(s.user`What are you?`)
    .add(
        s.assistant`I am a ${s.gen('answer', { choices: ['hippopotamus', 'giraffe'] })}`,
    );

console.log(s.get('answer'));

Forking

Sometimes, you can reach a state in your program which is parallelizable.
import ProgramState from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

s.add(s.system`You are a helpful assistant.`)
    .add(s.user`How can I stay healthy?`)
    .add(s.assistant`Here are two tips for staying healthy:
        1. Balanced Diet. 2. Regular Exercise.\n\n`);

const forks = s.fork(2);
await Promise.all(
    forks.map((f, i) =>
        f
            .add(
                f.user`Now, expand tip ${(i + 1).toString()} into a paragraph.`,
            )
            .add(
                f.assistant`${f.gen('detailed_tip', { sampling_params: { max_new_tokens: 256 } })}`,
            ),
    ),
);

await s
    .add(s.user`Please expand.`)
    .add(s.assistant`Tip 1: ${forks[0]?.get('detailed_tip') ?? ''}
        Tip 2: ${forks[1]?.get('detailed_tip') ?? ''}
        In summary, ${s.gen('summary')}`);

console.log(s.get('summary'));
In this case, the expanded tips for 1 and 2 don’t depend on each other, so we can generate both at the same time using the fork function. Then, we can join them back into the parent ProgramState and continue generating from there.

Streaming

Enochian of course also supports streaming, so you can use it for a chat application frontend.
import ProgramState from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

const gen = s
    .add(s.system`You are a helpful assistant.`)
    .add(s.user`Tell me a joke`)
    .add(s.assistant`${s.gen('answer1', { stream: true })}`);
for await (const chunk of gen) {
    console.log(chunk.content);
}
console.log(s.get('answer1'), s.getMetaInfo('answer1'));

Constrained decoding

For structured outputs, you can pass in either a regex, a JsonSchema, or a zod schema inside SamplingParams. Zod types are the preferred way since it allows for type safety in the .get.
import ProgramState from 'enochian-js'
import { z } from 'zod';

const schema = z.object({
    id: z.string().uuid(),
    name: z.string().min(2).max(50),
    age: z.number().int().min(0).max(120),
    email: z.string().email(),
    tags: z.array(z.string()).min(1).max(5),
    role: z.enum(['admin', 'user', 'guest']),
    settings: z
        .object({
            theme: z.enum(['light', 'dark']),
            notifications: z.boolean(),
        })
        .optional(),
    // You can't use z.date() since the LLM will generate a datestring, not a Date
    joinedAt: z.string().date(),
});

const s = await new ProgramState().fromSGL('http://localhost:30000');

await s
    .add(s.user`Describe a google employee's profile in json format`)
    .add(
        s.assistant`${s.gen('answer', { sampling_params: { zod_schema: schema } })}`,
    );
const profile = s.get('answer', { with: 'zod', schema });
console.log(profile);

Tool Calling

You can provide the LLM some tools that it can optionally use per generation. You should use the createTools util which will allow for greater type inference when retrieving the result.
import ProgramState, { OpenAIBackend, createTools } from 'enochian-js';
import { z } from 'zod';

const getWeatherParams = z.object({
    location: z.string()
});
function getWeather(args: z.infer<typeof getWeatherParams>) {
    return "It's cold and rainy";
}
const tools = createTools([{
    function: getWeather,
    name: 'getWeather',
    description: 'Gets the current weather at some location',
    params: getWeatherParams,
}]);

const s = await new ProgramState().fromSGL('http://localhost:30000');
await s
    .add(s.user`What's the weather like in Boston?`)
    .add(s.assistant`${s.gen('action', { tools )}`);

const action = s.get('action', { from: 'tools', tools });
console.log(action[0]);
It will return a ToolUseResponse typed object:
type ToolUseResponse = ({
    toolUsed: string;
    response: unknown;
} | {
    toolUsed: "respondToUser";
    response: string;
})[]
The LLM can either call one or multiple tools OR respond to the user but not both. The reason you want to use the createTools util is because the ToolUseResponse type cannot infer the literal value of toolUsed and the response type without it. With the createTools util, hovering over action in your IDE should show something like:
const action: ({
    toolUsed: "respondToUser";
    response: string;
} | {
    toolUsed: "getWeather";
    respond: string;
})[]
The LLM can either directly respond to the user or call the getWeather function and enochian is able to infer that. If you wanted to build a weather-getting agent, you could do a very simple “game loop” like this:
while (true) {
    await s.add(
        s.assistant`${gen('action', {
            tools,
        })}`,
    );
    const action = s.get('action', { from: 'tools', tools });
    if (!action) {
        console.error(`No tool was used: ${action}`);
        return;
    }
    const respondToUserAction = action.find(a => a.toolUsed === 'respondToUser');
    if (respondToUserAction) {
        console.log(respondToUserAction.response);
        return;
    } else {
        for (const toolUsed of action) {
            s.add(
                s.user`${toolUsed.toolUsed} Result: ${JSON.stringify(toolUsed.response)}`,
            );
        }
    }
}
That will repeatedly call the LLM, giving it the responses to its tool uses, until it decides it can finally directly respond to the user.

Max Prompt Tokens

A downside of gradually building up conversation history is that your context length can grow much larger than you’d like. In applications where time to first token is critical, like copilot, it’s critical to stay under a certain threshold of tokens you are submitting to the LLM at each time. The .gen function provides a transform callback that will take all the Messages added to the ProgramState previously, allow you to edit the array and return a new trimmed down array that .gen will use.
import ProgramState, { isUnderTokenThreshold, type Message } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');
s.add(s.system`You are a helpful assistant.`);

// Assume chatHistory is from a database query or something
for (const message in chatHistory) {
    if (message.role === 'user')
        s.add(s.user`${message.content}`);
    else
        s.add(s.assistant`${message.content}`);
}

await s
    .add(s.user`Tell me a joke about the conversation we just had.`);
    .add(s.assistant`${s.gen('joke', {
        transform: (messages) => {
            const newMessages: Message[] = [];
            // Keep the system prompt
            if (messages[0].role === 'system') newMessages.push(messages[0]);

            let i = messages.length - 1;
            // Keep the most recent messages
            while (i >= 1 && isUnderTokenThreshold(newMessages, 100)) {
                newMessages.push(messages[i]);
                i -= 1;
            }
            return newMessages;
        }
    })}`)
In this example we’re removing the oldest messages from the chat history until we’re under our maxPromptTokens threshold. Enochian provides a isUnderTokenThreshold(message: Message[], tokenThreshold: number, maxOutputTokens?: number) util function to make this check really easy. The tokenThreshold is for the number of tokens in the model’s context window you want to take up. You can optionally provide a maxOutputTokens to reserve some number of tokens in the context window for your output tokens (since the number of output tokens also counts towards your requests context length). If a maxOutputTokens is not passed in then zero tokens will be reserved for the output and the model will generate until it hits the max length or an end-of-sequence token. You can also provide metadata to .add and access that metadata inside the transform function to trim tokens out that way.
import ProgramState, { isUnderTokenThreshold, type Message } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');
s.add(s.system`An intelligent python programmer helping a friend program.`);

// Assume you have various pieces of context from your user's IDE
await s
    .add(s.user`Current file: ${currentFileContent}`, { type: 'fileContent' })
    .add(s.user`Code around cursor: ${codeAroundCursorContent}`, { type: 'codeAroundCursor' })
    .add(s.user`File metadata: ${fileMetadata}`, { type: 'fileMetadata' })
    .add(s.user`Help me finish off this piece of code: ${code}`, { type: 'userRequest' })
    .add(s.assistant`${s.gen('answer', {
        transform: (messages) => {
            // Higher number == higher priority
            const typesByPriority = {
                'userRequest': 4,
                'codeAroundCursor': 3,
                'fileContext': 2,
                'fileMetadata': 1,
            };
            const messagesByPriority = messages.sort(
                (m1, m2) => typesByPriority[m2.type] - typesByPriority[m1.type]
            );
            const newMessages: Message[];
            let i = 0;
            while (i < messagesByPriority.length && isUnderTokenThreshold(100)) {
                newMessages.push(messagesByPriority[i])
            }
            return newMessages;
        }
    })}`)
Enochian also provides transform presets for common patterns. You can do ranking by embedding similarity:
import ProgramState, { trimBySimilarity } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

// Assume you have various pieces of context from your user's IDE
await s
    .add(s.user`Current file: ${currentFileContent}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`Code around cursor: ${codeAroundCursorContent}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`File metadata: ${fileMetadata}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`Help me finish off this piece of code: ${userRequestCode}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.assistant`${s.gen('answer', {
        transform: (messages) => trimBySimilarity(messages, 100)
    })}`);
Or trim from middle to remove messages starting from the middle:
import ProgramState, { trimFromMiddle } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

// Assume chatHistory is from a database query or something
for (const message in chatHistory) {
    if (message.role === 'user')
        s.add(s.user`${message.content}`);
    else
        s.add(s.assistant`${message.content}`);
}

await s
    .add(s.user`Tell me a joke about the conversation we just had.`);
    .add(s.assistant`${s.gen('joke', {
        transform: (messages) => trimFromMiddle(messages, 100)
    })}`);
These presets can be combined however you like, since all they do is take in an array of Message and return the modified array. You can do trim from middle on previous chat history, and then do embedding similarity ranking on the current request:
import ProgramState, { trimFromMiddle, trimBySimilarity } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

// Assume chatHistory is from a database query or something
for (const message in chatHistory) {
    if (message.role === 'user')
        s.add(s.user`${message.content}`, { type: 'chatHistory' });
    else
        s.add(s.assistant`${message.content}`, { type: 'chatHistory' });
}

// Assume you have various pieces of context from your user's IDE
await s
    .add(s.user`Current file: ${currentFileContent}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`Code around cursor: ${codeAroundCursorContent}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`File metadata: ${fileMetadata}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.user`Help me finish off this piece of code: ${userRequestCode}`, { shouldBeRelevantTo: userRequestCode })
    .add(s.assistant`${s.gen('answer', {
        transform: (messages) => {
            // I want the chat history to use at most 50 tokens:
            const trimmedHistoryMessages = trimFromMiddle(
                messages.filter(m => m.type === 'chatHistory'), 50
            );
            // I want the current message to use the other 50 tokens:
            const currentRequestMessages = trimBySimilarity(
                messages.filter(m => m.prel !== undefined), 50
            );
            return [...trimmedHistoryMessages, currentRequestMessages]
        }
    })}`);

Prefix Caching

Prefix caching is a common way to reduce computation needed for LLM requests and is also very relevant to the max prompt token transform function. If a common prefix of a very frequently used prompt is cached (think long system prompts), you can skip the entire computation of that prefix, effectively removing it from your max prompt tokens restriction. For that reason, if you specify messages with the metadata { probablyPrefixCached: true }, then they will NOT be provided to the transform function. Use this wisely!
import ProgramState, { trimFromStart } from 'enochian-js';

const s = await new ProgramState().fromSGL('http://localhost:30000');

s.add(s.system`${longSystemPrompt}`, { probablyPrefixCached: true });

// Assume chatHistory is from a database query or something
for (const message in chatHistory) {
    if (message.role === 'user')
        s.add(s.user`${message.content}`);
    else
        s.add(s.assistant`${message.content}`);
}

await s
    .add(s.user`Tell me a joke about the conversation we just had.`);
    .add(s.assistant`${s.gen('joke', {
        // Will not remove the system prompt even if it's longer than 100 tokens
        transform: (messages) => trimFromStart(messages, 100)
    })}`);
You should probably always use fromPrefixCache: true for system prompts and any common prefixes in your prompts. Most providers automatically will do prefix caching for you as its also advantageous for them to reduce the amount of computation they have to do per request.
TODO: Put in relative priority example into api reference rawTransform