AI Tokens and the Role of Randomness
15 min read
When you type a prompt, AI breaks your words into tiny pieces called tokens. These tokens determine how much the AI can process at once and why responses sometimes vary. Understanding this helps you use AI more effectively.
How does AI read what you write?
When you type a question or request into an AI system, it breaks your text into smaller pieces called tokens. These are the building blocks the AI uses to understand language.
Some tokens are complete words. Others are parts of words or punctuation marks. When you write "Let's create a policy brief today," the AI sees separate pieces like "Let's," "create," "a," "policy," "brief," "today."
This approach balances efficiency with flexibility. If the AI processed one character at a time, it would need enormous computing power and work slowly. If it only understood complete words, it couldn't handle variations like plurals or new terminology.
Tokens let AI systems understand and generate natural responses across different contexts. This matters when you're working with specialised government terminology or creating documents that need consistent language.
What is a context window?
Think of a context window as the AI's working memory. It's the amount of conversation the system can see and remember at any given moment.
As you continue a conversation, new information comes into view. Older information gradually moves out of focus. The AI doesn't forget by choice. Earlier details simply move outside its frame of reference.
This matters most when you're working on complex tasks. If you're drafting a lengthy briefing document or explaining a multi-step process, the AI might lose track of details you mentioned early in the conversation.
Different AI models have different context window sizes. Some can handle 4,000 tokens (roughly 3,000 words). Others manage 128,000 tokens (roughly 96,000 words) or more. Even large models have limits. When you exceed this limit, the AI can no longer see earlier parts of the conversation. They fall outside the window.
Long conversations lose coherence as this happens. Extended discussions can become fragmented as early context disappears. Large documents consume space quickly too. Uploading a lengthy document uses substantial tokens, leaving less room for conversation. Complex tasks need careful management. Multi-step projects may require breaking work into separate conversations to maintain clarity.
How can you work within these limits?
Keep prompts focused. Clear, concise prompts use fewer tokens. This preserves space for responses and maintains context.
When you reference something from earlier in the conversation, briefly restate the key point. This helps keep critical information in view. Don't assume the AI still sees details from the beginning of a long conversation.
Don't try to fit everything into one conversation. Begin a new session when shifting topics. If you're creating a long document, work on it in parts rather than all at once. This keeps each conversation focused and within the context window.
If you need to continue a lengthy conversation, ask the AI to summarise key points. Use that summary to start a new conversation with preserved context. This gives you a fresh context window while maintaining continuity.
Only include documents that are essential for the current task. Every uploaded document consumes tokens. Be strategic about what you add to the conversation.
How do memory features work?
Some AI systems now offer memory features that work differently from context windows. ChatGPT has had memory since April 2024, Gemini added it in February 2025, and Claude added it in September 2025.
These memory features store information between conversations. The AI can search past chats to recall your preferences, project details, or previous discussions. This is separate from the context window. The context window is what the AI sees in the current conversation. Memory is what the AI can search from previous conversations.
Claude's memory starts each conversation with a blank slate, then searches past conversations when needed. ChatGPT automatically loads memory at the start of every conversation. Both approaches have the same goal - preserving context across sessions.
Memory features don't replace context windows. Both work together. Within a conversation, the context window determines what the AI can see. Between conversations, memory features let it recall information from past sessions.
Neither creates true understanding or persistent beliefs. They're mechanisms for accessing text from current or past interactions. The AI still processes everything as patterns, not knowledge.
Why do AI responses vary?
Ask an AI the same question twice. You'll likely get different answers each time. This isn't a flaw. It's by design.
Every time an AI responds, it makes choices about which words to use. These choices are based on patterns learned during training. The system introduces some randomness into these choices. This is intentional.
Without randomness, AI responses would feel mechanical and repetitive. Every answer to the same question would be identical. With too much randomness, responses might feel unfocused or inconsistent. This randomness is governed by a parameter often called "temperature."
How does temperature affect responses?
This randomness is governed by a parameter often called "temperature." Temperature controls how predictable or creative the AI's responses are.
At lower temperatures, the AI becomes more predictable. It tends to choose the most statistically likely tokens. Responses are more consistent and focused. This works well for factual questions, technical explanations, or tasks requiring precision. When you need reliable, consistent outputs, lower temperature helps.
At higher temperatures, the AI explores less probable options. This creates more creative and diverse responses. This works well for brainstorming, creative writing, or generating multiple alternatives. When you want varied perspectives or creative solutions, higher temperature encourages this.
Most AI systems use a moderate temperature by default. This balance creates responses that feel natural and engaging while remaining reliable and accurate. You typically don't need to adjust this unless you have specific requirements for consistency or creativity.
What is AI's non-determinism?
Non-determinism means the AI doesn't produce identical outputs for identical inputs. Each time you ask a question, the AI generates a fresh response. It's not retrieving a stored answer. It's creating new text based on probabilities.
This means consistency requires effort. If you need consistent responses across multiple interactions, you may need to explicitly request this in your prompts. You can't assume the AI will phrase things the same way twice.
Verification remains essential. You cannot assume that because the AI gave you an answer once, it will give you the same answer again. Different phrasings might emphasise different points or omit different details.
But non-determinism also brings benefits. It allows for exploring different perspectives and generating diverse ideas. When you're brainstorming or looking for alternatives, the variation becomes useful rather than problematic.
What does this mean for you?
These technical details shape every interaction you have with AI.
Tokens determine how much information the AI can process. Context windows set boundaries on conversation length. Randomness creates variation in responses.
None of these are limitations to work around. They're characteristics to understand and work with effectively. When you know how AI processes language, you can structure your work to get better results. Keep prompts focused. Manage long conversations strategically. Expect variation in responses.
These aren't obstacles. They're simply how the technology works.