Never Hit Your Claude Limit Again
The habits power users swear by to stretch their plans
Most Claude users assume they are hitting limits because they ask Claude too many questions. The real cause is typically due to three leaks that drain your allowance.
🔁 Conversation rereads. Claude reloads every previous message on every new reply.
📁 File reloads. Cowork reads every file in your folder before each task.
⚙️ Background features. Web search, connectors, and extended thinking add tokens to every reply, even when unused.
The 30 habits below close all three.
Pick a few. Skip the rest until those become muscle memory.
Conversation Management
1. Start a new chat every time the topic changes
LinkedIn post, then a client proposal, then a dinner recipe, all inside one chat? Claude is rereading the LinkedIn post and the proposal every time it thinks about your dinner.
The rule: new topic = new chat. No exceptions.
That older context contributes nothing to your current question. You keep paying for it anyway. The friction of clicking “New chat” is far smaller than the cost of dragging dead context behind you for the rest of the session.
2. Summarize and start fresh every 15 to 20 messages
The 98.5% figure from the intro came from a developer who actually tracked his sessions. Most paid users never measure their ratio. The numbers are usually worse than they assume.
Around the 15 to 20 message mark, run this prompt:
Summarize this conversation so I can continue it in a fresh session. Include: every decision we made, every constraint I gave you, the current state of the work, and what the next step is. Keep it under 250 words.
Copy the summary. Open a new chat. Paste it as message one. You keep the substance and shed the dead weight.
3. Edit your original message instead of sending a follow-up
When Claude misunderstands you, the instinct is to send “no, I meant something different.” That follow-up gets added to the conversation history forever. Claude rereads it on every future reply.
In Chat, the fix takes three clicks:
Hover over your original message
Click the pencil/edit icon
Fix the wording and regenerate
The old exchange gets replaced rather than stacked. Nothing accumulates. Bonus: this is also the cleanest way to back out when Claude takes a session in the wrong direction.
4. Restart from an earlier message instead of stacking corrections
A 30-message Cowork session can burn over 200,000 tokens. Most of that is rereads, not new output. Sending more correction messages only makes the rereads heavier.
The fix depends on the surface:
In Chat: the edit button replaces the wrong message
In Cowork: prompts cannot be edited inline. Use “Restart conversation from here” on an earlier message instead
The further up you restart, the more tokens you reclaim. If the whole session is unsalvageable, just open a fresh one and paste a one-line summary of what you actually need. Clean slate, almost zero token cost.
Prompting Habits
5. Use short prompts and let Claude ask the questions
A 500-word prompt costs 500 tokens every single time Claude reloads the conversation. That happens on every reply.
Better: write a short prompt that names the task, then let Claude pull context from you with the AskUserQuestion tool. Clicks cost almost nothing. Walls of typed context cost a lot.
The 30-word version that works for almost any Cowork task:
I want to [task] to [success criteria]. Read my folder. Ask me questions using the AskUserQuestion tool before you start.
6. Dictate prompts instead of typing them
Typed prompts tend to be lazy. People write things like “make it better” or “change the tone,” which forces Claude to guess and forces you to send three more correction messages when it guesses wrong. Each correction reloads the full conversation history.
Use a voice-to-text tool like Wispr Flow and dictate instead. Speaking naturally produces fuller context in a single shot.
Typed (vague):
Make it more casual.
Dictated (specific):
The tone feels stiff. I want it casual, like I am texting a friend who runs a 200-person company. Keep the data, but only redo section two.
Fewer back-and-forth messages = fewer context reloads = fewer tokens spent.
7. Bundle multiple tasks into a single prompt
Every separate prompt forces Claude to reread the entire conversation. Three quick questions sent one after another means three full context reloads.
Three prompts (three reloads):
“Summarize this article.”
“List the main points.”
“Suggest a headline.”
One prompt (one reload):
Summarize this article in three sentences, list the five most important points as bullets, and suggest two possible headlines.
You pay for one reload instead of three. The output is often more coherent because Claude sees the whole picture in one pass.
8. Specify your format upfront instead of reformatting later
Common pattern that doubles your output costs: Claude writes a full response, you decide you want bullets instead, you ask Claude to “redo that as a list.” Now Claude regenerates the entire response, paying the full output cost twice for content that was already correct.
Front-load the format. Tell Claude exactly how you want it the first time:
Respond as a bulleted list, no preamble.
Give me three short paragraphs, no headers.
Format as a table with columns for “name,” “deadline,” and “owner.”
Keep it under 200 words total.
Five extra seconds to specify saves an entire regeneration cycle. Same logic applies to length, tone, and section ordering.
Settings and Features
9. Turn off features you are not using for the current task
Web search, connectors, extended thinking, and Explore mode all add tokens to every reply, even when they contribute nothing to the answer.
Default to everything off. Turn features on per task, not per account.
When you do use connectors, narrow the query as tight as possible:
✅ Search Slack from the last seven days for messages about the Q2 launch.
❌ Search Slack for anything about launches.
The first pulls a small relevant slice. The second pulls a flood of tokens you did not need.
10. Set User Preferences and Styles to skip session setup
Without saved context, every new chat burns three to five messages on setup. “I am a marketer, I write casually, I prefer short paragraphs...” Hundreds of tokens before any real work happens, repeated across every conversation.
Two settings to configure once:
Settings → Personal Preferences. Write your role, tone, and audience in 100 words or less.Model selector → Styles. Pick “Concise” or build a custom style.
Both persist across chats without consuming session context. One configuration. Permanent savings.
11. Let Memory remember who you are so you stop repeating yourself
Memory rolled out to all users including the free tier on March 2, 2026. Claude scans your chat history and quietly builds a profile of your role, preferences, and recurring topics, updating about every 24 hours. You can also tell Claude “remember that I prefer X” mid-conversation to update it in real time.
Without Memory on, you type “I am a marketer who writes casually for SaaS founders” into every new chat. With Memory on, that context is already loaded.
Manage entries at: Settings → Capabilities → Memory
⚠️ One note for Cowork users: each Cowork project keeps its own separate memory, so context does not carry between Cowork projects. Use a notes file in your folder for that surface instead.
12. Use Incognito mode for sensitive one-off prompts
Incognito chats are free for everyone. Click the ghost icon in the upper right corner of a new chat. Nothing in that chat saves to your history, feeds Memory, or trains the model.
Use Incognito for:
Sensitive medical or legal questions
Confidential work matters
One-off tasks unrelated to your usual work
Anything you do not want shaping your long-term Memory profile
Bonus benefit: it prevents Memory pollution where one off-topic chat starts skewing how Claude treats you in future conversations. Note that Incognito is not available inside Projects.
13. Do not paste URLs when web search is enabled
This catches everyone by surprise. When web search is on and you paste a URL, Claude does not glance at the page. It retrieves the entire article. Every paragraph, image caption, sidebar, and footer, all loaded into context.
A single news article can quietly add 5,000 to 10,000 tokens before Claude has even started answering.
Monitoring and Plan Management
14. Watch your real numbers in Settings → Usage
Most people guess at how much of their allowance is left. Pro, Max, Team, and seat-based Enterprise users can see exactly where they stand by opening Settings → Usage.
You will see real-time progress bars for:
Your current 5-hour session
Your weekly limits
Opus consumption (separate bar)
Build a habit of checking before starting a heavy task. If you are at 70% of your weekly Opus limit on a Tuesday, that is the signal to switch to Sonnet for the rest of the week. The feedback loop turns invisible spending into something you can manage, like watching your bank balance instead of guessing.
15. Spread sessions across the day to use the full rolling window
Claude limits operate on a rolling 5-hour window. If you burn through your entire allowance in one morning marathon, your remaining capacity for the day mostly goes unused while you wait for the window to roll off.
Better pattern: split work into 2 to 3 smaller sessions across morning, afternoon, and evening. By the time you return, your earlier usage has partially aged out and you have more headroom available.
This matters most on lower-priced plans where the limits are tighter.
16. Shift heavy work to off-peak hours
In late March 2026, Anthropic made the policy official: the same conversation eats more of your 5-hour window during weekday peak hours than it does off-peak. Around 7% of users now hit session walls they would not have hit before.
Peak hours to avoid (weekdays):
Time zone Peak window Pacific (PT) 5 a.m. to 11 a.m. Eastern (ET) 8 a.m. to 2 p.m. GMT 1 p.m. to 7 p.m.
Save token-heavy work (long Cowork sessions, document analysis, research) for evenings, early mornings, and weekends. Use the morning rush for lighter Chat work and quick lookups. Queue up the deep work for after dinner.
Your weekly totals stay the same. Just how those totals get distributed across the day changed.
17. Buy Extra Usage instead of impulse-upgrading your plan
When you hit your limit, the obvious move is upgrading from Pro to Max. Often, that is the wrong financial decision.
Every paid plan can enable Extra Usage, which lets you keep working past your included allowance at standard pay-as-you-go API rates. Cheaper than a tier upgrade if you only hit limits occasionally.
To set it up, open Settings → Billing → Extra Usage, add a payment method, and you stop hitting hard walls.
Before jumping to Max: track your Pro usage for two to three weeks first. If you are not consistently hitting Pro limits at least weekly, Extra Usage is almost always cheaper than the upgrade.
⚠️ If you subscribed through the mobile app, you have to enable Extra Usage on the web version specifically.
Cowork-Specific Habits
18. Trim your “About Me” file to under 2,000 words
Cowork loads your context files at the start of every task. A 20,000-word personal background file gets reloaded on every single session, before any real work begins.
Two-step fix:
First, trim the file ruthlessly. Under 2,000 words.
Second, end each working session by asking Claude to write a short notes file:
Write a session-notes.md file with the key decisions we made today and the next two steps. Save it to the folder.
At the start of the next session, tell Claude to read that notes file first. You carry context forward without re-explaining anything from scratch.
19. Only include the files Cowork actually needs for the task
Every file in your Cowork folder is read before each task. Dumping 50 files into a project so Claude has “everything it might need” means paying to read all 50 files on every prompt.
There’s a second cost too: when folders get too heavy, Cowork starts skimming files loosely instead of reading them carefully. You pay more, get worse output.
The rule: if Claude does not need a file for this specific task, take it out.
For quick work that needs no files at all (like drafting an email through a connector), start the session with zero folders selected. Zero file context = zero file tokens, before you have even typed your first word.
20. Set Global Instructions in Cowork once and forget them
Cowork has a feature most users never find: Settings → Cowork → Edit Global Instructions. Anything you write there becomes a permanent rule Claude follows on every Cowork task, across every folder, forever.
Put recurring stuff there:
Your name and role
Preferred file naming conventions
Standard output style
Languages and tools you use
Default formats for deliverables
One thoughtful 200-word setup eliminates hundreds of repeated tokens across every future Cowork session.
For project-specific rules that should not apply globally, use folder-level instructions instead. They override Global Instructions only inside that folder.
21. Tell Cowork to “codify this” so it saves knowledge automatically
Cowork can write to its own memory files inside your working folder. It maintains files called claude.md and memory.md that persist across sessions.
The magic phrase: “remember this” or “codify this principle.”
Use it after Cowork does something well, or right after you correct a mistake:
Codify this principle in your memory file so you do not repeat the mistake.
Next session, that lesson is already loaded. Over a few weeks of using this habit, your Cowork project develops a customized rulebook. No manual file editing on your part.
Output Control
22. Edit specific sections instead of redoing the whole output
When section three of a report is wrong, asking Claude to redo the whole report regenerates every word of output, including the parts that were already fine. On a 2,000-token report, that’s 2,000 output tokens spent again to fix something small.
Be surgical:
Only redo section three. Keep every other section exactly as written. No commentary, no explanations, just the corrected section.
That last sentence saves more than people expect. Claude defaults to verbose preambles (”Happy to help! Here is the updated section...”). Every word of friendly throat-clearing is tokens you are paying for.
23. Highlight text in Artifacts before requesting changes
A useful upgrade landed in late 2025 and is now standard. When you ask Claude to change an Artifact, the system picks between three modes:
Create. Builds a new artifact from scratch.
Update. Does targeted inline edits using string replacement (cheapest).
Rewrite. Regenerates the whole artifact (most expensive).
To trigger update mode: select the specific sentence or paragraph you want changed before sending your request. Claude reads the selection as scope and patches just that part. Saying “change only the second paragraph” works similarly.
Avoid: vague prompts like “make this better” or “polish this.” Those almost always trigger a full rewrite and burn the entire artifact’s token cost again.
Choosing the Right Tool
24. Match the Claude product and model to the task
Chat is the lightest weight surface. Cowork is the heaviest. Code sits in between. Each one has different per-interaction costs, and using the wrong one for a task is the equivalent of driving a freight truck to pick up groceries.
Quick match guide:
Task Tool Model Quick factual question Chat Haiku or Sonnet Report built from your files Cowork Opus Chart from a CSV Code Sonnet Brainstorming or planning Chat Sonnet File creation (docx, pptx, xlsx) Cowork Opus Quick code snippet Chat Sonnet
Two seconds of thought before you open a session can cut its cost in half.
25. Use Sonnet or Haiku for simple tasks and save Opus for deep work
Opus with extended thinking is heavy machinery. Using it for grammar checks, quick reformats, or short brainstorms is the equivalent of renting a crane to hang a picture frame. Sonnet and Haiku handle that work at a fraction of the cost.
The 30-second rule: if Claude can answer a task in under 30 seconds, it probably does not need Opus.
Tasks that almost never need Opus:
Grammar and spelling checks
Reformatting existing text
Quick factual lookups
Simple summaries
Casual brainstorming
One-line code fixes
Switch models in the dropdown before you start. Sometimes hidden under “More models” if you do not see the option immediately.
26. Plan in Chat, then build in Cowork
File creation in Cowork uses significantly more of your usage allowance than ordinary Chat messages. If you walk into Cowork and start brainstorming structure, debating assumptions, and revising outlines, you are doing cheap work in an expensive environment.
The workflow that saves the most tokens:
Open Chat first. Plan the structure, lock in assumptions, agree on what each section should contain.
Switch to Cowork only once you know exactly what you want built.
Instruct Cowork to construct the finalized plan.
Think in the cheap product. Build in the expensive one.
27. Use other AI tools for tasks Claude is weak at
Claude does not generate images. If you spend five messages trying to coax a visual out of it and end up with text-based workarounds, that is five messages of tokens spent on a task that was never going to succeed.
When to reach for another tool:
Image generation → Gemini or ChatGPT
Real-time search and breaking news → Grok or ChatGPT
Video generation → Sora or Runway
Pick the right tool for each job rather than burning your Claude allowance on tasks better suited elsewhere. Saving tokens sometimes means not using Claude at all.
Other Quick Wins
28. Convert files to plain text before uploading
A single PDF page costs 1,500 to 3,000 tokens. Screenshots run higher. A document you upload to four different chats burns through that cost four separate times, even though the content never changed.
Open the file. Copy the text you actually need. Paste it into a plain text or markdown document. Upload that instead. For images, crop tight to only the portion that matters before attaching.
Quick conversion workflow:
doc.newin your URL bar (creates a fresh Google Doc)Paste the relevant text
File → Download → Markdown (.md)
Upload the .md file to Claude
A 15-page PDF that would cost tens of thousands of tokens shrinks to a clean text file under 2,000 tokens. Five minutes of prep work.
29. Use Projects to cache files you reference often
Uploading the same PDF to five different chats means Claude tokenizes that document five separate times. The file did not change. Your usage budget did.
Use Projects for any file you reference repeatedly. Upload once into the Project. Every conversation inside that Project can reference it without re-tokenizing.
Bonus: on paid plans, Projects use retrieval. Claude pulls only the relevant chunks rather than loading the whole document into context. For contracts, brand guides, research papers, or anything you consult often, this single change can cut your usage significantly.
30. Automate recurring work with scheduled tasks
If you run the same weekly report, daily digest, or monthly research summary, doing it manually inside a growing Cowork session is the most expensive way to handle it.
Use the /schedule plugin. Set the task once with the parameters and cadence, then let it run on its own.
A clean prompt for a weekly setup:
Every Monday at 8 a.m., pull the last seven days of Slack messages from #q2-launch, summarize the top three blockers, and post the result to my Inbox.
You stop paying the setup cost every week. You also stop letting those sessions balloon into 40-message threads that drain your allowance.
The 5 habits that pay off fastest
You will not adopt all 30 at once. Don’t try.
The 30 above are a menu, not a checklist. Most paid users who break the cycle of weekly limit-hitting do it by changing three to five behaviors that compound across every session.
Most users who think they need to upgrade to Max actually need three habits and a Tuesday afternoon.
These five give you the biggest cut for the least effort:
⚡ Habit 1: New chat per topic. No setup. No toggle. Just a habit. Stops you from paying to reread unrelated context for the rest of the day.
🎛️ Habit 9: Turn off features you are not using. One default setting flip. Stops web search, connectors, and extended thinking from quietly adding tokens to every reply.
🧠 Habit 11: Turn on Memory. One toggle. Eliminates the “I am a marketer who writes casually for SaaS founders” setup from every new chat, forever.
🔗 Habit 13: No URL pastes when web search is on. A single article can quietly add 10,000 tokens to a conversation. Copy the section you need instead.
⚖️ Habit 25: Sonnet for simple, Opus for deep. Two clicks before you open a session. Stops you from renting a crane to hang a picture frame.
Apply those five this week. By Friday, you will already be paying for fewer rereads than the version of you that started reading this.



