Great post! o1/pro is the first model i've used that can do high level software architecture well:
- As you noted, give it all the context -- all relevant code files + existing design docs (RepoPrompt is great for this).
- Ramble about the problem into speech2text for a while
- At the end tell it to present multiple alternatives + reasons to use/not use
Breakthrough capability is lack of sycophancy -- it's the first model I've used where I disagree with it, and then it will hold its ground and convince me that it is right.
Another tip is to have it break up the implementation into discrete steps, outputting all context for each one. Then paste into cursor composer one at a time for the actual coding.
Someone else was saying that after each step, they go back to o1 and have it review the code that cursor wrote. Still need to try that one out!
Dude, RepoPrompt is 🔥. IMO, RP needs to be incorporated directly into Chatgpt.
"Someone else was saying that after each step, they go back to o1 and have it review the code that Cursor wrote. Still need to try that one out!" I need to try this too, but so far everything has been functional ☺️
I def need to try making cursor code it out in stages/sections from a fully fleshed out start... I've been patching shit with bandaids, and finally getting the system down:
- sharpen the axe for 50 min... let cursor chop the tree for 10 min.
How much disconnected rambling can o1 handle? Can I just speak stream of thought about all the discussions and back and forth ideas that happened for a product feature and dump them? I'm clear on the final output that I want. I'm just thinking of o1 can handle all this extra discussion context.
I'm not sure about o1, but o1-Pro has HUGE context: like ~150K!
I brain dump as a stream-of-conscious in voice mode, refine with typed prompts in the same chat. Then take that as "context" to use in the template featured in this post.
I was wondering if you could spare a moment and help me out. I am looking for a portable (or not) local LLM / SLM installation so I can build a LLM (or SLM - small language model) AI assistant (agent) on my laptop. I'll then feed it with various ebooks/data/articles and see how it can speed up (or improve) my learning?
Can you please suggest a solution if you know of any? I appreciate it very much!
I'm searching for a solution and right now reading through the below:
for local inference, many folks now use ollama. but you seem to also want a RAG UI - in which case the options you list probably work. personally, i dont care about local, and use Claude Projects or NotebookLM.
Thank you. This is the one advice I was in need to also flip from the "nah" to "wow, can't wait for o3". Glad they featured you in the Neuron, I can see this you going viral. 🔥
Tl;dr of my following remark: Do not use advanced voice mode for o1
Forming the detailed inputs is likely best done via voice input, as described in the article. Be aware that using the inbuilt advanced voice mode of ChatGPT is not recommended as there is no possibility to correct wrongly transcribed passages or add details you did not think of initially as the AI starts to reply to you if a break is detected. I would fully recommend using the open-source model Whisper on your desktop or the keyboard voice to text capability as provided by many virtual phone keyboards (be aware of the processing taking part in the cloud for the latter). With that, you can edit your request and make it complete, correct errors and then send it to o1 to think about it and provide a useful answer.
Hi Ben, that was nice to read. Thanks for sharing.
I think the problem with unavailable streaming is not because of technology. Because of what you described that o1 has these advisory capabilities as you put it, they must be writing (in memory) and rewriting the answer constantly, distancing from the paradigm of the next token discovery. I think that this is what could make streaming impossible. Perhaps what could be a good streaming strategy for this kind of models would be to "act like a human" streaming their thought process during a conversation. Have you seen Sam Altman himself in interviews, how he thinks before answering and start answering in broader terms and begin narrowing down the answer while he is thinking? Perhaps this could be reproduced in some capacity in the streaming process of such models....
Precision, you can attach up to 4 images. If you have a pdf, use a scroll down printscreen tool. I got to push 20 pages of unselectedable content this way and worked. 👌🏻
If I have to guess, it's to make it harder to prompt inject into jailbreaking it, seen a better reasoning model would be more dangerous if handled by unethical people. I hope this is the reason, because the other most likely option is that it has multi-modal limitations because OpenAI were on a timer to release something and maybe multimodal habilities dropped the benchmarks significantly.
Gemini is multimodal from the ground-up, but OpenAI models where trained on multi-modality.
In any case, this is me throwing guesses out there, there's nothing to back this up tbh.
o1-Pro doesn't have attachment options, but it does have a HUGE token limit in your prompt.
So, write the first parts of this prompt format, and then copy/paste your "Context/Attachment-Data" from another chat/doc at the end of your prompt, and then submit.
Great write-up. I started switching back and forth between 4o and o1 in my ChatGPT Plus account to get a feel for the differences. I love the output when I give it a larger initial prompt. Thanks for the post!
By telling o1 what you want it gives you what you asked for. Society loves a good Echo chamber.
o1 congratulated me on Kristen, one of the most innovative solutions in combating climate change by eliminating the need for refrigerators in residential homes.
How was I going to do this by proposing a solution of utilizing subterranean refrigeration techniques?
There's plenty of subterranean real estate available below residential homes and by creating a system of tunnels we will not only be able to grow food but also store it and deliver it directly to people's basements.
In my one day of testing via the API I have found this view to be accurate. Which for my use case is great - I use the gpt4o for a conversation to gather information then the o1 to generate the desired result, all in a UX that allowed for that step to be long running and async. This saved me a few days of implementing a multi-step reasoner independently.
Regarding tonality of the output - my approach will be to transform the dry output to the desired tonality and lingo that is appropriate. This is all non-coding but does involve a mix of narrative text and structured data.
One of those pieces that comes at a time when everyone has been using something in a certain way, but on their own, and then a well time blog cements it into community fact.
Using o1 is best for deep, singular tasks with a lot of context.
Claude and others are better at quickly generating code. O1 is best at analyzing it and fixing a small bug.
Great post! o1/pro is the first model i've used that can do high level software architecture well:
- As you noted, give it all the context -- all relevant code files + existing design docs (RepoPrompt is great for this).
- Ramble about the problem into speech2text for a while
- At the end tell it to present multiple alternatives + reasons to use/not use
Breakthrough capability is lack of sycophancy -- it's the first model I've used where I disagree with it, and then it will hold its ground and convince me that it is right.
Another tip is to have it break up the implementation into discrete steps, outputting all context for each one. Then paste into cursor composer one at a time for the actual coding.
Someone else was saying that after each step, they go back to o1 and have it review the code that cursor wrote. Still need to try that one out!
great responses, thanks for sharing
Dude, RepoPrompt is 🔥. IMO, RP needs to be incorporated directly into Chatgpt.
"Someone else was saying that after each step, they go back to o1 and have it review the code that Cursor wrote. Still need to try that one out!" I need to try this too, but so far everything has been functional ☺️
I def need to try making cursor code it out in stages/sections from a fully fleshed out start... I've been patching shit with bandaids, and finally getting the system down:
- sharpen the axe for 50 min... let cursor chop the tree for 10 min.
How does Gemini deep research compare?
Hmm you mention adding pdfs to the prompt, I don’t have this feature in GPT Plus (and not only me: https://www.reddit.com/r/OpenAI/comments/1hwli30/pdf_file_uploads_on_o1/). Is that a Pro feature?
i also cannot upload pdfs/excels to o1pro, only graphic files (photos)
How much disconnected rambling can o1 handle? Can I just speak stream of thought about all the discussions and back and forth ideas that happened for a product feature and dump them? I'm clear on the final output that I want. I'm just thinking of o1 can handle all this extra discussion context.
lots! i ramble all the time and just dump the transcript
o1 only accepts ~32k tokens right? So maybe dumping a bunch of thoughts might be better in a separate chat?
Source: llm-stats.com
I'm not sure about o1, but o1-Pro has HUGE context: like ~150K!
I brain dump as a stream-of-conscious in voice mode, refine with typed prompts in the same chat. Then take that as "context" to use in the template featured in this post.
Hi there.
Thanks for this article. :)
I was wondering if you could spare a moment and help me out. I am looking for a portable (or not) local LLM / SLM installation so I can build a LLM (or SLM - small language model) AI assistant (agent) on my laptop. I'll then feed it with various ebooks/data/articles and see how it can speed up (or improve) my learning?
Can you please suggest a solution if you know of any? I appreciate it very much!
I'm searching for a solution and right now reading through the below:
https://docs.anythingllm.com/installation-desktop/overview
https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
https://semaphoreci.com/blog/local-llm
for local inference, many folks now use ollama. but you seem to also want a RAG UI - in which case the options you list probably work. personally, i dont care about local, and use Claude Projects or NotebookLM.
Thank you.
Thank you. This is the one advice I was in need to also flip from the "nah" to "wow, can't wait for o3". Glad they featured you in the Neuron, I can see this you going viral. 🔥
Excellent article, sharing nice insights!
Tl;dr of my following remark: Do not use advanced voice mode for o1
Forming the detailed inputs is likely best done via voice input, as described in the article. Be aware that using the inbuilt advanced voice mode of ChatGPT is not recommended as there is no possibility to correct wrongly transcribed passages or add details you did not think of initially as the AI starts to reply to you if a break is detected. I would fully recommend using the open-source model Whisper on your desktop or the keyboard voice to text capability as provided by many virtual phone keyboards (be aware of the processing taking part in the cloud for the latter). With that, you can edit your request and make it complete, correct errors and then send it to o1 to think about it and provide a useful answer.
Hi Ben, that was nice to read. Thanks for sharing.
I think the problem with unavailable streaming is not because of technology. Because of what you described that o1 has these advisory capabilities as you put it, they must be writing (in memory) and rewriting the answer constantly, distancing from the paradigm of the next token discovery. I think that this is what could make streaming impossible. Perhaps what could be a good streaming strategy for this kind of models would be to "act like a human" streaming their thought process during a conversation. Have you seen Sam Altman himself in interviews, how he thinks before answering and start answering in broader terms and begin narrowing down the answer while he is thinking? Perhaps this could be reproduced in some capacity in the streaming process of such models....
Probably I am missing something very obvious but how do you attach files to o1-pro on ChatGPT?
Precision, you can attach up to 4 images. If you have a pdf, use a scroll down printscreen tool. I got to push 20 pages of unselectedable content this way and worked. 👌🏻
Thanks - no idea why we cannot attach files directly
If I have to guess, it's to make it harder to prompt inject into jailbreaking it, seen a better reasoning model would be more dangerous if handled by unethical people. I hope this is the reason, because the other most likely option is that it has multi-modal limitations because OpenAI were on a timer to release something and maybe multimodal habilities dropped the benchmarks significantly.
Gemini is multimodal from the ground-up, but OpenAI models where trained on multi-modality.
In any case, this is me throwing guesses out there, there's nothing to back this up tbh.
Can't attach in o1-p.
o1-Pro doesn't have attachment options, but it does have a HUGE token limit in your prompt.
So, write the first parts of this prompt format, and then copy/paste your "Context/Attachment-Data" from another chat/doc at the end of your prompt, and then submit.
TLDR:
Just dump everything in the chat
Thanks!!!! :)
Great write-up. I started switching back and forth between 4o and o1 in my ChatGPT Plus account to get a feel for the differences. I love the output when I give it a larger initial prompt. Thanks for the post!
By telling o1 what you want it gives you what you asked for. Society loves a good Echo chamber.
o1 congratulated me on Kristen, one of the most innovative solutions in combating climate change by eliminating the need for refrigerators in residential homes.
How was I going to do this by proposing a solution of utilizing subterranean refrigeration techniques?
There's plenty of subterranean real estate available below residential homes and by creating a system of tunnels we will not only be able to grow food but also store it and deliver it directly to people's basements.
It then offered to build me a business plan.
It gives you what you want.
I see that as successful automation. Using the chat to work through things then collect my refined thoughts into an ask for a desired output.
The analogy to a junior hire is perfect. If I do the work to give good instructions then it saves time. Give bad instructions and waste time.
In my one day of testing via the API I have found this view to be accurate. Which for my use case is great - I use the gpt4o for a conversation to gather information then the o1 to generate the desired result, all in a UX that allowed for that step to be long running and async. This saved me a few days of implementing a multi-step reasoner independently.
Regarding tonality of the output - my approach will be to transform the dry output to the desired tonality and lingo that is appropriate. This is all non-coding but does involve a mix of narrative text and structured data.
One of those pieces that comes at a time when everyone has been using something in a certain way, but on their own, and then a well time blog cements it into community fact.
Using o1 is best for deep, singular tasks with a lot of context.
Claude and others are better at quickly generating code. O1 is best at analyzing it and fixing a small bug.
what is GA? such unnecessary acronym usage
General Availability
love it! thanks