30 Comments

Great post! o1/pro is the first model i've used that can do high level software architecture well:

- As you noted, give it all the context -- all relevant code files + existing design docs (RepoPrompt is great for this).

- Ramble about the problem into speech2text for a while

- At the end tell it to present multiple alternatives + reasons to use/not use

Breakthrough capability is lack of sycophancy -- it's the first model I've used where I disagree with it, and then it will hold its ground and convince me that it is right.

Another tip is to have it break up the implementation into discrete steps, outputting all context for each one. Then paste into cursor composer one at a time for the actual coding.

Someone else was saying that after each step, they go back to o1 and have it review the code that cursor wrote. Still need to try that one out!

Expand full comment

great responses, thanks for sharing

Expand full comment

Dude, RepoPrompt is 🔥. IMO, RP needs to be incorporated directly into Chatgpt.

"Someone else was saying that after each step, they go back to o1 and have it review the code that Cursor wrote. Still need to try that one out!" I need to try this too, but so far everything has been functional ☺️

I def need to try making cursor code it out in stages/sections from a fully fleshed out start... I've been patching shit with bandaids, and finally getting the system down:

- sharpen the axe for 50 min... let cursor chop the tree for 10 min.

Expand full comment

How does Gemini deep research compare?

Expand full comment

Hmm you mention adding pdfs to the prompt, I don’t have this feature in GPT Plus (and not only me: https://www.reddit.com/r/OpenAI/comments/1hwli30/pdf_file_uploads_on_o1/). Is that a Pro feature?

Expand full comment

i also cannot upload pdfs/excels to o1pro, only graphic files (photos)

Expand full comment

How much disconnected rambling can o1 handle? Can I just speak stream of thought about all the discussions and back and forth ideas that happened for a product feature and dump them? I'm clear on the final output that I want. I'm just thinking of o1 can handle all this extra discussion context.

Expand full comment

lots! i ramble all the time and just dump the transcript

Expand full comment

o1 only accepts ~32k tokens right? So maybe dumping a bunch of thoughts might be better in a separate chat?

Source: llm-stats.com

Expand full comment

I'm not sure about o1, but o1-Pro has HUGE context: like ~150K!

I brain dump as a stream-of-conscious in voice mode, refine with typed prompts in the same chat. Then take that as "context" to use in the template featured in this post.

Expand full comment

Hi there.

Thanks for this article. :)

I was wondering if you could spare a moment and help me out. I am looking for a portable (or not) local LLM / SLM installation so I can build a LLM (or SLM - small language model) AI assistant (agent) on my laptop. I'll then feed it with various ebooks/data/articles and see how it can speed up (or improve) my learning?

Can you please suggest a solution if you know of any? I appreciate it very much!

I'm searching for a solution and right now reading through the below:

https://docs.anythingllm.com/installation-desktop/overview

https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f

https://semaphoreci.com/blog/local-llm

Expand full comment

for local inference, many folks now use ollama. but you seem to also want a RAG UI - in which case the options you list probably work. personally, i dont care about local, and use Claude Projects or NotebookLM.

Expand full comment

Thank you.

Expand full comment

Thank you. This is the one advice I was in need to also flip from the "nah" to "wow, can't wait for o3". Glad they featured you in the Neuron, I can see this you going viral. 🔥

Expand full comment

Excellent article, sharing nice insights!

Tl;dr of my following remark: Do not use advanced voice mode for o1

Forming the detailed inputs is likely best done via voice input, as described in the article. Be aware that using the inbuilt advanced voice mode of ChatGPT is not recommended as there is no possibility to correct wrongly transcribed passages or add details you did not think of initially as the AI starts to reply to you if a break is detected. I would fully recommend using the open-source model Whisper on your desktop or the keyboard voice to text capability as provided by many virtual phone keyboards (be aware of the processing taking part in the cloud for the latter). With that, you can edit your request and make it complete, correct errors and then send it to o1 to think about it and provide a useful answer.

Expand full comment

Hi Ben, that was nice to read. Thanks for sharing.

I think the problem with unavailable streaming is not because of technology. Because of what you described that o1 has these advisory capabilities as you put it, they must be writing (in memory) and rewriting the answer constantly, distancing from the paradigm of the next token discovery. I think that this is what could make streaming impossible. Perhaps what could be a good streaming strategy for this kind of models would be to "act like a human" streaming their thought process during a conversation. Have you seen Sam Altman himself in interviews, how he thinks before answering and start answering in broader terms and begin narrowing down the answer while he is thinking? Perhaps this could be reproduced in some capacity in the streaming process of such models....

Expand full comment

Probably I am missing something very obvious but how do you attach files to o1-pro on ChatGPT?

Expand full comment

Precision, you can attach up to 4 images. If you have a pdf, use a scroll down printscreen tool. I got to push 20 pages of unselectedable content this way and worked. 👌🏻

Expand full comment

Thanks - no idea why we cannot attach files directly

Expand full comment

If I have to guess, it's to make it harder to prompt inject into jailbreaking it, seen a better reasoning model would be more dangerous if handled by unethical people. I hope this is the reason, because the other most likely option is that it has multi-modal limitations because OpenAI were on a timer to release something and maybe multimodal habilities dropped the benchmarks significantly.

Gemini is multimodal from the ground-up, but OpenAI models where trained on multi-modality.

In any case, this is me throwing guesses out there, there's nothing to back this up tbh.

Expand full comment

Can't attach in o1-p.

o1-Pro doesn't have attachment options, but it does have a HUGE token limit in your prompt.

So, write the first parts of this prompt format, and then copy/paste your "Context/Attachment-Data" from another chat/doc at the end of your prompt, and then submit.

TLDR:

Just dump everything in the chat

Expand full comment

Thanks!!!! :)

Expand full comment

Great write-up. I started switching back and forth between 4o and o1 in my ChatGPT Plus account to get a feel for the differences. I love the output when I give it a larger initial prompt. Thanks for the post!

Expand full comment

By telling o1 what you want it gives you what you asked for. Society loves a good Echo chamber.

o1 congratulated me on Kristen, one of the most innovative solutions in combating climate change by eliminating the need for refrigerators in residential homes.

How was I going to do this by proposing a solution of utilizing subterranean refrigeration techniques?

There's plenty of subterranean real estate available below residential homes and by creating a system of tunnels we will not only be able to grow food but also store it and deliver it directly to people's basements.

It then offered to build me a business plan.

It gives you what you want.

Expand full comment

I see that as successful automation. Using the chat to work through things then collect my refined thoughts into an ask for a desired output.

The analogy to a junior hire is perfect. If I do the work to give good instructions then it saves time. Give bad instructions and waste time.

Expand full comment

In my one day of testing via the API I have found this view to be accurate. Which for my use case is great - I use the gpt4o for a conversation to gather information then the o1 to generate the desired result, all in a UX that allowed for that step to be long running and async. This saved me a few days of implementing a multi-step reasoner independently.

Regarding tonality of the output - my approach will be to transform the dry output to the desired tonality and lingo that is appropriate. This is all non-coding but does involve a mix of narrative text and structured data.

Expand full comment

One of those pieces that comes at a time when everyone has been using something in a certain way, but on their own, and then a well time blog cements it into community fact.

Using o1 is best for deep, singular tasks with a lot of context.

Claude and others are better at quickly generating code. O1 is best at analyzing it and fixing a small bug.

Expand full comment

what is GA? such unnecessary acronym usage

Expand full comment

General Availability

Expand full comment

love it! thanks

Expand full comment