ChatGPT o1 represents such a leap forward that OpenAI have reset the counter back to 1
OpenAI just released its latest update— ChatGPT o1 (codename ‘Strawberry’)—and it’s not just a minor upgrade. This new o1 model is designed to excel where previous models have fallen short; mainly in complex reasoning and problem-solving, which could make it perfect for use in audit. With its ability to think through solutions before answering, solve mathematical problems and deal with complex reasoning, ‘Strawberry’ could be the secret ingredient that takes auditing to the next level.
What Makes ChatGPT o1 Different?
The o1 model doesn’t just spit out answers like other LLM’s—it’s designed to reason, cross-check, and think through complicated problems before giving a comprehensive answer. This is completely new to AI. Previous iterations of ChatGPT, such as GPT-4o, were impressive in many respects, but they fell short when faced with truly complex, multi-step challenges.
In fact, according to OpenAI, while GPT-4o managed to solve only 13% of problems on a qualifying exam For International Mathematics Olympiad (IMO), the o1 model achieved a staggering 83% success rate. In competitive coding environments, the o1 model reached the 89th percentile, completely outpacing its predecessor.
But it isn’t just suited for use in maths or coding. The o1 model is built to take its time with requests (sometimes over 2 minutes!), carefully refining its answers and following a train of thought through, before responding. It can perform all the usual AI tasks but also see the bigger picture. This could be a powerful weapon in the arsenal of any auditor.
How Could the o1 Model be Used in Audit Testing?
ChatGPT o1, with its enhanced reasoning capabilities, may significantly improve audit testing in ways GPT-4o cannot. For example, in preliminary analytical reviews, ChatGPT o1 could automatically pull relevant financial data, compare it against industry benchmarks, and highlight any deviations that warrant further investigation. This deeper analytical ability would allow auditors to identify potential risks earlier in the audit process. Similarly, in stock testing, ChatGPT o1 may be able to not only verify inventory records but also predict obsolescence based on market trends, offering auditors insights that would be difficult to capture with less advanced models.
Additionally, ChatGPT o1’s ability to aid in fraud detection could be a game changer for auditing. It could be used in analyzing journal entries for unusual patterns or timing, helping to identify instances of management override more effectively than GPT-4o. In trade debtors testing, ChatGPT o1 would be better at analyzing payment histories and customer credit risks with a more sophisticated approach, flagging potential bad debts that could otherwise be missed. These improvements across various audit tasks show how ChatGPT o1’s advanced reasoning could elevate the accuracy and efficiency of financial audits.
Putting ChatGPT o1 to the Test
I tasked o1 with the challenge of performing an asset depreciation recalculation. I copied and pasted an example fixed asset register into ChatGPT, told it what the accounting policies were per the accounts and asked it to do the following:
Perform a depreciation recalculation test looking at:
1. Whether the depreciation rates applied in the fixed asset register are in line with the accounting policy
2. Whether the depreciation stated on the fixed asset register has been calculated correctly.
3. Whether the depreciation rates applied are consistent and in line with other companies located in the the north west of england with the same sic code.
ChatGPT o1 efficiently tackled the depreciation recalculation, demonstrating impressive speed and analytical prowess. The fixed asset register was fairly shoddy and I thought it may struggle to interpret it. However, after spending 45 seconds ‘thinking’ it proceeded to summarise the extensive fixed asset register by category picking out the key details for each asset—such as cost, depreciation method, useful life, and depreciation rate—and performed a meticulous line-by-line recalculation of the depreciation charges. Throughout the process, it compared its calculations with the figures in the register, highlighting any discrepancies and providing explanations for them, such as rounding differences or fully depreciated assets. It also offered reasonable depreciation rates based on the accounting policies of other companies with that sic code (although admittedly when pressed further, it wouldn’t give any more details on which companies…). Despite this, the conclusions it reached were thorough and insightful, confirming the accuracy of most depreciation calculations while recommending areas for further review.
What stood out was its ability to understand and interpret the fixed asset register layout, and most importantly, it got the figures right! Quite often, in previous iterations of ChatGPT, the models had showed promising signs but the answer that it shoots out is rarely correct. This was not the case with o1!
This structured approach and the ability to handle complex financial data highlight how ChatGPT o1’s advanced reasoning capabilities lead to more precise and comprehensive answers in intricate accounting tasks and can be a useful tool in helping in audit tests.
I ran the same test with ChatGPT 4o for comparison, and the results were noticeably different. ChatGPT 4o struggled to interpret the layout of the fixed asset register, taking much longer to identify which depreciation rates applied to specific assets. It repeatedly made errors, often having to restart the entire process. The experience felt as though it was chasing after irrelevant details and getting tangled up in its own logic. In contrast, ChatGPT o1’s 45 seconds of ‘thinking’ allowed it to explore different approaches before diving into the task, resulting in a much smoother and more accurate performance.
So is Strawberry ‘The Special o1’ for auditing?
The new iteration of ChatGPT does feel like a big step up from previous models. Its ability to “think” through complex tasks is notably advanced. In the asset depreciation recalculation, “Strawberry” demonstrated a remarkable capacity to understand and interpret the intricate layout of the fixed asset register. It systematically categorized assets, accurately applied accounting policies, and performed meticulous line-by-line calculations. Not only did it get the calculations right—a feat that earlier versions struggled with—but it also provided thorough explanations and highlighted areas for further review.
This level of performance suggests that “Strawberry” possesses advanced reasoning capabilities that are highly beneficial for auditing tasks. Its proficiency in handling complex financial data, attention to detail, and ability to draw insightful conclusions can significantly enhance the efficiency and accuracy of audit processes.
Looking ahead, When OpenAI updates ChatGPT to support attachments and full multimodal inputs and outputs, it may elevate its capabilities to an even higher level. The ability to process and analyze attachments such as spreadsheets, PDFs, and other documents commonly used in auditing could make the AI an even more powerful tool in the auditor’s toolkit. This would streamline workflows, reduce the time spent on manual data entry, and potentially uncover insights that might be overlooked by human analysis.
However, it’s important to consider that these advanced functionalities may come with increased costs. The o1 model, with its enhanced features, might be more expensive to implement and utilize. Organizations will need to weigh the benefits of these capabilities against the additional expenses to determine if the investment aligns with their operational needs and budget constraints.
Moreover, while “Strawberry” shows immense promise, it’s essential to recognize that auditing is not solely about number-crunching. Auditing requires professional judgment, ethical considerations, and the ability to understand nuanced business contexts—areas where human expertise is irreplaceable. AI tools like ChatGPT o1 can handle repetitive and data-intensive tasks but may not fully grasp the subtleties involved in every auditing scenario. So we are still in with a job… For now!