Jakob Nielsen’s 1st of April joke or vision?

Image above: AI picturing two UX researchers (DALL-E) ALL·E-2023-12-11-12.34.52-A-fine-pencil-and-ink-drawing-in-black-and-white-on-white-fine-gravure-paper-1700-x-1700-pixels-illustrating-UX-Research_-AIs-Playground-with-a-f ….- by Jerome Bertrand © 2023

GPT 5 Will Do User Testing – soon!

‘For even better results from AI-run user testing, use different language models for the two roles in a usability study [i.e GPT-4 and Mistral]’.
Jacob Nielsen, April the 1st 2024

When I read the full article from Nielsen’s subtrack UX Roundup, dated April 1st, I thought.., hugh? Jacob’s tone of voice about AI for UX has changed?

Read the quoted and sourced article for yourself. I wonder what you think …

UX Roundup: TikTok Participation Inequality | GPT-5 UX | 2 UXers > 1 | Jakob Live

JAKOB NIELSEN
APR 1 2024
Source: Jacob Nielsen’s substack
??
‘GPT 5 Will Do User Testing
‘I think OpenIA’s marketing department must have tired of me speculating that the Abominable Snowman must be responsible for their many confused product names.’

OpenAI’s head of naming strategy leads a meeting in the OpenAI marketing department. “Can we change to a more catchy name than ‘GPT’ for version 5?” he asks the team, but the Marketing Director vetoes the radical idea of product names that customers can understand. (Midjourney)

OpenAI marketing recently invited me to preview their upcoming GPT-5 release to get me on their good side. While the release date remains shrouded in secrecy, it is clear that GPT-5 will create a leap forward in machine IQ.

Of most interest to me is that this improved AI capability will finally allow the ultimate in discount usability: the complete removal of expensive humans from all user research. It’s common to pay study participants around $100 just so that we can watch them stumble through our product for an hour. And even though UX salaries dropped by 11% in 2023, they are still too high for it to be economical to pay a human usability specialist to spend an hour watching that user before he or she can write the report about how users misused our design (after wasting 4 more expensive staff-hours watching 4 more users).

Unfortunately, as I explained in my recent article, “What AI Can and Cannot Do for UX,” current AI (GPT-4 level) is not smart enough to simulate users. We can’t simply ask an AI to use our software and have another AI analyze the results, even though this would cost a pittance.

Today, I can report that the forthcoming GPT-5 clocks in as being smarter than a user, which is admittedly not a very high bar.’

Once OpenAI ships GPT-5, its increased intelligence will allow us to replace costly humans in usability testing. Both the test user and the test facilitator give better results when they are AI. (Ideogram)
Even better, we can also employ GPT-5 to replace human study facilitators. This allows many benefits:

All user test sessions can run simultaneously, in parallel, through multithreading.
Since AI is so much faster than humans, a one-hour study will take about 10 minutes with the current slow AI response times. Once the software has been tuned and GPUs are replaced with AI-specific chips optimized for inference compute, I expect the time to simulate a one-hour study to drop to about 1 minute.
The combination of multi-tasking and time-dilation means that we will receive the completed report with usability test findings 1 minute after we have specified the UI we want to be tested. This fast turn-around will be a boon to iterative design, allowing UX designers to crank through maybe 50 design iterations in a day. (Remember that each design version will be produced by Generative UI in a minute or so, not by slow hand-tweaking of Figma prototypes.)
Since machines have infinite vigilance, you will avoid the downside of human test facilitators who may miss a user action while thinking about and writing notes about the previous user action.
While I’m describing qualitative user research, an added investment in AI compute will allow us to also complete quantitative studies reasonably fast, with less than an hour needed to run about 100 users through a measurement study.
A final advantage is that we can program the AI facilitator to treat all the AI users precisely the same way, which removes a source of bias (and associated noise in the data) when a human facilitator attempts (but fails) to behave identically with a hundred different people coming into the lab.
All these benefits to UX will accrue as soon as GPT-5 is released. We should expect hugely improved computer usability as soon as the following day, since about 50 design iterations can be tested daily.
Many UX experts recommend that you should not have a designer perform user testing on his or her own design because it’s hard to remain objective when watching users mistreat your beloved creation. I somewhat disagree with this common recommendation because of the advantages of employing UX “unicorns” who can keep all the information about both the design and the research findings within one human head and thus avoid the communication overhead of writing reports and having meetings. However, I agree with the risk of bias, which means we need to take extra care when interpreting unicorn study findings.
Using AI for all aspects of user testing eliminates this conundrum. One AI can be the user, and a completely different AI from another vendor can be the facilitator, retaining full objectivity. However, this advance must await the release of GPT-5 level AI from competing vendors. Experience from the GPT-4 era is that this catchup may take almost a year, which will be an insufferable wait since we will be accustomed to lightspeed AI-driven UI advances.
As more high-end AI models are released, you should experiment with using different AI products to serve as users for different persona segments in your market. For example, if you target teenage users, you could use xAI’s Grok, with its infamously snarky and irreverent personality.

For even better results from AI-run user testing, use different language models for the two roles in a usability study. If (as illustrated here) the user is GPT-5, then the facilitator could be Mistral for a more independent and less biased analysis of the user’s actions. However, this improved research method must await the release of a new version of Mistral with level-5 AI capability. (Ideogram)

k i n o k a s t . e u

Jakob Nielsen’s 1st of April joke or vision?

About the author