Challenges and Future Adoption
Ryan discusses the challenges faced by GPT-4 in a strict comparison task and shares examples of incorrect evaluations. He suggests caution in relying too much on large language models for peer review but acknowledges their potential for simplifying mundane tasks. Prompt injection is a concern, but with human oversight, it can be addressed. The lab leans towards using GPT-4 as a valuable tool in the review process while still emphasizing the importance of human judgment.In this clip
From this podcast

Data Skeptic
Automated Peer Review
Related Questions
What's your opinion on using large language models (LLMs) for scientific research, especially for generating new ideas for hypotheses, as discussed in the episode Does ChatGPT “Think”? A Cognitive Neuroscience Perspective with Anna Ivanova - 620 and the clip Language Model Insights?
Are AI models biased as discussed in the episode Proof That ChatGPT Has Gotten Worse? and the clip AI Performance Decline?
Do you think an industrious paper writer might start using models to get a review process from them before submitting the paper?