Google has announced it’s testing a new AI-powered search tool, Scholar Labs, that’s designed to answer detailed research questions. But its demonstration highlighted a bigger question about finding “good” science studies. How much will scientists trust a tool that forgoes typical ways of gauging a study’s popularity with the scientific establishment in favor of reading the relationships between words to help surface good research?
The new search tool uses AI to identify the main topics and relationships in a user’s query and is currently available to a limited set of logged-in users. The demo video from Scholar Labs featured a question about brain-computer interfaces (BCIs). I have a PhD in BCIs, so I was eager to see what Scholar Labs pulled up.
The first result was a review paper of BCI research published in 2024 in a journal called Applied Sciences. Scholar Labs includes explanations for why the results matched the query, so it pointed out that the paper discusses research into a noninvasive signal called electroencephalogram and surveys some leading algorithms in the field.
Scholar Labs uses AI to surface science papers that Google says best match the user’s research question. Screenshot: Google Scholar Labs
But I noticed that Scholar Labs lacks the filters for common metrics used to separate “good” studies from “not-so-good” ones. One metric is the number of times that a study has been cited by other studies since its publication, which loosely translates to a paper’s popularity. It’s also associated with time: A recently published study might have zero citations or rack up hundreds within a few months; a study from the ’90s may tout thousands. Another metric is the “impact factor” of a science journal. Journals that publish widely cited studies have a higher impact factor and thus have a reputation for being more rigorous or meaningful to the scientific community. Applied Sciences self-reports an impact factor of 2.5. Nature, for comparison, says its impact factor is 48.5.
The original Google Scholar has an option for ranking studies by “relevancy” and lists the number of citations for each result. The goal of the new Scholar Labs is to dig up “the most useful papers for the user’s research quest,” Google spokesperson Lisa Oguike told The Verge It does so by ranking papers in the same way as the researchers themselves, Google says, by “weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”
However, the new Scholar Labs will not sort or limit results based on a paper’s citation count or a journal’s impact factor, Oguike told The Verge.
Image: Google Scholar
“Impact factors and citation counts depend on the research area of the papers and it can be hard for most users to guess suitable values in the context of specific research questions,” Oguike wrote. “Limiting by impact factor or citation counts can often miss key papers — in particular, papers in interdisciplinary/adjacent fields/journal or recently published articles,” Oguike added.
Metrics like citation count and impact factor are “pretty coarse assessments of a paper’s quality,” associate professor of neurology at Vanderbilt University Medical Center Matthew Schrag said in an interview with The Verge, agreeing with Google’s statement. They “speak more about the social context of the paper” rather than its quality, although “those two things hopefully are correlated,” he said.
Schrag, who researches Alzheimer’s disease, is one of the many scientists-sleuths who have flagged dubious data in published science studies. The efforts of data sleuths like Schrag, and a closer attention by the science community at large, have resulted in studies pulled from well-regarded journals because of doctored images, corrections issued by Nobel Prize winners, and federal investigations into faked data.
Still, it’s difficult to not use citation count or a journal’s reputation to casually vet a study, especially when entering a new field. Professor of rehabilitation sciences at Tufts University, James Smoliga, a frequent user of the original Google Scholar, finds himself believing highly cited papers to be more trustworthy. “I’m guilty of it just like everybody else is,” he said to The Verge. He does so despite having debunked the methods used in a study with thousands of citations. “And I know myself that’s not the case but yet I still fall for that trap because what else am I going to do?”
I repeated the Scholar Labs demo query about BCI research for stroke patients in PubMed, a leading repository of biomedical and health research run by the US National Institutes of Health National Library of Medicine. Unlike Scholar Labs, PubMed relies extensively on filters and terms connected with ors and ands. I narrowed my results to only review articles of clinical research, meaning only done on humans, from the past five years. I excluded preprints, which are studies posted directly to a paper repository like arXiv or bioRxiv without having gone through a review process from other scientists. Two of the six results focused exclusively on electroencephalogram as the primary type of noninvasive BCI used to help stroke patients.
PubMed allows users to filter search results by factors like time, article type, and peer-review. Screenshot: PubMed
Users will be able to ask for “recent” papers in their query and specify a period of time in their request, and Scholar Labs uses the “full-text of research papers” to find results that match the user query, Oguike added.
Google is calling Scholar Labs a “new direction for us” and says it plans to incorporate user feedback in the future. It has a waitlist for access.
Schrag thinks AI-powered search, like that of the new Scholar Labs, has a place in the scientific ecosystem. It could, in theory, cast a wider net to surface papers that otherwise slipped through the cracks, or add additional context about a paper’s popularity across social media platforms, he added. Studies need a holistic appraisal, he said, which AI might be able to address. “You have to have a sense of what the standards in the field are in terms of rigor and whether a study meets that,” he added.
Ultimately, scientists are responsible for determining what science is impactful, Schrag said. It requires reading and engaging with science literature “to be the final arbiters and not to let algorithms be the final arbiter of what we consider high quality.”
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
- AIClose
AI
Posts from this topic will be added to your daily email digest and your homepage feed.
FollowFollow
See All AI
- GoogleClose
Google
Posts from this topic will be added to your daily email digest and your homepage feed.
FollowFollow
See All Google
- ReportClose
Report
Posts from this topic will be added to your daily email digest and your homepage feed.
FollowFollow
See All Report
- ScienceClose
Science
Posts from this topic will be added to your daily email digest and your homepage feed.
FollowFollow
See All Science
- TechClose
Tech
Posts from this topic will be added to your daily email digest and your homepage feed.
FollowFollow
See All Tech

