一些学术论文提出,在解读肺癌、皮肤癌等疾病的医学影像时,人工智能(AI)比人类医生的能力更强。但近期的研究对该结论提出了质疑。
医学期刊《英国医学杂志》3月发表的一篇论文发现,许多相关研究言过其实,夸大了AI的实际效果。这一发现意义重大,原因是医疗行业正在寻求借助AI技术加快疾病诊断速度,而该发现动摇了行业变革的理论基础。
同时,科技行业也因热衷于开发和兜售用于医学影像分析的AI技术而备受质疑。该论文的作者担心,狂热的企业与投资者可能会在相关技术得到充分审查之前便会试图将其推向市场。
麦乌拉·纳杰德兰是这篇论文的合著者之一,他说:“我们并非不尊重风险资本家,他们在许多创新项目的融资过程中有着重要作用,但显然他们最关心的始终还是如何尽快将产品推向市场。虽然我们怀有同样的热情,但我们也非常清楚,要想大规模推广相关技术,必须首先确保其安全性和有效性。”
这篇论文还涉及了导致3万多美国人死亡的新冠疫情。有研究者声称,已开发出比人类更快的AI系统,来通过胸部CT扫描诊断病人是否感染了新冠病毒。
最近《英国医学杂志》回顾了近100项有关AI深度学习技术的研究,该技术已被应用到各种疾病的医学扫描中,包括黄斑变性、结核病和几种癌症。
最后发现,有77项研究在缺少随机试验的情况下比较了AI系统与人类医生的表现,并在其摘要或结语中给出了具体评价,其中,23项研究表示AI在诊断特定疾病时的表现比临床医生“更优秀”。
论文合著者、非盈利机构斯克利普斯研究所创始人兼董事埃里克·托普尔表示,这些研究的一个主要问题是“其中许多都有人为的痕迹”,相关研究人员只是在声称其技术的表现“比医生好”而已。他解释说,在现实生活中,AI和人类医生并不是非此即彼的关系。总要有医生来检查诊断结果,因此对比AI和人类医生的表现本身就是一件很荒诞的事。
托普尔说:“总有人很热衷于拿机器和医生来做比较,问题在于你不可能把解读医学影像的工作完全交给机器来做。如果真碰上威胁生命或者比较严重的疾病,还是得有医生来判断。”
他补充道:“我想说的是,如果你看了所有这些论文,你会发现其中多达90%的论文都是在进行人机比较,真的没必要这么做。”
英国国家健康研究所的临床医学研究员纳杰德兰表示,宣扬AI(相较于人类医生)的优势可能会对公众造成误导。
纳杰德兰说:“现在外面的炒作很多,这些炒作又通过媒体很快变成各种如‘AI即将取代医生’的传言流入患者耳中。”
他表示,除了进行人机比较这一核心谬误,这些论文最大的问题在于未能遵循医疗专业人士过去十年一直在努力打造的更为严格的报告标准。例如,这些论文一般都未使用多个数据集来衡量其深度学习模型的准确性,这就导致其研究对象十分有限,未能包括各种不同人群。
在查阅近期发表的一些关于使用深度学习技术通过胸部CT扫描诊断新冠肺炎的论文时,澳大利亚皇家阿德莱德医院医学影像研究室主任卢克·奥克登·雷纳也注意到了类似的问题。与《英国医学杂志》论文描述的那些问题多多的医学影像研究一样,新冠肺炎相关论文的结论也是建构在十分有限的数据之上,无法代表全体人群的实际情况,带有选择性偏差问题。
在其中一篇论文中奥克登·雷纳注意到,研究人员开发了一种深度学习系统,该系统能够基于从同济大学附属医院1014名患者处采集到的数据识别新冠病毒。这些患者均已通过传统拭子测试确诊患有新冠肺炎,并且也接受了胸部CT扫描确认其肺部是否已被感染。
也就是说,研究人员训练深度学习系统时用的可能是偏斜数据。医生很可能正是因为怀疑这些患者患有与新冠病毒相关的肺部疾病才让他们去做了胸部CT扫描。同样的技术在筛查无肺部感染症状的患者时可能就没什么用了。
奥克登·雷纳在发给《财富》的邮件中写道:“一般而言,数据集越准确、越全面,其用处也就越大。”
他认为,就新冠肺炎而言,现有检测手段已经十分有效,AI技术应该用于其它更重要的任务之上,研究者完全没有必要就使用深度学习技术诊断新冠肺炎发表论文。
奥克登·雷纳还在邮件中表示:“只靠CT扫描筛查新冠肺炎效果可能并不好。如果在现有医疗流程中有哪些瓶颈问题是AI可以解决的,那就需要专门收集与该问题相关的数据。”
托普尔同意奥克登·雷纳的观点,他表示:“在使用CT扫描判断肺部是否可能感染新冠病毒方面,算法是有用武之地的,但我们不一定要做CT扫描。”
托普尔解释说,随着传统检测工具全球供应量的增加,已然成为了比CT扫描更容易获得的检测手段,而且成本还更低。
托普尔表示,近期发表的这些AI医学影像研究给我们提了个醒,在评估自己的发现时,我们应当始终抱有怀疑精神。从本质上说,这些论文都是关于AI技术在当前医疗体系中潜在应用前景的初步研究,但研究者仍然需要开展更深入的临床试验,验证相关技术的有效性。
在初步研究之后,研究者通常会进行更为正式的学术研究,即前瞻性研究。托普尔表示:“研究者不能直接闷头去做前瞻性研究,也不应夸大自己的研究结论。”(财富中文网)
译者:梁宇
审校:夏林
一些学术论文提出,在解读肺癌、皮肤癌等疾病的医学影像时,人工智能(AI)比人类医生的能力更强。但近期的研究对该结论提出了质疑。
医学期刊《英国医学杂志》3月发表的一篇论文发现,许多相关研究言过其实,夸大了AI的实际效果。这一发现意义重大,原因是医疗行业正在寻求借助AI技术加快疾病诊断速度,而该发现动摇了行业变革的理论基础。
同时,科技行业也因热衷于开发和兜售用于医学影像分析的AI技术而备受质疑。该论文的作者担心,狂热的企业与投资者可能会在相关技术得到充分审查之前便会试图将其推向市场。
麦乌拉·纳杰德兰是这篇论文的合著者之一,他说:“我们并非不尊重风险资本家,他们在许多创新项目的融资过程中有着重要作用,但显然他们最关心的始终还是如何尽快将产品推向市场。虽然我们怀有同样的热情,但我们也非常清楚,要想大规模推广相关技术,必须首先确保其安全性和有效性。”
这篇论文还涉及了导致3万多美国人死亡的新冠疫情。有研究者声称,已开发出比人类更快的AI系统,来通过胸部CT扫描诊断病人是否感染了新冠病毒。
最近《英国医学杂志》回顾了近100项有关AI深度学习技术的研究,该技术已被应用到各种疾病的医学扫描中,包括黄斑变性、结核病和几种癌症。
最后发现,有77项研究在缺少随机试验的情况下比较了AI系统与人类医生的表现,并在其摘要或结语中给出了具体评价,其中,23项研究表示AI在诊断特定疾病时的表现比临床医生“更优秀”。
论文合著者、非盈利机构斯克利普斯研究所创始人兼董事埃里克·托普尔表示,这些研究的一个主要问题是“其中许多都有人为的痕迹”,相关研究人员只是在声称其技术的表现“比医生好”而已。他解释说,在现实生活中,AI和人类医生并不是非此即彼的关系。总要有医生来检查诊断结果,因此对比AI和人类医生的表现本身就是一件很荒诞的事。
托普尔说:“总有人很热衷于拿机器和医生来做比较,问题在于你不可能把解读医学影像的工作完全交给机器来做。如果真碰上威胁生命或者比较严重的疾病,还是得有医生来判断。”
他补充道:“我想说的是,如果你看了所有这些论文,你会发现其中多达90%的论文都是在进行人机比较,真的没必要这么做。”
英国国家健康研究所的临床医学研究员纳杰德兰表示,宣扬AI(相较于人类医生)的优势可能会对公众造成误导。
纳杰德兰说:“现在外面的炒作很多,这些炒作又通过媒体很快变成各种如‘AI即将取代医生’的传言流入患者耳中。”
他表示,除了进行人机比较这一核心谬误,这些论文最大的问题在于未能遵循医疗专业人士过去十年一直在努力打造的更为严格的报告标准。例如,这些论文一般都未使用多个数据集来衡量其深度学习模型的准确性,这就导致其研究对象十分有限,未能包括各种不同人群。
在查阅近期发表的一些关于使用深度学习技术通过胸部CT扫描诊断新冠肺炎的论文时,澳大利亚皇家阿德莱德医院医学影像研究室主任卢克·奥克登·雷纳也注意到了类似的问题。与《英国医学杂志》论文描述的那些问题多多的医学影像研究一样,新冠肺炎相关论文的结论也是建构在十分有限的数据之上,无法代表全体人群的实际情况,带有选择性偏差问题。
在其中一篇论文中奥克登·雷纳注意到,研究人员开发了一种深度学习系统,该系统能够基于从同济大学附属医院1014名患者处采集到的数据识别新冠病毒。这些患者均已通过传统拭子测试确诊患有新冠肺炎,并且也接受了胸部CT扫描确认其肺部是否已被感染。
也就是说,研究人员训练深度学习系统时用的可能是偏斜数据。医生很可能正是因为怀疑这些患者患有与新冠病毒相关的肺部疾病才让他们去做了胸部CT扫描。同样的技术在筛查无肺部感染症状的患者时可能就没什么用了。
奥克登·雷纳在发给《财富》的邮件中写道:“一般而言,数据集越准确、越全面,其用处也就越大。”
他认为,就新冠肺炎而言,现有检测手段已经十分有效,AI技术应该用于其它更重要的任务之上,研究者完全没有必要就使用深度学习技术诊断新冠肺炎发表论文。
奥克登·雷纳还在邮件中表示:“只靠CT扫描筛查新冠肺炎效果可能并不好。如果在现有医疗流程中有哪些瓶颈问题是AI可以解决的,那就需要专门收集与该问题相关的数据。”
托普尔同意奥克登·雷纳的观点,他表示:“在使用CT扫描判断肺部是否可能感染新冠病毒方面,算法是有用武之地的,但我们不一定要做CT扫描。”
托普尔解释说,随着传统检测工具全球供应量的增加,已然成为了比CT扫描更容易获得的检测手段,而且成本还更低。
托普尔表示,近期发表的这些AI医学影像研究给我们提了个醒,在评估自己的发现时,我们应当始终抱有怀疑精神。从本质上说,这些论文都是关于AI技术在当前医疗体系中潜在应用前景的初步研究,但研究者仍然需要开展更深入的临床试验,验证相关技术的有效性。
在初步研究之后,研究者通常会进行更为正式的学术研究,即前瞻性研究。托普尔表示:“研究者不能直接闷头去做前瞻性研究,也不应夸大自己的研究结论。”(财富中文网)
译者:梁宇
审校:夏林
Artificial intelligence is better at analyzing medical images for illnesses like pneumonia and skin cancer than doctors are, according to a number of academic papers. But that conclusion is being called into question by recent research.
A paper published in March in medical journal The BMJ found that many of those studies exaggerated their conclusions, making A.I. technologies seem more effective than they were in reality. The finding is significant because it undermines a huge ongoing shift in the health care industry, which is looking to use technology to more quickly diagnose ailments.
It also calls into question a tech industry that is scrambling to develop and sell A.I. technology for analyzing medical imagery. The paper’s authors are worried that overzealous companies and their investors may push to sell the technology before it has been thoroughly vetted.
“With no disrespect to venture capitalists—obviously they’re an important part of the funding process for a lot of this innovation—but obviously their enthusiasm is always to try and get things to market as quickly as possible,” says Myura Nagendran, a coauthor of the BMJ paper. “While we share that enthusiasm, we’re also acutely aware of how important it is to make sure these things are safe and work effectively if we institute them en masse.”
The finding also touches on the current coronavirus pandemic, which has claimed over 30,000 lives in the U.S. Some researchers maintain that they’ve developed A.I. systems that are faster than humans at examining chest CT scans for COVID-19 infections.
The recent BMJ review looked at nearly 100 studies of a type of artificial intelligence called deep learning that had been used on medical scans of various disorders including macular degeneration, tuberculosis, and several types of cancers.
The review found that 77 studies that lacked randomized testing included specific comments in their abstracts, or summaries, comparing their A.I. system’s performance to that of human doctors. Of those, 23 said that their A.I. was “superior” to clinical physicians at diagnosing certain illnesses.
One of the main problems with these papers is that there’s “an artificial, contrived nature of a lot of these studies” in which researchers basically claim that their technology “outperformed a doctor,” says Eric Topol, one of the BMJ paper’s authors and the founder and director of the nonprofit Scripps Research Translational Institute. It’s absurd to compare an A.I.’s performance to that of human doctors, he explains, because in the real world, choosing between an A.I. system or a human doctor is not an either-or situation. Doctors will always review the findings.
“There’s this kind of nutty inclination to pit machines versus doctors, and that’s really a consistent flaw because it’s not only going to be machines that do readings of medical images,” Topol says. “You’re still going to have oversight if there’s anything reported that’s life-threatening or serious.”
Topol adds, “The point I’m just getting at, is if you look at all these papers, the vast majority—90%—do the man-versus-machine comparison, and it really isn’t necessary to do that.”
Nagendran, an academic clinical fellow for the U.K.’s National Institute for Health Research, says that studies describing A.I.’s superiority to human doctors can mislead people.
“There’s been a lot of hype out there, and that can very quickly translate through the media into stories that patients hear, saying things like, ‘It’s just around the corner, the A.I. will be seeing you rather than your doctor,’” says Nagendran.
“Besides the core fallacy of pitting A.I. versus humans, one of the big problems is that these papers typically fail to follow more robust reporting standards that health care professionals have been trying to make standard over the past decade, Nagendran says. One sore point, for instance, is that the papers generally fail to measure the accuracy of their deep-learning models on multiple data sets, which could include different populations of people, as opposed to just a limited number.
Luke Oakden-Rayner, a director of medical imaging research at the Royal Adelaide Hospital in Australia, noticed a similar problem when he examined a handful of recently published papers on using deep learning to diagnose COVID-19 via chest CT scans. Like the faulty medical imaging studies that the BMJ paper described, the coronavirus-related papers based their conclusions on a limited amount of data that was not representative of the entire population, a problem that’s known as selection bias.
In one paper Oakden-Rayner noted that the researchers developed a deep-learning system to recognize the coronavirus from data taken from 1,014 patients at Tongji University in Shanghai. These patients were diagnosed as having COVID-19 via the conventional swab tests used to detect the illness; they also had chest CT scans to see if there was any of the infection in their lungs.
But that deep-learning system was likely trained on skewed data. Doctors probably suspected those patients were having lung problems related to COVID-19, which is why they ordered CT scans of the patients’ chests. The same technology would be unlikely to work well with people who have COVID-19, but don’t have any symptoms in their lungs.
“As a general rule more accurate and complete data sets are more useful,” Oakden-Rayner says in an email to Fortune.
Oakden-Rayner questioned the need for A.I. researchers to even publish papers about using deep learning to diagnose the coronavirus, explaining that current testing is already effective and that there are more important jobs that A.I. can help with.
“Simply detecting COVID-19 on CT scans is unlikely to be very helpful,” Oakden-Rayner says in the email. “If there is a bottleneck that A.I. can solve in the medical workflow, then data for that task specifically will need to be collected.”
Topol agrees with Oakden-Rayner, saying, “It can be useful to have an algorithm review of a CT scan of lungs as to whether they are potentially related to COVID, but you don’t really need a CT scan.”
More conventional testing tools are increasingly being distributed worldwide, making them more available than CT scans, which are more expensive, Topol explains.
The takeaway from all these recent A.I. medical imaging studies is that people should use some skepticism in considering their findings, says Topol. These are essentially preliminary research papers that highlight potential uses of A.I. in the current health care system, but researchers still need deeper clinical trials to verify the technology’s effectiveness.
“You can’t just go right ahead to a prospective study,” Topol says regarding a more formal type of academic study that typically follows preliminary research. “You just don’t want to overstate the conclusions.”