基于多模态医学大数据的消化内镜诊断和预测的基础模型的开发与临床应用

注册号:

Registration number:

ChiCTR2600117908 

最近更新日期:

Date of Last Refreshed on:

2026-01-29 18:05:29 

注册时间:

Date of Registration:

2026-01-29 00:00:00 

注册号状态:

预注册

Registration Status:

Prospective registration

注册题目:

基于多模态医学大数据的消化内镜诊断和预测的基础模型的开发与临床应用

Public title:

Development and Clinical Application of a Foundational Model for Multimodal Medical Data-Based Diagnosis and Prediction in Digestive Endoscopy

注册题目简写:

English Acronym:

研究课题的正式科学名称:

基于多模态医学大数据的消化内镜诊断和预 测的基础模型的开发与临床应用

Scientific title:

Development and Clinical Application of a Foundational Model for Multimodal Medical Data-Based Diagnosis and Prediction in Digestive Endoscopy

研究课题代号(代码):

Study subject ID:

在二级注册机构或其它机构的注册号:

The registration number of the Partner Registry or other register:

申请注册联系人:

周平红 

研究负责人:

周平红 

Applicant:

Zhou Pinghong 

Study leader:

Zhou Pinghong 

申请注册联系人电话:

Applicant telephone:

+86 136 8197 1063

研究负责人电话:

Study leader's
telephone:

+86 136 8197 1063

申请注册联系人传真 :

Applicant Fax:

研究负责人传真:

Study leader's fax:

申请注册联系人电子邮件:

Applicant E-mail:

zhou.pinghong@zs-hospital.sh.cn

研究负责人电子邮件:

Study leader's E-mail:

zhou.pinghong@zs-hospital.sh.cn

申请单位网址(自愿提供):

Applicant website(voluntary supply):

研究负责人网址(自愿提供):

Study leader's website(voluntary supply):

申请注册联系人通讯地址:

上海市徐汇区枫林路180号

研究负责人通讯地址:

上海市徐汇区枫林路180号

Applicant address:

180 Fenglin Road Shanghai China

Study leader's address:

180 Fenglin Road Shanghai China

申请注册联系人邮政编码:

Applicant postcode:

研究负责人邮政编码:

Study leader's postcode:

申请人所在单位:

复旦大学附属中山医院

Applicant's institution:

Zhongshan Hospital

研究负责人所在单位:

复旦大学附属中山医院

Affiliation of the Leader:

Zhongshan Hospital

是否获伦理委员会批准:

Approved by ethic committee:

Yes

伦理委员会批件文号:

Approved No. of ethic committee:

B2025-145R

伦理委员会批件附件:

Approved file of Ethical Committee:

查看附件View

批准本研究的伦理委员会名称:

复旦大学附属中山医院医学伦理委员会

Name of the ethic committee:

Ethics Committee of Zhongshan Hospital Fudan University

伦理委员会批准日期:

Date of approved by ethic committee:

2025-04-08 00:00:00

伦理委员会联系人:

高鑫

Contact Name of the ethic committee:

Gao Xin

伦理委员会联系地址:

上海市徐汇区枫林路180号

Contact Address of the ethic committee:

180 Fenglin Road Shanghai China

伦理委员会联系人电话:

Contact phone of the ethic committee:

+86 21 3158 7871

伦理委员会联系人邮箱:

Contact email of the ethic committee:

研究实施负责(组长)单位:

复旦大学附属中山医院

Primary sponsor:

Zhongshan Hospital

研究实施负责(组长)单位地址:

上海市徐汇区枫林路180号

Primary sponsor's address:

180 Fenglin Road Shanghai China

试验主办单位(项目批准或申办者):

Secondary sponsor:

国家:

中国

省(直辖市):

上海市

市(区县):

上海市

Country:

China

Province:

Shanghai

City:

Shanghai

单位(医院):

复旦大学附属中山医院

具体地址:

上海市徐汇区枫林路180号

Institution
hospital:

Zhongshan Hospital

Address:

180 Fenglin Road Shanghai China

经费或物资来源:

自筹

Source(s) of funding:

Self-raised

研究疾病:

消化系统疾病,未特指的  

Target disease:

Unspecified diseases of the digestive system

研究疾病代码:

DE2Z

Target disease code:

DE2Z

研究类型:

诊断试验

Study type:

Diagnostic test

研究所处阶段:

其它 

Study phase:

N/A

研究设计:

诊断试验诊断准确性 

Study design:

Diagnostic test for accuracy 

研究目的:

本研究旨在开发并验证一个基于多模态医学大数据的消化内镜诊断和预测基础模型,评估其在实际临床应用中的准确性和效能。研究的目标人群为具有消化系统疾病高风险或已确诊消化系统疾病的患者,涵盖多种消化道疾病类型,包括胃癌、食管癌、结直肠癌、消化道间叶来源肿瘤、炎症性肠病(IBD)及其他常见或复杂病变。 通过本研究,计划回答以下核心临床科学问题: 1)基于多模态数据(如内镜图像、病理数据、生化检测结果和临床信息等)的消化内镜诊断基础模型是否能够提供与传统金标准诊断方法相媲美的诊断准确性。2)与单一模态数据(如仅内镜图像)相比,该多模态基础模型是否在不同消化道疾病的诊断中表现更优。3)基础模型在不同患者群体(如早期病变患者、复杂病变患者及高风险人群)中的效能和临床应用潜力如何。此外,本研究还将通过与当前临床广泛采用的金标准诊断方法(如病理检查、CT/MRI影像分析等)的对比,评估基础模型在以下方面的价值:诊断准确性:模型是否能够实现对消化系统常见病和肿瘤的早期发现和精准诊断。预测灵敏性:模型是否能够通过整合多模态数据,有效预测疾病的进展和患者的长期预后。个体化应用:模型是否能够在复杂病变情况下,通过数据融合提供个性化的诊疗支持。 通过本研究,我们期望推动基础模型在消化系统疾病诊断与预测中的实际应用,为提高消化道疾病的诊断效率与准确性、优化患者管理和改善长期预后提供科学依据。  

Objectives of Study:

This study aims to develop and validate a basic model for digestive endoscopy diagnosis and prediction based on multimodal medical big data, and evaluate its accuracy and efficacy in actual clinical applications. The target population of the study consists of patients at high risk digestive of system diseases or those who have been diagnosed with digestive system diseases, covering a variety of digestive tract disease types, including gastric cancer, esophageal cancer, colorectal cancer, digestive tract mesenchymal tumors, inflammatory bowel disease (IBD), and other common or complex lesions. Through this study, the following core clinical scientific questions are planned to be answered: 1) Can the basic model for digestive endoscopy diagnosis based on multimodal data (such as endoscopic images, pathological data, biochemical test results, and clinical information) provide diagnostic accuracy comparable to that of traditional gold - standard diagnostic methods? 2) Compared with single - modality data (such as only endoscopic images), does the multimodal basic model perform better in the diagnosis of different digestive tract diseases? 3) What are the efficacy and clinical application potential of the basic model in different patient groups (such as patients with early - stage lesions, patients with complex lesions, and high - risk groups)? In addition, this study will also evaluate the value of the basic model in the following aspects by comparing it with the gold - standard diagnostic methods widely used in current clinical practice (such as pathological examination, CT/MRI image, analysis etc.): Diagnostic accuracy: Can the model achieve early detection and accurate diagnosis of common digestive system diseases and tumors? Predictive sensitivity: Can the model effectively predict disease progression and long - term prognosis of patients by integrating multimodal data? Individualized application: Can the model provide personalized diagnosis and treatment support through data fusion in the case of complex lesions Through this study, we expect to promote the practical application of the basic model in the diagnosis and prediction of digestive system diseases, and provide a scientific basis for improving the diagnostic efficiency and accuracy of digestive tract diseases, optimizing patient management, and improving long - term prognosis.

药物成份或治疗方案详述:

 

Description for medicine or protocol of treatment in detail:

 

纳入标准:

Inclusion criteria

排除标准:

受试者正在接受可能影响研究结果的其他临床试验。

Exclusion criteria:

The subjects are participating in other clinical trials that may affect the study results.

研究实施时间:

Study execute time:

From 2025-04-08 00:00:00 To 2026-08-31 00:00:00  

征募观察对象时间:

Recruiting time:

From 2026-02-01 00:00:00 To 2026-08-31 00:00:00

诊断试验:

Diagnostic Tests:

金标准或参考标准(即可准确诊断某疾病的单项方法或多项联合方法,在本研究中用于诊断是否有该病的临床参考标准):

病理诊断;内镜诊断;影像学检查

Gold Standard or Reference Standard (The clinical reference standards required to establish the presence or absence of the target condition in the tested population in present study):

Pathological diagnosis; endoscopic diagnosis; imaging examination

指标试验(即本研究的待评估诊断试验,无论为方法、生物标志物或设备,均请列出名称):

一个基于多模态医学大数据的消化内镜诊断和预测基础模型

Index test:

A basic model for digestive endoscopy diagnosis and prediction based on multimodal medical big data

目标人群(可以是某种疾病患者或正常人群,详细描述其疾病特征,注意应纳入符合分布特点的全序列病例,具有良好的代表性)

在指定时间段内(2009年1月1日至2024年12月31日),所有前往所在医院就诊并曾行消化内镜检查的患者

例数:

Sample size:

2000000

Target condition (The target condition is a particular disease or disease stage that the index test will be intended to identify. Please specify the characteristics in detail; the population should has a complete spectrum and good representative):

All patients who visited the hospital within the specified time period (from January 1, 2009, to December 31, 2024) and had undergone digestive endoscopy

容易混淆的疾病人群(即与目标疾病不易区分的一种或多种不同疾病,应避免采用正常人群对照的病例-对照设计):

例数:

Sample size:

0

Population with condition difficult to distinguish from the target condition, the normal population in a case-control study design should be avoid:

None

研究实施地点:

Countries of recruitment and research settings:

国家:

中国

省(直辖市):

上海市 

市(区县):

 

Country:

China

Province:

Shanghai

City:

单位(医院):

复旦大学附属中山医院 

单位级别:

三甲 

Institution
hospital:

Zhongshan Hospital

Level of the institution:

Tertiary A

测量指标:

Outcomes:

指标中文名:

灵敏度

指标类型:

主要指标

Outcome:

Sensitivity

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

阴性预测值

指标类型:

主要指标

Outcome:

Negative predictive value

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

阳性预测值

指标类型:

主要指标

Outcome:

Positive predictive value

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

校准曲线

指标类型:

次要指标

Outcome:

Calibration plot

Type:

Secondary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

特异度

指标类型:

主要指标

Outcome:

Specificity

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

推理时间

指标类型:

主要指标

Outcome:

Inference Time

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

再现性

指标类型:

主要指标

Outcome:

Reproducibility

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

ROC曲线下面积

指标类型:

主要指标

Outcome:

Area Under the Receiver Operating Characteristic Curve

Type:

Primary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

指标中文名:

综合判别指数

指标类型:

次要指标

Outcome:

Integrated Discrimination Index

Type:

Secondary indicator

测量时间点:

测量方法:

Measure time point of outcome:

Measure method:

采集人体标本:

Collecting sample(s)
from participants:

标本中文名:

组织:

Sample Name:

None

Tissue:

人体标本去向

其它  

说明

Fate of sample:

0thers  

Note:

征募研究对象情况:

Recruiting status:

尚未开始

Not yet recruiting

年龄范围:

Participant age:

最小 Min age 18 years
最大 Max age 100 years

性别:

男女均可

Gender:

Both

随机方法(请说明由何人用什么方法产生随机序列):

Randomization Procedure (please state who generates the random number sequence and by what method):

None

是否公开试验完成后的统计结果:

Calculated Results after the Study Completed public access:

公开/Public

盲法:

Blinding:

试验完成后的统计结果(上传文件):

Calculated Results after
the Study Completed(upload file):

是否共享原始数据:

IPD sharing

否No

共享原始数据的方式(说明:请填入公开原始数据日期和方式,如采用网络平台,需填该网络平台名称和网址):

The way of sharing IPD”(include metadata and protocol, If use web-based public database, please provide the url):

None

数据采集和管理(说明:数据采集和管理由两部分组成,一为病例记录表(Case Record Form, CRF),二为电子采集和管理系统(Electronic Data Capture, EDC),如ResMan即为一种基于互联网的EDC:

数据治理/数据管理计划: 本研究为回顾性分析,旨在评估基于多模态医学大数据的消化内镜诊断基础模型的效能。为确保数据的完整性、准确性和安全性,特制定如下数据治理和数据管理计划。 1. 数据采集的时间和人群范围 时间范围:数据将回顾性地收集自2009年1月1日至2024年12月31日期间,在多个具备消化内镜诊疗能力的医疗中心就诊的患者。 人群范围:纳入所有年龄≥18岁并在此期间接受过消化内镜检查(包括普通内镜、放大内镜、染色内镜、超声内镜)、病理检查、影像学检查(如CT、MRI、PET-CT)、生化检测和临床随访的患者。包括以下患者群体: (1)各种消化道疾病患者(如胃癌、食管癌、结直肠癌、胃肠道息肉、炎症性肠病、消化性溃疡等)。 (2)门诊就诊,仅有轻微消化功能障碍或不适,无其他消化道疾病病史,且最终诊断为胃肠镜检查无异常或慢性浅表性胃炎的个体作为对照人群。 2. 关键字段 为支持多模态数据分析和模型开发,本研究将采集以下关键字段: 人口学信息:年龄、性别、体重、身高、既往病史、幽门螺杆菌感染史、家族史、吸烟史、饮酒史等。 消化内镜图像数据:包括普通内镜、放大内镜、染色内镜及超声内镜生成的原始图像数据(如DICOM格式)。涵盖胃、十二指肠、结肠等部位的影像,重点为病灶部位的高分辨率影像(如黏膜病变、息肉、溃疡或肿瘤)。 病理诊断数据:获取消化道活检或手术标本的病理诊断结果以及图像数据(如SVS、TIFF格式),包括良性和恶性病变的组织学特征。包括胃癌、结直肠癌、腺瘤性息肉及其他消化道病变的组织学分类和分级。 影像学检查数据:CT和MRI影像数据:采集DICOM格式的消化道CT和MRI影像,用于评估肿瘤浸润深度、淋巴结转移及远处转移情况。PET-CT影像数据:获取消化道肿瘤代谢活性和全身转移情况的影像数据,用于辅助分期和治疗决策。 生化检测数据:包括CEA(癌胚抗原)、CA19-9(糖类抗原19-9)等肿瘤标志物,以及其他与消化道疾病相关的血液检测结果。 临床信息数据:包括患者的症状(如腹痛、呕血、黑便等)、体征、既往治疗记录、药物使用情况和截止2024年12月31日的随访数据。涵盖诊断信息、疾病分期、治疗方案及预后情况。存储为CSV或者Doc格式。 3. 数据治理计划 数据收集和存储:所有数据将从各参与医疗中心的数据管理中心(或网络中心、信息中心、PACS等,取决于各医学单位的名称)系统中以加密技术或随机代码遮蔽或移除患者个人信息脱敏后导出至复旦大学CFFF智能计算平台进行分析。数据收集过程将遵循各中心的伦理批准和数据保护条例,确保患者隐私和数据安全。 数据清洗和预处理:在数据导入数据库后,将进行数据清洗和预处理,处理缺失值、异常值和不一致的数据。所有数据字段将进行标准化,以确保不同中心数据的兼容性和可比性。 数据访问和使用:数据访问将严格控制在研究团队内部,所有数据访问和使用活动将被记录和监控,确保合规性和数据安全。 数据质量控制:定期进行数据质量审查和监控,确保数据的准确性和完整性。研究团队将制定详细的数据治理手册,规范数据的收集、处理和分析流程,并进行定期培训,以确保团队成员对数据治理标准的理解和执行。 数据安全和隐私保护:所有数据将在脱敏后存储和传输,使用加密技术保护数据安全。患者的个人信息将被移除或加密处理,确保隐私得到最大程度的保护。 缺失数据的处理: 数据追踪和治理:在数据采集阶段,研究者将建立数据检查流程,及时识别并补充缺失的记录,确保在可能的情况下最小化数据缺失的发生。同时,数据将通过标准化的数据管理平台存储,定期进行质量检查,以确保数据完整性和准确性。 缺失数据处理策略: 对于无法避免的缺失数据,研究者将根据具体情况采取合适的处理方法。首先,将对缺失数据的模式进行分析,评估其是否是随机缺失(Missing Completely at Random, MCAR)、条件随机缺失(Missing at Random, MAR)或非随机缺失(Missing Not at Random, MNAR)。 单一插补:对于少量且随机缺失的数据,将采用单一插补技术,例如均值插补或中位数插补,以保持样本的完整性。 多重插补:对于重要变量且具有一定缺失比例的数据,将使用多重插补(Multiple Imputation)技术。多重插补通过生成多个不同的插补数据集,并对其进行独立分析,再综合结果以减少插补的不确定性和偏倚。 处理策略选择的理由: 单一插补方法适用于缺失比例较低且随机分布的数据,能够简化处理过程,同时避免丢失过多的样本。 多重插补则适用于缺失率较高的重要变量,因为它能够更好地模拟数据的真实分布,减少偏倚对研究结果的影响。因此,多重插补将作为本研究的主要处理方法,以确保结果的稳健性和可靠性。

Data collection and Management (A standard data collection and management system include a CRF and an electronic data capture:

Data Governance/Data Management Plan: This study is a retrospective analysis aiming to evaluate the effectiveness of a fundamental model for digestive endoscopy diagnosis based on multimodal medical big data. To ensure the integrity, accuracy, and security of the data, the following data governance and data management plan is specially formulated. Time and Population Scope of Data Collection Time Scope: The data will be retrospectively collected from patients who visited multiple medical centers with the ability to diagnose and treat digestive endoscopy between January 1, 2009, and December 31, 2024. Population Scope: All patients aged 18 or above who underwent digestive endoscopy examinations (including ordinary endoscopy, magnifying endoscopy, chromoendoscopy, and endoscopic ultrasonography), pathological examinations, imaging examinations (such as CT, MRI, and PET-CT), biochemical tests, and clinical follow - ups during this period will be included. The patient groups include: (1) Patients with various digestive tract diseases (such as gastric cancer, esophageal cancer, colorectal cancer, gastrointestinal polyps, inflammatory bowel disease, peptic ulcer, etc.). (2) Individuals who visited the outpatient department, had only mild digestive dysfunction or discomfort, had no history of other digestive tract diseases, and were finally diagnosed with normal gastroscopy results or chronic superficial gastritis will be used as the control group. Key Fields To support multimodal data analysis and model development, the following key fields will be collected in this study: Demographic Information: Age, gender, weight, height, past medical history, Helicobacter pylori infection history, family history, smoking history, alcohol consumption history, etc. Digestive Endoscopy Image Data: This includes the original image data (such as in DICOM format) generated by ordinary endoscopy, magnifying endoscopy, chromoendoscopy, and endoscopic ultrasonography. It covers images of the stomach, duodenum, colon, etc., with a focus on high - resolution images of lesion sites (such as mucosal lesions, polyps, ulcers, or tumors). Pathological Diagnosis Data: Obtain the pathological diagnosis results and image data (such as in SVS and TIFF formats) of digestive tract biopsy or surgical specimens, including the histological characteristics of benign and malignant lesions. This includes the histological classification and grading of gastric cancer, cancer colorectal, adenomatous polyps, and other digestive tract lesions. Imaging Examination Data: CT and MRI Image Data: Collect DICOM - formatted CT and MRI images of the digestive tract for evaluating the depth of tumor invasion, lymph node metastasis, and distant metastasis. PET - CT Image Data: Obtain image data on the metabolic activity of digestive tract tumors and the situation of systemic metastasis for assisting in staging and treatment decision - making. Biochemical Test Data: This includes tumor markers such as CEA (carcinoembryonic antigen) and CA19 - 9 (carbohydrate antigen 19 - 9), as well as other blood test results related to digestive tract diseases. Clinical Information Data: This includes patients' symptoms (such as abdominal pain, hematemesis, melena, etc.), physical signs, past treatment records, medication use, and follow - up data as of December 31, 2024. It covers diagnostic information, disease staging, treatment plans, and prognosis. It will be stored in CSV or Doc format. Data Governance Plan Data Collection and Storage: All data will be exported from the data management centers (or network centers, information centers, PACS, etc., depending on the names of each medical unit) of each participating medical center to the CFFF Intelligent Computing Platform of Fudan University for analysis after desensitization by using encryption technology or random codes to shield or remove patients' personal information. The data collection process will comply with the ethical approvals and data protection regulations of each center to ensure patients' privacy and data security. Data Cleaning and Pre - processing: After the data is imported into the database, data cleaning and pre - processing will be carried out to handle missing values, outliers, and inconsistent data. All data fields will be standardized to ensure the compatibility and comparability of data from different centers. Data Access and Use: Data access will be strictly restricted to the research team. All data access and use activities will be recorded and monitored to ensure compliance and data security. Data Quality Control: Regular data quality reviews and monitoring will be carried out to ensure the accuracy and integrity of the data. The research team will formulate a detailed data governance manual to standardize the data collection, processing, and analysis processes and conduct regular training to ensure that team members understand and implement the data governance standards. Data Security and Privacy Protection: All data will be stored and transmitted after desensitization, and encryption technology will be used to protect data security. Patients' personal information will be removed or encrypted to ensure maximum privacy protection. Handling of Missing Data: Data Tracking and Governance: During the data collection stage, the researchers will establish a data inspection process to promptly identify and supplement missing records, minimizing the occurrence of data missing whenever possible. At the same time, the data will be stored through a standardized data management platform, and regular quality checks will be carried out to ensure data integrity and accuracy. Missing Data Handling Strategies: For unavoidable missing data, the researchers will adopt appropriate handling methods according to the specific situation. First, the pattern of missing data will be analyzed to evaluate whether it is Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing Not at Random (MNAR). Single Imputation: For a small amount of randomly missing data, single imputation techniques, such as mean imputation or median imputation, will be used to maintain the integrity of the sample. Multiple Imputation: For important variables with a certain proportion of missing data, Multiple Imputation technology will be used. Multiple imputation generates multiple different imputed data sets, analyzes them independently, and then synthesizes the results to reduce the uncertainty and bias of imputation. Reasons for Selecting Handling Strategies: The single imputation method is suitable for data with a low missing proportion and random distribution, which can simplify the handling process and avoid losing too many samples. Multiple imputation is suitable for important variables with a high missing rate because it can better simulate the real distribution of the data and reduce the impact of bias on the research results. Therefore, multiple imputation will be the main handling method in this study to ensure the robustness and reliability of the results.

数据与安全监察委员会:

Data and Safety Monitoring Committee:

暂未确定/Not yet

注册人:

Name of Registration:

 2026-01-29 18:05:18