基于机器学习的中文文本分类器辅助SRMD合理用药审查研究

注册号:

Registration number:

ChiCTR2600126317 

最近更新日期:

Date of Last Refreshed on:

2026-06-07 10:16:57 

注册时间:

Date of Registration:

2026-06-07 00:00:00 

注册号状态:

补注册

Registration Status:

Retrospective registration

注册题目:

基于机器学习的中文文本分类器辅助SRMD合理用药审查研究

Public title:

Machine Learning-Based Chinese Text Classification to Support Medication Appropriateness Review in Stress-Related Mucosal Disease

注册题目简写:

English Acronym:

研究课题的正式科学名称:

基于机器学习的中文文本分类器辅助SRMD合理用药审查研究

Scientific title:

Machine Learning-Based Chinese Text Classification to Support Medication Appropriateness Review in Stress-Related Mucosal Disease

研究课题代号(代码):

Study subject ID:

在二级注册机构或其它机构的注册号:

The registration number of the Partner Registry or other register:

申请注册联系人:

姜伟 

研究负责人:

姜伟 

Applicant:

Wei Jiang 

Study leader:

Wei Jiang 

申请注册联系人电话:

Applicant telephone:

+86 15067028867

研究负责人电话:

Study leader's
telephone:

+86 570 3121128

申请注册联系人传真 :

Applicant Fax:

研究负责人传真:

Study leader's fax:

申请注册联系人电子邮件:

Applicant E-mail:

javenwei1995@163.com

研究负责人电子邮件:

Study leader's E-mail:

justview1995@163.com

申请单位网址(自愿提供):

Applicant website(voluntary supply):

研究负责人网址(自愿提供):

Study leader's website(voluntary supply):

申请注册联系人通讯地址:

浙江省衢州市闽江大道100号

研究负责人通讯地址:

浙江省衢州市闽江大道100号

Applicant address:

100 Minjiang Avenue, Smart New City, Quzhou, Zhejiang

Study leader's address:

100 Minjiang Avenue, Smart New City, Quzhou, Zhejiang

申请注册联系人邮政编码:

Applicant postcode:

研究负责人邮政编码:

Study leader's postcode:

申请人所在单位:

温州医科大学附属衢州医院(衢州市人民医院)

Applicant's institution:

The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital

研究负责人所在单位:

衢州市人民医院

Affiliation of the Leader:

Quzhou People's Hospital

是否获伦理委员会批准:

Approved by ethic committee:

Yes

伦理委员会批件文号:

Approved No. of ethic committee:

2026-研013

伦理委员会批件附件:

Approved file of Ethical Committee:

查看附件View

批准本研究的伦理委员会名称:

衢州市人民医院医学伦理审查委员会

Name of the ethic committee:

Ethics Committee of Quzhou People’s Hospital

伦理委员会批准日期:

Date of approved by ethic committee:

2026-01-09 00:00:00

伦理委员会联系人:

余洁

Contact Name of the ethic committee:

Yu Jie

伦理委员会联系地址:

浙江省衢州市闽江大道100号

Contact Address of the ethic committee:

100 Minjiang Avenue, Smart New City, Quzhou, Zhejiang

伦理委员会联系人电话:

Contact phone of the ethic committee:

+86 570 3123305

伦理委员会联系人邮箱:

Contact email of the ethic committee:

yj411@126.com

研究实施负责(组长)单位:

衢州市人民医院

Primary sponsor:

Quzhou People's Hospital

研究实施负责(组长)单位地址:

浙江省衢州市闽江大道100号

Primary sponsor's address:

100 Minjiang Avenue, Smart New City, Quzhou, Zhejiang

试验主办单位(项目批准或申办者):

Secondary sponsor:

国家:

中国

省(直辖市):

浙江省

市(区县):

Country:

China

Province:

Zhejiang

City:

单位(医院):

衢州市人民医院

具体地址:

浙江省衢州市

Institution
hospital:

Quzhou People's Hospital

Address:

100 Minjiang Avenue, Smart New City, Quzhou, Zhejiang

经费或物资来源:

自选课题(自筹)

Source(s) of funding:

Self-funded by the research team

研究疾病:

应激相关性黏膜病变  

Target disease:

Stress-Related Mucosal Disease, SRMD

研究疾病代码:

Target disease code:

研究类型:

观察性研究

Study type:

Observational study

研究所处阶段:

其它 

Study phase:

N/A

研究设计:

析因分组(即根据危险因素或暴露因素分组) 

Study design:

Factorial 

研究目的:

为了弥补现有的合理用药机器审查多为基于可编码信息而忽略了非结构化的自然语言病历等信息,导致潜在的用药风险被遗漏的缺陷,本文提出并验证一种基于文本分类的机器学习框架,以自动识别病历中“预防性”与“非预防性”用药指征,进而辅助判断质子泵抑制剂(PPIs)预防性使用的合理性。  

Objectives of Study:

To address the limitations of existing machine-driven rational drug use review systems that predominantly rely on codifiable information and neglect unstructured data such as EHRs documented in natural language, this study proposes and validates a text classification-based machine learning framework for the automatic identification of “prophylactic” and “non-prophylactic” medication indications in EHRs, thereby facilitating the assessment of the rationality of prophylactic proton pump inhibitor (PPI) use.

药物成份或治疗方案详述:

 

Description for medicine or protocol of treatment in detail:

 

纳入标准:

Inclusion criteria

排除标准:

Exclusion criteria:

None

研究实施时间:

Study execute time:

From 2025-12-01 00:00:00 To 2025-12-31 00:00:00  

征募观察对象时间:

Recruiting time:

From 2025-12-02 00:00:00 To 2025-12-20 00:00:00

干预措施:

Interventions:

组别:

有预防性使用质子泵抑制剂特征的病历段数 vs 非预防性使用质子泵抑制剂特征的病历段数(407 vs 3288,共3695)

样本量:

57

Group:

Number of medical record sections with features of prophylactic use of proton pump inhibitors vs.Number of medical record sections without features of prophylactic proton pump inhibitor use

Sample size:

干预措施:

干预措施代码:

Intervention:

None

Intervention code:

研究实施地点:

Countries of recruitment and research settings:

国家:

中国

省(直辖市):

浙江省 

市(区县):

 

Country:

China

Province:

Zhejiang

City:

单位(医院):

衢州市人民医院 

单位级别:

三级甲等 

Institution
hospital:

Quzhou People's Hospital

Level of the institution:

Tertiary A

测量指标:

Outcomes:

指标中文名:

f1-score

指标类型:

主要指标

Outcome:

f1-score

Type:

Primary indicator

测量时间点:

模型在独立测试集上评估完成时

测量方法:

使用 Python 中 scikit-learn 库的 f1_score 函数,基于测试集的精确率和召回率计算,是二者的调和平均数。

Measure time point of outcome:

At the end of model evaluation on the independent test set

Measure method:

Calculated using the f1_score function from the scikit-learn library in Python, as the harmonic mean of precision and recall on the test set.

指标中文名:

召回率

指标类型:

主要指标

Outcome:

Recall

Type:

Primary indicator

测量时间点:

模型在独立测试集上评估完成时

测量方法:

使用 Python 中 scikit-learn 库的 recall_score 函数,基于测试集的预测标签与真实标签计算,评估所有真实为阳性的样本中被正确预测为阳性的比例。

Measure time point of outcome:

At the end of model evaluation on the independent test set

Measure method:

Calculated using the recall_score function from the scikit-learn library in Python, based on predicted labels and true labels on the test set, to evaluate the proportion of true positives that were correctly identified by the model.

指标中文名:

精确率

指标类型:

主要指标

Outcome:

Precision

Type:

Primary indicator

测量时间点:

模型在独立测试集上评估完成时

测量方法:

使用 Python 中 scikit-learn 库的 precision_score 函数,基于测试集的预测标签与真实标签计算,评估所有被预测为阳性的样本中实际为阳性的比例。

Measure time point of outcome:

At the end of model evaluation on the independent test set

Measure method:

Calculated using the precision_score function from the scikit-learn library in Python, based on predicted labels and true labels on the test set, to evaluate the proportion of true positives among all predicted positives.

指标中文名:

准确率

指标类型:

次要指标

Outcome:

Accuracy

Type:

Secondary indicator

测量时间点:

模型在独立测试集上评估完成时

测量方法:

使用 Python 中 scikit-learn 库的 accuracy_score 函数,基于测试集的预测标签与真实标签计算,评估模型正确分类的样本比例。

Measure time point of outcome:

At the end of model evaluation on the independent test set

Measure method:

Calculated using the accuracy_score function from the scikit-learn library in Python, based on predicted labels and true labels on the test set, to evaluate the proportion of correctly classified samples.

采集人体标本:

Collecting sample(s)
from participants:

标本中文名:

组织:

Sample Name:

NA

Tissue:

人体标本去向

其它  

说明

Fate of sample:

0thers  

Note:

征募研究对象情况:

Recruiting status:

结束

/Completed

年龄范围:

Participant age:

最小 Min age 0 years
最大 Max age years

性别:

男女均可

Gender:

Both

随机方法(请说明由何人用什么方法产生随机序列):

由研究人员使用 Python 中的 scikit-learn 库,调用 train_test_split 函数,设置 test_size=0.2,并指定随机种子(random_state)以确保可重复性,对全部样本进行 8:2 的训练集与测试集划分。

Randomization Procedure (please state who generates the random number sequence and by what method):

The research team used the train_test_split function from the scikit-learn library in Python to perform an 8:2 split of the entire dataset into training and test sets. A fixed random seed was set to ensure reproducibility.

是否公开试验完成后的统计结果:

Calculated Results after the Study Completed public access:

不公开/Private

盲法:

开放标签

Blinding:

Open-label study

是否共享原始数据:

IPD sharing

是Yes

共享原始数据的方式(说明:请填入公开原始数据日期和方式,如采用网络平台,需填该网络平台名称和网址):

优先通过ResMan 平台(www.medresman.org.cn)实现共享,将去标识化后的数据上传至该平台,供公众在线浏览;同时,将数据同步存储至 Dryad 或 Figshare 等国际通用学术数据平台,并获取数据集唯一标识(如 DOI)。

The way of sharing IPD”(include metadata and protocol, If use web-based public database, please provide the url):

The de-identified data will be shared primarily through the ResMan platform (www.medresman.org.cn), allowing public online browsing. Meanwhile, the data will be synchronously stored in international academic data repositories such as Dryad or Figshare to obtain a unique dataset identifier (e.g., DOI). The ResMan platform only supports browsing; for data download, applicants must contact the research team directly, and the data will be provided after review and approval. Data stored in international repositories can be accessed and downloaded directly via DOI, with download permissions set to "open access" (for non-commercial academic use).

数据采集和管理(说明:数据采集和管理由两部分组成,一为病例记录表(Case Record Form, CRF),二为电子采集和管理系统(Electronic Data Capture, EDC),如ResMan即为一种基于互联网的EDC:

本研究数据来源于合作医院的医院信息系统(HIS)。数据采集后,先完成病例记录表(CRF)的规范化填写,将元数据(如数据采集时间、采集人员信息)完整转录至 CRF 表;再通过文本向量化技术(TF-IDF)将非结构化文本数据转换为结构化特征,最终存储于本地加密 MySQL 数据库(数据库加密算法:AES-256,访问需通过双重身份验证)。数据管理采用 “CRF 表 + ResMan 平台” 的双轨模式,ResMan 平台用于在线实时质量控制(如数据逻辑校验、缺失值提醒),本地数据库用于数据备份,所有数据操作均留存日志(包括操作人、操作时间、操作内容),确保数据可追溯。

Data collection and Management (A standard data collection and management system include a CRF and an electronic data capture:

The data of this study are sourced from the Hospital Information System (HIS) of cooperative hospitals. After collection, the Case Report Form (CRF) is filled out standardizedly, with metadata (e.g., data collection time, collector information) completely transcribed into the CRF. Unstructured text data are then converted into structured features using text vectorization technology (TF-IDF), and finally stored in a locally encrypted MySQL database (database encryption algorithm: AES-256, access requires two-factor authentication). Data management adopts a dual-track model of "CRF + ResMan platform": the ResMan platform is used for online real-time quality control (e.g., data logic verification, missing value reminders), and the local database is used for data backup. All data operations are logged (including operator, operation time, and operation content) to ensure data traceability.

数据与安全监察委员会:

Data and Safety Monitoring Committee:

无/No

注册人:

Name of Registration:

 2026-06-07 10:15:58