| 18 | 0 | 47 |
| 下载次数 | 被引频次 | 阅读次数 |
目的 评估以传统药性特征为输入变量的机器学习模型对中药抗肿瘤活性的预测可行性及其预测性能的上限,并识别统计贡献度较高的核心药性特征。方法 整合《中药学》、SymMap数据库及多个平台文献检索结果,构建抗肿瘤中药标注数据集,提取四气、五味、归经、毒性、功效五类特征并进行独热编码。在嵌套交叉验证框架下,结合树结构帕尔森估计器(tree-structured parzen estimator,TPE)的贝叶斯优化算法,对8种模型进行训练与超参数优化;以独立测试集验证泛化性能,并采用沙普利加性解释(shapley additive explanations,SHAP)方法对模型进行可解释性分析。结果 8种模型曲线下面积(area under the curve,AUC)均值集中于0.594~0.629区间,独立测试集上所有模型AUC值在0.510~0.565之间。SHAP分析显示,“苦味”“肝经”“辛味”是跨模型稳定的核心药性特征。结论 传统药性特征对中药抗肿瘤活性具有一定统计预测能力,但存在明显信息量上限。本研究为后续多模态融合研究提供了性能基线及假说构建参照。
Abstract:Objective To evaluate the feasibility and upper information limit of machine learning models using traditional medicinal property features as input variables for predicting the antitumor activity of traditional Chinese medicine(TCM),and to identify core medicinal property features with high statistical contribution. Methods An annotated dataset of antitumor TCM was constructed by integrating information from Chinese Materia Medica,the SymMap database,and multi-platform literature searches. Five categories of features including four natures,five flavors, meridian tropism, toxicity, and efficacy were extracted and subjected to one-hot encoding. Within a nested cross-validation framework, eight models were trained and hyperparameter-optimized using Bayesian optimization algorithm with the treestructured parzen estimator(TPE). Generalization performance was further validated on an independent test set,and model interpretability was analyzed via the shapley additive explanations(SHAP) method. Results The mean area under the curve(AUC) values of the eight models ranged from 0.594 to 0.629. On the independent test set,the AUC of all models fell between 0.510 and 0.565. SHAP analysis revealed that “bitter flavor”,“liver meridian”,and “pungent flavor” were the core medicinal property features that were robust across models. Conclusion Traditional medicinal property features exhibit a certain statistical predictive power for the antitumor activity of TCM, yet a distinct upper information limit exists. This study provides a methodological baseline and hypothetical reference for subsequent multimodal fusion research.
[1]袁伟琛,周红光,李文婷,等.中医药防治恶性肿瘤的现状、优势及策略[J].南京中医药大学学报,2025,41(6):707-720.
[2]张文政,毛许庆,孙雪妮,等.中西医结合分子配伍治疗肿瘤协同增效及逆转耐药的研究进展[J].中国肿瘤临床,2021,48(11):566-570.
[3]刘竹君,张婉怡,李之阳,等.基于CiteSpace和VOSviewer的中医药辅助放化疗治疗恶性肿瘤的可视化分析[J].中药新药与临床药理,2025,36(5):825-835.
[4]MIAO K, LIU W, XU J, et al. Harnessing the power of traditional Chinese medicine monomers and compound prescriptions to boost cancer immunotherapy[J]. Frontiers in Immunology,2023,14:1277243.
[5]乔塬淏,谢虹亭,胡馨雨,等.基于Voting集成算法的中药抗炎预测模型的构建[J].中草药,2025,56(15):5529-5537.
[6]杨淇,郝二伟,侯小涛,等.基于药性理论的中药抗辐射预测模型的构建[J].中草药,2024,55(8):2684-2693.
[7]LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2017:4768-4777.
[8]高学敏,钟赣生.中药学[M]. 2版.北京:人民卫生出版社,2012:174-1988.
[9]南京中医药大学.中药大辞典[M]. 2版.上海:上海科学技术出版社,2014:1-3874.
[10]OKADA S, OHZEKI M, TAGUCHI S. Efficient partition of integer optimization problems with one-hot encoding[J]. Scientific Reports,2019,9(1):13036.
[11]LIU F T, TING K M, ZHOU Z H. Isolation-based anomaly detection[J]. ACM Transactions on Knowledge Discovery from Data,2012,6(1):1-39.
[12]COX D R. The regression analysis of binary sequences[J]. Journal of the Royal Statistical Society:Series B(Methodological),1959,21(1):238-238.
[13]CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning,1995,20(3):273-297.
[14]BREIMAN L, FRIEDMAN J H, OLSHEN R A, et al.Classification and regression trees(CART)[J]. Biometrics,1984,40(3):358.
[15]BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.
[16]FRIEDMAN J H. Greedy function approximation:a gradient boosting machine[J]. Annals of Statistics,2001,29(5):1189-1232.
[17]CHEN T, GUESTRIN C. XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco:ACM,2016:785-794.
[18]KE G, MENG Q, FINLEY T, et al. LightGBM:a highly efficient gradient boosting decision tree[C]//Advances in Neural Information Processing Systems 30. Long Beach:Curran Associates,Inc.,2017:3146-3154.
[19]PROKHORENKOVA L, GUSEV G, VOROBEV A, et al.CatBoost:unbiased boosting with categorical features[C]//Advances in Neural Information Processing Systems 31. Red Hook:Curran Associates,Inc.,2018:6637-6647.
[20]BERGSTRA J, BARDENET R, BENGIO Y, et al. Algorithms for hyper-parameter optimization[C]//Advances in Neural Information Processing Systems 24. Red Hook:Curran Associates,Inc.,2011:2546-2554.
[21]MANDREKAR J N. Receiver operating characteristic curve in diagnostic test assessment[J]. Journal of Thoracic Oncology,2010,5(9):1315-1316.
[22]LOBO J M, JIMÉNEZ-VALVERDE A, REAL R. AUC:a misleading measure of the performance of predictive distribution models[J]. Global Ecology and Biogeography, 2008, 17(2):145-151.
[23]POWERS D M W. Evaluation:from precision, recall and Fmeasure to ROC, informedness, markedness and correlation[J].Journal of Machine Learning Research,2011,12:2825-2831.
[24]GANGOPADHYAY A, CHAKRABORTY S, JASH S K, et al.Cytotoxicity of natural flavones and flavonols against different cancer cells[J]. Journal of the Iranian Chemical Society, 2022,19:1547-1573.
[25]夏霁,韩凤娟.基于中药药性理论的中药单体联合治疗化疗耐药型复发性卵巢癌可行性分析[J].江苏中医药,2020,52(5):75-78.
[26]闻晓琳,程海波,李柳,等.理气解郁法防治早期恶性肿瘤的应用探析[J].中医杂志,2022,63(6):581-583,594.
[27]孙晓荷,李柳,王俊壹,等.基于癌毒病机理论探讨恶性肿瘤癌变病机特点[J].中华中医药杂志,2025,40(4):1651-1654.
[28]姜晓丹,崔欣怡,油红捷,等.基于《中华医典》文献挖掘中医治疗积聚的用药规律研究[J].世界中医药,2024,19(1):82-87.
基本信息:
DOI:10.19811/j.cnki.ISSN2096-6628.2026.03.014
中图分类号:R273
引用信息:
[1]胡馨雨,乔塬淏,谢虹亭,等.基于传统药性理论的抗肿瘤中药预测潜力探索研究[J].中医肿瘤学杂志,2026,8(02):104-112.DOI:10.19811/j.cnki.ISSN2096-6628.2026.03.014.
基金信息:
国家自然科学基金面上项目(编号:8257153067); 中国中医科学院望京医院高水平中医医院建设项目(编号:WJZJ-202305); 中国中医科学院科技创新工程项目(编号:CI2026A03811)
2026-03-25
2026-03-25