您当前的位置: > 详细浏览

三种数据挖掘算法在电子病历知识发现中的比较

Comparison of Three Data Mining Algorithms in Knowledge Discovery of Electronic Medical Records

摘要:【目的】从异构的电子病历数据中发现疾病危险因素, 为数据挖掘与知识发现提供借鉴。【方法】选取集各种结构为一身的临床电子病历数据, 利用决策树、逻辑回归和神经网络三种数据挖掘算法分别建立疾病危险因素预测模型, 对三种预测模型进行比较分析和统计学评价。【结果】决策树预测模型在查准率、召回率上高于逻辑回归和神经网络, 在总体性能上决策树最优, 但三者差别不大。【局限】未对电子病历属性进行优化选择。【结论】决策树在危险因素的发现与疾病的预测方面优于逻辑回归和神经网络。研究中建立基于数据挖掘算法的异构数据源知识发现框架, 为今后领域知识发现和知识库构建以及数据挖掘算法的选择提供一定借鉴和参考。

英文摘要:【Objective】Disease risk factors were discovered from heterogeneous electronic medical record data to provide reference for data mining and knowledge discovery. 【Method】Clinical electronic medical record data with various structures were selected, and three data mining algorithms, decision tree, logistic regression and neural network, were used to establish disease risk factor prediction models, and the three prediction models were compared and analyzed statistically. . [Results] The precision and recall of the decision tree prediction model are higher than those of logistic regression and neural network, and the overall performance of the decision tree is the best, but there is little difference between the three. [Limitations] The attributes of electronic medical records are not optimized. 【Conclusion】Decision tree is superior to logistic regression and neural network in the discovery of risk factors and prediction of disease. In the research, a knowledge discovery framework of heterogeneous data sources based on data mining algorithm is established, which provides certain reference and reference for the future domain knowledge discovery and knowledge base construction and the selection of data mining algorithms.

版本历史

[V1] 2017-10-11 13:20:06 chinaXiv:201711.01190V1 下载全文
点击下载全文
许可声明
metrics指标
  • 点击量5349
  • 下载量1342
评论
分享
邀请专家评阅