[1] Improved Clustering Methodology for Lung Cancer Disease Prediction
Alka Kumari & Dr.Megha Kamble
As we can say, big data is defined as large quantity of data which requires new technologies
and methodologies to make it possible to obtain value from it by using various types of
process. In present day, an effective big data tool is required to process the large amount of
data for extraction of useful pattern which will be effective in diagnosis process of critical
disease such as lung cancer. There are various types of big data tool which is useful in data
storage and in its processing. Hadoop is one kind of big data tool which provide a best way to
deal with large amount of data in an efficient way. Lots of research work has been used to
process the lung cancer data for easily prediction of disease. Lung cancer is the second
deadliest disease among all types of cancer. Prediction of lung cancer is not an easy task
because of its dependency on multiple attributes. In this paper, we are going to work with
lung cancer dataset, which is basically associated with some noisy values, missing values and
high dimensional data which is not suitable for classification approach. So, in this paper we
are going to apply an improved clustering methodology in an unsupervised manner. With the
help of modified foggy k means methodology it will be an easy task to deal with the lung
cancer dataset so we can get better result for its prediction as compared with existing
methodology. With the help of C4.5 classification approach we can easily get a better solution
for lung cancer disease prediction on which suitable treatment may be helpful in easily
diagnosis of disease.