IJLNCT

As we can say, big data is defined as large quantity of data which requires new technologies and methodologies to make it possible to obtain value from it by using various types of process. In present day, an effective big data tool is required to process the large amount of data for extraction of useful pattern which will be effective in diagnosis process of critical disease such as lung cancer. There are various types of big data tool which is useful in data storage and in its processing. Hadoop is one kind of big data tool which provide a best way to deal with large amount of data in an efficient way. Lots of research work has been used to process the lung cancer data for easily prediction of disease. Lung cancer is the second deadliest disease among all types of cancer. Prediction of lung cancer is not an easy task because of its dependency on multiple attributes. In this paper, we are going to work with lung cancer dataset, which is basically associated with some noisy values, missing values and high dimensional data which is not suitable for classification approach. So, in this paper we are going to apply an improved clustering methodology in an unsupervised manner. With the help of modified foggy k means methodology it will be an easy task to deal with the lung cancer dataset so we can get better result for its prediction as compared with existing methodology. With the help of C4.5 classification approach we can easily get a better solution for lung cancer disease prediction on which suitable treatment may be helpful in easily diagnosis of disease.