Nhảy đến nội dung

Seminar khoa hoc

9.00 AM – 10.00 AM: Dr. Nguyen Sy Dung trình bày về: "Clustering Validity Index Deriving from Accumulation of Fuzzy-set-based Local Risk"

Tóm tắt:

Evaluation of clustering validity to set up an optimal cluster data space (CDS) is a vital task in many fields related to data mining. Many clustering validity indexes (CVI) have been proposed, however, their unstable effectiveness, especially in dealing with noisy databases, is still an aspect needing to be improved. Here, we present a new CVI deriving from the accumulation of fuzzy-set-based local risk named fRisk2 and propose a new algorithm fRisk2-bA for determining the optimal number of data clusters (Copt_CVI) for centroid-based clustering. First, we adopt fuzzy set relationships to track two unwanted distribution trends in the created CDS during the clustering process to set up the two features named fL(C) and fH(C). We then depict and prove the variation tendencies of fL(C) and fH(C) theoretically to set up and demonstrate the variation characteristics of fRisk2. Based on this theoretical basis, we clarify the relationship between fRisk2 and clustering validity to exploit fRisk2 effectively in evaluating the CDS and determining Copt_CVI. The obtained survey results reflect that the higher accuracy, stability, and convergence of fRisk2 are the outstanding advantages, even if facing noise in the databases, especially measurement noise.