python实现Kmeans++算法
K-Means 是一种用于初始化K-Means聚类的方法,它的目的是通过选择合理的初始点来优化K-Means聚类的性能。
K-Means算法的基本流程是:
- 随机选择K个初始聚类中心
- 对于每个数据点,计算它到每个聚类中心的距离,并将其分配到距离最近的聚类中心
- 对于每个聚类,计算所有数据点的平均值,并将其作为新的聚类中心
- 重复步骤2和3直到聚类中心不再改变或达到最大迭代次数
K-Means 算法的基本思路是:
- 随机选择一个数据点作为第一个聚类中心
- 对于每个数据点,计算它到最近聚类中心的距离,并将距离存储在一个列表中
- 将距离列表当做概率分布,并从中选择一个新的聚类中心
- 重复步骤2和3,直到选择了K个聚类中心
下面是一段python代码的实例:
-
import random
-
import math
-
class KMeansPlusPlus:
-
def __init__(self, data_points, K, max_iterations):
-
self.data_points = data_points
-
self.K = K
-
self.max_iterations = max_iterations
-
self.random = random.Random()
-
-
def cluster(self):
-
# Initialize centroids list
-
centroids = []
-
# Randomly select first centroid
-
centroids.append(self.data_points[self.random.randint(0, len(self.data_points))].centroid)
-
-
# Select remaining centroids
-
for i in range(1, self.K):
-
# Calculate distance of each data point to nearest centroid
-
distances = [dp.distance_to_nearest_centroid(centroids) for dp in self.data_points]
-
# Convert distances to probability distribution
-
sum_distances = sum(distances)
-
probabilities = [d / sum_distances for d in distances]
-
# Randomly select a new centroid from probability distribution
-
r = self.random.random()
-
cumulative_probability = 0
-
for j, p in enumerate(probabilities):
-
cumulative_probability = p
-
if r <= cumulative_probability:
-
centroids.append(self.data_points[j].centroid)
-
break
-
-
# Run K-Means algorithm
-
kmeans = KMeans(self.data_points, centroids, self.max_iterations)
-
return kmeans.cluster()
-
-
class DataPoint:
-
def __init__(self, coordinates):
-
self.coordinates = coordinates
-
self.centroid = None
-
-
def distance_to_nearest_centroid(self, centroids):
-
min_distance = float("inf")
-
for centroid in centroids:
-
distance = self.euclidean_distance(centroid.coordinates)
-
if distance < min_distance:
-
min_distance = distance
-
return min_distance
-
-
def euclidean_distance(self, coordinates):
-
sum_squared_distance = 0
-
for i in range(len(self.coordinates)):
-
sum_squared_distance = math.pow(self.coordinates[i] - coordinates[i], 2)
-
return math.sqrt(sum_squared_distance)
-
-
class Centroid:
-
def __init__(self, coordinates):
-
self.coordinates = coordinates
-
self.data_points = []
-
-
def update_coordinates(self):
-
num_data_points = len(self.data_points)
-
new_coordinates = [0] * len(self.coordinates)
-
for data_point in self.data_points:
-
for i in range(len(new_coordinates)):
-
new_coordinates[i] = data_point.coordinates[i]
-
for i in range(len(new_coordinates)):
-
new_coordinates[i] /= num_data_points
-
self.coordinates = new_coordinates
-
-
class KMeans:
-
def __init__(self, data_points, centroids, max_iterations):
-
self.data_points = data_points
-
self.centroids = centroids
-
self.max_iterations = max_iterations
-
-
def cluster(self):
-
for _ in range(self.max_iterations):
-
# Clear data points belonging to each centroid
-
for centroid in self.centroids:
-
centroid.data_points.clear()
-
-
# Assign each data point to nearest centroid
-
for data_point in self.data_points:
-
min_distance = float("inf")
-
nearest_centroid = None
-
for centroid in self.centroids:
-
distance = data_point.euclidean_distance(centroid.coordinates)
-
if distance < min_distance:
-
min_distance = distance
-
nearest_centroid = centroid
-
nearest_centroid.data_points.append(data_point)
-
data_point.centroid = nearest_centroid
-
-
# Update centroid coordinates
-
for centroid in self.centroids:
-
centroid.update_coordinates()
-
-
return self.centroids
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /boutique/detail/tanhiaacak
系列文章
更多
同类精品
更多
-
photoshop保存的图片太大微信发不了怎么办
PHP中文网 06-15 -
word里面弄一个表格后上面的标题会跑到下面怎么办
PHP中文网 06-20 -
photoshop扩展功能面板显示灰色怎么办
PHP中文网 06-14 -
《学习通》视频自动暂停处理方法
HelloWorld317 07-05 -
Android 11 保存文件到外部存储,并分享文件
Luke 10-12 -
TikTok加速器哪个好免费的TK加速器推荐
TK小达人 10-01 -
微信公众号没有声音提示怎么办
PHP中文网 03-31 -
excel下划线不显示怎么办
PHP中文网 06-23 -
excel打印预览压线压字怎么办
PHP中文网 06-22 -
微信运动停用后别人还能看到步数吗
PHP中文网 07-22