爱彼迎 python_利用数据科学使您的下一次波士顿爱彼迎之旅

2023-11-30 00:40

本文主要是介绍爱彼迎 python_利用数据科学使您的下一次波士顿爱彼迎之旅,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

爱彼迎 python

介绍 (Introduction)

Boston is the capital and the most populous city in the State of Massachusetts in the United States. Its economy, culture, history, and education attract hundreds-thousands of tourists each year. I have been eager to travel to this beautiful city so long and eventually make my plan this March. However, an unexpected global pandemic locks me down in NYC and delays my plan. After staying home, I have been planning the next trip to Airbnb Boston with data science techniques. I think that infuse data science into a trip plan would be more scientific and interesting. If you are interested, you can also consider this blog as a funny and may-be insightful guidance for your next trip to Boston.

乙 oston是资本和人口最多的城市在马萨诸塞州的美国国家。 它的经济,文化,历史和教育每年吸引成千上万的游客。 我一直渴望去这个美丽的城市这么长时间,并最终在今年三月制定我的计划。 但是,一场出乎意料的全球大流行将我锁定在纽约市,并推迟了我的计划。 待在家里之后,我一直在计划使用数据科学技术前往波士顿Airbnb的行程。 我认为将数据科学融入旅行计划将更加科学和有趣。 如果您有兴趣,也可以将此博客视为下一次波士顿旅行的有趣且可能很有见地的指南。

The open dataset I will use coming from here and complied on 10 June 2020. The original dataset consists of 3440 listings, 16 features of Airbnb across 25 Boston neighborhoods. In this post, I will provide you with data visualization and machine learning solutions for three main questions that you would care about: Location: What regions do you have more choice or you are more likely to stay in Boston Airbnb? Room Type: What types of rooms are most popular for stay? Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?

我将使用的开放数据集来自此处,并于2020年6月10日完成。原始数据集包含3440个列表,横跨25个波士顿社区的16个Airbnb功能。 在本文中,我将为您解决三个主要问题,为您提供数据可视化和机器学习解决方案:位置:您在哪些地区有更多选择,或者您更有可能留在Boston Airbnb? 房间类型:最受欢迎的住宿类型是什么类型的房间? 价格:影响价格的重要特征是什么? 您能预测波士顿爱彼迎的价格吗?

初步数据可视化 (Preliminary Data Visualization)

Firstly, I review paired relations using seaborn. Which provides general information and patterns among 9 usefully numerical features.

首先,我使用seaborn来回顾配对关系。 它提供了9个有用的数字功能中的常规信息和模式。

Image for post
Figure 1.
图1。

Some insightful points (Check Figure 1.):

一些有见地的要点(请参见图1):

  1. Latitude: From South 42.25 to North 42.40, the number of listings for Boston Airbnb increases.

    纬度:从南部42.25到北部42.40,波士顿Airbnb的房源数量增加。
  2. Longitude: From West -71.15 to East -71.00, the number of listings for Boston Airbnb increases.

    经度:从西部-71.15到东部-71.00,波士顿Airbnb的房源数量增加。
  3. The Number of Reviews and Reviews Per Month has a positive correlation.

    每月的评论数和评论数具有正相关关系。

Then I applied Spearman Correlation Heatmap(Figure 2.) to review the correlation among 9 features.

然后,我应用了Spearman Correlation Heatmap(图2)来审查9个特征之间的相关性。

Image for post
Figure 2. Spearman Correlation Heatmap with 9 features
图2.具有9个功能的Spearman相关热图

Some insightful points:

一些有见地的要点:

  1. Latitude is positively correlated with the price (r=0.31) and longitude (r=0.30).

    纬度与价格(r = 0.31)和经度(r = 0.30)正相关。
  2. The number of reviews and reviews per month are positively correlated (r=0.44).

    评论数与每月评论数呈正相关(r = 0.44)。
  3. Availability 365 and calculated host listings count are positively correlated (r=0.25).

    可用性365和计算出的主机列表数量呈正相关(r = 0.25)。

Furthermore, for analytical purposes, I also deal with outliers and remove the rows that price is above $500, dummy the features in room type, and exclude the minor room types (Figure 3.): Hotel room, Shared room.

此外,出于分析目的,我还处理了异常值,并删除了价格高于500美元的行, 虚拟了房间类型的功能 ,并排除了次要房间类型(图3.):酒店房间,共享房间。

Image for post
Figure 3.
图3。

Therefore, The new Spearman Correlation Heatmap with 11 features(Figure 4.) should be more accurate.

因此,具有11个功能的新Spearman相关热图(图4)应该更准确。

Image for post
Figure 4. Spearman Correlation Heatmap with 11 features
图4.具有11个功能的Spearman相关热图

Lastly, the more important points:

最后,更重要的一点是:

  1. Price and Home Type: Entire home/apt is positively correlated to price (r=0.67). The private room is negatively correlated to price (r=-0.66). The average price for Entire home/apt is higher than the average price for the private room (will be shown later).

    价格和房屋类型:整个房屋/公寓与价格成正相关(r = 0.67)。 私人房间与价格负相关(r = -0.66)。 整套房子/公寓的平ASP格高于私人房间的平ASP格(稍后显示)。

2. Latitude is positively correlated to price. As latitude increases from south to north, the prices of Airbnb may tend to increase.

2.纬度与价格成正比。 随着纬度从南到北的增加,Airbnb的价格可能会上涨。

3. The number of reviews and reviews per month are positively correlated.

3.评论数量与每月评论数量呈正相关。

4. The number of reviews and reviews per month is negatively correlated to minimum nights (Which are required by hosts).

4.每月的评论数量和评论数量与最低住宿天数负相关(房东要求的住宿天数)。

地点:您在哪个地区有更多选择,或者您更有可能留在Boston Airbnb? (Location: What regions do you have more choices or will you be more likely to stay in Boston Airbnb?)

Figures 5. shows the number of listing Airbnb across 25 different neighborhoods in Boston.

图5显示了在波士顿的25个不同社区中列出Airbnb的数量。

The TOP 5 neighborhoods that have most Airbnb are Dorchester, Downtown, Jamaica Plain, Roxbury, and Back Bay.

Airbnb最多的前5位社区是Dorchester,Downtown,Jamaica Plain,Roxbury和Back Bay。

Image for post
Figure 5. Number of Listings of Airbnb Across Different Neighborhoods on Boston
图5.波士顿不同地区的Airbnb房源数量

Figure 6. shows the proportion of Airbnb across neighborhoods in Boston.

图6.显示了Airbnb在波士顿各个社区中所占的比例。

Check this out. Remarkably, Dorchester has an easily higher proportion of Airbnb compared to other neighborhoods, at 12%.

看一下这个。 值得注意的是,与其他社区相比,多切斯特的Airbnb比例要容易得多,为12%。

Image for post
Figure 6.
图6。

Figure 7., a density plot shows the distribution of Airbnb across Boston. The brightest area has the highest amount of Airbnb. You can also review the actual map of Boston Airbnb in Figure 8.

图7. 密度图显示了Airbnb在波士顿的分布。 最亮的区域中Airbnb的数量最多。 您还可以在图8中查看Boston Airbnb的实际地图。

I find that Boston Airbnb is highly populated in longitude from West -71.08 to East -71.06 and in latitude from South 42.34 to North 42.36.

我发现,从西部-71.08到东部-71.06的经度,以及从南42.34到北42.36的纬度的人口密度很高。

Image for post
Figure 7.
图7。
Image for post
Inside Airbnb Airbnb内部

Look at them closely in Figure 9. An Airbnb Scatterplot in 25 different neighborhoods across Boston. Longitude and latitude are represented on the x-axis and y-axis.

在图9中仔细观察它们。遍布波士顿25个不同社区的Airbnb散布图。 经度和纬度分别在x轴和y轴上表示。

Image for post
Figure 9.
图9。

If you would like to expect what neighborhoods have a higher chance to find your Airbnb, the TOP 5 neighborhoods with their locations that are indicated in Figure 10. would provide useful information into your plan.

如果您希望期望哪些社区更有机会找到您的Airbnb,那么图10所示的TOP 5社区及其位置将为您的计划提供有用的信息。

Image for post
Figure 10.
图10。

房间类型:最受欢迎的住宿类型是什么类型的房间? (Room Type: What types of rooms are most popular for stay?)

Figure 11. shows that the Entire home/apt and Private room are the most available room type considering the number of listings.

图11显示,考虑到房源数量,整个家庭/公寓和私人房间是最可用的房间类型。

Image for post
Figure 11.
图11。

If you consider minimum nights to stay, Figure 12. shows that minimum nights (required by hosts) of 91, 1, and 2 probably give more choices for travelers.

如果考虑最小停留时间,图12显示91、1、2的最小停留时间(房东要求)可能会为旅行者提供更多选择。

Image for post
Figure 12.
图12。

If you look at Figure 13. and Figure 14., you will find there are about 576 Airbnb listings not available in all 365 days (either will be very popular or permanently closed).

如果您查看图13和图14.,您会发现在整个365天中没有大约576个Airbnb列表(无论是非常受欢迎还是永久关闭)。

Interestingly, there are also about 452 Airbnb listings available in all 365 days.

有趣的是,在整个365天内,大约有452个Airbnb房源可用。

Image for post
Figure 13.
图13。
Image for post
Figure 14.
图14。

Next, Let’s review some statistics about different room types

接下来,让我们回顾一些有关不同房型的统计数据

Figure 15. shows that travelers tend to stay longer in the Shared rooms than in the Private rooms and Entire homes/apt.

图15显示,旅行者在共享房间中的停留时间往往比在私人房间和整个房屋/公寓中的停留时间更长。

Image for post
Figure 15.
图15。

Figure 16. and Figure 17. both confirm that Private room and Entire home/ apt have a higher average number of reviews than Shared rooms and Hotel rooms.

图16和图17均确认“私人房间”和“整套房子/公寓”的平均评论数量高于“共享房间”和“酒店房间”。

The confirmation may indicate that the Entire home/ apt and private room are your choices of popular room types.

确认信息可能表明您会选择整个房间/套间和私人房间。

Image for post
Figure 16.
图16。
Image for post
Figure 17.
图17。

Lastly, Figure 18. shows the average days of availability in a year by room type.

最后,图18显示了按房间类型划分的一年中的平均可用天数。

The shared room has a much lower 131.94 day of availability. But it only has 16 listings. Therefore data would be biased and should not be considered as a decisive point for popularity.

共享室的可用时间低得多,为131.94天。 但它只有16个列表。 因此,数据将带有偏见,不应被视为普及的决定性点。

Compared to the Hotel room, Private room and Entire home/ apt have lower days of availability in a year. Especially Entire home/ apt has about fewer 30-days availabilities than Hotel room. Therefore, we can probably assume that the Entire home/apt is more popular.

与酒店房间相比,私人房间和整套房子/公寓一年的可用天数较少。 特别是,整个家庭/公寓的可用30天少于酒店房间。 因此,我们可以假设整个家庭/公寓更受欢迎。

Image for post
Figure 18.
图18。

价格:影响价格的重要特征是什么? 您能预测波士顿爱彼迎的价格吗? (Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?)

After dealing with outliers, dummy variables, missing values, I use 3357 observations and 11 variables to build three models: Linear Regression, Lasso Regression, and Random Forests. The Response variable is the price.

处理离群值,伪变量, 缺失值之后 ,我使用3357个观测值和11个变量构建了三个模型: 线性回归 , 套索回归和随机森林 。 响应变量是价格。

Image for post
Figure 19.
图19。

Meanwhile, I find the average price by private room is about $81.22. The average price by the entire home or apartment is much higher at about $189.38.

同时,我发现私人房间的平ASP格约为81.22美元。 整个房屋或公寓的平ASP格要高得多,约为189.38美元。

The actual average price for Boston Airbnb from the test dataset is about $147.85. The predicted average price for Boston Airbnb from the test dataset is about $149.53. Meanwhile, the predictions are built upon Random Forests. Plus, you can also check the distributions of actual prices vs. predicted prices for Boston Airbnb in Figure 20.

根据测试数据显示 ,波士顿Airbnb的实际平ASP格约为147.85美元。 根据测试数据,Airbnb的预测平ASP格约为149.53美元。 同时,这些预测是建立在随机森林上的。 另外,您还可以在图20中查看Boston Airbnb的实际价格与预测价格的分布。

Image for post
Figure 20. Actual Prices vs. Predicted Prices
图20.实际价格与预测价格

Model Performances

模型表演

Image for post
Figure 21.
图21。

The R-squared(R²) at 0.549 indicates that the Random Forest model best explains the variability of the response data. The Mean Absolute Error(MAE) at 40.423 indicates that the Random Forest model has a lower absolute difference between prediction and actual observation. Which means it has lower prediction errors. Clearly, Random Forests is the best among the three models.

R平方(R²)为0.549表示随机森林模型最好地解释了响应数据的可变性。 在40.423处的平均绝对误差(MAE)表示,随机森林模型在预测和实际观测值之间具有较低的绝对差。 这意味着它具有较低的预测误差。 显然,随机森林是这三种模型中最好的。

Feature Explanation: Coefficients of Regression Model, Tree-built Feature Importance Method, Shapley Value

特征说明:回归模型系数, 树状特征重要性方法 Shapley值

Image for post
Figure 22.
图22。

Figure 22. shows the coefficients in Lasso Regression (alpha=0.1). Obviously, as the variables latitude, room type entire home/ apt, and longitude increase, the response variable price will increase. Reversely, the increase of room type private rooms will lead to a decrease in the price. The results of the coefficients also agree with my expectation that location and room type are important influencers for the prices.

图22.显示了套索回归系数( alpha = 0.1 )。 显然,随着变量纬度,整个家庭/公寓的房间类型和经度的增加,响应变量的价格将增加。 相反,增加房间类型的私人房间将导致价格下降。 系数的结果也符合我的期望,即位置和房间类型是影响价格的重要因素。

Image for post
Figure 23.
图23。

Figure 23. shows the feature importance ranking plot from the Random Forests model. Room type Entire home/apt, Latitude, and Longitude are still the most important features to predict the price. Interestingly, calculated host listings count and host id is bigger influencers in Random Forests.

图23.显示了来自“随机森林”模型的特征重要性排名图。 房间类型整个房屋/公寓,纬度和经度仍然是预测价格的最重要功能。 有趣的是,在“随机森林”中,计算得出的主机列表数量和主机ID影响更大。

Image for post
Figure 24.
图24。

I also use Shapley Value to analyze and explain predictions in Random Forests predicting prices of Airbnb.

我还使用Shapley值分析并解释了随机森林中预测Airbnb价格的预测。

As it indicates in Figure 24, the Shapley value plot can further show the positive and negative relationships of the predictors with the target variable price[1].

如图24所示,Shapley值图可以进一步显示预测变量与目标变量price的正负关系[1]。

  • Feature importance: Variables are ranked in descending order.

    功能重要性:变量按降序排列。

  • Impact: The horizontal location shows whether the effect of that value is associated with a higher or lower prediction.

    影响:水平位置显示该值的影响是与较高还是较低的预测相关联

  • Original value: Color shows whether that variable is high (in red) or low (in blue) for that observation.

    原始值:颜色显示该变量在该观察值中是高(红色)还是低(蓝色)。

  • Correlation: A high level of the “Room type Entire home/ apt” content has a high and positive impact on the price. The “high” comes from the red color, and the “positive” impact is shown on the X-axis. Similarly, “minimum nights” is negatively correlated with the target variable price.

    相关性: 较高的“房间类型整个房屋/公寓”内容对价格有高而积极的影响。 “高”来自红色,“正”影响显示在X轴上。 同样,“最少住宿天数”与目标可变价格负相关。

Image for post
Figure 25.
图25。

Figure 25. is a simpler version of Shapley Value indicating the average impact of each variable on the model’s output price in descending order and ignoring positive/negative prediction for the price. Of course, Shapley Value could be used to explain more complex models such as deep learning magically. Next time, you can use this algorithm to explain your BlackBox of deep learning models to your audiences.

图25.是Shapley值的简化版本,指示每个变量对模型输出价格的平均影响以降序排列,而忽略了价格的正/负预测。 当然,Shapley Value可以用来解释更复杂的模型,例如神奇地进行深度学习。 下次,您可以使用此算法向受众解释您的深度学习模型的BlackBox。

结论: (Conclusion:)

Using data science is not only can help to make business decisions but also can make life more interesting and scientific. Within data science for Boston Airbnb so far, I will apply these guidances into the next trip to Boston:

使用数据科学不仅可以帮助制定业务决策,还可以使生活变得更加有趣和科学。 到目前为止,在关于波士顿Airbnb的数据科学方面,我将把这些指导应用于下一次波士顿之旅:

  1. Location: What regions do you have more choice or you are more likely to stay in Boston Airbnb?

    地点:您在哪个地区有更多选择,或者您更有可能留在Boston Airbnb?

TOP 5 neighborhoods for you to make decisions are Dorchester, Downtown, Jamaica Plain, Roxbury, and Back Bay. Geographically speaking, you would like to locate longitude between west -71.08 and east -71.06, latitude between south 42.25 and north 42.40 in Boston.

您可以决定的前5个街区是多切斯特,市区,牙买加平原,罗克斯伯里和后湾区。 从地理位置上讲,您想将经度定位在西部-71.08和东部-71.06之间,并将纬度定位在波士顿的南42.25和北42.40之间。

2. Room Type: What types of rooms are most popular for stay?

2.房间类型:哪种类型的房间最受欢迎?

Generally speaking, you would have a higher chance to find your Airbnb within the room type Entire home/apt and Private room. Making a comparison between the two, Entire home/ apt has higher numbers in terms of listings, average reviews per month. Private room has higher numbers in terms of average minimums nights, the average number of reviews, and average days of availability in 365 days.

一般来说,您会更有机会在整套房子/公寓和私人房间中找到您的Airbnb。 比较两者,整个房屋/公寓在列表和每月平均评论方面的数量更高。 私人房间的平均最低住宿天数,平均评价数和365天的平均可用天数更高。

3. Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?

3.价格:影响价格的重要特征是什么? 您能预测波士顿爱彼迎的价格吗?

Lasso Regression and Random Forests both agree that Location(Longitude & Latitude) and Room Type (Entire home/apt &Private room) are important to predict prices of Boston Airbnb.

套索回归和随机森林都同意位置(经度和纬度)和房间类型(整个房间/公寓和私人房间)对于预测波士顿Airbnb的价格很重要。

Very interesting, the feature importance function in Random Forest and Shapley Value both suggest that calculated host listing count is important while it is contributed zero in the Lasso Regression in the term of coefficients.

非常有趣的是,Random Forest中的特征重要性函数和Shapley Value都表明,计算出的宿主列表数量很重要,而在Lasso回归系数中,其贡献为零。

If you care about the price, you might choose a Private room for your next trip. Otherwise, from Southwest to Northeast, the price of Boston Airbnb tends to increase.

如果您关心价格,则可以为下一次旅行选择私人房间。 否则,从西南到东北,波士顿爱彼迎的价格趋于上涨。

This is my first post on Medium. I hope it helps! I welcome feedback and constructive criticism. You can contact me on LinkedIn: https://www.linkedin.com/in/lanxiao12.

这是我在Medium上的第一篇文章。 希望对您有所帮助! 我欢迎您提供反馈和建设性的批评。 您可以在LinkedIn上与我联系: https : //www.linkedin.com/in/lanxiao12 。

Before you go, the codes can be found to my GitHub here. Happy coding, happy life!

在开始之前,可以在我的GitHub上找到代码。 快乐编码,快乐生活!

Special Thanks to Menoua Keshishian.

特别感谢Menoua Keshishian。

Reference:

参考:

[1] Dr.Dataman, Explain Your Model with the SHAP Values(2019), Towards Data Science

[1] Dataman博士, 使用SHAP值解释模型 (2019年),迈向数据科学

翻译自: https://towardsdatascience.com/using-data-science-to-make-your-next-trip-on-boston-airbnb-952030cad433

爱彼迎 python


http://www.taodudu.cc/news/show-8373436.html

相关文章:

  • 【chrome】观看直播卡顿原因
  • 连遭Google、Tesla“冷眼”,俄罗斯小伙8个月喜提顶级自动驾驶公司Offer
  • 俄罗斯一地区进行区块链选举
  • 俄罗斯制造公司Rostec与Waves区块链平台合作管理数据
  • 俄罗斯公司提出为小米即将到来的IPO提供代币化业务
  • 俄罗斯程序员的编程人生
  • SBI俄罗斯子公司加入R3区块链联盟
  • 2003年度个人总结
  • IT个人年终工作总结[1]
  • 江西旅游商贸职业学院计算机考证
  • 谈一谈|计算机二级考试准备
  • 计算机网络技术专业技能测试考试,2015年四川托普信息技术职业学院单独招生计算机网络技术专业技能测试方案(中职生)...
  • python模块xlwt怎么用不了,python的xlwt模块的常用方法
  • 项目demo —— GPT 聊天机器人
  • 【多校】H. Playf and Tree
  • SAP HANA 用户管理(SAP HANA User Management)
  • 初めてのjQuery:セレクターAPIを一挙解説(後編)
  • 程序包 com.sun.awt 已在模块 java.desktop 中声明, 但该模块未导出它
  • 源码浅析_sun.security.util.SecurityConstants类(基于 Latest JDK)
  • com.sun.jna.Pointer,读取字节数据
  • 我的世界 RED SUN 思路。
  • java 未读邮件_JAVAMail定时获取未读邮件数和邮件总数
  • Wi-SUN测试 相关内容记录
  • 速取!30 张「酷到爆炸」的极客风格壁纸
  • 每个极客都应该知道的Linux技巧!
  • python是什么 自学-python自学时该注意什么?
  • BUUCTF web 极客大挑战 2019
  • 国家版权局正版化检查工具添加自定义检查软件及问题处理
  • 如何使用云office在线办公
  • 物联网系统1.0(局域网)
  • 这篇关于爱彼迎 python_利用数据科学使您的下一次波士顿爱彼迎之旅的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



    http://www.chinasem.cn/article/434954

    相关文章

    大模型研发全揭秘:客服工单数据标注的完整攻略

    在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

    基于MySQL Binlog的Elasticsearch数据同步实践

    一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

    关于数据埋点,你需要了解这些基本知识

    产品汪每天都在和数据打交道,你知道数据来自哪里吗? 移动app端内的用户行为数据大多来自埋点,了解一些埋点知识,能和数据分析师、技术侃大山,参与到前期的数据采集,更重要是让最终的埋点数据能为我所用,否则可怜巴巴等上几个月是常有的事。   埋点类型 根据埋点方式,可以区分为: 手动埋点半自动埋点全自动埋点 秉承“任何事物都有两面性”的道理:自动程度高的,能解决通用统计,便于统一化管理,但个性化定

    python: 多模块(.py)中全局变量的导入

    文章目录 global关键字可变类型和不可变类型数据的内存地址单模块(单个py文件)的全局变量示例总结 多模块(多个py文件)的全局变量from x import x导入全局变量示例 import x导入全局变量示例 总结 global关键字 global 的作用范围是模块(.py)级别: 当你在一个模块(文件)中使用 global 声明变量时,这个变量只在该模块的全局命名空

    使用SecondaryNameNode恢复NameNode的数据

    1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

    异构存储(冷热数据分离)

    异构存储主要解决不同的数据,存储在不同类型的硬盘中,达到最佳性能的问题。 异构存储Shell操作 (1)查看当前有哪些存储策略可以用 [lytfly@hadoop102 hadoop-3.1.4]$ hdfs storagepolicies -listPolicies (2)为指定路径(数据存储目录)设置指定的存储策略 hdfs storagepolicies -setStoragePo

    Hadoop集群数据均衡之磁盘间数据均衡

    生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

    【Prometheus】PromQL向量匹配实现不同标签的向量数据进行运算

    ✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN全栈领域优质创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开发,小程序开发,人工智能,js逆向,App逆向,网络系统安全,数据分析,Django,fastapi

    【Python编程】Linux创建虚拟环境并配置与notebook相连接

    1.创建 使用 venv 创建虚拟环境。例如,在当前目录下创建一个名为 myenv 的虚拟环境: python3 -m venv myenv 2.激活 激活虚拟环境使其成为当前终端会话的活动环境。运行: source myenv/bin/activate 3.与notebook连接 在虚拟环境中,使用 pip 安装 Jupyter 和 ipykernel: pip instal

    【机器学习】高斯过程的基本概念和应用领域以及在python中的实例

    引言 高斯过程(Gaussian Process,简称GP)是一种概率模型,用于描述一组随机变量的联合概率分布,其中任何一个有限维度的子集都具有高斯分布 文章目录 引言一、高斯过程1.1 基本定义1.1.1 随机过程1.1.2 高斯分布 1.2 高斯过程的特性1.2.1 联合高斯性1.2.2 均值函数1.2.3 协方差函数(或核函数) 1.3 核函数1.4 高斯过程回归(Gauss