分析Airbnb新用户订房地点

2023-10-27 18:59

本文主要是介绍分析Airbnb新用户订房地点,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

首先先导入包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
import datetime
import os
import seaborn as sns

-----------------------------------------------------------------------------------

其次导入数据

train = pd.read_csv('/Users/qinpeng/Documents/airbnb/train_users_2.csv',sep=',')
test = pd.read_csv('/Users/qinpeng/Documents/airbnb/test_users.csv',sep=',')

-----------------------------------------------------------------------------------

观测数据

train.head()
train.shape
test.shape
print(train.shape)
print(test.shape)
train.info()
test.info()
(213451, 16)
(62096, 15)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 213451 entries, 0 to 213450
Data columns (total 16 columns):
id                         213451 non-null object
date_account_created       213451 non-null object
timestamp_first_active     213451 non-null int64
date_first_booking         88908 non-null object
gender                     213451 non-null object
age                        125461 non-null float64
signup_method              213451 non-null object
signup_flow                213451 non-null int64
language                   213451 non-null object
affiliate_channel          213451 non-null object
affiliate_provider         213451 non-null object
first_affiliate_tracked    207386 non-null object
signup_app                 213451 non-null object
first_device_type          213451 non-null object
first_browser              213451 non-null object
country_destination        213451 non-null object
dtypes: float64(1), int64(2), object(13)
memory usage: 26.1+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62096 entries, 0 to 62095
Data columns (total 15 columns):
id                         62096 non-null object
date_account_created       62096 non-null object
timestamp_first_active     62096 non-null int64
date_first_booking         0 non-null float64
gender                     62096 non-null object
age                        33220 non-null float64
signup_method              62096 non-null object
signup_flow                62096 non-null int64
language                   62096 non-null object
affiliate_channel          62096 non-null object
affiliate_provider         62096 non-null object
first_affiliate_tracked    62076 non-null object
signup_app                 62096 non-null object
first_device_type          62096 non-null object
first_browser              62096 non-null object
dtypes: float64(2), int64(2), object(11)
memory usage: 7.1+ MB

可以看出train共有观测213451条,16个变量,test共有62096条,15个变量

可以看到里面缺失的列的数量

date_first_booking         0 non-null float64

为什么全是空置,因为需要预测住房地点,所以此列为全空

-----------------------------------------------------------------------------------


-----------------------------------------------------------------------------------

train_features = train.columns
test_features = test.columns
np.setdiff1d(train_features,test_features)
dac_train = train.date_account_created.value_counts()
dac_test = test.date_account_created.value_counts()
print('training dataset:\n')
print(dac_train.describe())
print('\n' + '***' * 15 + '\n')
print('test dataset:\n')
print(dac_test.describe())
print('training dateset:')
print(dac_train.head())
print(dac_train.tail())
training dataset:count    1634.000000
mean      130.630967
std       139.327895
min         1.000000
25%        15.000000
50%        79.000000
75%       201.000000
max       674.000000
Name: date_account_created, dtype: float64*********************************************test dataset:count      92.000000
mean      674.956522
std       122.568116
min       401.000000
25%       606.750000
50%       662.000000
75%       739.000000
max      1105.000000
Name: date_account_created, dtype: float64
training dateset:
2014-05-13    674
2014-06-24    670
2014-06-25    636
2014-05-20    632
2014-05-14    622
Name: date_account_created, dtype: int64
2010-03-09    1
2010-04-24    1
2010-01-23    1
2010-02-14    1
2010-04-11    1
Name: date_account_created, dtype: int64

上面图说明train注册的不同天数有1634个不同的天数,test有92个不同的天数,train最小值是1,最大值674

trainng中2014-5-13有674个

-----------------------------------------------------------------------------------

数据集train显示2010-1-1到2014-6-30的数据

数据集test显示2014-6-30到2014-9-30的数据

dac_train_date = pd.to_datetime(dac_train.index)
print('the start date of training dataset is :{}'.format(dac_train_date.min()))
print('the end date of training dataset is :{}'.format(dac_train_date.max()))dac_test_date = pd.to_datetime(dac_test.index)
print('the start date of test dataset is :{}'.format(dac_test_date.min()))
print('the end date of test dataset is :{}'.format(dac_test_date.max()))
the start date of training dataset is :2010-01-01 00:00:00
the end date of training dataset is :2014-06-30 00:00:00
the start date of test dataset is :2014-07-01 00:00:00
the end date of test dataset is :2014-09-30 00:00:00

-----------------------------------------------------------------------------------

dac_train_day = dac_train_date - dac_train_date.min()
dac_test_day = dac_test_date - dac_test_date.min()
print(dac_train_day)
print(dac_test_day)
TimedeltaIndex(['1593 days', '1635 days', '1636 days', '1600 days','1594 days', '1614 days', '1601 days', '1627 days','1622 days', '1641 days',...'18 days',   '27 days',    '2 days',    '0 days','1 days',   '67 days',  '113 days',   '22 days','44 days',  '100 days'],dtype='timedelta64[ns]', length=1634, freq=None)
TimedeltaIndex(['22 days', '21 days', '16 days', '23 days', '17 days','20 days', '57 days', '56 days', '28 days', '36 days','29 days', '42 days', '41 days', '58 days', '35 days','27 days', '15 days', '24 days', '77 days', '30 days','44 days', '49 days', '71 days', '37 days', '19 days','38 days', '43 days', '64 days', '34 days', '55 days','85 days', '18 days', '26 days', '84 days', '91 days','31 days', '63 days', '90 days', '51 days', '25 days','69 days', '65 days', '39 days', '48 days', '52 days','7 days', '46 days', '14 days', '50 days', '70 days','83 days', '79 days', '45 days', '60 days',  '1 days','76 days', '72 days', '33 days', '13 days',  '2 days','6 days', '59 days', '61 days', '66 days', '78 days','53 days',  '8 days', '47 days',  '9 days', '54 days','80 days', '40 days', '73 days',  '0 days', '74 days','62 days', '32 days', '68 days', '88 days', '67 days','82 days', '86 days', '75 days', '81 days', '87 days','10 days', '89 days',  '5 days', '11 days',  '3 days','4 days', '12 days'],dtype='timedelta64[ns]', freq=None)
注册时间和使用时间的时间间隔

-----------------------------------------------------------------------------------

plt.scatter(dac_train_day.days,dac_train.values,color = 'r',label= 'train dataset')
plt.scatter(dac_test_day.days,dac_test.values,color = 'b',label = 'test dataset')plt.title('Accounts created vs day')
plt.xlabel('Days')
plt.ylabel('Accounts created')
plt.legend(loc = 'upper left')
<matplotlib.legend.Legend at 0x1a0b68db38>
会发现注册人数持续上升

-----------------------------------------------------------------------------------

tra_train_df = train.timestamp_first_active.astype(str).apply(lambda x:datetime.datetime(int(x[:4]),int(x[4:6]),int(x[6:8]),int(x[8:10]),int(x[10:12]),int(x[12:])))
tra_test_df = test.timestamp_first_active.astype(str).apply(lambda x:datetime.datetime(int(x[:4]),int(x[4:6]),int(x[6:8]),int(x[8:10]),int(x[10:12]),int(x[12:])))
20090319043255把这样的时间转换成标准的时间格式2009-03-19 04:32:55

-----------------------------------------------------------------------------------

print(train[train.age<15].age.shape)
print(train[train.age>80].age.shape)
print(test[test.age<15].age.shape)
print(test[test.age>80].age.shape)
(57,)
(2771,)
(2,)
(417,)

可以看到有train里面有57个小于15岁的,大于80岁的有2771个,test小于15岁的有2个,大于80岁的有417个,属于极值吧

-----------------------------------------------------------------------------------

plt.scatter(train.age.value_counts().index.values,train.age.value_counts().values,color='r',label='training')
plt.scatter(test.age.value_counts().index.values,test.age.value_counts().values,color='b',label='test')plt.title('Counts at different ages')
plt.xlabel('Age')
plt.ylabel('Counts od id')
plt.legend(loc = 'upper right',fontsize = 15)
<matplotlib.legend.Legend at 0x1a0bc15f98>
从图中看出大部分都在0-100多岁之间,有一小部分在2000岁左右,最长寿老人好像117岁
-----------------------------------------------------------------------------------

age_train = [train[train.age.isnull()].age.shape[0],train.query('age<15').age.shape[0],train.query('age>=15 & age<=80').age.shape[0],train.query('age>80').age.shape[0]]age_test = [test[test.age.isnull()].age.shape[0],test.query('age<15').age.shape[0],test.query('age>=15 & age<=80').age.shape[0],test.query('age>80').age.shape[0]]
columns = ['Null','age<15','age','age>80']
fig, (ax1,ax2) = plt.subplots(1,2,sharex=True,sharey=True,figsize=(10,5))sns.barplot(columns,age_train,ax=ax1)
sns.barplot(columns,age_test,ax=ax2)ax1.set_title('training dataset')
ax2.set_title('test dataset')
ax1.set_ylabel('counts')
Text(0,0.5,'counts')

我用柱状图把年龄分成几段,空值分一段,小于15岁分成一段,大于80岁分成一段,绿色属于正常值范围,蓝色属于空值,红色属于千年老妖

-----------------------------------------------------------------------------------

ohe_feats = ['gender','signup_method','signup_flow','language','affiliate_channel','affiliate_provider','first_affiliate_tracked','signup_app','first_device_type','first_browser']
def feature_barplot(feature, df_train = train, df_test=test, figsize=(10,5),rot=90,saveimg = False):feat_train = df_train[feature].value_counts()feat_test = df_test[feature].value_counts()fig_feature, (axis1,axis2) = plt.subplots(1,2,sharex=True,sharey=True,figsize=figsize)sns.barplot(feat_train.index.values, feat_train.values, ax= axis1)sns.barplot(feat_test.index.values, feat_test.values, ax= axis2)axis1.set_xticklabels(axis1.xaxis.get_majorticklabels(), rotation = rot)axis2.set_xticklabels(axis1.xaxis.get_majorticklabels(), rotation = rot)axis1.set_title(feature + ' of training dataset')axis2.set_title(feature + ' of test dataset')axis1.set_ylabel('Counts')plt.tight_layout()if saveimg ==True:figname = feature + '.png'fig_feature.savefig(figname, dpi = 75)
train.gender.value_counts()

性别区分

-unknown-    95688
FEMALE       63041
MALE         54440
OTHER          282
Name: gender, dtype: int64
可以看train到空值有9万多,男的有5万多,女士6万多
-unknown-    33792
FEMALE       14483
MALE         13769
OTHER           52
Name: gender, dtype: int64

可以看test到空值有3万多,男的有1.3万多,女士1.4万多

feature_barplot('gender',saveimg= True)

-----------------------------------------------------------------------------------

feature_barplot('signup_method')

注册方式,最多的是basic,其次是facebook,最后是谷歌

-----------------------------------------------------------------------------------

for feat in ohe_feats:feature_barplot(feature=feat)

男女人数分布图


注册方式分布图


从那个页面进来注册的,0是主页注册页面


英语第一,因为Airbnb是美国公司,所以第一吧,中文第二


从什么地方引流过来的


付费推广的渠道



注册最多的是浏览器


台式机mac最多是最多,可能美国台式mac比较多吧,毕竟是发达国家嘛


Airbnb使用时候的浏览器

-----------------------------------------------------------------------------------

通过上面可以发现,年龄的极值需要处理,性别空值需要处理

下面开始查看session数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn as sk
import datetime
from datetime import date
import seaborn as sns
from sklearn.preprocessing import *
from sklearn.preprocessing import LabelEncoder #标准化标签,将标签值统一转换成range(标签值个数-1)范围内
from sklearn.cross_validation import StratifiedShuffleSplit#数据集划分
df_sessions.head()

	action	action_type	action_detail	device_type	secs_elapsed	id
0	lookup	NaN	NaN	Windows Desktop	319.0	d1mm9tcy42
1	search_results	click	view_search_results	Windows Desktop	67753.0	d1mm9tcy42
2	lookup	NaN	NaN	Windows Desktop	301.0	d1mm9tcy42
3	search_results	click	view_search_results	Windows Desktop	22141.0	d1mm9tcy42
4	lookup	NaN	NaN	Windows Desktop	435.0	d1mm9tcy42

可以看到id有重复的,说明一个用户有多次操作

df_sessions.shape
(10567737, 6)

有多少用户呢

dgr_sess = df_sessions.groupby(['id'])
共有135483用户制造出了1000多万的数据

查看session有多少空值

df_sessions.isnull().sum()
action             79626
action_type      1126204
action_detail    1126204
device_type            0
secs_elapsed      136031
id                 34496
dtype: int64

对空置进行处理,先对空置进行NAN填充,因为有些空值是Nan,none等,所以对空值进行统一填充

print('Working on Session data...')df_sessions.action = df_sessions.action.fillna('NAN')
df_sessions.action_type = df_sessions.action_type.fillna('NAN')
df_sessions.action_detail = df_sessions.action_detail.fillna('NAN')
df_sessions.device_type = df_sessions.device_type.fillna('NAN')

-----------------------------------------------------------------------------------

act_freq = 100
act = dict(zip(df_sessions.action.value_counts().index,df_sessions.action.value_counts().values))
df_sessions.action = df_sessions.action.apply(lambda x: 'OTHER' if act[x] < act_freq else x)
f_act = df_sessions.action.value_counts().argsort()
f_act_detail = df_sessions.action_detail.value_counts().argsort() #对action_detail频繁进行排序
f_act_type = df_sessions.action_type.value_counts().argsort()    #对action_type频繁进行排序
f_dev_type = df_sessions.device_type.value_counts().argsort()
print(f_act.shape)
print(f_act_detail.shape)
print(f_act_type.shape)
print(f_dev_type.shape)

因为有不频繁的值,所以把小于100的进行OTHER归类,归类特别细致的话会对模型没有朴实性

df_sessions.action.value_counts()

-----------------------------------------------------------------------------------

samples = []
cont = 0
In = len(dgr_sess)for g in dgr_sess:if cont%1000 ==0:print('%s from %s'%(cont,In))  #提示打印多少行数据了gr = g[1]l = []l.append(g[0])l.append(len(gr))sev = gr.secs_elapsed.fillna(0).valuesc_act = [0] * len(f_act)for i,v in enumerate(gr.action.values):c_act[f_act[v]] += 1_, c_act_uqc = np.unique(gr.action.values,return_counts=True)c_act += [len(c_act_uqc), np.mean(c_act_uqc), np.std(c_act_uqc)]l = l + c_actc_act_detail = [0] * len(f_act_detail)for i,v in enumerate(gr.action_detail.values):c_act_detail[f_act_detail[v]] += 1_, c_act_det_uqc = np.unique(gr.action_detail.values,return_counts=True)c_act_detail += [len(c_act_det_uqc), np.mean(c_act_det_uqc), np.std(c_act_det_uqc)]l = l + c_act_detaill_act_type = [0] * len(f_act_type)c_act_type = [0] * len(f_act_type)for i,v in enumerate(gr.action_type.values):l_act_type[f_act_type[v]] += sev[i]c_act_type[f_act_type[v]] += 1l_act_type = np.log(1 + np.array(l_act_type)).tolist()_, c_act_type_uqc = np.unique(gr.action_type.values,return_counts=True)c_act_type += [len(c_act_type_uqc), np.mean(c_act_type_uqc), np.std(c_act_type_uqc)]l = l + c_act_type + l_act_typec_dev_type = [0] * len(f_dev_type)for i,v in enumerate(gr.device_type.values):c_dev_type[f_dev_type[v]] += 1c_dev_type.append(len(np.unique(gr.device_type.values)))_, c_dev_type_uqc = np.unique(gr.device_type.values,return_counts=True)c_dev_type += [len(c_dev_type_uqc), np.mean(c_dev_type_uqc),np.std(c_dev_type_uqc)]l = l + c_dev_typel_secs = [0] * 5l_log = [0] * 15if len(sev) > 0:l_secs[0] = np.log(1 + np.sum(sev))l_secs[1] = np.log(1 + np.mean(sev))l_secs[2] = np.log(1 + np.std(sev))l_secs[3] = np.log(1 + np.median(sev))l_secs[4] = l_secs[0] / float(l[1])log_sev = np.log(1+sev).astype(int)l_log = np.bincount(log_sev,minlength=15).tolist()l = l + l_secs + l_logsamples.append(l)cont += 1
共有135483用户承载出了1000多万的数据,所以遍历数据,以人为单位,每个人有多少个action,f_act列出所有的action,c_act求出每个人有多少个action,把这个人action这一组里面所有不同的action列出来
samples = np.array(samples)
samp_ar = samples[:,1:].astype(np.float16)
samp_id = samples[:,0]col_names = []
for i in range(len(samples[0])-1):col_names.append('c_' + str(i))
df_agg_sess = pd.DataFrame(samp_ar, columns=col_names)
df_agg_sess['id'] = samp_id
df_agg_sess.index = df_agg_sess.id

df_agg_sess.shape
(135483, 335)

现在有13万数据,335列

-----------------------------------------------------------------------------------

导入train和test数据集

train = pd.read_csv('/Users/qinpeng/Documents/airbnb/train_users_2.csv')
test = pd.read_csv('/Users/qinpeng/Documents/airbnb/test_users.csv')
train_row = train.shape[0]labels = train['country_destination'].valuesid_test = test['id']train.drop(['country_destination','date_first_booking'], axis = 1,inplace=True)
test.drop(['date_first_booking'], axis=1,inplace=True)

删除掉3行数据

df = pd.concat([train,test],axis = 0,ignore_index=True)
df.shape

把train和test两个数据进行拼接

-----------------------------------------------------------------------------------

tfa = df.timestamp_first_active.astype(str).apply(lambda x:datetime.datetime(int(x[:4]),int(x[4:6]),int(x[6:8]),int(x[8:10]),int(x[10:12]),int(x[12:])))
df['tfa_year'] = np.array([x.year for x in tfa])
df['tfa_month'] = np.array([x.month for x in tfa])
df['tfa_day'] = np.array([x.day for x in tfa])
df['tfa_wd'] = np.array([x.isoweekday() for x in tfa])#isoweekday 分成周几
df_tfa_wd = pd.get_dummies(df.tfa_wd,prefix='tfa_wd') # get_dummies is 'one hot encoding'
df = pd.concat((df,df_tfa_wd),axis = 1)
df.drop(['tfa_wd'],axis = 1,inplace=True)
df.head()

处理时间,把时间detetime标准化,提取出年月日,根据年月日分出是周几,进行one hot encoding编码

Y = 2000 #北半球,确定月份的四季
seasons = [(0, (date(Y, 1, 1), date(Y, 3, 20))),   #winter(1, (date(Y, 3, 21), date(Y, 6, 20))),  #spring(2, (date(Y, 6, 21), date(Y, 9, 22))),  #summer(3, (date(Y, 9, 23), date(Y, 12, 20))), #autumn(0, (date(Y, 12, 21), date(Y, 12, 31))),#winter]

把一年四季的四个季节进行划分

df['tfa_season'] = np.array([get_season(x) for x in tfa])
df_tfa_season = pd.get_dummies(df.tfa_season, prefix = 'tfa_season')
df = pd.concat((df,df_tfa_season),axis = 1)
df.drop(['tfa_season'],axis = 1,inplace=True)
df.head()

def get_season(dt):dt = dt.date()dt = dt.replace(year=Y)return next(season for season, (start, end) in seasons if start <= dt <=end)

写一个划分一年四季的函数,对这个四个季节进行one hot encoding编码

dac = pd.to_datetime(df.date_account_created)
df['dac_year'] = np.array([x.year for x in dac])
df['dac_month'] = np.array([x.month for x in dac])
df['dac_day'] = np.array([x.day for x in dac])
df['dac_wd'] = np.array([x.isoweekday() for x in dac])
df_dac_wd = pd.get_dummies(df.dac_wd, prefix='dac_wd')
df = pd.concat((df,df_dac_wd),axis = 1)
df.drop(['dac_wd'],axis = 1,inplace=True)

把df时间进行拆分,分析出是周几,再进行one hot encoding编码

df['dac_season'] = np.array([get_season(x) for x in dac])
df_dac_season = pd.get_dummies(df.dac_season, prefix='dac_season')
df = pd.concat((df,df_dac_season),axis = 1)
df.drop(['dac_season'],axis = 1,inplace=True)
df.head()

-----------------------------------------------------------------------------------

dt_span = dac.subtract(tfa).dt.days
plt.scatter(dt_span.value_counts().index.values,dt_span.value_counts().values)
<matplotlib.collections.PathCollection at 0x1a195697b8>

可以看出第一次注册时间和第一次激活时间的时间差

写一个函数,把小于-1天的为一天,小于30天大于-1天的为一个月,大于30天小于365天的为一年

为什么是-1可能是服务器更新把第一次注册时间放到下一天了,所以看到-1最频繁

对及其不频繁的进行归类

def get_span(dt):if dt == -1:return 'OneDay'elif (dt<30) & (dt>-1):return 'OneMonth'elif (dt>=30) & (dt<=365):return 'OneYear'else:return 'Other'
df['dt_span'] = np.array([get_span(x) for x in dt_span])
df_dt_span = pd.get_dummies(df.dt_span,prefix='dt_span')
df = pd.concat((df,df_dt_span),axis = 1)
df.drop(['dt_span'],axis = 1,inplace=True)
df.head()
把一天,一个月,其它,进行one hot encoding编码
-----------------------------------------------------------------------------------
df.drop(['date_account_created','timestamp_first_active'],axis = 1, inplace=True)

删除两个时间列,分别是注册时间和激活时间


下面处理年龄

av = df.age.values
av = np.where(np.logical_and(av<2000,av>1900),2018-av,av)
df['age'] = av
age = df.age
age.fillna(-1,inplace=True)
div = 15
def get_age(age):if age < 0:return 'NA'elif (age < div):return divelif (age <= div*2):return div*2elif (age <= div*3):return div*3elif (age <= div*4):return div*4elif (age <=div*5):return div*5elif (age <=110):return div*6else:return 'Unphysical'

对年龄段进行划分,我推理把小于2000大于1900岁的用户,我用2018(今年)减去用户的岁数,得到他现在的年龄,把一部分极值信息变成有用的信息

df['age'] = np.array([get_age(x) for x in age])
df_age = pd.get_dummies(df.age,prefix='age')
df_age.head()

对年龄段进行one hot encoding编码

df = pd.concat((df,df_age),axis = 1)
df.drop(['age'],axis = 1,inplace=True)
df.head()

把age列删除

-----------------------------------------------------------------------------------

处理其它列

feat_toOHE = ['gender','signup_method','signup_flow','language','affiliate_channel','affiliate_provider','first_affiliate_tracked','signup_app','first_device_type','first_browser']
for f in feat_toOHE:df_ohe = pd.get_dummies(df[f],prefix=f,dummy_na=True)df.drop([f],axis = 1,inplace=True)df = pd.concat((df,df_ohe),axis = 1)

把其它列进行one hot encoding编码

df.shape
(275547, 208)

现在有208列的特征

df_all = pd.merge(df,df_agg_sess,how='left')
df_all = df_all.drop(['id'],axis = 1)
af_all = df_all.fillna(-2)df_all['all_null'] = np.array([sum(r<0) for r in df_all.values])

把df和df_agg_sess表进行合并

删除id列,对空值用-2填充

添加一个all_null列,小于-2的都是为空值,计算空置的数量

-----------------------------------------------------------------------------------

Xtrain = df_all.iloc[:train_row,:]
Xtest = df_all.iloc[train_row:,:]le = LabelEncoder()
le.fit(labels)
ytrain = le.transform(labels)
print(train.shape[0] == Xtrain.shape[0])
print(test.shape[0] == Xtest.shape[0])
Xtrain.to_csv('/Users/qinpeng/Documents/airbnb/Airbnb_Xtrain_v2.csv')
Xtest.to_csv('/Users/qinpeng/Documents/airbnb/Airbnb_Xtest_v2.csv')
labels.tofile('/Users/qinpeng/Documents/airbnb/Airbnb_Ytrain_v2.csv',sep='\n',format='%s')
保存处理后的数据集








这篇关于分析Airbnb新用户订房地点的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/287704

相关文章

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

SWAP作物生长模型安装教程、数据制备、敏感性分析、气候变化影响、R模型敏感性分析与贝叶斯优化、Fortran源代码分析、气候数据降尺度与变化影响分析

查看原文>>>全流程SWAP农业模型数据制备、敏感性分析及气候变化影响实践技术应用 SWAP模型是由荷兰瓦赫宁根大学开发的先进农作物模型,它综合考虑了土壤-水分-大气以及植被间的相互作用;是一种描述作物生长过程的一种机理性作物生长模型。它不但运用Richard方程,使其能够精确的模拟土壤中水分的运动,而且耦合了WOFOST作物模型使作物的生长描述更为科学。 本文让更多的科研人员和农业工作者

MOLE 2.5 分析分子通道和孔隙

软件介绍 生物大分子通道和孔隙在生物学中发挥着重要作用,例如在分子识别和酶底物特异性方面。 我们介绍了一种名为 MOLE 2.5 的高级软件工具,该工具旨在分析分子通道和孔隙。 与其他可用软件工具的基准测试表明,MOLE 2.5 相比更快、更强大、功能更丰富。作为一项新功能,MOLE 2.5 可以估算已识别通道的物理化学性质。 软件下载 https://pan.quark.cn/s/57

衡石分析平台使用手册-单机安装及启动

单机安装及启动​ 本文讲述如何在单机环境下进行 HENGSHI SENSE 安装的操作过程。 在安装前请确认网络环境,如果是隔离环境,无法连接互联网时,请先按照 离线环境安装依赖的指导进行依赖包的安装,然后按照本文的指导继续操作。如果网络环境可以连接互联网,请直接按照本文的指导进行安装。 准备工作​ 请参考安装环境文档准备安装环境。 配置用户与安装目录。 在操作前请检查您是否有 sud

线性因子模型 - 独立分量分析(ICA)篇

序言 线性因子模型是数据分析与机器学习中的一类重要模型,它们通过引入潜变量( latent variables \text{latent variables} latent variables)来更好地表征数据。其中,独立分量分析( ICA \text{ICA} ICA)作为线性因子模型的一种,以其独特的视角和广泛的应用领域而备受关注。 ICA \text{ICA} ICA旨在将观察到的复杂信号

【软考】希尔排序算法分析

目录 1. c代码2. 运行截图3. 运行解析 1. c代码 #include <stdio.h>#include <stdlib.h> void shellSort(int data[], int n){// 划分的数组,例如8个数则为[4, 2, 1]int *delta;int k;// i控制delta的轮次int i;// 临时变量,换值int temp;in

三相直流无刷电机(BLDC)控制算法实现:BLDC有感启动算法思路分析

一枚从事路径规划算法、运动控制算法、BLDC/FOC电机控制算法、工控、物联网工程师,爱吃土豆。如有需要技术交流或者需要方案帮助、需求:以下为联系方式—V 方案1:通过霍尔传感器IO中断触发换相 1.1 整体执行思路 霍尔传感器U、V、W三相通过IO+EXIT中断的方式进行霍尔传感器数据的读取。将IO口配置为上升沿+下降沿中断触发的方式。当霍尔传感器信号发生发生信号的变化就会触发中断在中断

【Kubernetes】K8s 的安全框架和用户认证

K8s 的安全框架和用户认证 1.Kubernetes 的安全框架1.1 认证:Authentication1.2 鉴权:Authorization1.3 准入控制:Admission Control 2.Kubernetes 的用户认证2.1 Kubernetes 的用户认证方式2.2 配置 Kubernetes 集群使用密码认证 Kubernetes 作为一个分布式的虚拟

kubelet组件的启动流程源码分析

概述 摘要: 本文将总结kubelet的作用以及原理,在有一定基础认识的前提下,通过阅读kubelet源码,对kubelet组件的启动流程进行分析。 正文 kubelet的作用 这里对kubelet的作用做一个简单总结。 节点管理 节点的注册 节点状态更新 容器管理(pod生命周期管理) 监听apiserver的容器事件 容器的创建、删除(CRI) 容器的网络的创建与删除

PostgreSQL核心功能特性与使用领域及场景分析

PostgreSQL有什么优点? 开源和免费 PostgreSQL是一个开源的数据库管理系统,可以免费使用和修改。这降低了企业的成本,并为开发者提供了一个活跃的社区和丰富的资源。 高度兼容 PostgreSQL支持多种操作系统(如Linux、Windows、macOS等)和编程语言(如C、C++、Java、Python、Ruby等),并提供了多种接口(如JDBC、ODBC、ADO.NET等