728x90
반응형

1. 모평균의 차에 대한 소표본 가설검정

 

EX-01) 배기량이 2000cc인 차량의 rpm이 3000cc인 차량의 rpm보다 높은지 유의수준 5%에서 조사하라.

 

H_0 : m_2000 <= m_3000

 

|X_m_2000 - |Y_m_3000 <= 0 (상단측 검정)

 

X = np.arange(-5,5 , .01)

fig = plt.figure(figsize=(15,15))


A = [2360, 2330 , 2350 , 2430 , 2380 , 2360]
B = [2250 , 2230 , 2300 , 2240 , 2260 , 2340]



MEANS_A = np.mean(A)
# print(MEANS_A)
STDS_A = np.std(A , ddof=1)
# print(STDS_A**2)
MEANS_B =  np.mean(B)
# print(MEANS_B)
STDS_B = np.std(B , ddof=1)
# print(STDS_B**2)
n_A = len(A) #표본개수
n_B = len(B)
dof = n_A+n_B-2
dof_2 = [dof] #자유도c

MEANS = round(MEANS_A - MEANS_B,4)
STDS = round(math.sqrt(( (n_A-1) * (STDS_A**2) + (n_B-1) * (STDS_B**2))/(dof)),4) # 합동표본표준편차

ax = sns.lineplot(x = X , y=scipy.stats.t(dof_2).pdf(X) )
trust = 95 #신뢰도
trust = round( (1- trust/100) , 4)
t_r =  scipy.stats.t(dof_2).ppf(1- trust)
print(t_r)
t_l = scipy.stats.t(dof_2).ppf(trust)
print(t_l)

E = round(float(t_r * STDS * math.sqrt((1/n_A + 1/n_B))),4)


ax.set_title('두 모평균 차에 대한 소표본 상단측 검정' , fontsize = 15)
# =========================================================

ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X<=t_r)  , facecolor = 'orange') # x값 , y값 , 0 , X조건 인곳 , 색깔
area = round(float(scipy.stats.t(dof_2).cdf(t_r)),4)


plt.annotate('' , xy=(0, .2), xytext=(-2.5 , .25)  , arrowprops = dict(facecolor = 'black'))


ax.text(-4.6 , .28, f'평균(MEANS_A - MEANS_B) = {MEANS}\n'  +f' n = {n_A} , m = {n_B}  \n 합동표본분산' +r'$(s^{2})$ = ' + '\n' + r'$S _{p}^{2}=\dfrac{1}{n+m-2}\left[ \left( n-1\right) S _{1}^{2}+\left( m-1\right) S_{2}^{2}\right]$' + f'\n = {STDS}\n' +r'오차한계 $e_{%d} = t_{\dfrac{\alpha}{2}}*{s}*\sqrt{\dfrac{1}{n} + \dfrac{1}{m}}$' % ((1-  trust)*100 ) +f'= {E} \n\n' + r'T = $\dfrac{\overline{X} - \overline{Y} - ({\mu}_{1} - {\mu}_{2})}{s_{p} * \sqrt{\dfrac{1}{n} + \dfrac{1}{m}}}$ ~ t(n+m-2)'  ,fontsize=15)

plt.annotate('' , xy=(0, .25), xytext=(1.5 , .25)  , arrowprops = dict(facecolor = 'black'))


ax.text(1.6 , .25, r'$P(t_{%.3f}>T)$' % (trust) + f'= {area}\n' + r'신뢰구간 = ( -$\infty$ , MEANS + $e_{\alpha}$)' +f'\n' + r' = $(-\infty, {%.4f} + {%.4f})$' % (MEANS, E )  +f'\n' +r'$ = (-\infty , {%.4f})$' % (MEANS+E)  ,fontsize=15)

# ax.vlines(x = t_r ,ymin=0 , ymax= scipy.stats.t(dof_2).pdf(t_r) , colors = 'black')
ax.vlines(x = t_r ,ymin=0 , ymax= scipy.stats.t(dof_2).pdf(t_r) , colors = 'black')




# plt.annotate('' , xy=(t_r, .007), xytext=(2.5 , .1)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_r, .007), xytext=(2.5 , .1)  , arrowprops = dict(facecolor = 'black'))
# ax.text(1.71 , .13, r'$t_{\dfrac{\alpha}{2}} = {%.4f}$' % t_r + '\n' +r'$\dfrac{\alpha}{2}$ =' + f'{round(float(1- scipy.stats.t(dof_2).cdf(t_r)),3)}',fontsize=15)
ax.text(2.71 , .13, r'$t_{{\alpha}} = {%.4f}$' % t_r + '\n' +r'${\alpha}$ =' +f'{round(float(1- scipy.stats.t(dof_2).cdf(t_r)),3)}',fontsize=15)

ax.text(t_r - 1 , 0.02 , r'$t_r$' + f'= {t_r}'  , fontsize = 13)
# ax.text(t_l + .2 , 0.02 , r'$t_l$' + f'= {t_l}'  , fontsize = 13)

#======



#==================================== 가설 검정 ==========================================



t_1 = round((MEANS)/ (STDS * math.sqrt((1/n_A + 1/n_B))),4)

print(t_1)
t_1 = abs(t_1)
area = round(1- float(scipy.stats.t(dof_2).cdf(t_1) ),4)
ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where =  (X>=t_1) , facecolor = 'skyblue') # x값 , y값 , 0 , X조건 인곳 , 색깔
ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X>=t_r) , facecolor = 'red') # x값 , y값 , 0 , X조건 인곳 , 색깔

ax.vlines(x= t_1, ymin= 0 , ymax= stats.t(dof_2).pdf(t_1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -t_1, ymin= 0 , ymax= stats.t(dof_2).pdf(-t_1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.t(dof_2).pdf(t_1) /2
plt.annotate('' , xy=(t_1, annotate_len), xytext=(-t_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
# plt.annotate('' , xy=(-t_1, annotate_len), xytext=(t_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5, annotate_len+0.03 , f'P-value : \nP(T>={t_1}) \n = {area}',fontsize=15)

mo = '모평균'
#
# ax.text(-4.6 , .22, r'T = $\dfrac{\overline{X} - {\mu}}{\dfrac{s}{\sqrt{n}}}$' + f'= { round((MEANS - MO_MEAN)/(STDS / math.sqrt(n)),4) }' ,fontsize=15)






b = ['t-(n={})'.format(i) for i in dof_2]
plt.legend(b , fontsize = 15)

모평균 차에 대한 소표본 상단축 검정

H_0 : m_2000 <= m_3000

 

|X_m_2000 - |Y_m_3000 <= 0 (상단측 검정)

 

p-value : 0.0006

 

alpha : 0.05

 

p-value < alpha ==> 0.0006 <  0.05 ==> 귀무가설 H_0 : X_m_2000 <= Y_m_3000 기각한다. 즉 , 유의수준 5%에서2000cc인 차량의 rpm이 3000cc 차량의 rpm보다 높다고 할 수 있다.

 

 

EX-02) 두 회사의 커피에 함유된 카페인의 양이 같은지 유의수준 5%에서 조사하라.

 

A = [ n= 8 , |x = 109 , s_1 = 루트(4.25) ]

B = [m = 6 , |y = 107 , s_2 = 루트(4.36) ]

 

H_0 : A_m = B_m (양측검정)

X = np.arange(-5,5 , .01)

fig = plt.figure(figsize=(15,15))


#
# A = "1073 1067 1103 1122 1057 1096 1057 1053 1089 1102 1100 1091 1053 1138 1063 1120 1077 1091"
# A = list(map(int, A.split(' ')))


# A = [2360, 2330 , 2350 , 2430 , 2380 , 2360]
# B = [2250 , 2230 , 2300 , 2240 , 2260 , 2340]



MEANS_A = 109
# print(MEANS_A)
STDS_A = math.sqrt(4.25)
# print(STDS_A**2)
MEANS_B =  107
# print(MEANS_B)
STDS_B = math.sqrt(4.36)
# print(STDS_B**2)
n_A = 8 #표본개수
n_B = 6
dof = n_A+n_B-2
dof_2 = [dof] #자유도c

MEANS = round(MEANS_A - MEANS_B,4)
STDS = round(math.sqrt(( (n_A-1) * (STDS_A**2) + (n_B-1) * (STDS_B**2))/(dof)),4) # 합동표본표준편차

ax = sns.lineplot(x = X , y=scipy.stats.t(dof_2).pdf(X) )
trust = 95 #신뢰도
trust = round( (1- trust/100)/2 , 4)
t_r =  scipy.stats.t(dof_2).ppf(1- trust)
print(t_r)
t_l = scipy.stats.t(dof_2).ppf(trust)
print(t_l)

E = round(float(t_r * STDS * math.sqrt((1/n_A + 1/n_B))),4)


ax.set_title('두 모평균 차에 대한 양측검정' , fontsize = 15)
# =========================================================

ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X<t_r) & (X>t_l) , facecolor = 'orange') # x값 , y값 , 0 , X조건 인곳 , 색깔
ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X>=t_r) | (X<=t_l) , facecolor = 'skyblue') # x값 , y값 , 0 , X조건 인곳 , 색깔
area = round(float(scipy.stats.t(dof_2).cdf(t_r) - scipy.stats.t(dof_2).cdf(t_l)),4)


plt.annotate('' , xy=(0, .2), xytext=(-2.5 , .25)  , arrowprops = dict(facecolor = 'black'))
ax.text(-4.6 , .28, f'평균(MEANS_A - MEANS_B) = {MEANS}\n'  +f' n = {n_A} , m = {n_B}  \n 합동표본분산' +r'$(s^{2})$ = ' + '\n' + r'$S _{p}^{2}=\dfrac{1}{n+m-2}\left[ \left( n-1\right) S _{1}^{2}+\left( m-1\right) S_{2}^{2}\right]$' + f'\n = {STDS}\n' +r'오차한계 $e_{%d} = t_{\dfrac{\alpha}{2}}*{s}*\sqrt{\dfrac{1}{n} + \dfrac{1}{m}}$' % ((1-  trust*2)*100 ) +f'= {E} \n\n' + r'T = $\dfrac{\overline{X} - \overline{Y} - ({\mu}_{1} - {\mu}_{2})}{s_{p} * \sqrt{\dfrac{1}{n} + \dfrac{1}{m}}}$ ~ t(n+m-2)'  ,fontsize=15)

plt.annotate('' , xy=(0, .25), xytext=(1.5 , .25)  , arrowprops = dict(facecolor = 'black'))
ax.text(1.6 , .25, r'$P(t_{%.3f}<T<t_{%.3f})$' % (trust , 1-trust) + f'= {area}\n' + r'신뢰구간 = (MEANS -$e_{\alpha}$ , MEANS + $e_{\alpha}$)' +f'\n' + r' = $({%.4f} - {%.4f} , {%.4f} + {%.4f})$' % (MEANS, E , MEANS , E)  +f'\n' +r'$ = ({%.4f} , {%.4f})$' % (MEANS-E , MEANS+E)  ,fontsize=15)

ax.vlines(x = t_r ,ymin=0 , ymax= scipy.stats.t(dof_2).pdf(t_r) , colors = 'black')
ax.vlines(x = t_l ,ymin=0 , ymax= scipy.stats.t(dof_2).pdf(t_l) , colors = 'black')




plt.annotate('' , xy=(t_r, .007), xytext=(2.5 , .1)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_l, .007), xytext=(-3.5 , .1)  , arrowprops = dict(facecolor = 'black'))
ax.text(1.71 , .13, r'$t_{\dfrac{\alpha}{2}} = {%.4f}$' % t_r + '\n' +r'$\dfrac{\alpha}{2}$ =' + f'{round(float(1- scipy.stats.t(dof_2).cdf(t_r)),3)}',fontsize=15)
ax.text(-3.71 , .13, r'$t_{\dfrac{\alpha}{2}} = {%.4f}$' % t_l + '\n' +r'$\dfrac{\alpha}{2}$ =' +f'{round(float(scipy.stats.t(dof_2).cdf(t_l)),3)}',fontsize=15)

ax.text(t_r - 1 , 0.02 , r'$t_r$' + f'= {t_r}'  , fontsize = 13)
ax.text(t_l + .2 , 0.02 , r'$t_l$' + f'= {t_l}'  , fontsize = 13)




#==================================== 가설 검정 ==========================================



t_1 = round((MEANS )/ (STDS  *math.sqrt((1/n_A) + (1/n_B))),4)

print(t_1)
t_1 = abs(t_1)
area = round(float(scipy.stats.t(dof_2).cdf(-t_1) + 1 - (scipy.stats.t(dof_2).cdf(t_1))),4)
ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X>=t_1) | (X<=-t_1) , facecolor = 'skyblue') # x값 , y값 , 0 , X조건 인곳 , 색깔
ax.fill_between(X, scipy.stats.t(dof_2).pdf(X) , 0 , where = (X>=t_r) | (X<=-t_r) , facecolor = 'red') # x값 , y값 , 0 , X조건 인곳 , 색깔

ax.vlines(x= t_1, ymin= 0 , ymax= stats.t(dof_2).pdf(t_1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= -t_1, ymin= 0 , ymax= stats.t(dof_2).pdf(-t_1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.t(dof_2).pdf(t_1) /2
plt.annotate('' , xy=(t_1, annotate_len), xytext=(-t_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-t_1, annotate_len), xytext=(t_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5 , annotate_len+0.03 , f'P-value : \nP(T<={-t_1}) + P(T>={t_1}) \n = {area}',fontsize=15)

# mo = '모평균'
#
# ax.text(-4.6 , .22, r'T = $\dfrac{\overline{X} - {\mu}}{\dfrac{s}{\sqrt{n}}}$' + f'= { t_1 }' ,fontsize=15)






b = ['t-(n={})'.format(i) for i in dof_2]
plt.legend(b , fontsize = 15)

모평균 차에 대한 소표본 양측 검정

H_0 : m_A = m_B (양측 검정)

 

p-value : 0.0992

 

alpha : 0.05

 

p-value > alpha ==> 0.0992 >  0.05 ==> 귀무가설 H_0 : m_A = m_B  채택한다. 즉 , 유의수준 5%에서 두 회사의 카페인의 양이 같다.

 

 

728x90
반응형

+ Recent posts