★표본비율의 차에 따른 검정★합동표본비율★표본분산 모를때는 표본비율의 차로 검정★기초통계학-[연습문제 06 -14]

2023. 1. 16. 17:57

728x90

23. 남녀 직장인이 받는 스트레스에 대해 조사한 결과가 다음과 같다. 이것을 근거로 남녀 직장인이 받는 스트레스에 차이가 있는지 유의수준 5%에서 조사하라.

https://knowallworld.tistory.com/343

★합동표본비율★양측검정, 상단측검정, 하단측검정 구분하기★모비율 차의 검정★기초통계학-

https://knowallworld.tistory.com/334 ★모평균 , 모비율의 차에 대한 신뢰구간★모비율에 대한 오차한계 및 신뢰구간★기초통계학-[ 16. 2014년에 초,중,고 학생 116000명을 대상으로 우리나라 통일에 대해

knowallworld.tistory.com

A = pd.DataFrame(columns = ['조사인원' , '스트레스를 받은 인원' ] , index = ['남자' , '여자'])
A['조사인원'] = [1650 , 1235 ]
A['스트레스를 받은 인원'] = [1137, 806]
A

H_0 : p_1 = p_2 (양측검정)

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도

ax.set_title('양측검정' ,fontsize = 18)
p1 = 1137/1650
p2 = 806/1235

n = 1650
m = 1235

RATIO = round(p1 - p2,4)
print(RATIO)
SAMPLE_PROP = (1137 + 806) / (n+m) #합동표본비율

STDS = math.sqrt( SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m))
print(STDS)


#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf(1 - (1-(trust/100))) ,3 )


ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x<=t_1) & (x>=-t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- (1- scipy.stats.norm.cdf(t_1))

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP({-t_1}<=Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= -t_1, ymin= 0 , ymax= stats.norm.pdf(-t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
area = round((1-area)/2,6)
ax.text(1 + t_1 , annotate_len+0.02 , r'$z_{\alpha/2} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha/2 = {%.4f}$' % area,fontsize=15)
ax.text(-2.5 -t_1 , annotate_len , r'$z_{\alpha/2} = $' + f'{-t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha/2 = {%.3f}$' % area,fontsize=15)


plt.annotate('' , xy=(t_1, annotate_len-0.02), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-t_1, annotate_len-0.02), xytext=(-1-t_1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))


ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$' +' , \n'+ r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .15, f'신뢰구간 모비율간 차에 대한 {trust}%의 합동표본비율 : \n' + r'$Z = \dfrac{\widehat{p}_1 - \widehat{p}_2}{ \sqrt{\widehat{p}\widehat{q} * (\dfrac{1}{n} + \dfrac{1}{m}})  }$' + f'=  { round((p1 - p2) / math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * ( (1 / n) + (1/m))),4) }',fontsize=15)

ax.text(2 , .35 , r'합동표본비율의 $ \widehat{p} = \dfrac{x + y}{n + m}$' ,fontsize= 15)

# ax.text(-5.5 , .25, f'귀무가설 : 남자(u_1)의 입원일수가 여자(u_2)의 입원일수 보다 작다.\n u_1 < u_2'  , fontsize = 15)
# ax.text(1 , .35, f' 신뢰구간 {trust}% 에 대한 오차 한계( ' +r'$e_{%d}) : $ '% trust + f'\n' + r'${%.3f}\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$ = '% t_1 + f'{round((t_1)*math.sqrt(p1*(1-p1)/n + p2*(1-p2)/m),3)} ' , fontsize= 15)



# ax.text(-5.5 , .25,  r'신뢰구간 L = $2*{%.3f}\dfrac{\sigma}{\sqrt{n}} $' % t_1 ,fontsize=15)
#
# ax.text(-5.5 , .3,  r'오차한계 = ${%.3f}\dfrac{\sigma}{\sqrt{n}} $' % t_1,fontsize=15)


#============================================표본평균의 정규분포화 =========================================================

z_1 = round(RATIO/STDS  , 4)
print(z_1)
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )

ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1) | (x<=-z_1) , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=t_1) | (x<=-t_1) , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area =  scipy.stats.norm.cdf(-z_1) + 1 - (scipy.stats.norm.cdf(z_1))


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
plt.annotate('' , xy=(z_1, annotate_len), xytext=(-z_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-z_1, annotate_len), xytext=(z_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5 , annotate_len+0.01 , f'P-value : \nP(Z<={-z_1}) + P(Z>={z_1}) \n = {round(area,4)}',fontsize=15)
# ax.text(0 , annotate_len+0.01 ,f'P(Z>={-z_1})',fontsize=15)



b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')

P-value = 0.0386

alpha = 0.05

귀무가설 H_0 : 남자의 스트레스 = 여자의 스트레스

p-value < alpha ==> 0.0386 < 0.05 ==> 귀무가설 기각 ==> 남자의 스트레스와 여자의 스트레스는 동일하지 않다.

24. A와 B 두 도시 간, 특정 정당 지지율에 차이가 있는지 알아보기 위하여 두 도시에서 500명씩 임의로 추출하여 지지도를 조사한 결과, A 도시에서 275명, B 도시에서 244명이 지지하는 것으로 조사되었다. 이 자료를 근거로 두 도시 간의 지지도에 차이가 있는지 유의수준 5%에서 조사하라.

^p_A = 275/500

^p_B = 244/500

==> 표준편차 모른다. ==> 표본비율간 차이에 따른 검정 시행

H_0 : p_A = p_B(양측검정)

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도

ax.set_title('양측검정' ,fontsize = 18)
p1 = 275/500
p2 = 244/500

n = 500
m = 500

RATIO = round(p1 - p2,4)
print(RATIO)
SAMPLE_PROP = (275 + 244) / (n+m) #합동표본비율

STDS = math.sqrt( SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m))
print(STDS)


#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf(1 - (1-(trust/100))) ,3 )


ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x<=t_1) & (x>=-t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- (1- scipy.stats.norm.cdf(t_1))

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP({-t_1}<=Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= -t_1, ymin= 0 , ymax= stats.norm.pdf(-t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
area = round((1-area)/2,6)
ax.text(1 + t_1 , annotate_len+0.02 , r'$z_{\alpha/2} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha/2 = {%.4f}$' % area,fontsize=15)
ax.text(-2.5 -t_1 , annotate_len , r'$z_{\alpha/2} = $' + f'{-t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha/2 = {%.3f}$' % area,fontsize=15)


plt.annotate('' , xy=(t_1, annotate_len-0.02), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-t_1, annotate_len-0.02), xytext=(-1-t_1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))


ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$' +' , \n'+ r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .15, f'신뢰구간 모비율간 차에 대한 {trust}%의 합동표본비율 : \n' + r'$Z = \dfrac{\widehat{p}_1 - \widehat{p}_2}{ \sqrt{\widehat{p}\widehat{q} * (\dfrac{1}{n} + \dfrac{1}{m}})  }$' + f'=  { round((p1 - p2) / math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * ( (1 / n) + (1/m))),4) }',fontsize=15)

ax.text(2 , .35 , r'합동표본비율의 $ \widehat{p} = \dfrac{x + y}{n + m}$' ,fontsize= 15)

# ax.text(-5.5 , .25, f'귀무가설 : 남자(u_1)의 입원일수가 여자(u_2)의 입원일수 보다 작다.\n u_1 < u_2'  , fontsize = 15)
# ax.text(1 , .35, f' 신뢰구간 {trust}% 에 대한 오차 한계( ' +r'$e_{%d}) : $ '% trust + f'\n' + r'${%.3f}\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$ = '% t_1 + f'{round((t_1)*math.sqrt(p1*(1-p1)/n + p2*(1-p2)/m),3)} ' , fontsize= 15)



# ax.text(-5.5 , .25,  r'신뢰구간 L = $2*{%.3f}\dfrac{\sigma}{\sqrt{n}} $' % t_1 ,fontsize=15)
#
# ax.text(-5.5 , .3,  r'오차한계 = ${%.3f}\dfrac{\sigma}{\sqrt{n}} $' % t_1,fontsize=15)


#============================================표본평균의 정규분포화 =========================================================

z_1 = round(RATIO/STDS  , 4)
print(z_1)
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )

ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1) | (x<=-z_1) , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=t_1) | (x<=-t_1) , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area =  scipy.stats.norm.cdf(-z_1) + 1 - (scipy.stats.norm.cdf(z_1))


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
plt.annotate('' , xy=(z_1, annotate_len), xytext=(-z_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-z_1, annotate_len), xytext=(z_1/2 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5 , annotate_len+0.01 , f'P-value : \nP(Z<={-z_1}) + P(Z>={z_1}) \n = {round(area,4)}',fontsize=15)
# ax.text(0 , annotate_len+0.01 ,f'P(Z>={-z_1})',fontsize=15)



b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')

P-value = 0.0498

alpha = 0.05

귀무가설 H_0 : 두 도시간 지지도에 차이가 없다.

p-value < alpha ==> 0.0498 < 0.05 ==> 귀무가설 기각 ==> 두 도시간 지지도에 차이가 있다.

25. 어떤 단체에서 국영 TV의 광고 방송에 대한 찬반을 묻는 조사를 실시하였다. 대도시에 거주하는 사람들 2055명중 1312명이 찬성하였고, 농어촌에 거주하는 사람 800명 중 486명이 찬성하였다. 도시 사람의 찬성률이 농어촌 사람의 찬성률 보다 큰지 유의수준 5%에서 조사하라.

^p_1 = 1312/2055

^p_2 = 486/800

p1 > p2

==> H_0 : p_1 <= p_2 (상단측 검정)

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도
p1 = 1312/2055
p2 = 486/800

n = 2055
m = 800


ax.set_title('상단측검정' ,fontsize = 18)
RATIO = round(p1 - p2,4)
SAMPLE_PROP = (1312 + 486) / (n+m) #합동표본비율

#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf(1-(1-(trust/100))) ,3 )


ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x<=t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = scipy.stats.norm.cdf(t_1)

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP(Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -t_1, ymin= 0 , ymax= stats.norm.pdf(-t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

area = round((1-area) ,3)
# ax.text(1 + t_1 , annotate_len+0.02 , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha = {%.3f}$' % area,fontsize=15)
ax.text(1.5 +t_1 , annotate_len+0.02 , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha = {%.3f}$' % area,fontsize=15)


# plt.annotate('' , xy=(t_1, annotate_len), xytext=(t_1+ 1 , annotate_len+0.02)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_1, annotate_len), xytext=(1+t_1 , annotate_len+0.02)  , arrowprops = dict(facecolor = 'black'))


ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$' +' , \n'+ r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .15, f'신뢰구간 모비율간 차에 대한 {trust}%의 합동표본비율 : \n' + r'$Z = \dfrac{\widehat{p}_1 - \widehat{p}_2}{ \sqrt{\widehat{p}\widehat{q} * (\dfrac{1}{n} + \dfrac{1}{m}})  }$' + f'=  { round((p1 - p2) / math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * ( (1 / n) + (1/m))),4) }',fontsize=15)



# ax.text(-5.5 , .25, f'귀무가설 : 남자(u_1)의 입원일수가 여자(u_2)의 입원일수 보다 작다.\n u_1 < u_2'  , fontsize = 15)
ax.text(2 , .35 , r'합동표본비율의 $ \widehat{p} = \dfrac{x + y}{n + m}$' ,fontsize= 15)



#============================================표본평균의 정규분포화 =========================================================


STDS = math.sqrt( SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m))
# print(math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m) ))
# print(RATIO)
# print(f'STDS :{STDS}' )

z_1 = round(RATIO/STDS  , 4)
# z_1 =(9.075 - 7.114) / 1.6035
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )
print(z_1)
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1)  , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=t_1)  , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- scipy.stats.norm.cdf(z_1)


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
plt.annotate('' , xy=(z_1, annotate_len), xytext=(z_1- 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
# plt.annotate('' , xy=(-z_1, annotate_len), xytext=(1-z_1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5 , annotate_len+0.01 , f'P-VALUE: \n   P(Z>={z_1})  = {round(area,4)}',fontsize=15)


b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')
#
# print( (9.075 - 7.114) / 1.6035)

P-value = 0.0623

alpha = 0.05

귀무가설 H_0 : 도시 사람의 찬성률이 농어촌 사람의 찬성률보다 작거나 같다.

p-value > alpha ==> 0.0623 > 0.05 ==> 귀무가설 채택 ==> 도시 사람의 찬성률이 농어촌 사람의 찬성률보다 크지 않다.

26. 남학생 650명과 여학생 555명을 대상으로 학업을 위하여 아르바이트를 하고 있는지 설문 조사를 실시한 결과, 남학생 403명과 여학생 389명이 아르바이트를 하고 있는 것으로 조사되었다. 아르바이트하는 남학생의 비율이 여학생의 비율보다 낮은지 유의수준 1%에서 조사하라.

^p_1 = 403/650

^p_2 = 389/555

==> ^p_1 - ^p_2 < 0 ==> 하단측 검정

==> 표본편차 존재 x ==> 표본비율의 차에 따른 검정 시행

p1 < p2

==> H_0 : p_1 >= p_2 ==> 아르바이트 하는 남학생의 비율이 여학생의 비율보다 크거나 같다.

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도
p1 = 403/650
p2 = 389/555

n = 650
m = 555


ax.set_title('하단측검정' ,fontsize = 18)
RATIO = round(p1 - p2,4)
SAMPLE_PROP = (403 + 389) / (n+m) #합동표본비율

#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf(1-(1-(trust/100))) ,3 )


ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x>=-t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- scipy.stats.norm.cdf(-t_1)

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP(Z<={-t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
ax.vlines(x= -t_1, ymin= 0 , ymax= stats.norm.pdf(-t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -t_1, ymin= 0 , ymax= stats.norm.pdf(-t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

area = round((1-area) ,3)
# ax.text(1 + t_1 , annotate_len+0.02 , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha = {%.3f}$' % area,fontsize=15)
ax.text(-1.5 -t_1 , annotate_len+0.02 , r'$z_{\alpha} = $' + f'{-t_1}\n' + r'$H_{0}$의 기각역' + f'\n'+ r'$\alpha = {%.3f}$' % area,fontsize=15)


# plt.annotate('' , xy=(t_1, annotate_len), xytext=(t_1+ 1 , annotate_len+0.02)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(-t_1, annotate_len), xytext=(-1-t_1 , annotate_len+0.02)  , arrowprops = dict(facecolor = 'black'))


ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$' +' , \n'+ r'$\left(\widehat{p}_{1} - \widehat{p}_{2}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\widehat{p}_{1}\widehat{q}_{1}}{n}+\dfrac{\widehat{p}_{2}\widehat{q}_{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .15, f'신뢰구간 모비율간 차에 대한 {trust}%의 합동표본비율 : \n' + r'$Z = \dfrac{\widehat{p}_1 - \widehat{p}_2}{ \sqrt{\widehat{p}\widehat{q} * (\dfrac{1}{n} + \dfrac{1}{m}})  }$' + f'=  { round((p1 - p2) / math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * ( (1 / n) + (1/m))),4) }',fontsize=15)



# ax.text(-5.5 , .25, f'귀무가설 : 남자(u_1)의 입원일수가 여자(u_2)의 입원일수 보다 작다.\n u_1 < u_2'  , fontsize = 15)
ax.text(2 , .35 , r'합동표본비율의 $ \widehat{p} = \dfrac{x + y}{n + m}$' ,fontsize= 15)



#============================================표본평균의 정규분포화 =========================================================


STDS = math.sqrt( SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m))
# print(math.sqrt(SAMPLE_PROP * (1-SAMPLE_PROP) * (1/n + 1/m) ))
# print(RATIO)
# print(f'STDS :{STDS}' )

z_1 = round(RATIO/STDS  , 4)
# z_1 =(9.075 - 7.114) / 1.6035
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )
z_1 = abs(z_1)
print(z_1)
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x<=-z_1)  , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x<=-t_1)  , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = scipy.stats.norm.cdf(-z_1)


ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
plt.annotate('' , xy=(-z_1, annotate_len), xytext=(-z_1 +1.5 , annotate_len+0.02)  , arrowprops = dict(facecolor = 'black'))
# plt.annotate('' , xy=(-z_1, annotate_len), xytext=(1-z_1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1.5 , annotate_len+0.01 , f'P-VALUE: \n   P(Z<={-z_1})  = {round(area,4)}',fontsize=15)


b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')
#
# print( (9.075 - 7.114) / 1.6035)

P-value = 0.0016

alpha = 0.05

귀무가설 H_0 : 남학생의 비율이 여학생보다 높거나 같다.

p-value < alpha ==> 0.0016 < 0.05 ==> 귀무가설 기각 ==> 남학생의 비율이 여학생의 비율보다 낮다.

728x90

'기초통계 > 대표본 가설검정' 카테고리의 다른 글

★상단/하단 구분시 찾으려는 가설의 부등호 위치에 주목하자★표본평균의 차에 대한 가설 검정★기초통계학-[연습문제 05 -13] (0)	2023.01.16
★귀무가설 설정 중요!!!!!!!(부등호가 존재)★표본평균의 차에 대한 가설 검정★모평균의 차에 대한 가설 검정★기초통계학-[연습문제 04 -12] (0)	2023.01.16
★표본비율의 정규분포화★모비율의 가설 검정★기초통계학-[연습문제 03 -11] (0)	2023.01.16
★모평균의 가설 검정★기초통계학-[연습문제 02 -10] (1)	2023.01.16
★=이 있는것이 귀무가설★모평균의 양측검정★모평균 <= \|X 상단측 검정★모평균 >= \|X 하단측검정★기초통계학-[연습문제 01 -09] (2)	2023.01.16

뭐든지 다 알아보자

Menu

Category

Notice

Recent comments

Links