★상단/하단 구분시 찾으려는 가설의 부등호 위치에 주목하자★표본평균의 차에 대한 가설 검정★기초통계학-[연습문제 05 -13]

2023. 1. 16. 17:08

728x90

20. 12세 이하의 남자아이가 여자아이에 비해 주당 TV 시청 시간이 더 많은지 알아보기 위하여 조사한 결과 다음과 같다. 이것을 근거로 남자아이가 여자아이보다 TV를 더 많이 시청하는지 유의수준 5%에서 조사하라.

A = pd.DataFrame(columns = ['표본평균' , '모표준편차' , '표본의크기'] , index = ['남자아이' , '여자아이'])
A['표본평균'] = [14.5 , 13.7 ]
A['모표준편차'] = [2.1, 2.7]
A['표본의크기'] = [48 , 42]
A

H_0 : X <= Y (상단측 검정)

대립가설 : X>Y

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도

MEANS_A = 14.5
MEANS_B = 13.7
MEANS = round(MEANS_A-MEANS_B ,3)
std_a = 2.1
std_b = 2.7
n_a = 48
n_b = 42
STDS = round(math.sqrt( (std_a**2/n_a) + (std_b**2/n_b)),4)



ax.set_title('상단측검정' ,fontsize = 18)



#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf((trust/100)) ,3 )
print(t_1)

ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x<=t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = scipy.stats.norm.cdf(t_1)

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP(Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
# ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

# ax.text(1 + t_1 , annotate_len , r'$z_{\alpha/2} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역',fontsize=15)
area = 1-area
ax.text(1 + t_1  , annotate_len , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역'+f'\n' +r'$\alpha = {%.3f}$' % area,fontsize=15)


# plt.annotate('' , xy=(t_1, annotate_len-0.02), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_1, annotate_len), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))

ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left( \overline{x}-\overline{y}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' +' , '+ r'$\left( \overline{x}-\overline{y}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .2 , f'표준오차 :' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' + f'= {round(STDS,3)}' , fontsize = 15)

ax.text(-5.5 , .15,  r'오차한계 = ${%.3f}\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}} = $' % t_1 + f'{round(t_1 * STDS , 3)}',fontsize=15)


ax.text(2 , .3,  r'$\overline{X}_{1}$ = ' +f'{MEANS_A}'+   r' , $\overline{X}_{2}$ = ' +f'{MEANS_B}\n' + r'$\sigma_{1} = $' + f'{std_a}' + r', $\sigma_{2} = $' + f'{std_b}\n' + r'$\sqrt{n} = $' + f'{round(math.sqrt(n_a),3)}' + r', $\sqrt{m} = $' + f'{round(math.sqrt(n_b),3)}\n'
        + r'Z = $\dfrac{\overline{X}_{1} - \overline{X}_{2}} {\sqrt{\dfrac{\sigma_{1}^{2}}{n} + \dfrac{\sigma_{2}^{2}}{m}}}$'
        + f'= {round(MEANS / STDS,4)}' ,fontsize=15)


#============================================표본평균의 정규분포화 =========================================================


z_1 = round(MEANS/ STDS ,4)
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )
z_1 = abs(z_1)
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1)  , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x>=t_1) , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- scipy.stats.norm.cdf(z_1)


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
# plt.annotate('' , xy=(z_1, annotate_len), xytext=(z_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=( z_1 ,  annotate_len), xytext=(0, annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1 , annotate_len+0.01 , f'p-value \n P(Z>={z_1})  = {round(area,4)}',fontsize=15)


b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')

P-value = 0.0606

alpha = 0.05

H_0 : X <= Y (상단측 검정) ==> 남자의 TV시청 시간이 여자보다 작거나 같다.

p-value > alpha ==> 0.0606 > 0.05 ==> 귀무가설 채택 ==> 남자의 TV시청 시간이 여자보다 작거나 같다.

21. 어느 패스트푸드 가게에서 근무하는 종업원 A의 서비스 시간이 종업원 B보다 긴지 알아보기 위하여 조사한 결과 다음과 같았다. 이것을 근거로 종업원 A가 종업원 B보다 서비스 시간이 긴지 유의수준 1%에서 조사하라.

A = pd.DataFrame(columns = ['표본평균' , '모표준편차' , '표본의크기'] , index = ['종업원 A' , '종업원 B'])
A['표본평균'] = [4.5 , 4.3 ]
A['모표준편차'] = [0.45, 0.42]
A['표본의크기'] = [50 ,80]
A

H_0 : M_A <= M_B (상단측 검정)

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도

MEANS_A = 4.5
MEANS_B = 4.3
MEANS = round(MEANS_A-MEANS_B ,3)
std_a = 0.45
std_b = 0.42
n_a = 50
n_b = 80
STDS = round(math.sqrt( (std_a**2/n_a) + (std_b**2/n_b)),4)



ax.set_title('상단측검정' ,fontsize = 18)



#==========================================귀무가설 기각과 채택 ====================================================
trust = 99 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf((trust/100)) ,3 )
print(t_1)

ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x<=t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = scipy.stats.norm.cdf(t_1)

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP(Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
# ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

# ax.text(1 + t_1 , annotate_len , r'$z_{\alpha/2} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역',fontsize=15)
area = 1-area
ax.text(1 + t_1  , annotate_len , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역'+f'\n' +r'$\alpha = {%.3f}$' % area,fontsize=15)


# plt.annotate('' , xy=(t_1, annotate_len-0.02), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_1, annotate_len), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))

ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left( \overline{x}-\overline{y}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' +' , '+ r'$\left( \overline{x}-\overline{y}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .2 , f'표준오차 :' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' + f'= {round(STDS,3)}' , fontsize = 15)

ax.text(-5.5 , .15,  r'오차한계 = ${%.3f}\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}} = $' % t_1 + f'{round(t_1 * STDS , 3)}',fontsize=15)


ax.text(2 , .3,  r'$\overline{X}_{1}$ = ' +f'{MEANS_A}'+   r' , $\overline{X}_{2}$ = ' +f'{MEANS_B}\n' + r'$\sigma_{1} = $' + f'{std_a}' + r', $\sigma_{2} = $' + f'{std_b}\n' + r'$\sqrt{n} = $' + f'{round(math.sqrt(n_a),3)}' + r', $\sqrt{m} = $' + f'{round(math.sqrt(n_b),3)}\n'
        + r'Z = $\dfrac{\overline{X}_{1} - \overline{X}_{2}} {\sqrt{\dfrac{\sigma_{1}^{2}}{n} + \dfrac{\sigma_{2}^{2}}{m}}}$'
        + f'= {round(MEANS / STDS,4)}' ,fontsize=15)


#============================================표본평균의 정규분포화 =========================================================


z_1 = round(MEANS/ STDS ,4)
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )
z_1 = abs(z_1)
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1)  , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x>=t_1) , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- scipy.stats.norm.cdf(z_1)


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
# plt.annotate('' , xy=(z_1, annotate_len), xytext=(z_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=( z_1 ,  annotate_len), xytext=(0, annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1 , annotate_len+0.01 , f'p-value \n P(Z>={z_1})  = {round(area,4)}',fontsize=15)


b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')

P-value = 0.0057

alpha = 0.01

H_0 : M_A <= M_B (상단측 검정) ==> A의 근무시간이 B보다 작거나 같다.

p-value < alpha ==> 0.0057 < 0.01 ==> 귀무가설 기각 ==> A의 근무시간이 B보다 길다.

22. 울산 지역의 1인당 평균 소득이 서울보다 150만원 이상 더 많은지 알아보기 위하여 두 도시에 거주하는 사람을 각각 100명씩 임의로 선정하여 조사한 결과 다음 표와 같았다. 이것을 근거로 울산 지역의 평균 소득이 서울보다 150만원 이상 더 많은지 유의수준 5%에서 조사하라.

A = pd.DataFrame(columns = ['평균' , '표준편차' ] , index = ['울산' , '서울'])
A['평균'] = [1854 , 1684 ]
A['표준편차'] = [69.9, 73.3]
A

H_0 : 울산- 서울 >= 150 (상단측 검정) ==> 찾으려는 가설의 부등호 위치에 주목하자.

x = np.arange(-5,5 , .001)

fig = plt.figure(figsize=(15,8))
ax = sns.lineplot(x , stats.norm.pdf(x, loc=0 , scale =1)) #정의역 범위 , 평균 = 0 , 표준편차 =1 인 정규분포 플롯

# trust = 95 #신뢰도

MEANS_A = 1854
MEANS_B = 1684 + 150
MEANS = round(MEANS_A-MEANS_B ,3)
std_a = 69.9
std_b = 73.3
n_a = 100
n_b = 100
STDS = round(math.sqrt( (std_a**2/n_a) + (std_b**2/n_b)),4)



ax.set_title('상단측검정' ,fontsize = 18)



#==========================================귀무가설 기각과 채택 ====================================================
trust = 95 #신뢰도_유의수준

t_1  = round(scipy.stats.norm.ppf((trust/100)) ,3 )
print(t_1)

ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x<=t_1) , facecolor = 'orange') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = scipy.stats.norm.cdf(t_1)

plt.annotate('' , xy=(0, .2), xytext=(2 , .2)  , arrowprops = dict(facecolor = 'black'))
ax.text(2 , .2, r'$H_{0}$의 채택역' +f'\nP(Z<={t_1}) : {round(area,4)}',fontsize=15)

annotate_len = stats.norm.pdf(t_1, loc=0 , scale =1) /2
# ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))
ax.vlines(x= t_1, ymin= 0 , ymax= stats.norm.pdf(t_1, loc=0 , scale =1) , color = 'green' , linestyle ='solid' , label ='{}'.format(2))

# ax.text(1 + t_1 , annotate_len , r'$z_{\alpha/2} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역',fontsize=15)
area = 1-area
ax.text(1 + t_1  , annotate_len , r'$z_{\alpha} = $' + f'{t_1}\n' + r'$H_{0}$의 기각역'+f'\n' +r'$\alpha = {%.3f}$' % area,fontsize=15)


# plt.annotate('' , xy=(t_1, annotate_len-0.02), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=(t_1, annotate_len), xytext=(t_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))

ax.text(-5.5 , .25, f'신뢰구간 모평균에 대한 {trust}% : \n' + r'$\left( \overline{x}-\overline{y}\right)$' +f'-{t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' +' , '+ r'$\left( \overline{x}-\overline{y}\right)$' +f'+ {t_1}' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$',fontsize=15)

ax.text(-5.5 , .2 , f'표준오차 :' + r'$\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}}$' + f'= {round(STDS,3)}' , fontsize = 15)

ax.text(-5.5 , .15,  r'오차한계 = ${%.3f}\sqrt{\dfrac{\sigma_{1}^{2}}{n}+\dfrac{\sigma_{2} ^{2}}{m}} = $' % t_1 + f'{round(t_1 * STDS , 3)}',fontsize=15)


ax.text(2 , .3,  r'$\overline{X}_{1}$ = ' +f'{MEANS_A}'+   r' , $\overline{X}_{2}$ = ' +f'{MEANS_B}\n' + r'$\sigma_{1} = $' + f'{std_a}' + r', $\sigma_{2} = $' + f'{std_b}\n' + r'$\sqrt{n} = $' + f'{round(math.sqrt(n_a),3)}' + r', $\sqrt{m} = $' + f'{round(math.sqrt(n_b),3)}\n'
        + r'Z = $\dfrac{\overline{X}_{1} - \overline{X}_{2}} {\sqrt{\dfrac{\sigma_{1}^{2}}{n} + \dfrac{\sigma_{2}^{2}}{m}}}$'
        + f'= {round(MEANS / STDS,4)}' ,fontsize=15)


#============================================표본평균의 정규분포화 =========================================================


z_1 = round(MEANS/ STDS ,4)
# # z_2 = round((34.5 - 35) / math.sqrt(5.5**2 / 25) , 2)
# z_1 = round(scipy.stats.norm.ppf(1 - (1-(trust/100))/2) ,3 )
z_1 = abs(z_1)
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where = (x>=z_1)  , facecolor = 'skyblue') # x값 , y값 , 0 , x<=0 인곳 , 색깔
ax.fill_between(x, stats.norm.pdf(x, loc=0 , scale =1) , 0 , where =  (x>=t_1) , facecolor = 'red') # x값 , y값 , 0 , x<=0 인곳 , 색깔
area = 1- scipy.stats.norm.cdf(z_1)


ax.vlines(x= z_1, ymin= 0 , ymax= stats.norm.pdf(z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))
# ax.vlines(x= -z_1, ymin= 0 , ymax= stats.norm.pdf(-z_1, loc=0 , scale =1) , color = 'black' , linestyle ='solid' , label ='{}'.format(2))

annotate_len = stats.norm.pdf(z_1, loc=0 , scale =1) /2
# plt.annotate('' , xy=(z_1, annotate_len), xytext=(z_1+ 1 , annotate_len)  , arrowprops = dict(facecolor = 'black'))
plt.annotate('' , xy=( z_1 ,  annotate_len), xytext=(0, annotate_len)  , arrowprops = dict(facecolor = 'black'))
ax.text(-1 , annotate_len+0.01 , f'p-value \n P(Z>={z_1})  = {round(area,4)}',fontsize=15)


b= 'N(0,1)'

plt.legend([b] , fontsize = 15 , loc='upper left')

P-value = 0.0242

alpha = 0.05

H_0 : 울산- 서울 >= 150 (상단측 검정) ==> 울산의 1인당 평균소득이 서울보다 150만원 이상이다.

p-value < alpha ==> 0.0242 < 0.05 ==> 귀무가설 기각 ==> 울산의 1인당 평균소득이 서울보다 150만원 미만이다.

728x90

'기초통계 > 대표본 가설검정' 카테고리의 다른 글

★표본비율의 차에 따른 검정★합동표본비율★표본분산 모를때는 표본비율의 차로 검정★기초통계학-[연습문제 06 -14] (0)	2023.01.16
★귀무가설 설정 중요!!!!!!!(부등호가 존재)★표본평균의 차에 대한 가설 검정★모평균의 차에 대한 가설 검정★기초통계학-[연습문제 04 -12] (0)	2023.01.16
★표본비율의 정규분포화★모비율의 가설 검정★기초통계학-[연습문제 03 -11] (0)	2023.01.16
★모평균의 가설 검정★기초통계학-[연습문제 02 -10] (1)	2023.01.16
★=이 있는것이 귀무가설★모평균의 양측검정★모평균 <= \|X 상단측 검정★모평균 >= \|X 하단측검정★기초통계학-[연습문제 01 -09] (2)	2023.01.16

뭐든지 다 알아보자

Menu

Category

Notice

Recent comments

Links