'연구 노트/R Python' 카테고리의 글 목록

연구 노트/R Python

AI PCA :: (Fog) no difference before/after applying PCA on fog data 2024.05.10
Error in install.packages : Updating loaded packages 2023.09.04
unable to access index for repository https://mran.microsoft.com/snapshot/ 2023.09.04
파이썬 :: 모델링 시간 측정 (코드) 2022.10.06
코드 :: 안개 발생 일수, 시간 계산 2022.09.23
TypeError: 'int' object is not iterable 2022.09.23
머신러닝 :: 이진분류 평가 지표 2022.07.29
머신러닝 :: 의사결정나무 2022.07.29
Avoid Overfitting By Early Stopping With XGBoost In Python 2022.07.28
pd.merge 에서 how='outer' 와 on = 'date' 차이 2022.07.05
MinMaxScaler 수치 원상복구 방법 2022.04.22
UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 2022.04.21
Keras에서 fit() and fit_generator() 차이 2022.04.08
파이썬의 enumerate() 내장 함수로 for 루프 돌리기 (펌) 2022.03.31
Series와 DataFrame의 사칙연산 2022.03.28
리스트(배열) 각 요소들의 값 더하기 2022.03.28
등간격 Time Series 자동으로 채워 넣는 방법 2022.03.18
WPS - SST 넣기 2022.03.17
자동으로 날짜 생성 2022.03.11
df 날짜별로 구분해서 출력하는 법 2022.03.11

AI PCA :: (Fog) no difference before/after applying PCA on fog data

airmaster 2024. 5. 10. 18:05

2024. 5. 10. 18:05

728x90

안동 안개 데이터

35차원 --> 2차원

PCA 적용 전후, 정확도 차이 없음. 똑같음

kernel PCA 적용 전후, 정확도 차이 없음. 똑같음

저작자표시 비영리 변경금지 (새창열림)

Error in install.packages : Updating loaded packages

airmaster 2023. 9. 4. 17:11

2023. 9. 4. 17:11

728x90

백신 프로그램이 임시 저장 폴더에서 이동을 금지시킨 것이 원인

백신을 잠시 끄고 실행시키면 정상적으로 설치가 진행됨.

참조: https://m.blog.naver.com/jjy0501/221300556843

R 패키지 설치 및 업데이트 오류 (1)

R 패키지를 설치하거나 업데이트 하다보면 여러 가지 문제가 생기는 경우들이 있습니다. 이 ...

blog.naver.com

저작자표시 비영리 변경금지 (새창열림)

unable to access index for repository https://mran.microsoft.com/snapshot/

airmaster 2023. 9. 4. 16:56

2023. 9. 4. 16:56

728x90

쌩뚱맞은 microsoft.com 에러?

에러 발생

> install.packages("ggplot2")
Warning in install.packages :
  unable to access index for repository https://mran.microsoft.com/snapshot/2020-07-16/src/contrib:
  URL 'https://mran.microsoft.com/snapshot/2020-07-16/src/contrib/PACKAGES'를 열 수 없습니다
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/chpark/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
Warning in install.packages :
  unable to access index for repository https://mran.microsoft.com/snapshot/2020-07-16/src/contrib:
  URL 'https://mran.microsoft.com/snapshot/2020-07-16/src/contrib/PACKAGES'를 열 수 없습니다
Warning in install.packages :
  package ‘ggplot2’ is not available (for R version 4.0.2)
Warning in install.packages :
  unable to access index for repository https://mran.microsoft.com/snapshot/2020-07-16/bin/windows/contrib/4.0:
  URL 'https://mran.microsoft.com/snapshot/2020-07-16/bin/windows/contrib/4.0/PACKAGES'를 열 수 없습니다

해결

아래 script 실행

local({r <- getOption("repos")
       r["CRAN"] <- "http://cran.r-project.org"
       options(repos=r)})

이후

> install.packages("ggplot2")

실행하면 됨

참고: https://learn.microsoft.com/en-us/answers/questions/544231/unable-to-access-mran-is-there-currently-any-probl

Unable to access MRAN: is there currently any problem on your server? - Microsoft Q&A

Hello, I tried to install a package today from MRAN but it failed. Here is the error message: Warning in install.packages : unable to access index for repository https://mran.microsoft.com/snapshot/2020-07-16/src/contrib: cannot open URL…

learn.microsoft.com

저작자표시 비영리 변경금지 (새창열림)

파이썬 :: 모델링 시간 측정 (코드)

airmaster 2022. 10. 6. 11:14

2022. 10. 6. 11:14

728x90

XGBOOST 작업 중 모델링 시간 측정 코드

# ## 4.4.4 시간 측정 (115)
# from sklearn.ensemble import GradientBoostingClassifier
# from xgboost import XGBClassifier
# from sklearn.metrics import accuracy_score
# import time

# start = time.time()
# df.info()
# end=time.time()
# elapsed = end - start
# print('\n실행시간: ' + str(elapsed) + '초')

# ## 4.4.5 속도비교(152)
# %timeit -n 100 -r 3 sum(np.square(range(10000)))
# %%timeit -n 100 -r 3
# summing = 0
# for i in range(10000):
#     summing += i**2
# ## 그레디언트 부스팅 분류
# # 모델크기 제한을 위해서 max_depth =2, n_estimator=100 으로 설정
# start = time.time()
# gbr = GradientBoostingClassifier(n_estimators = 100, max_depth = 2, random_state=2)
# gbr.fit(X_train, y_train)
# y_pred = gbr.predict(X_test)
# score = accuracy_score(y_pred, y_test)
# print('점수: ' + str(score))
# end = time.time()
# elapsed = end-start
# print('실행시간: ' + str(elapsed) + '초')
## XGB 분류
# ### 부스팅 분야에서 타의 추종을 불허하는 속도를 보이는 모델. GPU_0에서 30배 빠름
# start = time.time()
# xg_reg = XGBClassifier(n_estimators=100, max_depth=2, use_label_encoder=False)
# xg_reg.fit(X_train, y_train)
# y_pred = xg_reg.predict(X_test)
# score = accuracy_score(y_pred, y_test)
# print('점수 : ' +str(score))
# end = time.time()
# elapsed=end-start
# print('실행 시간: ' +str(elapsed) + '초')

저작자표시 비영리 변경금지 (새창열림)

코드 :: 안개 발생 일수, 시간 계산

2022. 9. 23. 16:04

TypeError: 'int' object is not iterable

airmaster 2022. 9. 23. 09:06

2022. 9. 23. 09:06

728x90

에러

TypeError: 'int' object is not iterable

코드

n_row = len(din['Phen_fog'])-1
print(n_row)
for i in n_row:
    if i == 1:
        print(i)

원인

for i in n_row: 에서 n_row 가 list 이어야 하는데, 여기서는 정수(int)로만 되어 있음.

다른 언어에서는 for i in (initial, end , increment) 형태로 되나, Python 에서는 배열 전체가 list 로 들어가 있어야 함.

해결

아래와 같이, din['Phen_fog'] 로 수정하면 에러 해결

for i in din['Phen_fog']:
    if i == 1:
        print(i)

저작자표시 비영리 변경금지 (새창열림)

머신러닝 :: 이진분류 평가 지표

airmaster 2022. 7. 29. 17:50

2022. 7. 29. 17:50

728x90

1. 정확도

실제와 에측이 얼마나 일치하는가로 모델의 성능을 평가함

연속형 자료의 경우

예측값의 일치도를 의미.

정확도가 평가의 대부분임

범주형 자료의 경우

예측 범주의 일치정도를 의미.

정확도 뿐 아니라 그 이면을 세세하게 검토해야 함.

2. 오차행렬(confusion matrix)

오차 행렬은 이진 또는 다중 범주형 레이블의 하위 범주는 세부적으로 살펴보면서 실제 범주와 예측 범주의 일치 혹은 오류를 파악할 수 있는 결과임

	0	1
0	90	10
1	20	80

	음성예측	양성예측
음성 클래스	TN	FP
양성 클래스	FN	TP

정확도 = (90+80)/200 = 85%

이진 분류의 평가 지표

3. 정밀도

양성 예측의 정확도를 의미

4. 재현율

분류기가 정확하게 예측한 양성 샘플의 비율

민감도 또는 진짜 양성 비율이라고도 함.

암진단/범죄여부/불법영상 진단 등

5. f-score

정밀도와 재현율의 조화평균으로 두 지표를 종합적으로 파악

정밀도 = TP/(TP + FP)

재현율 = TP/(TP+FN)

f-score = 2/((1/정밀도) + (1/재현율))

저작자표시 비영리 변경금지 (새창열림)

머신러닝 :: 의사결정나무

airmaster 2022. 7. 29. 17:43

2022. 7. 29. 17:43

728x90

의사 결정 나무

다양한 의사결정 결로와 결과를 놓고 나무 구조를 이용하여 설명하는 것

질문을 던지면서 대상에 접근해 가는 스무고개 놀이와 유사

질문은 조건을 이분법적으로 제시하면서 진행한다.

지도학습 기법으로서 변수의 영역을 게속적으로 분할해 나가면서 집단을 몇개의 소집단으로 분류하거나 예측하는 기법

맨 위쪽에 뿌리 노드로 시작해서 아래로 가면서 가지를 치고 마지막까지 진행한다.

처음에 어떤 분류기준을 선택할 것인가를 결정하는 것은 여러 알고리즘이 있다.

예를 들어, 프로 야구선수중에서 자유게약 선수(Free Agaent: FA)의 연봉을 의사결정 나무로 간단히 그려보자.

의사결저과정에서 나무를 가지고 목표와상황과 상호 관련성을 나타내어 최종 결정을 내린다. 의사결정 규칙을 나물 구조로 도식화 하여 관심대상의 집단을 몇 개의 소집단으로 분류하거나 예측할 수 있다.

if-else 원리로 코딩할 수 있다.

일반적으로 기업에서 의사결정을 내릴때, 어떠한 위험(손실)과 기회(이익)가 있는지 판단하여 최적의 의사결정을 도와주는 프로그램으로 많이 활용한다.

1단계

나무 모형 구축: 분석 목적과 자료구조에 따라 적절한 분리기준과 정지규칙을 정하여 나무를 만들어 나간다.

2단계

가지치기 : 분류 오류를 크게 할 위험이 높거나 부적절한 추론규칙이 내재된 가지는 제거한다.

3단계

분리작업: 더 이상 유효하지 않거나 최소 노드수에 도달할 때까지 분리를 계속한다.

4단계

타당성 평가: 이익 또는 위험 도표나 검정자료를 이용하여 나무 모형의 교차 타당성을 평가한다.

5단계

해석과 예측: 결과를 해석하고 예측을 수행한다.

시각적인 효과는 행동의 결정 뿐만 아니라 미래의 계획을 세우는 데도 유용하게 사용될 수 있다.

그래서 의사결정나무를 적합성나무(relevance tree)라고도 부른다.

저작자표시 비영리 변경금지 (새창열림)

Avoid Overfitting By Early Stopping With XGBoost In Python

airmaster 2022. 7. 28. 12:47

2022. 7. 28. 12:47

728x90

https://machinelearningmastery.com/avoid-overfitting-by-early-stopping-with-xgboost-in-python/

Avoid Overfitting By Early Stopping With XGBoost In Python

Overfitting is a problem with sophisticated non-linear learning algorithms like gradient boosting. In this post you will discover how you can use early stopping to limit overfitting with XGBoost in Python. After reading this post, you will know: About earl

machinelearningmastery.com

저작자표시 비영리 변경금지 (새창열림)

pd.merge 에서 how='outer' 와 on = 'date' 차이

airmaster 2022. 7. 5. 17:14

2022. 7. 5. 17:14

728x90

print(len(df))
date_data = pd.date_range(start='2016-01-01', end='2022-01-01',  freq='H')
dat  = date_data.to_list()
print(len(dat)-1)

52608

tmp = pd.merge(asos, aaos, how="outer")  
print(len(tmp))

52608

앞에서 만든 1시간 간격 날짜 데이터에 맞춰서 merge 됨.

tmp = pd.merge(asos, aaos, on="Date")  
print(len(tmp))

51385

#주의 !! on='Date'를 사용하면 Date 51385 로 자료 없는 곳은 생략하고 merge 됨.

저작자표시 비영리 변경금지 (새창열림)

MinMaxScaler 수치 원상복구 방법

airmaster 2022. 4. 22. 15:20

2022. 4. 22. 15:20

728x90

https://www.inflearn.com/questions/224124

MinMaxScaler 수치 원상복구 방법 문의 - 인프런 | 질문 & 답변

안녕하세요. RNN, LSTM, GRU 파트 집중적으로 공부하고 있습니다. 수치의 단위를 맞추기 위해 MinMaxScaler를 사용하여 0~1사이의 값으로 보이는 결과 그래프까지 확인 하였습니다. 다만, 주가예측도, 주

www.inflearn.com

저작자표시 비영리 변경금지 (새창열림)

UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:

airmaster 2022. 4. 21. 15:59

2022. 4. 21. 15:59

728x90

에러

XGBRegression 모델링 에서 발생

UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:

증상:

nan 으로 나옴

def regression_model(model):
    scores = cross_val_score(model, X_train_std, y_train, scoring='neg_mean_squared_error', cv=kfold)
#     scores = cross_val_score(model, X, y, scoring='mean_squared_error', cv=kfold)
    rmse = (-scores)**0.5
    return rmse.mean()

from xgboost import XGBRegressor
from sklearn.model_selection import cross_val_score
regression_model(XGBRegressor(booster='gblinear'))

해결

위 코드 실행 전에 X에 대해서 표준화를 반드시 할 것!!!!

# 데이터 표준화 X 에 대해서만!!!
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale.fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)

저작자표시 비영리 변경금지 (새창열림)

Keras에서 fit() and fit_generator() 차이

airmaster 2022. 4. 8. 17:21

2022. 4. 8. 17:21

728x90

* Keras에서 학습을 시킬 때 fit()과 fit_generator()의 차이점

- fit()은 sklearn의 fit method와 비슷하다. 전체 dataset을 한번에 fit method로 통과시킨다. 따라서 전체 dataset을 메모리에 로드할 수 있는, 작은 크기의 dataset으로 학습을 시킬때 사용한다.

- fit_generator()는 x와 y를 직접적으로 통과시키지 않고, generator를 통해 데이터를 불러온다. kears 공식 문서를 보면, generator는 Multiprocessing을 진행할 때 데이터 중복을 막기 위해서 사용한다. 이것은 practical purpose를 위한 것이며, 큰 크기의 dataset으로 학습을 시킬때 사용한다.

[출처] Keras에서 fit() and fit_generator()|작성자 킵

DevelopLog - 개발로그 : 네이버 블로그

당신의 모든 기록을 담는 공간

blog.naver.com

저작자표시 비영리 변경금지 (새창열림)

파이썬의 enumerate() 내장 함수로 for 루프 돌리기 (펌)

airmaster 2022. 3. 31. 12:24

2022. 3. 31. 12:24

728x90

https://www.daleseo.com/python-enumerate/

파이썬의 enumerate() 내장 함수로 for 루프 돌리기

Engineering Blog by Dale Seo

www.daleseo.com

저작자표시 비영리 변경금지 (새창열림)

Series와 DataFrame의 사칙연산

airmaster 2022. 3. 28. 12:12

2022. 3. 28. 12:12

728x90

https://truman.tistory.com/88

파이썬(Python) - Series와 DataFrame의 사칙연산

시리즈 사칙연산 인덱스를 기준으로 연산한다. obj1 = Series([1,2,3,4,5], index=['a','b','c','d','e']) obj2 = Series([2,4,6,8,10], index=['a','b','d','f','g']) 일때 1. 더하기 obj1 + obj2 obj1.add(o..

truman.tistory.com