'자연어처리' 태그의 글 목록 (2 Page)

프로그래밍/자연어처리 2017. 3. 3. 11:09

언어별 word2vec 데모 사이트들을 발견하는대로 앞으로 이 포스팅에서 정리하려고 한다. 한국어 word2vec 데모 사이트 (1) 바로가기 한국어 word2vec/doc2vec 데모 사이트 바로가기 핀란드/영어 word2vec 데모 사이트 바로가기

sklearn CountVectorizer 클래스 사용법

프로그래밍/자연어처리 2017. 2. 9. 13:34

Sklearn CountVectorizer 클래스 사용법 CountVectorizer 는 문서를 token count matrix로 변환하는 클래스입니다. 여기서 feature는 문장의 토큰 단위로 아래 TfidfVectorizer함수의 analyzer, tokenizer, token_pattern, stop_words 등의 분석 단계를 거쳐 나온 토큰들을 의미합니다. CountVectorizer 클래스 설명 문서 바로가기 feature extraction 설명 문서 바로가기 class sklearn.feature_extraction.text.CountVectorizer(input=u'content', encoding=u'utf-8', decode_error=u'strict', strip_accents=..

sklearn TF-IDF vectorizer 사용 예시

프로그래밍/자연어처리 2017. 2. 8. 15:36

Sklearn TfidfVectorizer 클래스 사용법 TF-IDF vectorizer 는 문서를 tf-idf의 feature matrix로 변환하는 클래스입니다. 문서에 CountVectorizer를 사용하고 TfidfTransformer를 사용한 것과 똑같은 결과를 가집니다. ,where is the total number of documents, and is the number of documents that contain term . The resulting tf-idf vectors are then normalized by the Euclidean norm:. 여기서 feature는 문장의 토큰 단위로 아래 TfidfVectorizer함수의 analyzer, tokenizer, token_pa..

ABOUT ME

you've got to find what you love. you've got to find what you love.

티스토리툴바