리눅스(CentOS) 에서 nutch & solr 사용 예시

정상적으로 solr가 재 시작되는지는 아래의 url에 접속해보면 알 수 있다.

http://localhost:8983/solr/#/ or http://domain or ip:8983/solr/#/

solr가 정상적으로 재시작되었으므 아까 저장해 둔 데이터를 색인해보자.

[root]# bin/nutch solrindex http://127.0.0.1:8983/solr/ TestCrawl/crawldb -linkdb TestCrawl/linkdb TestCrawl/segments/*
Indexer: starting at 2014-05-06 01:08:20
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication

Indexer: finished at 2014-05-06 01:09:10, elapsed: 00:00:50

정상적으로 색인이 끝났다. 참고로 색인된 데이터 확인은 solr 관리 페이지에서도 가능하다.

collection1의 query를 선택하여 전체 검색하면 nutch 사이트의 content, title, url등이 수집된 것을 확인할 수 있다.

728x90

저작자표시

'프로그래밍 > 검색' 카테고리의 다른 글

[elasticsearch] open / close /delete index (0)	2015.01.25
[ elasticsearch] HTTP content length exceeded 104857600 bytes. (2)	2015.01.13
[ 엘라스틱서치] HTTP content length exceeded 104857600 bytes. (0)	2015.01.12
elasticsearch 에러 - FileNotFoundException, too many open file (열린 파일이 너무 많음)]; "}}, (1)	2014.07.22
리눅스에서 elasticsearch 1.2.1 설치하기 (0)	2014.06.29

ABOUT ME

you've got to find what you love. you've got to find what you love.

'프로그래밍 > 검색' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바