ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 오픈소스 word2vec-api 사용 후기
    프로그래밍/자연어처리 2017. 6. 28. 21:29
    728x90
    반응형




    GitHub 바로가기 


    모델 로딩하기 





    /home/wiki/word2vec-api# python word2vec-api.py --model GoogleNews-vectors-negative300.bin --binary BINARY --path /word2vec --host 0.0.0.0 --port 5000

    ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.





    단어와 단어 사이의 유사도 구하기 




    curl "http://127.0.0.1:5000/word2vec/similarity?w1=Sushi&w2=Japanese"

    0.33475740674630033




    가장 비슷한 단어들 구하기 


    기본 옵션은 10개. topn옵션으로 나오는 개수 조절 가능하다.

     


    curl "http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food"

    [["chinese", 0.49355363845825195], ["rajma_chawal", 0.4719219505786896], ["golgappas", 0.4700218737125397], ["bhel_puri", 0.4685877859592438], ["vegeterian", 0.4674953818321228], ["khana", 0.46610206365585327], ["mexican", 0.46595215797424316], ["daal_chawal", 0.46591341495513916], ["Rajasthani_cuisine", 0.46537309885025024], ["chicken_biriyani", 0.4648185074329376]]




     curl "http://127.0.0.1:5000/word2vec/most_similar?positive=indian&positive=food&topn=20"

    [["chinese", 0.49355363845825195], ["rajma_chawal", 0.4719219505786896], ["golgappas", 0.4700218737125397], ["bhel_puri", 0.4685877859592438], ["vegeterian", 0.4674953818321228], ["khana", 0.46610206365585327], ["mexican", 0.46595215797424316], ["daal_chawal", 0.46591341495513916], ["Rajasthani_cuisine", 0.46537309885025024], ["chicken_biriyani", 0.4648185074329376], ["daal", 0.46415504813194275], ["dahi", 0.4634069502353668], ["idli_dosa", 0.4621509313583374], ["Mughlai", 0.46195879578590393], ["pithas", 0.46179938316345215], ["pongal", 0.4604540467262268], ["kachoris", 0.46040695905685425], ["bhindi", 0.46031108498573303], ["vadas", 0.46014922857284546], ["tenderloin_sandwich", 0.45940056443214417]]






    이 함수는 정확히 뭐 하는건지 모르겠다;;




    curl http://127.0.0.1:5000/word2vec/model?word=restaurant

    "oFZJvTg5KL1D2188TLuNPXvpJL3YZI49bFeWvL8RjbzOo5s94I7aPIzzHj0zaIw7MFl6PTka7b3+sne8asDvPEPb37wJC5E7hwPIPKBWyby4yIU9CHRqvVwu4Tu5qUq9AEqeu6VGIL2HA8g8vnpmPWrA7z0ASh49XC5hvQeTpb1+QtU8SKx7PTkabb3vIOm8B5MlPk2cUjx5cTm9UwQVvb56Zr1C+pq87yDpPPnh2z1h//y8W02cvb8RDb62UJq6XcWHPKVGoD2v6Nc7SKx7PTRJ0TyWtJG9lrQRvW+wxrsiXpI9CHTqPL2ZIb10gWK9zCswvdu9PjxzoJ29rgeTvXlxOT3Oo5s8181nPBoVi73MK7A9bs+Bve8g6TyCMiw9eXE5PBYltLzfrRW+x1oUvWxXFr59YZA9nGZyOv3Rsr1HyzY9uanKvVV8gL2nvgs9vnrmvUy7DTyN1OO9dIHiPT4KxDy+eua9A6NOvE2c0r390bK9L3g1vM0Mdb2z2C69qhc8vS94Nb2HA0g9c6AdvU2c0r1SbW69Il6SPcg7WTvW7KI9x1qUvXlS/jyN1GO9tLnzu1GMKb1sV5a9r+hXPJHEurw0SdG97yBpPTuxk705Gu082GQOvUise73bvb68RHKGvSwfBb2HA0i80fzLPEy7jTwXBnk8vnrmu+4/pDubhS08ORptPBcG+Tzgjtq8gjKsPfQQQD1gHri8eXG5POCO2jyRxLq8O7ETPcwrMD0wWXq9ODkovRcGeb1VfIC8Um3uO+CO2rwJC5G9dIFiPWQOj70RVJg94I5aPbS58z2M8548nGbyPHOgHTy5qco6ualKPO4/pDzMKzC9vnrmvLZQmr0TzIM9GJ0fuxYltLzqT828s9guveZfdr3vIGm8TLsNPSmnmb0RVJi9pSflvBYltDwYnZ+9LB+FvbjIhT1Hy7Y9272+PWrfqj0veDU9GhULvURyBj00SVE8272+ve4/JD0HkyW9qhc8vEy7DT4SNd287yDpvaoXPDziJYE9c6CdvDBZej3RGwe97yBpPdu9vjxl79O6FwZ5O2rAb7zMK7C9+niCPVJt7jn+snc6+AAXvAkLkb0JCxE9SKx7PSW3wjuuBxM79BDAvOIlAT2+emY8lpVWPZ39GL0YnZ89YpYjvUis+zt+QlW9Um1uPFZdRT3gjto8PgrEvHlS/rvmX3Y9Ze9TPTkabb0Hk6U9vxGNvY3U47uv6Fc9tLlzvY5rCr7W7KI9p76LPRFUmL3R/Ms9360VPTBZer10geI87j+kPRidH73vIGk9+eHbu1wuYbyRxLq9CHTqPcg72bycZnI9SKz7PLmpSryv6Fc9vZmhPVV8gL2/EQ29272+vZuFrTwwWXq8e+mkvW7PgTxvsMa8hKoXO6Un5TuHIgM9A6POPHSBYjyTPKa9nf2YvQh06rxRjCk7YpajvGrfKryHA8i9NElRPMJqPTyd/Zg7kzwmvQxkQb1EcoY9pUYgvnSB4rsveDU9TZxSvcdalD36eIK9ualKvP6yd72lRqA9lpVWvVtNHD74ABc8c6CdPDNojDtiliM+bs8BPrjIBb3W7KI8VXwAvZHEOj1C+ho9"




    이 함수도 정확히 뭐 하는지 잘 모르겠다.. 

    모델내에 있는 단어들 세트를 다 보여주는 것 같은데 

    클릭하면 엄청난 문자열 러시가 쏟아진다.  ㄷㄷ





    curl "http://127.0.0.1:5000/word2vec/model_word_set"






    위 함수들중 내가 자주 쓸것 같은 함수들만 추리고

    한국어 처리 부분만 추가해서 word2vec 데모 사이트에 올려두었다. 




    한국어 word2vec 데모 사이트 바로가기







    728x90
    반응형
Designed by Tistory.