Tsarin Harshen Halitta(NLP) a cikin Elasticsearch
Gudanar da Harshen Halitta a cikin Elasticsearch ya ƙunshi matakai masu mahimmanci don canzawa da tsaftace rubutun shigarwa a cikin shirye-shiryen bincike da tambaya. A ƙasa akwai wasu hanyoyin sarrafa harshe na halitta a cikin Elasticsearch:
Tokenization
Tokenization shine tsarin rarraba rubutu zuwa ƙananan raka'a da ake kira tokens
. Kowace alama yawanci kalma ce ko ƙaramar magana. Alamar rubutu yana taimakawa saurin bincike da tambaya a cikin Elasticsearch.
Misali: Rubutun Elasticsearch kayan aiki ne mai ƙarfi da bincike. za a shiga cikin: Elasticsearch, is
, a
, powerful
, search
, da analytics
, tool
.
Turi
Stemming shine tsarin juya kalmomi zuwa tushe ko tushen su. Manufar ita ce daidaita kalmomi tare da tushe guda ɗaya, yana taimakawa ƙarin ingantaccen sakamakon bincike.
Misali: Za a canza kalmomin running
, runs
, ran
zuwa sigar tushe run
.
Dakatar da Cire Kalmomi
Tsaida kalmomin gama gari ne kuma kalmomin da ke faruwa akai-akai, kamar is
, the
, da a
. Elasticsearch yana cire kalmomin tsayawa daga rubutu don rage girman fihirisa da inganta aikin bincike.
Misali: A cikin jumlar fox mai launin ruwan kasa mai sauri ta yi tsalle kan karen malalaci. kalmomin tsayawa the
kuma over
za a cire su.
Makamantu
Gano ma'ana guda don faɗaɗa sakamakon bincike. Elasticsearch ana iya daidaita su don sarrafa ma'anar ma'ana da dawo da sakamako daidai.
Misali: Idan mai amfani ya nemo big
, Elasticsearch zai iya dawo da sakamakon da ya ƙunshi duka biyun large
da huge
.
Haɗin Kan Kalma
Sarrafa hadaddun kalmomi ko haɗe-haɗe kalmomi a cikin harsuna masu haɗaka. Elasticsearch zai iya nazarin kalmomi masu haɗaka zuwa sassa daban-daban don sauƙin bincike.
schwimmbad
Misali: A cikin Jamusanci, ana iya nazarin kalmar hadaddiyar giyar(wajan wanka) zuwa schwimm
da bad
.
Binciken jumla a ciki Elasticsearch
Binciken Jumla hanya ce ta musamman ta bincike a cikin Elasticsearch, tana mai da hankali kan nemo takamaiman jimlolin da suka bayyana a jere kuma a daidai tsari a cikin rubutu. Wannan yana tabbatar da ingantaccen sakamakon bincike mai inganci.
Misali: Idan akwai rubutu Elasticsearch kayan aiki ne mai ƙarfi na bincike da nazari., lokacin yin binciken jumla tare da kalmar "bincike da nazari", Elasticsearch kawai zai dawo da rubutun da ke ɗauke da wannan jimlar cikin tsari daidai, kamar rubutun da aka ambata a sama.
Don yin phrase
bincike a cikin Elasticsearch, zaku iya amfani da ko dai tambayar Match Phrase ko Match Phrase Prefix
tambayar, ya danganta da buƙatun ku. Tambayar Match Phrase
za ta nemo madaidaicin phrase
, yayin da Match Phrase Prefix
tambayar ta ba da damar yin wani juzu'i na madaidaicin kalma ta ƙarshe.