티스토리 뷰

IT/Data

[vector DB]Elastic Search

Hayley Shim 2025. 9. 21. 11:51

안녕하세요. Elastic Search에서 진행한 Generative AI search 워크샵을 참관 후 이해한 내용을 간략히 정리했습니다.

 

 

LLM의 문제점

- 대규모 데이터 이관 시 할루시네이션 발생

- Search에서 할루시네이션이 발생되지 않고, LLM에서 발생됨


vector search
1. dense vector search : 최근접/이웃 활용
2. sparse vector search : elastic search의 key-value 
- 이전에 view했던 페이지 이력 확인해서 검색에 활용

vector 많이 쓰면 메모리 많이 사용, 많은 데이터가 RAM에 있고 데이터 search를 위해서는 CPU에 부하 발생됨
-> 8개 또는 4개로 양자화하는 기술 도입됨
-> Better Binary Quantization(BBQ)


dense vector 사용 : multi-langual

semantic text 라는 data type 
Chunking : 임베딩할 때 chunk 단위로 자르는데 , chunk가 크면 연관성이 떨어져서 chunk 단위를 작게 자르는게 semantic text
임베딩 후 숫자 

Search AI Inference
/_inference 

algorithm for Hybrid Search
1. Reciprocal Rank Fusion(RRF)

2. Linear Weighted Score Fusion(Linear)



multi mode ingestion
- colpali multi-vector

Data -> Search Results -> Reranked Search Results
Rerank를 위해 사용 : Tagging 통해 효율적인 개인화 가능



Elastic Search e5 모델 Hands-on
- GenAI VectorDB & RAG 101 - e5(dense vector, multi-lingual)


vector : 의미를 갖고있는 숫자.
feature vector : 0,1 
유사 그룹끼리 묶임

semantic text : text, image 등 데이터의 유사성을 표현

Elastic_VectorDB_and_RAG_Workshop_1.pdf
6.53MB


https://github.com/elastic/instruqt-workshops-take-away-assets/blob/main/search/genai-101/Elastic_VectorDB_and_RAG_Workshop_1.pdf


[inference endpoint 생성]
PUT _inference/text_embedding/my-e5-endpoint
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
"model_id": ".multilingual-e5-small_linux-x86_64"
  }
}

---------------------------------------------------------------------------------------
{
  "inference_id": "my-e5-endpoint",
  "task_type": "text_embedding",
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small_linux-x86_64"
  },
  "chunking_settings": {
    "strategy": "sentence",
    "max_chunk_size": 250,
    "sentence_overlap": 1
  }
}

4 - Upload a dataset
- data 업로드 후 semantic text 선택

 

 

Chat Endpoint 

 

PUT _inference/completion/openai_chat_completions { "service": "openai", "service_settings": { "api_key": "sk-12LS_PcCuEa_oMLi87kpxg", "model_id": "gpt-4o", "url": "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" } }

 

{
  "inference_id": "openai_chat_completions",
  "task_type": "completion",
  "service": "openai",
  "service_settings": {
    "model_id": "gpt-4o",
    "rate_limit": {
      "requests_per_minute": 500
    }
  }
}

 

 

 

 

 

 

Local에 다운받은 데이터를 통해 LLM을 만드는 과정. Chat Bot에게 문의 가능.

 

 

 

Code Editor

 

 

- RRF -> 쿼리 -> 중첩(retriver, retrivers)...

 

## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
    "<your-elasticsearch-url>",
    api_key=os.environ["ES_API_KEY"]
)
      
openai_client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
    "restaurant_reviews": [
        "Review_semantic"
    ]
}
def get_elasticsearch_results():
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "Review_semantic.inference.chunks",
                        "query": {
                            "knn": {
                                "field": "Review_semantic.inference.chunks.embeddings",
                                "query_vector_builder": {
                                    "text_embedding": {
                                        "model_id": "my-e5-endpoint",
                                        "model_text": query
                                    }
                                }
                            }
                        },
                        "inner_hits": {
                            "size": 2,
                            "name": "restaurant_reviews.Review_semantic",
                            "_source": [
                                "Review_semantic.inference.chunks.text"
                            ]
                        }
                    }
                }
            }
        },
        "size": 3
    }
    result = es_client.search(index="restaurant_reviews", body=es_query)
    return result["hits"]["hits"]
def create_openai_prompt(results):
    context = ""
    for hit in results:
        inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
        ## For semantic_text matches, we need to extract the text from the inner_hits
        if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
            context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
        else:
            source_field = index_source_fields.get(hit["_index"])[0]
            hit_context = hit["_source"][source_field]
            context += f"{hit_context}\n"
    prompt = f"""
  Instructions:
  
  - You are an assistant for asking about restaurant reviews.
  - Answer questions truthfully and factually using only the context presented.
  - If you don't know the answer, just say that you don't know, don't make up an answer.
  - You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
  - Use markdown format for code examples.
  - You are correct, factual, precise, and reliable.
  
  Context:
  {context}
  
  """
    return prompt
def generate_openai_completion(user_prompt, question):
    response = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": user_prompt},
            {"role": "user", "content": question},
        ]
    )
    return response.choices[0].message.content
if __name__ == "__main__":
    question = "my question"
    elasticsearch_results = get_elasticsearch_results()
    context_prompt = create_openai_prompt(elasticsearch_results)
    openai_completion = generate_openai_completion(context_prompt, question)
    print(openai_completion)

 

Query

- vector query, text embedding, 

def get_elasticsearch_results():
    es_query = {
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "Review_semantic.inference.chunks",
                        "query": {
                            "knn": {
                                "field": "Review_semantic.inference.chunks.embeddings",
                                "query_vector_builder": {
                                    "text_embedding": {
                                        "model_id": "my-e5-endpoint",
                                        "model_text": query
                                    }
                                }
                            }
                        },

 

- 위 field 값은 현재 이미지 임베딩, captuning, 이미지 캡셔닝.. 등에 사용

 

추가 알아둘 개념들

 

mcp + search...

 

agent, tools..

 

 

 

 

참고자료 

 

https://github.com/markpudd
https://github.com/elastic/elasticsearch-labs

최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
«   2025/10   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함