티스토리 뷰
안녕하세요. Elastic Search에서 진행한 Generative AI search 워크샵을 참관 후 이해한 내용을 간략히 정리했습니다.
LLM의 문제점
- 대규모 데이터 이관 시 할루시네이션 발생
- Search에서 할루시네이션이 발생되지 않고, LLM에서 발생됨
vector search
1. dense vector search : 최근접/이웃 활용
2. sparse vector search : elastic search의 key-value
- 이전에 view했던 페이지 이력 확인해서 검색에 활용
vector 많이 쓰면 메모리 많이 사용, 많은 데이터가 RAM에 있고 데이터 search를 위해서는 CPU에 부하 발생됨
-> 8개 또는 4개로 양자화하는 기술 도입됨
-> Better Binary Quantization(BBQ)
dense vector 사용 : multi-langual
semantic text 라는 data type
Chunking : 임베딩할 때 chunk 단위로 자르는데 , chunk가 크면 연관성이 떨어져서 chunk 단위를 작게 자르는게 semantic text
임베딩 후 숫자
Search AI Inference
/_inference
algorithm for Hybrid Search
1. Reciprocal Rank Fusion(RRF)
2. Linear Weighted Score Fusion(Linear)
multi mode ingestion
- colpali multi-vector
Data -> Search Results -> Reranked Search Results
Rerank를 위해 사용 : Tagging 통해 효율적인 개인화 가능
Elastic Search e5 모델 Hands-on
- GenAI VectorDB & RAG 101 - e5(dense vector, multi-lingual)
vector : 의미를 갖고있는 숫자.
feature vector : 0,1
유사 그룹끼리 묶임
semantic text : text, image 등 데이터의 유사성을 표현
https://github.com/elastic/instruqt-workshops-take-away-assets/blob/main/search/genai-101/Elastic_VectorDB_and_RAG_Workshop_1.pdf
[inference endpoint 생성]
PUT _inference/text_embedding/my-e5-endpoint
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".multilingual-e5-small_linux-x86_64"
}
}
---------------------------------------------------------------------------------------
{
"inference_id": "my-e5-endpoint",
"task_type": "text_embedding",
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".multilingual-e5-small_linux-x86_64"
},
"chunking_settings": {
"strategy": "sentence",
"max_chunk_size": 250,
"sentence_overlap": 1
}
}
4 - Upload a dataset
- data 업로드 후 semantic text 선택
Chat Endpoint
PUT _inference/completion/openai_chat_completions { "service": "openai", "service_settings": { "api_key": "sk-12LS_PcCuEa_oMLi87kpxg", "model_id": "gpt-4o", "url": "https://litellm-proxy-service-1059491012611.us-central1.run.app/v1/chat/completions" } }
Local에 다운받은 데이터를 통해 LLM을 만드는 과정. Chat Bot에게 문의 가능.
Code Editor
- RRF -> 쿼리 -> 중첩(retriver, retrivers)...
## Install the required packages
## pip install -qU elasticsearch openai
import os
from elasticsearch import Elasticsearch
from openai import OpenAI
es_client = Elasticsearch(
"<your-elasticsearch-url>",
api_key=os.environ["ES_API_KEY"]
)
openai_client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
)
index_source_fields = {
"restaurant_reviews": [
"Review_semantic"
]
}
def get_elasticsearch_results():
es_query = {
"retriever": {
"standard": {
"query": {
"nested": {
"path": "Review_semantic.inference.chunks",
"query": {
"knn": {
"field": "Review_semantic.inference.chunks.embeddings",
"query_vector_builder": {
"text_embedding": {
"model_id": "my-e5-endpoint",
"model_text": query
}
}
}
},
"inner_hits": {
"size": 2,
"name": "restaurant_reviews.Review_semantic",
"_source": [
"Review_semantic.inference.chunks.text"
]
}
}
}
}
},
"size": 3
}
result = es_client.search(index="restaurant_reviews", body=es_query)
return result["hits"]["hits"]
def create_openai_prompt(results):
context = ""
for hit in results:
inner_hit_path = f"{hit['_index']}.{index_source_fields.get(hit['_index'])[0]}"
## For semantic_text matches, we need to extract the text from the inner_hits
if 'inner_hits' in hit and inner_hit_path in hit['inner_hits']:
context += '\n --- \n'.join(inner_hit['_source']['text'] for inner_hit in hit['inner_hits'][inner_hit_path]['hits']['hits'])
else:
source_field = index_source_fields.get(hit["_index"])[0]
hit_context = hit["_source"][source_field]
context += f"{hit_context}\n"
prompt = f"""
Instructions:
- You are an assistant for asking about restaurant reviews.
- Answer questions truthfully and factually using only the context presented.
- If you don't know the answer, just say that you don't know, don't make up an answer.
- You must always cite the document where the answer was extracted using inline academic citation style [], using the position.
- Use markdown format for code examples.
- You are correct, factual, precise, and reliable.
Context:
{context}
"""
return prompt
def generate_openai_completion(user_prompt, question):
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": user_prompt},
{"role": "user", "content": question},
]
)
return response.choices[0].message.content
if __name__ == "__main__":
question = "my question"
elasticsearch_results = get_elasticsearch_results()
context_prompt = create_openai_prompt(elasticsearch_results)
openai_completion = generate_openai_completion(context_prompt, question)
print(openai_completion)
Query
- vector query, text embedding,
def get_elasticsearch_results():
es_query = {
"retriever": {
"standard": {
"query": {
"nested": {
"path": "Review_semantic.inference.chunks",
"query": {
"knn": {
"field": "Review_semantic.inference.chunks.embeddings",
"query_vector_builder": {
"text_embedding": {
"model_id": "my-e5-endpoint",
"model_text": query
}
}
}
},
- 위 field 값은 현재 이미지 임베딩, captuning, 이미지 캡셔닝.. 등에 사용
추가 알아둘 개념들
mcp + search...
agent, tools..
참고자료
https://github.com/markpudd
https://github.com/elastic/elasticsearch-labs
'IT > Data' 카테고리의 다른 글
[data] AWS Lake Formation & Redshift Spectrum (0) | 2023.10.28 |
---|---|
[data] Amazon Managed Workflows for Apache Airflow(MWAA) & GCP Cloud Composer (1) | 2023.10.28 |
- Total
- Today
- Yesterday
- security
- SDWAN
- 혼공챌린지
- gcp serverless
- 도서
- VPN
- terraform
- 파이썬
- operator
- S3
- NW
- autoscaling
- PYTHON
- AI
- NFT
- controltower
- k8s calico
- cni
- handson
- GCP
- 혼공파
- IaC
- EKS
- GKE
- 혼공단
- OS
- k8s cni
- cloud
- k8s
- AWS
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |