To translate "Developer Guide to start with AI-ML: Pdf to ChatBot" into simplified Chinese while keeping HTML structure, you can use the following: ```html

开发者指南:从PDF到ChatBot的AI-ML入门

``` This HTML snippet maintains the structure while providing the translated text in simplified Chinese.

Sure, here's the translation of "Working with models, embeddings and vector database on local machine" in simplified Chinese while keeping HTML structure intact: ```html 在本地机器上使用模型、嵌入和向量数据库 ``` This HTML structure wraps the translated text within a `` tag, preserving the original structure specified.

介绍:

Certainly! Here's the translation of the provided text into simplified Chinese, keeping the HTML structure intact: ```html

在本博客中,我们将涵盖常见的机器学习术语,并构建一个简单的应用程序,允许用户免费从其本地机器上的PDF文件中提问。我已经在简历PDF上进行了测试,似乎提供了良好的结果。虽然可以进行进一步的优化以获得更好的性能,但这超出了本博客的范围。

``` This HTML snippet contains the translated text in simplified Chinese.

Sure, here's the translated text in simplified Chinese while maintaining HTML structure: ```html

什么是模型?在人工智能(AI)和机器学习(ML)的背景下,“模型”指的是在数据上训练的数学表示或算法,用于执行特定任务。这些模型学习数据中的模式和关系,并利用这些知识进行预测、分类或决策。

```

Sure, here's the simplified Chinese translation of the text while keeping the HTML structure intact: ```html

什么是嵌入?简单来说,嵌入是数据的紧凑数值表示形式。它们将复杂数据(如单词、句子或图像)转换为一组数字(向量),捕捉数据内的关键特征和关系。这使得机器学习模型更容易理解和处理数据。

``` This HTML snippet translates the provided English text into simplified Chinese and maintains the structure for easy integration into web pages or documents.

Sure, here is the translation of your text into simplified Chinese, while keeping the HTML structure: ```html

什么是嵌入模型?嵌入模型是一种工具,它将复杂数据(如单词、句子或图片)转换为称为嵌入的简单数值形式。

``` This HTML snippet retains the structure and translates the text correctly into simplified Chinese.

Here's the simplified Chinese translation of the provided text, keeping the HTML structure: ```html

什么是向量数据库?向量数据库是专门设计用于高效存储、索引和查询高维向量的数据库。这些向量通常由机器学习模型生成,以数值格式表示文本、图像或音频等数据。向量数据库针对涉及相似性搜索和最近邻查询的任务进行了优化,这些任务在推荐系统、图像检索和自然语言处理等应用中很常见。我们将嵌入存储在向量数据库中。向量数据库的示例包括:ChromaDB、Pinecone、ADX、FAISS…

``` This HTML snippet contains the translated text in simplified Chinese, maintaining the structure requested.
Representation of Embeddings

Sure, here is the translated text in simplified Chinese while keeping the HTML structure intact: ```html

什么是LLM?LLM,即大型语言模型,是一种人工智能模型,它基于从大量文本数据中学习的模式和信息来处理和生成类似人类的文本。它旨在理解和生成自然语言,因此在回答问题、撰写文章、翻译语言等任务中非常有用。LLM的例子包括GPT-3、GPT-4和BERT。

``` This HTML snippet translates the provided English text into simplified Chinese, structured in a paragraph (`

`) tag.

To translate "Workflow:" into simplified Chinese while keeping the HTML structure, you can use the following: ```html 工作流程: ``` This HTML code will display "工作流程:" in simplified Chinese, where "工作流程" means "workflow" and ":" is the colon punctuation mark.

Sure, here's the translation of the text you provided into simplified Chinese, while keeping the HTML structure intact: ```html

  1. 将PDF转换为文本。
  2. 创建嵌入 — 递归地将文件分割为块,并为每个块创建嵌入。 — 使用嵌入模型创建嵌入。在我们的案例中,我们使用由ollama库提供的model=”nomic-embed-text”。 — 将嵌入存储在向量数据库中(在我们的示例中,我们使用了chromaDB)。
  3. 获取用户的问题并为问题文本创建嵌入。
  4. 查询您的向量数据库,以查找数据库中的相似嵌入,指定您需要的结果数量。ChromaDB执行相似性搜索以获取最佳结果。
  5. 将用户问题和相似结果作为上下文传递给LLM模型以获得格式化输出。在我们的示例中,我们使用了model=”llama3”。
``` This HTML structure maintains the numbered list format and translates the content into simplified Chinese as requested.

To translate "Prerequisite:" into simplified Chinese while keeping the HTML structure intact, you can use the following: ```html 先决条件: ``` This HTML code specifies that the text "先决条件:" is in simplified Chinese (`lang="zh"`) while maintaining the surrounding HTML structure.

  • 在 HTML 结构中保持不变,将英文文本 "Install Python." 翻译为简体中文如下: 安装 Python.
  • Sure, here is the HTML structure with the text translated into simplified Chinese: ```html

    在本地运行模型,请从“https://ollama.com/”下载 Ollama。Ollama 是一个开源项目,旨在为您的本地机器提供一个强大且用户友好的平台,用于运行大型语言模型。

    ``` In simplified Chinese, the translated text is: "在本地运行模型,请从“https://ollama.com/”下载 Ollama。Ollama 是一个开源项目,旨在为您的本地机器提供一个强大且用户友好的平台,用于运行大型语言模型。"
  • Sure, here's the simplified Chinese translation while keeping the HTML structure intact: ```html 如果你希望,你可以使用由OpenAI和Huggingface提供的其他模型。 ``` In this translation: - "如果你希望" means "if you wish". - "你可以使用" means "you can use". - "由" means "provided by". - "其他模型" means "other models".
  • To translate "For quick start just run: ollama run llama3" into simplified Chinese while keeping the HTML structure, you can use the following: ```html

    快速开始只需运行:ollama run llama3

    ``` This HTML snippet maintains the structure and wraps the translated text in a paragraph (`

    `) tag.

  • Sure, here's the translated text in simplified Chinese while keeping the HTML structure intact: ```html

    要运行嵌入模型,请运行:ollama pull nomic-embed-text

    ```
  • Sure, here's the translated text in simplified Chinese while keeping the HTML structure intact: ```html

    运行嵌入模型的命令是:ollama pull nomic-embed-text- 从ollam库中选择合适的模型。

    ```
  • Sure, here's the translation of "Install jupyter and create .ipynb" in simplified Chinese, keeping the HTML structure: ```html 安装 jupyter 并创建 .ipynb 文件 ``` This maintains the original text in English, with "jupyter" transliterated into Chinese characters (朱庇特), as it's commonly understood in the context of the Jupyter Notebook.

在保持HTML结构的情况下,将以下英文文本翻译为简体中文: 测试安装:

#ollama runs on 11434 port by default.
res = requests.post('http://localhost:11434/api/embeddings',
json={
'model': 'nomic-embed-text',
'prompt': 'Hello world'
})

print(res.json())
# In our example we will be using a framework langchain.
# langchain provides library to interact with ollama.
Output: you will get embeddings as a response

Sure, here is the translation of the text into simplified Chinese while keeping the HTML structure: ```html

什么是LangChain?LangChain是一个旨在简化利用大语言模型(LLM)进行各种自然语言处理(NLP)任务开发的框架。它提供工具和抽象层,帮助开发者高效、有效地构建、管理和部署使用LLM的应用程序。

``` In this translation: - `

` tags are used for paragraph structure in HTML. - Chinese characters are used for the translation of the text.

在保持HTML结构的情况下,将以下英文文本翻译为简体中文: 安装依赖项并导入所需的库:

!pip3 install numpy, requests
!pip3 install chromadb
!pip3 install jupyter --upgrade
!pip3 install langchain-community

from PyPDF2 import PdfReader # used to extract text from pdf
from langchain.text_splitter import RecursiveCharacterTextSplitter # split text in smaller snippets
from langchain_community.llms import Ollama # interact with ollama local server

import requests
import chromadb
import numpy
import uuid

To translate "Define constants:" to simplified Chinese while keeping HTML structure intact, you would use the following: ```html 定义常量: ```

DIMENSION = len(res.json()['embedding']) # 768 dimensions for nomic-embed-text embedding model
EXTRACTED_TEXT_FILE_PATH = "pdf_txt.txt" # text extracted from pdf
PDF_FILE_PATH = "my_pdf.pdf"
CHUNK_SIZE = 150 # chunk size to create snippets
CHUNK_OVERLAP = 20 # check size to create overlap between snippets

Sure, here's the translation of "Initialize chromaDB:" in simplified Chinese while maintaining HTML structure: ```html 初始化 chromaDB: ``` This maintains the same message but presents it in simplified Chinese within an HTML context.

chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(name="test_chat_nomic")

Sure, here's the text translated to simplified Chinese while keeping the HTML structure: ```html 初始化Ollama嵌入并测试: ``` This maintains the HTML tags around the text content, providing the translation within the structure requested.

embeddings = (
OllamaEmbeddings(model="nomic-embed-text")
)

embedding = embeddings.embed_query("Hello World")
dimension = len(embedding)
print(len(embedding)) #768

在保留HTML结构的情况下,将英文文本“Extract text from pdf:”翻译为简体中文为:“从PDF中提取文本:”。

def extract_text_from_pdf(file_path: str):
# Open the PDF file using the specified file_path
reader = PdfReader(file_path)
# Get the total number of pages in the PDF
number_of_pages = len(reader.pages)

# Initialize an empty string to store extracted text
pdf_text = ""

# Loop through each page of the PDF
for i in range(number_of_pages):
# Get the i-th page
page = reader.pages[i]
# Extract text from the page and append it to pdf_text
pdf_text += page.extract_text()
# Add a newline after each page's text for readability
pdf_text += "\n"

# Specify the file path for the new text file
file_path = EXTRACTED_TEXT_FILE_PATH

# Write the content to the text file
with open(file_path, "w", encoding="utf-8") as file:
file.write(pdf_text)

Sure, here's the translation of "Create Embeddings and Save it to chromaDB collection" in simplified Chinese while keeping the HTML structure: ```html 创建嵌入并保存到 chromaDB 集合: ```

def create_embeddings(file_path: str):
# Initialize a list to store text snippets
snippets = []
# Initialize a CharacterTextSplitter with specified settings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)

# Read the content of the file specified by file_path
with open(file_path, "r", encoding="utf-8") as file:
file_text = file.read()

# Split the text into snippets using the specified settings
snippets = text_splitter.split_text(file_text)
print(len(snippets))

x = numpy.zeros((len(snippets), dimension), dtype= 'float32')
ids = []

for i, snippet in enumerate(snippets):
print(snippet)
embedding = embeddings.embed_query(snippet)
ids.append(get_uuid())
x[i] = numpy.array(embedding)

collection.add(embeddings = x,
documents=snippets,
ids=ids)

def get_uuid():
return str(uuid.uuid4())

在保持HTML结构不变的情况下,将英文文本"Execute:"翻译成简体中文为:"执行:"

extract_text_from_pdf(PDF_FILE_PATH)
create_embeddings(EXTRACTED_TEXT_FILE_PATH)

Sure, here's the text translated into simplified Chinese while maintaining HTML structure: ```html

要查看集合中的内容,请运行:

``` This HTML snippet renders the translated text "要查看集合中的内容,请运行:" in simplified Chinese.
data=collection2.get(include=['embeddings', 'documents', 'metadatas'])
print(data['embeddings'])
print(data['ids'])
print(data['metadatas'])
print(data['documents'])

在HTML结构中保持不变,将以下英文文本翻译成简体中文: 从ChromaDB中检索用户查询的n个匹配答案,并将这些答案作为上下文传递给LLM模型(例如LLaMA3),以便生成响应。

def answer_users_question(user_question):
embedding_arr = embeddings.embed_query(user_question)
result = collection.query(
query_embeddings= embedding_arr,
n_results=5
)

return frame_response(result['documents'][0], user_question)

def frame_response(results, ques):
joined_string = "\n".join(results)
prompt = joined_string + "\n Given this information, " + ques
llm = Ollama(
model="llama3"
)
return llm.invoke(prompt)

在HTML结构中保持不变,将以下英文文本翻译为简体中文: 开始一个无限循环,允许用户提问。

while True:

# Prompt the user to input a question
print("👤USER:")

# Read the user's question from the console
user_question = input("Enter question: ")
print(user_question)

# Print a separator for readability
print("----------------------")

# Check if the user entered an empty question
if user_question =="exit":

# If the user entered an empty question, exit the loop
break
else:

# If the user entered a question, proceed to generate a response
print("🤖 BOT:")

# Call the function to generate an answer based on the user's question
# and print the bot's response
print(answer_users_question(user_question=user_question))

# Print a separator for readability
print("----------------------")
Output: ChatBot

Sure, here is the translation of "Github: https://github.com/shashwat12june/pdftochatbot" in simplified Chinese while keeping the HTML structure: ```html

Github: https://github.com/shashwat12june/pdftochatbot

```

Certainly! Here's the translation of "THANK YOU" into simplified Chinese, keeping the HTML structure: ```html 谢谢您 ```

2024-06-30 04:41:09 AI中文站翻译自原文