初探Spring AI与PostgreSQL PGVector的奇妙结合

发表时间: 2024-02-29 17:51

学习使用 Spring AI 和 PostgreSQL pgvector 从头开始构建 Java 生成式 AI 应用程序。

Spring AI 是 Spring 生态系统的一个新项目，它简化了 Java 中 AI 应用程序的创建。通过将 Spring AI 与 PostgreSQL pgvector 结合使用，您可以构建生成式 AI 应用程序，从您的数据中获取见解。

首先，本文向您介绍使用 OpenAI GPT-4 模型根据用户提示生成推荐的 Spring AI ChatClient。接下来，本文将介绍如何使用 PGVector 扩展部署 PostgreSQL，并使用 Spring AI EmbeddingClient 和 Spring JdbcClient 执行向量相似性搜索。

添加 Spring AI 依赖项

Spring AI 支持许多大型语言模型（LLM）提供程序，每个 LLM 都有自己的 Spring AI 依赖项。

假设您更喜欢使用 OpenAI 模型和 API。然后，您需要将以下依赖项添加到项目中：

XML 格式

<dependency>br

    <groupId>org.springframework.ai</groupId>br

    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>br

    <version>{latest.version}</version>br

</dependency>

此外，在撰写本文时，Spring AI 正在积极开发中，框架工件在 Spring Milestone 和/或 Snapshot 存储库中发布。因此，如果您仍然无法在 https://start.spring.io/ 上找到 Spring AI，请将存储库添加到文件中：pom.xml

XML 格式

<repositories>br

    <repository>br

      <id>spring-milestones</id>br

      <name>Spring Milestones</name>br

      <url>https://repo.spring.io/milestone</url>br

      <snapshots>br

        <enabled>false</enabled>br

      </snapshots>br

    </repository>br

    <repository>br

      <id>spring-snapshots</id>br

      <name>Spring Snapshots</name>br

      <url>https://repo.spring.io/snapshot</url>br

      <releases>br

        <enabled>false</enabled>br

      </releases>br

    </repository>br

</repositories>

设置 OpenAI 模块

OpenAI 模块带有多个配置属性，允许管理与连接相关的设置并微调 OpenAI 模型的行为。

至少，您需要提供您的 OpenAI API 密钥，Spring AI 将使用它来访问 GPT 和嵌入模型。创建密钥后，将其添加到文件中：application.properties

属性文件

spring.ai.openai.api-key=sk-...

然后，如有必要，您可以选择特定的 GPT 和嵌入模型：

属性文件

spring.ai.openai.chat.model=gpt-4br

spring.ai.openai.embedding.model=text-embedding-ada-002

最后，您可以通过使用 Spring AI 的：ChatClient

爪哇岛

// Inject the ChatClient beanbr

@Autowiredbr

private ChatClient aiClient;br

br

// Create a system message for ChatGPT explaining the taskbr

private static final SystemMessage SYSTEM_MESSAGE = new SystemMessage(br

    """br

    You're an assistant who helps to find lodging in San Francisco.br

    Suggest three options. Send back a JSON object in the format below.br

    [{\"name\": \"<hotel name>\", \"description\": \"<hotel description>\", \"price\": <hotel price>}]br

    Don't add any other text to the response. Don't add the new line or any other symbols to the response. Send back the raw JSON.br

    """);br

br

public void searchPlaces(String prompt) {br

    // Create a Spring AI prompt with the system message and the user messagebr

    Prompt chatPrompt = new Prompt(List.of(SYSTEM_MESSAGE, new UserMessage(prompt)));br

br

    // Send the prompt to ChatGPT and get the responsebr

    ChatResponse response = aiClient.generate(chatPrompt);br

br

    // Get the raw JSON from the response and print itbr

    String rawJson = response.getGenerations().get(0).getContent();br

br

    System.out.println(rawJson);br

为了便于实验，如果您通过了“我想住在金门大桥附近”的提示，则该方法可能会提供如下住宿建议：searchPlaces

JSON格式

[br

 {"name": "Cavallo Point", "description": "Historic hotel offering refined rooms, some with views of the Golden Gate Bridge, plus a spa & dining.", "price": 450}, br

 {"name": "Argonaut Hotel", "description": "Upscale, nautical-themed hotel offering Golden Gate Bridge views, plus a seafood restaurant.", "price": 300}, br

 {"name": "Hotel Del Sol", "description": "Colorful, retro hotel with a pool, offering complimentary breakfast & an afternoon cookies reception.", "price": 200}br

使用 PGVector 启动 Postgres

如果使用运行前面的代码片段，您会注意到 OpenAI GPT 模型通常需要 10 秒以上才能生成响应。该模型具有广泛而深入的知识库，需要时间才能产生相关的响应。ChatClient

除了高延迟之外，GPT 模型可能没有针对与应用程序工作负载相关的数据进行训练。因此，它可能会生成远非用户满意的响应。

但是，如果您在数据的子集上生成嵌入，然后让 Postgres 使用这些嵌入，则始终可以加快搜索速度并为用户提供准确的响应。

pgvector 扩展允许在 Postgres 中存储和查询向量嵌入。开始使用 PGVector 的最简单方法是在 Docker 中使用扩展启动 Postgres 实例：

壳

mkdir ~/postgres-volume/br

br

docker run --name postgres \br

    -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \br

    -p 5432:5432 \br

    -v ~/postgres-volume/:/var/lib/postgresql/data -d ankane/pgvector:latest

启动后，可以通过执行 vector 语句连接到容器并启用扩展：CREATE EXTENSION

壳

docker exec -it postgres psql -U postgres -c 'CREATE EXTENSION vector'

最后，将 Postgres JDBC 驱动程序依赖项添加到文件中：pom.xml

XML 格式

<dependency>br

  <groupId>org.postgresql</groupId>br

  <artifactId>postgresql</artifactId>br

  <version>{latest.version}</version>br

</dependency>

通过向文件添加以下设置来配置 Spring DataSource：application.properties

属性文件

spring.datasource.url = jdbc:postgresql://127.0.0.1:5432/postgresbr

spring.datasource.username = postgresbr

spring.datasource.password = password

使用 Spring AI 执行向量相似性搜索

至少，向量相似性搜索是一个两步过程。

首先，您需要使用嵌入模型为提供的用户提示或其他文本生成向量/嵌入。Spring AI 支持连接到 OpenAI 或其他提供商的嵌入模型，并为文本输入生成矢量化表示：EmbeddingClient

爪哇岛

// Inject the Spring AI Embedding clientbr

@Autowiredbr

private EmbeddingClient aiClient;br

br

public List<Place> searchPlaces(String prompt) {br

    // Use the Embedding client to generate a vector for the user promptbr

    List<Double> promptEmbedding = aiClient.embed(prompt);br

    ...br

其次，使用生成的嵌入对存储在 Postgres 数据库中的向量执行相似性搜索。例如，您可以使用 Spring 执行此任务：JdbcClient

爪哇岛

@Autowiredbr

private JdbcClient jdbcClient;br

br

// Inject the Spring AI Embedding clientbr

@Autowiredbr

private EmbeddingClient aiClient;br

br

br

public List<Place> searchPlaces(String prompt) {br

    // Use the Embedding client to generate a vector for the user promptbr

    List<Double> promptEmbedding = aiClient.embed(prompt);br

br

    // Perform the vector similarity searchbr

    StatementSpec query = jdbcClient.sql(br

        "SELECT name, description, price " +br

        "FROM airbnb_listing WHERE 1 - (description_embedding <=> :user_promt::vector) > 0.7 " +br

        "ORDER BY description_embedding <=> :user_promt::vector LIMIT 3")br

        .param("user_promt", promptEmbedding.toString());br

br

    // Return the recommended placesbr

    return query.query(Place.class).list();br

该列存储从该列为 Airbnb 房源概览预先生成的嵌入内容。Airbnb嵌入是由Spring AI的EmbeddingClient用于用户提示的同一模型生成的。description_embeddingdescription
Postgres 使用 PGVector 计算 Airbnb 和用户提示嵌入（）之间的余弦距离（），然后仅返回描述与提供的用户提示相似的 Airbnb 列表。相似性以 0 到 1 范围内的值来衡量。相似度越接近 1，向量的相关性越高。<=>description_embedding <=> :user_prompt::vector> 0.7

后续步骤

Spring AI 和 PostgreSQL PGVector 提供了在 Java 中构建生成式 AI 应用程序所需的所有基本功能。如果您想了解更多信息，请观看此动手教程。它指导您完成从头开始在 Java 中创建住宿推荐服务、使用专用索引优化相似性搜索以及使用分布式 Postgres （YugabyteDB）进行扩展的

原文标题：Getting Started With Spring AI and PostgreSQL PGVector

原文链接：
https://dzone.com/articles/spring-ai-with-postgresql-pgvector

作者：Denis Magda

编译：LCR

初探Spring AI与PostgreSQL PGVector的奇妙结合

学习使用 Spring AI 和 PostgreSQL pgvector 从头开始构建 Java 生成式 AI 应用程序。

添加 Spring AI 依赖项

设置 OpenAI 模块

使用 PGVector 启动 Postgres

使用 Spring AI 执行向量相似性搜索

后续步骤

热门阅读

推荐阅读