使用OpenAI命令行中的结构化JSON输出

如果您喜欢使用人工智能来从命令行执行日常任务，比如从非结构化文本甚至图像或音频中提取数据，并希望将结果馈送到其他程序中，那么您必须使输出具备机器可读性。当然，向人工智能描述您想要的输出并在那里给出一些示例是一种经过验证的技术，但仍然可能发生人工智能偏离轨道，偏离预期输出，导致您的流程停在那里。为了防止这种情况，OpenAI提供了巧妙的特性结构化输出，您可以提供一个JSON模式，OpenAI确保模型的响应将符合该格式。（顺便说一句，解决这个问题实际上是一个非常有趣的问题：需要对每个输出标记进行验证，以确定由LLM建议的标记中哪些具有将仍符合模式的延续。这将是一个非常有趣的项目，但不幸的是，这对于业余项目来说远远太多。）我将这个功能集成到了我的chatgpt瑞士军刀类型命令行工具中，用于使用OpenAI的聊天完成API，因此可以从命令行中使用。但我并没有止步于此-由于编写JSON模式有点麻烦，我为常见用例添加了一些快捷方式。

当然，在我的ChatGPT工具套件中还有许多命令行工具，你可能会喜欢。

一个例子

作为一个例子，让我们以可机器读取的格式从我关于Composum AI的AdaptTo 2024演讲幻灯片中提取链接，同时假装幻灯片没有正确链接。我们将使用多模式输出，所以第一步是将幻灯片转换为我们可以提交给OpenAI的图像。我们将使用suggestbash创建一个命令行建议：suggestbash将talk.pdf拆分成单独的图像建议使用pdftoppm -png talk.pdf slides来生成文件slides-01.png到slides-31.png。现在，你可以使用chatgpt从中获取链接，使用图像输入。

cmd=chatgpt
for fil in slides-*; do cmd="$cmd -i $fil"; done
$cmd 'print urls of links in the image; if there are no links print nothing'

那样确实可以运作，但是可能会在链接或评论等周围得到一个代码块，你将需要劝阻AI在提示中这样做，或者采取更聪明的措施。让我们看看如何可以避免这种情况。

使用JSON模式

现在让我们走结构化输出的方式，提供一个JSON模式。因为手动提供会很麻烦：OpenAI的游乐场确实提供了一个助手，当你选择响应类型json_schema时，它可以为你生成一个模式。我将使用描述“输出一个URL列表”，它会为我创建一个模式：

{
  "name": "url_list",
  "schema": {
    "type": "object",
    "properties": {
      "urls": {
        "type": "array",
        "description": "A list of URLs.",
        "items": {
          "type": "string",
          "description": "A single URL."
        }
      }
    },
    "required": [
      "urls"
    ],
    "additionalProperties": false
  },
  "strict": true
}

所以我们把这写到 urlsschema.json 并调用：

chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rf urlsschema.json \
  'print urls of links in the image as JSON'Now let’s call chatgpt again, this time using $(printf -- '-i slides-%02d.png ' {1..31}) (Kudos to ChatGPT) for the -i slides-01.png -i slides-02.png ... arguments, and the schema file:

那漂亮地印刷我们

{"urls":["https://ai.composum.com","https://github.com/ist-dresden/composum-AI",
"https://www.composum.com/","https://www.stoerr.net/ai.html",
"https://github.com/ist-dresden/composum-nodes","https://www.stoerr.net/ai"]}

快捷方式参数

我添加了快捷方式，可以帮助你创建两种常见用例的模式文件：创建一个带有一些字符串属性的简单对象，以及一个带有一些字符串属性的简单对象列表。以下是来自chatgpt -h帮助文件的摘录：

Response Options:
  -rj             [R]esponse mode JSON: model outputs a JSON object
  -rf schemafile  Structured output: requests that the [r]esponse conforms 
                  to the given JSON schema read from a [f]ile.
  -ra attr1,...   Structured output [r]esponse - JSON with [a]ttributes: comma separated 
                  list of attributes to include in the JSON response.
                  Alternative to -rf - creates a simple schema with these attributes 
                  as string properties.
  -rar attr1,...  Structured output for JSON [r]esponse [ar]ray of objects with the 
                  given attributes - e.g. for extracting a list of entities from an input. 
                  Alternative to -rf and -ra , all are string properties.

如果我想要从第一张幻灯片中提取作者和谈话标题，我可以使用-ra选项：

chatgpt -i slides-01.png -ra author,title \
    "Print talk author and talk title as JSON"

打印

{   "author": "Dr. Hans-Peter Störr, IST GmbH Dresden",   
    "title": "Composum AI - Supporting the Content Author with LLM" }

在内部，它为您创建了该对象的JSON模式，因此您无需担心。或者让我们创建一个对象列表：

chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rar name,description \
    'print the slide name and a content description as JSON'

[{"name": "Slide 1", "description": "Introduction to the presentation and overview of the conference." },
 {"name": "Slide 2", "description": "Details on the talk's content, focusing on the functionalities of Composum AI."},
...]

you get confused by all the chatgpt options - how about using the built in help feature -ha:
chatgpt -ha how can I take the prompt from prompt.mp3

如果您对所有的ChatGPT选项感到困惑 - 不妨使用内置的帮助功能-ha:

chatgpt -ha how can I take the prompt from prompt.mp3

结论

我的ChatGPT工具套件中的chatgpt工具使使用OpenAI的结构化输出轻松生成机器可读输出，以执行您希望AI执行的任务，并打开了许多与其他工具结合的可能性，使用了最好的Unix精神，将小工具组合起来实现伟大的事情。来试试吧！我一直在使用它从截图中提取URL，对银行对帐单进行分类，从网页中提取信息，通过命令行向ChatGPT提出快速问题，以及其他许多事情。