混合搜索
混合搜索通过融合两个结果集来组合向量搜索和关键词(BM25F)搜索的结果。
查询代理将简单的英文问题自动转换为优化的 Weaviate 查询 - 无需手动构建查询。
基本混合搜索
组合向量搜索和关键词搜索的结果。搜索使用单个查询字符串。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(query="food", limit=3)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}
命名向量
v1.24在具有命名向量的集合上进行的混合搜索必须指定一个target向量。Weaviate 使用查询向量来搜索目标向量空间。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
reviews = client.collections.use("WineReviewNV")
response = reviews.query.hybrid(
query="A French Riesling",
target_vector="title_country",
limit=3
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
解释搜索结果
要查看对象排名,请在查询中设置 explain score 字段。搜索排名是对象元数据的一部分。Weaviate 使用分数来对搜索结果进行排序。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.5,
return_metadata=MetadataQuery(score=True, explain_score=True),
limit=3,
)
for o in response.objects:
print(o.properties)
print(o.metadata.score, o.metadata.explain_score)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"explainScore": "(bm25)\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.003968253968253968 to the score\n(hybrid) Document df958a90-c3ad-5fde-9122-cd777c22da6c contributed 0.012295081967213115 to the score",
"score": "0.016263336"
},
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document ec776112-e651-519d-afd1-b48e6237bbcb contributed 0.012096774193548387 to the score",
"score": "0.012096774"
},
"answer": "Famine",
"question": "From the Latin for \"hunger\", it's a period when food is extremely scarce"
},
{
"_additional": {
"explainScore": "(vector) [0.0223698 -0.02752683 -0.0061537363 0.0023812135 -0.00036100898 -0.0078375945 -0.018505432 -0.037500713 -0.0042215516 -0.012620432]... \n(hybrid) Document 98807640-cd16-507d-86a1-801902d784de contributed 0.011904761904761904 to the score",
"score": "0.011904762"
},
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute"
}
]
}
}
}
平衡关键词和向量搜索
混合搜索结果可以偏向关键词组件或向量组件。要更改关键词和向量组件的相对权重,请在查询中设置 alpha 值。
alpha为1是纯向量搜索。alpha为0是纯关键词搜索。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
alpha=0.25,
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}
更改融合方法
从 v1.24 开始,相对分数融合是默认的融合方法。
- 要使用关键词和向量搜索相对分数而不是搜索排名,请使用
相对分数融合。 - 要使用
autocut与hybrid操作符,请使用相对分数融合。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import HybridFusion
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
}
]
}
}
}
关键词搜索运算符
v1.31 中添加关键词(BM25)搜索运算符定义了对象中必须存在才能返回的查询词元的最小数量。选项是 and 或 or(默认)。
or
使用 or 运算符,搜索将返回包含搜索字符串中的至少 minimumOrTokensMatch 个词元的对象。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import BM25Operator
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="Australian mammal cute",
bm25_operator=BM25Operator.or_(minimum_match=2),
limit=3,
)
for o in response.objects:
print(o.properties)
and
使用 and 运算符,搜索将返回包含搜索字符串中所有词元的对象。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import BM25Operator
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="Australian mammal cute",
bm25_operator=BM25Operator.and_(), # Each result must include all tokens (e.g. "australian", "mammal", "cute")
limit=3,
)
for o in response.objects:
print(o.properties)
指定关键词搜索属性
v1.19.0 起添加混合搜索的关键词部分可以被定向到仅搜索对象属性的子集。这不会影响向量搜索部分。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question"],
alpha=0.25,
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "honey",
"question": "The primary source of this food is the Apis mellifera"
}
]
}
}
}
设置属性值的权重
指定对象 properties 在关键词搜索中的相对值。较高的值会增加属性对搜索分数的贡献。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
query_properties=["question^2", "answer"],
alpha=0.25,
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "a closer grocer",
"question": "A nearer food merchant"
},
{
"answer": "cake",
"question": "Devil's food & angel food are types of this dessert"
},
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other"
}
]
}
}
}
指定搜索向量
混合搜索的向量组件可以使用查询字符串或查询向量。要指定查询向量而不是查询字符串,请在查询中提供查询向量(用于向量搜索)和查询字符串(用于关键词搜索)。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
query_vector = [-0.02] * 1536 # Some vector that is compatible with object vectors
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
vector=query_vector,
alpha=0.25,
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Risotto",
"question": "From the Italian word for rice, it's a rice dish cooked with broth & often grated cheese"
},
{
"answer": "arrabiata",
"question": "Italian for \"angry\", it describes a pasta sauce spiced up with plenty of chiles"
},
{
"answer": "Fettucine Alfredo",
"question": "Ribbon-shaped noodles, sweet butter, cream, parmesan cheese & black pepper make up this pasta dish"
}
]
}
}
}
向量搜索参数
v1.25 起添加请注意,混合阈值(max_vector_distance)是在 v1.26.3 中后期引入的。
您可以指定向量相似性搜索参数,类似于near text或near vector搜索,例如 group by 和 move to / move away。可以使用 max vector distance 参数指定向量搜索的等效 distance阈值。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import HybridVector, Move, HybridFusion
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="California",
max_vector_distance=0.4, # Maximum threshold for the vector search component
vector=HybridVector.near_text(
query="large animal",
move_away=Move(force=0.5, concepts=["mammal", "terrestrial"]),
),
alpha=0.75,
limit=5,
)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Rhinoceros",
"points": 400,
"question": "The \"black\" species of this large horned mammal can grasp twigs with its upper lip"
},
{
"answer": "the hippopotamus",
"points": 400,
"question": "Close relative of the pig, though its name means \"river horse\""
},
{
"answer": "buffalo",
"points": 400,
"question": "Animal that was the main staple of the Plains Indians economy"
},
{
"answer": "California",
"points": 200,
"question": "Its state animal is the grizzly bear, & the state tree is a type of redwood"
},
{
"answer": "California",
"points": 200,
"question": "This western state sent its first refrigerated trainload of oranges back east February 14, 1886"
}
]
}
}
}
混合搜索阈值
v1.25 起添加可用的唯一搜索阈值是 max vector distance,它将设置向量搜索组件的最大允许距离。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import HybridVector, Move, HybridFusion
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="California",
max_vector_distance=0.4, # Maximum threshold for the vector search component
alpha=0.75,
limit=5,
)
分组结果
v1.25 起添加定义分组搜索结果的标准。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
# Grouping parameters
group_by = GroupBy(
prop="round", # group by this property
objects_per_group=3, # maximum objects per group
number_of_groups=2, # maximum number of groups
)
# Query
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
alpha=0.75,
query="California",
group_by=group_by
)
for grp_name, grp_content in response.groups.items():
print(grp_name, grp_content.objects)
示例响应
响应如下
'Jeopardy!'
'Double Jeopardy!'
limit & offset
使用 limit 设置要返回的对象的固定最大数量。
可选地,使用 offset 对结果进行分页。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
limit=3,
offset=1
)
for o in response.objects:
print(o.properties)
限制结果组
要将结果限制为与查询距离相似的组,请使用autocut过滤器。在使用 autocut 进行混合搜索时,指定 相对分数融合排名方法。
Autocut 需要 相对分数融合方法,因为它使用实际的相似度分数来检测截断点。不应将 Autocut 与 Ranked Fusion 一起使用,因为此融合方法依赖于排名位置,而不是相似度分数。
要了解有关不同融合算法的更多信息,请访问搜索运算符参考页面。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import HybridFusion
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
fusion_type=HybridFusion.RELATIVE_SCORE,
auto_limit=1
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "Guards",
"question": "Life, Security, Shin",
"_additional": {
"score": "0.75"
},
},
# ... trimmed for brevity
]
}
}
}
过滤结果
要缩小搜索结果,请使用filter。
如果某个片段无法工作或您有任何反馈,请打开一个 GitHub issue。
from weaviate.classes.query import Filter
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="food",
filters=Filter.by_property("round").equal("Double Jeopardy!"),
limit=3,
)
for o in response.objects:
print(o.properties)
示例响应
输出如下所示
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "food stores (supermarkets)",
"question": "This type of retail store sells more shampoo & makeup than any other",
"round": "Double Jeopardy!"
},
{
"answer": "Tofu",
"question": "A popular health food, this soybean curd is used to make a variety of dishes & an ice cream substitute",
"round": "Double Jeopardy!"
},
{
"answer": "gastronomy",
"question": "This word for the art & science of good eating goes back to Greek for \"belly\"",
"round": "Double Jeopardy!"
}
]
}
}
}
词元化
Weaviate 将过滤器术语转换为词元。默认词元化是 word。word 词元化器保留字母数字字符,将它们转换为小写,并在空格处拆分。它将字符串 "Test_domain_weaviate" 转换为 "test"、"domain" 和 "weaviate"。
有关详细信息和附加词元化方法,请参阅词元化。
相关页面
- 连接到 Weaviate
- API 参考:搜索运算符 # 混合
- 关于混合融合算法。
- 有关教程,请参阅查询
- 有关使用 GraphQL API 进行搜索,请参阅GraphQL API。
问题和反馈
如果您有任何问题或反馈,请在 用户论坛 中告诉我们。
