Search Documents
APIs to search documents in a Collection in OramaCore.
APIs
API Key type: read_api_key
. Safe to expose to the public.
To search for documents in a collection, you can use the following API:
curl -X POST \
http://localhost:8080/v1/collections/{COLLECTION_ID}/search?api-key=<read_api_key> \
-H 'Content-Type: application/json' \
-d '{ "term": "The quick brown fox" }'
The API will return a list of documents that match the search term. The documents will be sorted by relevance, with the most relevant documents appearing first.
Search Parameters
When performing search, you can use a number of parameters to customize the search results:
Parameter | Description | Default |
---|---|---|
term | The search term. | - |
mode | The search mode. Can be fulltext , vector , or hybrid . | fulltext |
limit | The maximum number of documents to return. | 10 |
offset | The number of documents to skip. | 0 |
properties | The properties to search in. Should be an array of strings (for example: ["title", "description", "author.name"] ) | All properties |
facets | A list of facets to return. Read more here | - |
where | A filter to apply to the search results. Read more here | - |
threshold | The percentage of matches required to return a document. Read more here | 0 |
exact | Whether to use exact matching. | false |
Where Filters
At index time, OramaCore will index different datatypes in different ways. For example, a string
will be indexed differently than a number
or a boolean
.
When performing a search, you can use the where
parameter to filter the search results based on the datatype of the property.
Filtering Strings
OramaCore does not support filtering strings with more than 25 ASCII characters.
To filter strings, you can use the following API:
{
"term": "John Doe",
"where": {
"job": "Software Engineer"
}
}
const results = await collection.search({
term: 'John Doe',
where: {
job: 'Software Engineer'
}
})
Filtering Numbers
To filter numbers, you can use the following operators:
Operator | Description | Example |
---|---|---|
eq | Equal to | {"where": {"age": {"eq": 25}}} |
lt | Less than | {"where": {"age": {"lt": 25}}} |
lte | Less than or equal to | {"where": {"age": {"lte": 25}}} |
gt | Greater than | {"where": {"age": {"gt": 25}}} |
gte | Greater than or equal to | {"where": {"age": {"gte": 25}}} |
between | Between two values | {"where": {"age": {"between": [20, 30]}}} |
So a full query complete with a where
filter might look like this:
{
"term": "John Doe",
"where": {
"age": {
"gte": 25
}
}
}
const results = await collection.search({
term: 'John Doe',
where: {
age: {
gte: 25
}
}
})
Filtering Booleans
To filter booleans, you can use the following operators:
Operator | Description | Example |
---|---|---|
true | True | {"where": {"is_active": true}} |
false | False | {"where": {"is_active": false}} |
So a full query complete with a where
filter might look like this:
{
"term": "John Doe",
"where": {
"is_active": true
}
}
const results = await collection.search({
term: 'John Doe',
where: {
is_active: true
}
})
Facets
OramaCore supports faceted search. You can use the facets
parameter to get a list of facets for a given property.
Numeric Facets
The facets
parameter can be used to get numeric facets. For example, to get a histogram of the price
property, you can use the following query:
{
"term": "Bluetooth Airbuds",
"facets": {
"price": {
"ranges": [
{ "from": 0, "to": 50 },
{ "from": 50, "to": 100 },
{ "from": 100, "to": 200 },
{ "from": 200, "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
}
}
const results = await collection.search({
term: 'Bluetooth Airbuds',
facets: {
price: {
ranges: [
{ from: 0, to: 50 },
{ from: 50, to: 100 },
{ from: 100, to: 200 },
{ from: 200, to: 500 },
{ from: 500, to: 1000 },
{ from: 1000 }
]
}
}
})
Boolean Facets
The facets
parameter can also be used to get boolean facets. For example, to get a list of available
values, you can use the following query:
{
"term": "Bluetooth Airbuds",
"facets": {
"available": {
"true": true,
"false": false
}
}
}
const results = await collection.search({
term: 'Bluetooth Airbuds',
facets: {
available: {
true: true,
false: false
}
}
})
Understanding the Orama Threshold Property
The threshold
property in Orama controls the minimum/maximum number of results to return when performing a search operation. It helps filter out potentially irrelevant results, especially with long search queries.
Example Data
Let's consider these four documents:
[
{ "title": "Blue t-shirt, slim fit" },
{ "title": "Blue t-shirt, regular fit" },
{ "title": "Red t-shirt, slim fit" },
{ "title": "Red t-shirt, oversize fit" }
]
Search Behavior Without Threshold
If we search for regular fit
:
{
"term": "regular fit"
}
OramaCore will return:
{
"count": 4, // 4 results!
"hits": [...],
"elapsed": {...}
}
Why four results? While only one document contains the exact phrase "regular fit", OramaCore returns all documents that match any of the search terms. In this case, all documents contain the word "fit", so they're all included in the results.
How Threshold Works
The threshold
property is a number between 0
and 1
representing the percentage of matching terms required for a document to be included in results:
- threshold: 0 (default) - Returns all documents matching ANY search term
- threshold: 1 - Returns only documents matching ALL search terms
- threshold: 0.5 - Returns documents with at least 50% of search terms
Examples
With threshold: 0 (default)
{
"term": "slim fit"
}
Returns all documents containing either "slim" OR "fit" (all 4 documents in our example).
With threshold: 1
{
"term": "slim fit",
"threshold": 1
}
Returns only documents containing BOTH "slim" AND "fit" (only the 2 documents with "slim fit").
With threshold: 0.5
{
"term": "slim fit",
"threshold": 0.5
}
Prioritizes documents containing both "slim" and "fit", then returns 50% of documents containing either term.
Real-World Application
For large document collections (e.g., 1 million documents), using an appropriate threshold becomes crucial. Long search queries like "red t-shirt with long sleeves and a motorbike printed on the front" could match too many irrelevant documents without a proper threshold setting.
Last updated on