1 min read
Azure Cognitive Search Indexers for Document Processing
Azure Cognitive Search isn’t just search - it’s a complete document processing pipeline with built-in AI enrichment.
The Indexer Pipeline
Data Source → Skillset (AI Enrichment) → Index → Search Queries
Creating an Indexer
{
"name": "document-indexer",
"dataSourceName": "blob-documents",
"targetIndexName": "documents-index",
"skillsetName": "document-skills",
"schedule": {
"interval": "PT2H"
},
"parameters": {
"configuration": {
"dataToExtract": "contentAndMetadata",
"parsingMode": "default"
}
},
"fieldMappings": [
{"sourceFieldName": "metadata_storage_path", "targetFieldName": "id"},
{"sourceFieldName": "metadata_storage_name", "targetFieldName": "filename"}
],
"outputFieldMappings": [
{"sourceFieldName": "/document/content", "targetFieldName": "content"},
{"sourceFieldName": "/document/keyphrases", "targetFieldName": "keyphrases"},
{"sourceFieldName": "/document/entities", "targetFieldName": "entities"}
]
}
AI Skillset
{
"name": "document-skills",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"inputs": [{"name": "text", "source": "/document/content"}],
"outputs": [{"name": "keyPhrases", "targetName": "keyphrases"}]
},
{
"@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"categories": ["Organization", "Person", "Location"],
"inputs": [{"name": "text", "source": "/document/content"}],
"outputs": [{"name": "entities", "targetName": "entities"}]
}
]
}
Now every document uploaded to blob storage is automatically indexed with AI-extracted metadata.