In part 1 of this two-part series, we delved upon the key success factors for a modern media operations setup (or service provider), including leveraging the latest in observability and AI/ML capabilities, along with shift left by design. An ideal. futuristic media operation pipeline, therefore, should embrace the below design tenets:
- Not just data but also contextual insights - Filter out noise and contextually analyze data to understand connections and drive better insights,
- Integrated with AI - Predict, prevent, and assist with a unique combination and collaboration of AI tools and models,
- Automation - Automate discovery of data, APIs, workflows and key business process intelligently,
- Collaborative pipeline – Breakdown silos between data, tools, and teams with a common source of data and intelligent insights,
- Secure – The criticality of the data requires utmost level of security, with such critical data and AI models recommended for on-prem runs due to legal and regulatory restrictions, and
- Scalability – Modular components capable of easily integrating and individually scaling.
Let us try to envision this through a detailed use case and see how an enhanced media operation framework leveraging AI with Observability performs in such a scenario:
Background:
A new web series has been launched on a subscription-based OTT platform. Post launch, immediately consistent increase in the video playback failure alerts begin to show up on the media operation SRO dashboards. The SRO and support teams are getting overwhelmed with alerts from across probes not able to quickly decide on the triage path, worsening an already poor customer experience and impacting the OTT service brand.
How an AI enhanced media operations monitoring setup helps:
The system is driven by observability insights integrated with multiple AI agents collaborating to identify what is going wrong and provide insights on how to address it. It also has a command center dashboard where the SRE/SRO’s can view investigations and insights from the system.

In Action: Command center Agents A, B and C
Command Center Agent - This includes an AI agent monitoring critical business flows/scenario related events and initializing investigation through group of individual AI Agents. When this agent identifies the pattern mentioned in context mentioned above it initializes investigation via Agent A.
Agent A- App/Device Observability insights with AI analysis and next action:
App/Device Observability Platform provides insights on patterns/demographics of the anomalies in the failure analytics trends. A custom LLM agent reviews all the analytics data and insights provided by one or more app/device observability tools/platforms and determines correlation and probable next steps.
Agent analysis: More failures get reported, when
- The user accesses a specific content/title playback,
- The user watches the content more than XX duration before error occurs,
- The user tried to subscribe the pack before watching the failed content, or
- The user from particular region and using specific device type(s) had higher error rates.
Next Step: Playback failures of specific content seem to be happening, Agent A shares data with Command Centre Agent, Agent B and C to initiate investigation for failures in the User/Subscription/CDN and payment gateway integration.
Agent B - Alert monitoring tool data with AI analysis and next action:
In a typical media operation pipeline an alert monitoring toolset is utilized, which gets alerts from various probes monitoring the overall hardware infrastructure and the software ecosystem. A custom LLM agent reviews all the alert data and insights provided by the alert monitoring tool/platform and determines correlation and probable next steps.
Agent analysis: Based on insights handed over by Agent A (i.e., demographic, timeline, and data) Agent B analyses alerts and its data to determine probable relationships and patterns and gather further insights. This includes:
- Alerts received from user entitlements and subscription components during the period from same region,
- Alerts from video encoding/transcoding and CDN delivery components during that period, and
- Payment failure alerts observed for the users and region during that time.
Next Step: Playback failures of specific content seem to be happening for users who are trying to purchase subscription for getting video access. Agent B shares data and insights back to command center Agent and Agent C.
Agent C - Subscription component and payment gateway data with AI analysis and next action:
A custom LLM agent reviews all the logs and tracing data of the subscription, entitlements and payment integration data and determines correlation and probable next steps.
Agent analysis: Based on insights handed over by Agent B, Agent C analyses the logs and traces to find any error or exceptions in the flow to determine probable relationships, root cause, and patterns to gather further insights It observes that:.
- Payment partner gateway reported error for few users trying to use specific type of payment method to purchase subscription, and that
- Errors were observed in entitlements logs for the newly launched content due to incorrect content_Id being used on the TV platform.
Next Step: Playback failures appear to be because of coincidental payment failures and entitlement issues for the new content for the TV platform. Agent C shares data and insights back to command center Agent.
Command center agent provides insights and guidance to the SRO and other teams for further action to resolve the issue and minimize customer experience impact.
Looking Ahead:
AI agents and advanced observability platforms are changing the playing field across industries by enhancing efficiency, optimizing workflows, and driving better customer experiences. By embracing them for media operations use cases we can not only increase efficiency but also unlock new growth opportunities, as illustrated in the use case here.
Interested – let’s start a conversation.