Lab 3 - Baseline Dashboard

Task Goal

In this laboratory session, you'll explore the AI-Driven Baseline Dashboard feature, which is used to get unparalleled visibility on the network performance, identifying buildings that are either problematic (based on the AI-Driven anomaly detection issues) or outliers, with respect to one or more KPIs, as well as enabling different layers of comparisons, to deeply understand how your network is performing.

This lab task will guide you through the AI-Driven Baseline Dashboard workflow:

get an overview of the network performance across all buildings in the network, using the beeswarm visualization
identify interesting buildings either based on the AI-Driven anomaly detection issues, or by quickly identifying outliers
observe the network performance comparing the expected vs. actual behavior, across multiple KPIs and comparing across different WLANs in use on each building
drill into each entity and KPI to understand the impact on clients and Access Points

Benefits

The Baseline Dashboard enables the exploration and analysis of the network performance by comparing the expected and the actual Wi-Fi KPI values, across different buildings and SSIDs, and over time.

This dashboard is built using the AI-driven baselines (the same that are computed and used to generate the AI-driven anomaly detection) extending the visibility even when no anomalies are detected.

Similarly to the AI-Driven anomaly detection, no manual configuration is needed to use this feature; the requirements are simply to have your network devices managed by the Cisco Catalyst Center appliance, provisioned to export network telemetry data to the appliance, and to enable the Cisco AI Analytics service, as explained on the Lab 8 - Service Operations.

In this lab we provide you with a pre-configured system so that you can immediately explore the results, as it usually takes up to one week for baselines to become available following the service activation, due to the minimum data required for the algorithms to learn the network behavior.

Usecase workflow

The Baseline Dashboard is worlflow designed to guide the user in the network data exploration (currently covering onboarding related KPIs and respective baselines):

Identifying good/bad/busy buildings at a glance
Seeing the evolution of expected vs. actual KPI values over time for an SSID within a building
Comparing baselines for different KPIs, related to the same SSID within the selected building
This view allows for instance to compare how the onboarding time compares to onboarding or authentication failures over the same time period, for the same SSID.
Comparing baselines for the same KPI, related to different SSIDs within the selected building
See what the normal ranges for different KPIs look like within a building, across different SSIDs; when an issue is raised, see what SSIDs were affected at the same time.
Performing a deeper analysis by using the detailed view.

Landing page

It's time to try this out, please follow the steps and see the baseline dashboard info by yourself on the lab setup:

Open the Baseline Dashboard page:
Menu > Assurance > AI Network Analytics > Baselines
Follow the steps below.

Baseline Dashboard - Menu item

Time selection

The view shows by default the last 24 hours; you can expand the time range using the Custom Range selector:

Building overview and selection

Get an overview of all the buildings, in order to select an interesting building to investigate.

The default visualization is a beeswarm plot, with the following features:

each circle represents a building
the circle size represents the average client count on that building, on the selected time period (you can easily spot the busiest buildings here)
the circle is red if there has been at least one AI-Driven issue on that building, on the selected time period, otherwise it's blue
the position of the circle on the plot x-axis represents the average for the selected KPI (by default it's Onboarding time). This allows to quickly spot buildings usually having a good or bad performance with respect to the selected KPI.

Baseline Dashboard Beeswarm view

Now, what makes a building "interesting"?

There's no unique answer to this, but these are some examples:

A busy building on the left-hand side of the plot (typically good performance) is marked in red.
An example of this is the LONDON 1 building, that you can find on the lab setup beeswarm:
- The building's position on the beeswarm indicates that the average onboarding time during the selected time is below 2 seconds
- the red colors indicates that at least one AI-Driven issue was reported there

Baseline Dashboard - Building with issues

A building on the right-hand side of the plot (typically having bad performance) is marked in blue:
An entity matching those conditions is worth analyizing, as it means that probably the bad performance is so persistent that the AI/ML baselining learned this as a normal behavior at this building. It's key to know more, as a building of this type may require some optimization.
The beeswarm makes it easy to identify such buildings as they usually stand out as outliers on the right-hand side of the beeswarm chart, like the SAN JOSE building you can find on the lab setup:

Baseline Dashboard - Building no issues outlier

The beeswarm chart is very effective at providing an overview of all buildings in the network, even for deployments with a large amount of buildings.

If, instead, you know in advance what building you want to analyze (e.g., you received complains by the users), you can also use the map or table view to quickly select your building of interest.

Building baseline view

What building did you end up choosing first?

Building with issues

Let's start with the LONDON 1 building, having an issue.

Now that you clicked on the red circle on the beeswarm, you'll be presented with the baseline view for this building, showing all the onboarding KPIs for one SSID.

The pre-selection of KPIs, SSID and WLC can be adjusted using the drop down menus at the top:

Baseline Building KPI SSID WLC selection

Depending on the selections, you'll be able to explore the baselines in different ways, for instance:

Single SSID:
You can observe how the predicted and the actual KPIs evolve over time for a given SSID.
See when AI-driven issues were reported and quickly access the issue details.
When issues are reported, quickly identify if anomalies were found on a single or multiple KPIs.

Baseline Dashboard - Default baseline view, 1 SSID with issue

Multiple SSIDs:
Even within a single building, the "normal" behavior for different SSIDs can be very different, for instance due to different security policies or different number of clients and client types.
Adding SSIDs from the drop-down menu allows you to compare the predicted range and actual KPI values across SSIDs.
This type of view also allows to understand why using static thresholds would be difficult or impossible.

You can add a second SSID from the Selector on the top of the page.
Click on the SSID menu and then select the PseudoCo-Corp SSID:

The UI will now update to show all the selected KPIs for both SSIDs, where you can see how the issue affected only the PseudoCo-Guest SSID, while the PseudoCo-Corp one behaved normally during the same time period.

Baseline Dashboard - Baseline view, 2 SSIDs

Building with no issues

Let's now check the SAN JOSE building as it didn't have any issues but its position on the beeswarm indicates that the onboarding time is usually very high over there.

Click on the Network Overview link on the top-left of the page, to return to the beeswarm, then select the SAN JOSE building:

Let's add the PseudoCo-Corp SSID to the baseline view, like we did before:

The resulting view allows to compare the behavior of both SSIDs.

Baseline Dashboard - Baseline view, 2 SSIDs no issues

While there's no AI-Driven issue pointed out, please note the difference between the scale of the charts for each SSIDs:

the PseudoCo-Guest SSID onboarding time has a range usually between 1.5 and 2 seconds, which is pretty good;
the PseudoCo-Corp SSID instead, has almost constantly values above 20 seconds; the baseline shows this as normal as it's common to see these values on this combination of building and SSID.

Baseline Dashboard - Compare KPI baseline range

KPI detailed view

The baselines presented on the detailed view are computed for each entity, as seen on the AI-driven aggregations table for Onboarding and Roaming issues, which means that in most cases the computation makes use of data aggregated by SSID and building.

Now, we want to know more about what's happening at the SAN JOSE building, on the PseudoCo-Corp SSID.

Click on the View Details link next to the SSID name, on the Onboarding Time KPI section:

The Detailed view allows to go from the aggreated view to a more detailed view, including information such as:

Client count timeseries
Sankey diagram showing the breakdown of onboardings allowing to compare the distribution of onboarding attempts by:
time range (for duration-related KPIs)
building floors
client types
AP table, with onboarding-related KPIs and direct links to the AP360 page of each AP for further analysis

Note

The client type info is provided by the device classification feature on the Wireless LAN Controller (WLC): devices with no classification will be reported with a generic Device label.

In the example of SAN JOSE, we can see that the clients contributing the most to the high onboarding time are the Windows Workstations and Linux Workstations located at the second floor, while other mobile device types, exhibit a normal behavior.

Click on the >15s element on the sankey diagram to highlight the distribution of locations and device types having slow onboardings:

Alt text

Click on the Apple iPhones (or other device types) element on the sankey diagram to highlight the distribution of locations and onboarding times for a specific device category:

Alt text

Key takeaways

AI/ML-based baselining is an exceptional tool for understanding the typical behavior of a network and efficiently managing the large amount of data that networks generate. By learning from historical data, it's able to adapt over time and identify what constitutes normal behavior for a given network. This is extremely useful for reducing the noise of unnecessary alerts, allowing you to focus on more significant issues.

However, while AI/ML-based baselining is effective in identifying anomalies and sudden changes in network behavior (like you typically see on Cisco AI Analytics AI-Driven issues), it may not be as effective in identifying consistently poor performance. In the case of the building with a consistently high onboarding time, such as the SAN JOSE building we saw earlier, the algorithm has adapted to consider this as normal, hence not triggering any alerts. This results in an absence of alerts even though the network performance is subpar.

This highlights why, in addition to AI-driven issues, a tool like the Baseline dashboard is crucial in maintaining full visibility on network performance metrics, whether it's a sudden change or a constant behavior.

This concludes the exploration of the Baselines dashboard feature.
You can use the link below to proceed with the exploration of other use cases.

Click here to go back to the use-cases list