Lab 3 - Baseline Dashboard
Task Goal
In this laboratory session, you'll explore the AI-Driven Baseline Dashboard feature, which is used to get unparalleled visibility on the network performance, identifying buildings that are either problematic (based on the AI-Driven anomaly detection issues) or outliers, with respect to one or more KPIs, as well as enabling different layers of comparisons, to deeply understand how your network is performing.
This lab task will guide you through the AI-Driven Baseline Dashboard workflow:
- get an overview of the network performance across all buildings in the network, using the beeswarm visualization
- identify interesting buildings either based on the AI-Driven anomaly detection issues, or by quickly identifying outliers
- observe the network performance comparing the expected vs. actual behavior, across multiple KPIs and comparing across different WLANs in use on each building
- drill into each entity and KPI to understand the impact on clients and Access Points
Benefits
The Baseline Dashboard enables the exploration and analysis of the network performance by comparing the expected and the actual Wi-Fi KPI values, across different buildings and SSIDs, and over time.
This dashboard is built using the AI-driven baselines (the same that are computed and used to generate the AI-driven anomaly detection) extending the visibility even when no anomalies are detected.
Similarly to the AI-Driven anomaly detection, no manual configuration is needed to use this feature; the requirements are simply to have your network devices managed by the Cisco Catalyst Center appliance, provisioned to export network telemetry data to the appliance, and to enable the Cisco AI Analytics service, as explained on the Lab 8 - Service Operations.
In this lab we provide you with a pre-configured system so that you can immediately explore the results, as it usually takes up to one week for baselines to become available following the service activation, due to the minimum data required for the algorithms to learn the network behavior.
Usecase workflow
The Baseline Dashboard is worlflow designed to guide the user in the network data exploration (currently covering onboarding related KPIs and respective baselines):
- Identifying good/bad/busy buildings at a glance
- Seeing the evolution of expected vs. actual KPI values over time for an SSID within a building
- Comparing baselines for different KPIs, related to the same SSID within the selected building
This view allows for instance to compare how the onboarding time compares to onboarding or authentication failures over the same time period, for the same SSID. - Comparing baselines for the same KPI, related to different SSIDs within the selected building
See what the normal ranges for different KPIs look like within a building, across different SSIDs; when an issue is raised, see what SSIDs were affected at the same time. - Performing a deeper analysis by using the detailed view.
Landing page
It's time to try this out, please follow the steps and see the baseline dashboard info by yourself on the lab setup:
- Open the Baseline Dashboard page:
Menu > Assurance > AI Network Analytics > Baselines
- Follow the steps below.
Time selection
The view shows by default the last 24 hours; you can expand the time range using the Custom Range
selector:
Building overview and selection
Get an overview of all the buildings, in order to select an interesting building to investigate.
The default visualization is a beeswarm plot, with the following features:
- each circle represents a building
- the circle size represents the average client count on that building, on the selected time period (you can easily spot the busiest buildings here)
- the circle is red if there has been at least one AI-Driven issue on that building, on the selected time period, otherwise it's blue
- the position of the circle on the plot x-axis represents the average for the selected KPI (by default it's
Onboarding time
). This allows to quickly spot buildings usually having a good or bad performance with respect to the selected KPI.
Now, what makes a building "interesting"?
There's no unique answer to this, but these are some examples:
-
A busy building on the left-hand side of the plot (typically good performance) is marked in red.
An example of this is theLONDON 1
building, that you can find on the lab setup beeswarm:- The building's position on the beeswarm indicates that the average onboarding time during the selected time is below 2 seconds
- the red colors indicates that at least one AI-Driven issue was reported there
- A building on the right-hand side of the plot (typically having bad performance) is marked in blue:
An entity matching those conditions is worth analyizing, as it means that probably the bad performance is so persistent that the AI/ML baselining learned this as a normal behavior at this building. It's key to know more, as a building of this type may require some optimization.
The beeswarm makes it easy to identify such buildings as they usually stand out as outliers on the right-hand side of the beeswarm chart, like theSAN JOSE
building you can find on the lab setup:
The beeswarm chart is very effective at providing an overview of all buildings in the network, even for deployments with a large amount of buildings.
If, instead, you know in advance what building you want to analyze (e.g., you received complains by the users), you can also use the map or table view to quickly select your building of interest.
Building baseline view
What building did you end up choosing first?
Building with issues
Let's start with the LONDON 1
building, having an issue.
Now that you clicked on the red circle on the beeswarm, you'll be presented with the baseline view for this building, showing all the onboarding KPIs for one SSID.
The pre-selection of KPIs, SSID and WLC can be adjusted using the drop down menus at the top:
Depending on the selections, you'll be able to explore the baselines in different ways, for instance:
- Single SSID:
You can observe how the predicted and the actual KPIs evolve over time for a given SSID.
See when AI-driven issues were reported and quickly access the issue details.
When issues are reported, quickly identify if anomalies were found on a single or multiple KPIs.
-
Multiple SSIDs:
Even within a single building, the "normal" behavior for different SSIDs can be very different, for instance due to different security policies or different number of clients and client types.
Adding SSIDs from the drop-down menu allows you to compare the predicted range and actual KPI values across SSIDs.
This type of view also allows to understand why using static thresholds would be difficult or impossible.You can add a second SSID from the Selector on the top of the page.
Click on theSSID
menu and then select thePseudoCo-Corp
SSID:
The UI will now update to show all the selected KPIs for both SSIDs, where you can see how the issue affected only the PseudoCo-Guest
SSID, while the PseudoCo-Corp
one behaved normally during the same time period.
Building with no issues
Let's now check the SAN JOSE
building as it didn't have any issues but its position on the beeswarm indicates that the onboarding time is usually very high over there.
Click on the Network Overview
link on the top-left of the page, to return to the beeswarm, then select the SAN JOSE
building:
Let's add the PseudoCo-Corp
SSID to the baseline view, like we did before:
The resulting view allows to compare the behavior of both SSIDs.
While there's no AI-Driven issue pointed out, please note the difference between the scale of the charts for each SSIDs:
-
the
PseudoCo-Guest
SSID onboarding time has a range usually between 1.5 and 2 seconds, which is pretty good; -
the
PseudoCo-Corp
SSID instead, has almost constantly values above 20 seconds; the baseline shows this as normal as it's common to see these values on this combination of building and SSID.
KPI detailed view
The baselines presented on the detailed view are computed for each entity, as seen on the AI-driven aggregations table for Onboarding and Roaming issues, which means that in most cases the computation makes use of data aggregated by SSID and building.
Now, we want to know more about what's happening at the SAN JOSE
building, on the PseudoCo-Corp
SSID.
Click on the View Details
link next to the SSID name, on the Onboarding Time
KPI section:
The Detailed
view allows to go from the aggreated view to a more detailed view, including information such as:
- Client count timeseries
- Sankey diagram showing the breakdown of onboardings allowing to compare the distribution of onboarding attempts by:
- time range (for duration-related KPIs)
- building floors
- client types
- AP table, with onboarding-related KPIs and direct links to the
AP360
page of each AP for further analysis
Note
The client type info is provided by the device classification feature on the Wireless LAN Controller (WLC): devices with no classification will be reported with a generic Device
label.
In the example of SAN JOSE
, we can see that the clients contributing the most to the high onboarding time are the Windows Workstations
and Linux Workstations
located at the second floor, while other mobile device types, exhibit a normal behavior.
Click on the >15s
element on the sankey diagram to highlight the distribution of locations and device types having slow onboardings:
Click on the Apple iPhones
(or other device types) element on the sankey diagram to highlight the distribution of locations and onboarding times for a specific device category:
Key takeaways
AI/ML-based baselining is an exceptional tool for understanding the typical behavior of a network and efficiently managing the large amount of data that networks generate. By learning from historical data, it's able to adapt over time and identify what constitutes normal behavior for a given network. This is extremely useful for reducing the noise of unnecessary alerts, allowing you to focus on more significant issues.
However, while AI/ML-based baselining is effective in identifying anomalies and sudden changes in network behavior (like you typically see on Cisco AI Analytics AI-Driven issues), it may not be as effective in identifying consistently poor performance. In the case of the building with a consistently high onboarding time, such as the SAN JOSE
building we saw earlier, the algorithm has adapted to consider this as normal, hence not triggering any alerts. This results in an absence of alerts even though the network performance is subpar.
This highlights why, in addition to AI-driven issues, a tool like the Baseline dashboard
is crucial in maintaining full visibility on network performance metrics, whether it's a sudden change or a constant behavior.
This concludes the exploration of the Baselines dashboard feature.
You can use the link below to proceed with the exploration of other use cases.