Using Healthscores in ACI to troubleshoot issues

Unofficial ACI Guide
31 Jul 202007:03

Summary

TLDRIn this tutorial, Jody explores the use of health scores in Cisco's Application Centric Infrastructure (ACI) for troubleshooting network issues. Health scores are assigned to every object in ACI, including endpoints, endpoint groups, and application profiles. When a network component fails, the health score decreases, signaling a problem. Jody demonstrates how to identify the issue by shutting down an interface on a traffic generator, causing the health score to drop. The video guides viewers through the process of using the health score to pinpoint the affected interface and switch port, and explains the expected recovery time for health scores after resolving the issue.

Takeaways

  • πŸ’‘ Health scores in ACI are attached to every object, providing a unique way to monitor the health of the network fabric.
  • πŸ” Endpoints, endpoint groups, and application profiles in ACI all have health scores that reflect their operational status.
  • πŸ“‰ A decrease in health score indicates a potential issue with the corresponding network entity, such as a server or interface going down.
  • πŸ› οΈ ACI's health scores can guide troubleshooting by pinpointing which part of the application or network is affected when an issue occurs.
  • πŸ“Š The health score provides a quick overview of the network's health at the tenant level, with a score of less than 100 signaling a need for investigation.
  • πŸ”Ž Drilling down into the health score can lead to specific nodes and ports that are impacted, aiding in the identification of the root cause of a problem.
  • πŸ”Œ The video demonstrates a practical scenario where an interface is shut down, causing a health score decrease, and then restored to observe the recovery.
  • ⏱️ There is a 'soaking period' during which the health score may not immediately return to 100% after an interface is brought back up.
  • πŸ”— The video emphasizes the importance of monitoring health scores and acting upon changes as part of maintaining a healthy ACI fabric.
  • πŸ“š For further learning, the video suggests visiting unofficialaciguy.com or YouTube for more ACI how-to guides, design guides, and best practices.

Q & A

  • What is the primary focus of the video?

    -The video focuses on using health scores within the Application Centric Infrastructure (ACI) to troubleshoot issues in a network fabric.

  • What is unique about ACI's approach to troubleshooting?

    -ACI has a unique approach where it attaches a health entity to every object within the system, including endpoints, endpoint groups, and application profiles, allowing for a health score to reflect the status of each component.

  • How does the health score system work when a server or interface fails?

    -If a server or interface fails, the health score, which is typically 100, will decrement to reflect the reduced health of the application or entity, indicating a problem.

  • What action does the video demonstrate to simulate a network issue?

    -The video demonstrates shutting down an interface on a traffic generator to simulate a network issue and observe how ACI's health scores react to this change.

  • How does ACI respond when an interface is down?

    -When an interface is down, ACI reads the configuration to understand its role in an application and then updates the health score to notify the administrator of the issue.

  • What steps are taken to identify the problematic interface in the video?

    -The video shows how to drill down from the application profile to the endpoint group and then to the specific node and port to identify the problematic interface.

  • What is the significance of the health score dropping to 80 in the video?

    -A health score dropping to 80 indicates that the application profile has been impacted, suggesting that one or more of the interfaces associated with the profile are down.

  • How does the video demonstrate the resolution of the network issue?

    -The video demonstrates resolving the network issue by plugging back in a cable that was unplugged, which is expected to restore the health score over time.

  • What is the 'soaking period' mentioned in the video?

    -The 'soaking period' refers to the time it takes for the health score to return to 100 after an interface is brought back up, which can take a few minutes.

  • What additional guidance does the video provide for monitoring health scores?

    -The video advises that if an interface is brought back up and the health score does not immediately improve, it is normal due to the soaking period, and one should wait for the health score to update.

  • Where can viewers find more resources on ACI?

    -Viewers can find more resources on ACI, including how-to's, design guides, and best practices, on the website unofficialaciguy.com or on YouTube.

Outlines

00:00

πŸ” Exploring Health Scores in ACI for Troubleshooting

This paragraph introduces the concept of health scores within the Cisco Application Centric Infrastructure (ACI) and how they can be utilized for troubleshooting network issues. Jody explains that every object in ACI, such as endpoints, endpoint groups, and application profiles, has an associated health score. The health score is designed to reflect the status of network components and can help identify when a device or interface fails. Jody demonstrates this by having Javad disable an interface on a traffic generator, which is part of an application within ACI. The health score decreases from 100 to 80, indicating the impact. The video then guides viewers through the process of identifying the problematic interface by drilling down into the health scores of different components until the specific node and port are identified.

05:01

πŸ”§ Restoring Health Scores and Interface Recovery in ACI

In this paragraph, the focus is on the process of recovering a network interface in ACI and the subsequent recovery of health scores. It begins with a discussion on the health score dropping below 100%, indicating a fault that needs investigation. The video then shows the process of bringing the interface back up, emphasizing that there is a 'soaking period' where the health score takes a few minutes to reflect the restored status. The video reassures viewers that it is normal for the health score not to immediately return to 100% after an interface is reactivated. The demonstration concludes with the health score returning to 100%, confirming the successful resolution of the issue. The video ends with a prompt for viewers to visit unofficialaciguy.com or YouTube for more resources on ACI.

Mindmap

Keywords

πŸ’‘Health Scores

Health Scores in the context of the video refer to a system within the Application Centric Infrastructure (ACI) that assigns a numerical value to represent the operational health of various components within the network. These scores are crucial for monitoring and troubleshooting as they provide a quick overview of the network's health. In the video, it's mentioned that every object in ACI, such as endpoints, endpoint groups, and application profiles, has a health score attached to it. When an interface is taken down, the health score decreases, indicating a problem, which is a central theme of the video.

πŸ’‘ACI (Application Centric Infrastructure)

ACI is a software-defined networking solution by Cisco that allows for centralized management of network policies. It is designed to simplify network operations and enable greater scalability and flexibility. The video focuses on using ACI's features to troubleshoot network issues, highlighting its ability to manage health scores and provide insights into network health.

πŸ’‘APIC (Application Policy Infrastructure Controller)

APIC is the central management component of the ACI system. It is responsible for enforcing policies, configuring network devices, and synthesizing logs. The video discusses how APIC can be used to troubleshoot issues by monitoring health scores and other networking metrics, showcasing its role in maintaining network health.

πŸ’‘Endpoints

In ACI, endpoints refer to the physical or virtual devices that are part of the network, such as bare-metal servers or virtual machines. The video script mentions endpoints as objects that have health scores, which can be monitored to assess the health of the devices and the overall application they support.

πŸ’‘Endpoint Groups

Endpoint Groups in ACI are collections of endpoints that are grouped together based on common policies or functions. The video script discusses how health scores are associated with endpoint groups, and a decrease in the health score can indicate issues within the group, helping in pinpointing the source of network problems.

πŸ’‘Application Profiles

Application Profiles in ACI define the network configuration for applications, including how traffic flows between different components. The video emphasizes the importance of health scores within application profiles to quickly identify and address issues that affect application performance.

πŸ’‘Troubleshoot

Troubleshoot, as used in the video, refers to the process of identifying and resolving issues within a network. The video demonstrates how health scores in ACI can be a valuable tool for troubleshooting by providing immediate feedback on the network's health status.

πŸ’‘Traffic Generator

A traffic generator is a device or software used to simulate network traffic for testing purposes. In the video, a traffic generator is used to create traffic between endpoints, and when an interface is shut down, it demonstrates how the health score changes, helping to identify the affected interface.

πŸ’‘Interface

An interface in networking refers to a point of connection between a device and a network. In the video, an interface is intentionally taken down to simulate a network issue, and the subsequent decrease in health score is used to illustrate how ACI can detect and respond to such problems.

πŸ’‘Static Path Binding

Static Path Binding in ACI is a configuration that defines a fixed path for network traffic to follow. The video script uses the example of static path binding to show how, once an issue is detected through a decreased health score, the exact switch and port causing the problem can be identified.

πŸ’‘Soaking Period

The soaking period mentioned in the video refers to the time it takes for health scores to return to normal after an issue has been resolved. This concept is important for understanding that there might be a delay between fixing a network problem and seeing the health score reflect that improvement.

Highlights

Introduction to using health scores in ACI for troubleshooting fabric issues.

Explaining the unique feature of health entities attached to every object in ACI.

Description of how health scores reflect the status of endpoints and application profiles.

Demonstration of how health scores change when a server or interface is lost.

Practical example of taking down an interface and observing health score changes.

Explanation of how ACI reads intent and notifies when a configured interface is down.

Guidance on using health scores to identify the problematic interface.

Drilling down into the health score to pinpoint the affected node and port.

Discussion on the importance of static path binding in identifying switch and port issues.

Procedure for checking the health score after resolving an issue.

Explanation of the expected time for health scores to improve after an interface is brought back up.

Observation of the health score returning to 100 after an interface is restored.

Emphasis on the importance of investigating when health scores are less than 100%.

Instruction on how to find faults when the health score indicates a problem.

Highlight of the warning system in ACI that alerts users to health score drops.

Summary of the process for using health scores to troubleshoot and resolve network issues in ACI.

Encouragement for viewers to explore more ACI how-to's, design guides, and best practices.

Transcripts

play00:01

welcome to unofficial aci guide

play00:03

this is jody today we're going to look

play00:04

at using health scores and aci to

play00:06

troubleshoot issues in our fabric

play00:08

let's take a look all right now we're

play00:11

going to look at the

play00:13

additional ways that we can troubleshoot

play00:15

using the apic

play00:17

now like like all other networking

play00:21

devices aci and the apic will

play00:25

allow you to configure and synthesis

play00:26

logs uh

play00:28

snmp traps things such as that but

play00:32

one of the unique things about aci is

play00:34

the way that this

play00:35

is built um we have health

play00:39

um a health entity attached to every

play00:41

object in aci so what i mean by every

play00:44

object

play00:45

the endpoint in aci which is your bare

play00:48

metal server or a vm

play00:49

that makes up your applications the

play00:52

endpoint

play00:53

groups the application profiles all of

play00:55

those have health scores tied to them

play00:58

the idea being is that if we have 10 or

play01:01

15 devices that make up

play01:03

an application in one of our networks

play01:06

and we lose a server we lose an

play01:08

interface from a server

play01:10

that 100 that you see there should be

play01:12

decremented it it should go down

play01:14

to indicate that the health of that

play01:16

application

play01:17

or that entity has gone down so we're

play01:20

going to

play01:20

have javad come through he's going to go

play01:22

in and take down

play01:24

an interface uh to the aspirin traffic

play01:27

generator that we've been using to do

play01:29

some of the testing

play01:30

and because that is configured we've

play01:32

configured that

play01:33

interface to be a part of one of our

play01:36

applications aci will

play01:37

read our intent and say hey you've

play01:39

configured this

play01:41

to be a part of your application it's

play01:43

now down i need to let you know about

play01:44

that

play01:45

so we're going to shut down an interface

play01:47

we're not going to tell you which one it

play01:48

is

play01:49

but we're going to use the health score

play01:51

inside of the tenant

play01:52

to help guide us to which interface it

play01:55

is so java

play01:56

take it away thank you uh here as you

play01:59

can see

play02:00

i'm looking at tenant part and

play02:04

the health score right now shows 100 so

play02:06

everything looks good

play02:08

i'm going to go to my traffic generator

play02:10

i've got

play02:12

palmer sending to dingo traffic back and

play02:15

forth

play02:16

and i'm going to go in palmer i'm going

play02:18

to shut

play02:19

this interface offline

play02:23

we lost the link if i go back here

play02:28

and let me refresh my screen

play02:40

as you can see now we have some uh

play02:43

some some losses and

play02:46

now we're gonna go ahead drill each so

play02:49

from the

play02:50

application profile point of view

play02:53

net 100 epg net 102 and net 104 they've

play02:58

been impacted

play02:59

and you can see the health code has gone

play03:01

down to 80. so let's just focus on one

play03:03

of these

play03:04

i'm going to look at net 100 epg drill

play03:08

it down

play03:10

and you see now it's 30 click again

play03:16

so now it tells us node 1004

play03:22

location and now that go

play03:26

over here it tells us the port each one

play03:28

five

play03:29

is the one that's connected to traffic

play03:31

jam has been impacted

play03:33

so if we go in sorry john if we go into

play03:37

that application epg now now that it's

play03:39

given us the 1004

play03:41

eth one five and we can look at the

play03:43

static path binding uh

play03:46

and see if if we have a static path

play03:48

binding for that right

play03:50

that's right here static nodes one

play03:52

thousand four

play03:53

eight one five okay so

play03:57

so that's a quick way to find out

play04:01

the switch where the switch and where

play04:03

the port

play04:05

impacted you know and uh

play04:11

so if we go back to the if we go back

play04:14

and

play04:14

we you know whatever we talk to the

play04:16

server team

play04:17

they figure out the cable was unplugged

play04:20

or whatever and they go through and they

play04:21

they unplug that they plug it back in

play04:24

i would expect these scores to improve

play04:27

after after that interface comes back up

play04:30

correct

play04:31

that's correct so also from the system

play04:33

point of view

play04:34

and if you look at tenant oh you get a

play04:37

warning

play04:39

yeah you see that the warning on

play04:45

so if i click on the

play04:49

less than 99

play04:54

which brings me back to and also you see

play04:57

from the summary

play04:58

there is something going on see that it

play05:00

says eddie

play05:02

so then i'll go to my health

play05:09

and as you see health is less than 100

play05:12

percent

play05:14

and let me see if there's any faults

play05:16

here right now no there's no fault but

play05:18

since we the interface has gone down but

play05:20

the health score

play05:22

is definitely we need to investigate

play05:24

that's how we you know

play05:28

right okay

play05:32

oh very good

play05:36

so what do we take bring the interface

play05:38

back up

play05:39

and then i mean it will take it will

play05:41

obviously take a little while for

play05:44

once the interface comes up the health

play05:46

isn't going to

play05:47

improve right away there is a um a

play05:50

soaking period that these

play05:52

uh faults and that these health scores

play05:53

go through

play05:55

uh before it will come back to 100 so

play05:57

it'll take two to three minutes

play05:59

for this uh for this to be reflected in

play06:02

inside of aci

play06:05

so if you're bringing in a face back up

play06:06

and it doesn't come back right away

play06:07

don't worry about that's expected

play06:11

i just brought up the interface so we're

play06:13

just going to have to wait and

play06:15

as i mentioned it takes a couple of

play06:17

minutes for it to

play06:19

to show their health go back to 100.

play06:31

party back up so that took maybe we

play06:33

won't have to pause the video

play06:34

let's turn it back to 100 or so that

play06:36

took less than

play06:37

30 seconds for it to come back up okay

play06:41

very good well uh that's it for health

play06:44

scores and inside of aci javid thanks

play06:47

for your time

play06:48

luis thanks sir thank you thank you

play06:51

thanks for watching the video today if

play06:53

you'd like more aci how-to's design

play06:56

guides and best practices check us out

play06:58

on the web at unofficialaciguy.com or on

play07:01

youtube

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Health ScoresACI TroubleshootingNetwork DevicesFabric IssuesApplication ProfilesEndpoint GroupsInterface ProblemsHealth MonitoringNetwork HealthAPIC Configuration