we did run in a rather nasty error today, where a deployment got stuck in the predeployment phase because it did run into http 500 errors from our utility server (at least I assume it's from the utility as the action did run on it and the failed paused message did say it was a 500 error).
Let this alone happening be a kinda bad situation, it is getting worse, as I wasn't able to do anything within ROC. First of all I'm getting the error message mentioned in the title (sorry, it is from my memory, so the correct text might differ a bit) and second I can't see the deployment neither in the deployment view nor under the corresponding deployment plan. The only way I was able to see this running deployment was within the automation studio, where I also have seen then the error 500 messages for several paused actions. After stopping the deployment with automation studio I also needed to delete the deployment via jmx, otherwise I was still getting the error in ROC when checking the failed deployments in the overall system status widget.
So my questions are:
1) did any of you experience something like this as well? (we're using version 18.104.22.168)
2) why was the automation studio able to handle this hanging deployment but ROC wasn't?
That's difficult to address without finer details, especially since HTTP 500 is a sort of generic catch-all error. Is it at all possible to capture what the exact error was? If we can start with the exact verbage, I can look up the error verbatim and see if there's any specific history on this. Of note, there's also been a couple of cumulative patches to fix various performance issues since build 191.
sadly I don't have the exact wording, as I was to nervous and rushed to solve the problem instead of calmy documenting it. It's an error I do most of the times, need to work on that :-)
I can't say I haven't done that before.
If that was a one-time problem that can't be readily reproduced in order to capture the error, can we close this thread out as answered for now and revisit when it happens again? I'd be happy to explore this further -- I'm very interested in finding out more about this error, sounds like an odd issue.
hi james, that's why I marked it as "assumed anwsered" :-)
if it does happen again, I make sure to document it properly ;-)
so we had this happen again right now and this time I took care of the proper documentation :-D
The Error we're getting in ROC is "Failed to get deployment detail for overall system status widget.", it appears when you want to open the "detail view"/the table.
In Automation Studio I can then see that I have several actions with an Internal Server Error because of an http 500.
As the Redeployment succeeded again with out any problems, I really think that this is something network related and it is not an issue with CA RA
BUT, I still would be interetesd to know why Automation Studio can handle the display of this deployment but ROC can't :-)
interesting, after stopping the deployment I still get the error for the widget when I click on "failed" deployments, BUT I can see and delete the deployment when I check the deployments in the deployment plan. Before stopping it, it wasn't shown in the deployment list of the plan.
Interesting -- We've seen HTTP 500 errors in the past for a variety of reasons, but I've never seen this specific widget error before. I've been searching through various sources internally, and it looks like we haven't encountered this before at all. You may very well have discovered something new. Darn, I hate to admit to being stumped!
Do you think you might be able to open a support case and upload the full NAC logs folder? I think if we could look at what other details are being logged leading up to that error, that'll help pin down what's causing this.
I will gladly open a ticket for that, I'm just hoping the logs still help, as it has been 24h since the error occured, but I will just grab the log folder of the management, execution and utility server (the utiltiy was the agent where the actions did run as the error occured)
case ID: 00321244
I also included some screenshots next to the logs
I've just run into the same thing for the 2nd time. First time around I was able to delete errant "runs" in JMX to solve the problem. This time I have not found any way to resolve it. I've created a new support case: 00612138 and attached my NAC logs to it.
we can usually resolve this by stopping the deployment via ASAP or if the person who started it is still on the deployment, it can be stopped as well.
but please keep me posted if anything new is found with your case.
Unfortunately, the deployments that look like they might be connected to the problem are listed in ASAP as "Recent Activity" with a status of "Stopped". I can look at their details, but cannot do anything else with them.
I'm at a dead end for troubleshooting the problem.
oh wow, so even in the online report you don't get it listed then?
because this is how I've always done it, to check the online report in ASAP to get the current process run and then navigate in ASAP to the process tag and there stop the run.
but if your's already stopped...puh...yeah, then it is even worse then our problem.
good luck on resolving this
Solved it! One of my colleagues told me they were not having the problem I was seeing. So I logged out, emptied my browser cache, logged back in and presto! All is well once more.