Pingdom duplicate checks creation #293

artemlive · 2020-11-02T21:27:51Z

Hello, the community!
I have found one major issue with Pingdom API integration. At some point, I discovered that my Pingdom monitoring slots have been significantly reduced. I checked my Pingdom account and found so many checks with the same name.
Here is the screenshot with a duplicated check example from Pingdom UI: https://prnt.sc/vbyhjm
During the investigation, I found such lines in the IMC log:

time="2020-11-01T23:40:07Z" level=info msg="Error received while listing checks: invalid character '<' looking for beginning of value"
time="2020-11-01T23:40:07Z" level=info msg="Creating Monitor: mycompany-software-master-xxx.com"

It means that the API call has been failed and IMC re-created existed monitor, because of some API error (it may be either an API service problem or transport problem, etc)
One more important thing that Pingdom allows us to create multiple checks with the same name.
Here is the method which returns all existing checks from Pingdom without error checking because of MonitorService interface implementation doesn't have an error in the returning parameters:

IngressMonitorController/pkg/monitors/pingdom/pingdom-monitor.go

Lines 65 to 83 in 221927f

 func (service *PingdomMonitorService) GetAll() []models.Monitor { 

 var monitors []models.Monitor 

 checks, err := service.client.Checks.List() 

 if err != nil { 

 log.Println("Error received while listing checks: ", err.Error()) 

 return nil 

 } 

 for _, mon := range checks { 

 newMon := models.Monitor{ 

 URL: mon.Hostname, 

 ID: fmt.Sprintf("%v", mon.ID), 

 Name: mon.Name, 

 } 

 monitors = append(monitors, newMon) 

 } 

 return monitors 

 }

Then during processing the response from the previous method, IMC generates the error if monitor wasn't found:

IngressMonitorController/pkg/monitors/pingdom/pingdom-monitor.go

Line 62 in 221927f

return match, fmt.Errorf("Unable to locate monitor with name %v", name)

But in the ReconcileEndpointMonitor we don't check the error itself, so we are able to create a duplicated monitor:

IngressMonitorController/pkg/controller/endpointmonitor/endpointmonitor_controller.go

Line 120 in 84c61fb

monitor := findMonitorByName(r.monitorServices[index], monitorName)

I decided to implement the method findMonitorByName without ignoring an error. It may help us to avoid duplicate checks creation. I've refactored error raising algorithm, so it will track only real errors with API calls. If monitor didn't exist, it won't be marked as an error.

As I mentioned before, there is one more big issue, which blocks me to correctly implement the GetAll with an error returning.
GetAll is the part of MonitorService that has the method prototype without the error in return parameters. Of course, I could change the interface, but then I won't be able to guarantee that all of my changes to other monitors would be fine, because I don't have enough expertise with other solutions than Pingdom and I found the same way avoiding of GetAll function usage in the uptimerobot monitor.
Here is my PR #294 (Which is still in work for that moment)

ahmedwaleedmalik · 2020-11-12T09:36:32Z

As far as i understand the issue:

You need to create a compactor for pingdom to avoid creation of multiple monitors because i think in each reconcile loop it's creating a new monitor in your case probably.

IngressMonitorController/pkg/monitors/pingdom/pingdom-monitor.go

Line 31 in 442fc1b

// TODO: Retrieve oldMonitor config and compare it here
findMonitorByName should not ignore the error and thank you for fixing that. But still in your case the method to retrieve a monitor should ensure that the correct monitor is being retrieved.

Let me know if we can further discuss this or if you have any more queries regarding this.

artemlive · 2020-11-12T11:36:48Z

@ahmedwaleedmalik hello, thanks for your reply.

You need to create a compactor for pingdom to avoid creation of multiple monitors because i think in each reconcile loop it's creating a new monitor in your case probably.

It won't create new monitors every reconciles iteration if there weren't errors during an API call. With the current algorithm, reconcile doesn't check any errors during API calls, it just marks monitor as non-exist and doesn't handle API error itself.
Our main issue is if there was a problem with the API request (e.g. some service issue and it responds with 500 HTTP code), which checks whether a monitor exists or it has to be created:

IngressMonitorController/pkg/controller/endpointmonitor/endpointmonitor_controller.go

Lines 120 to 124 in 84c61fb

 monitor := findMonitorByName(r.monitorServices[index], monitorName) 

 if monitor != nil { 

 // Monitor already exists, update if required 

 err = r.handleUpdate(request, instance, *monitor, r.monitorServices[index]) 

 } else {

So it would create a new monitor even if it does exist because Pingdom allows creating monitors with the same names.

findMonitorByName should not ignore the error and thank you for fixing that. But still in your case the method to retrieve a monitor should ensure that the correct monitor is being retrieved.

I think GetByName method is responsible for that If I got it right. We should check only if it's not null, if not we have to update exist monitor.
Сorrect me if I'm wrong.

tomjohnburton · 2021-08-23T15:48:09Z

I'm getting this error too. Any updates to be had?

github-actions · 2023-04-13T01:52:08Z

This issue is stale because it has been open for 60 days with no activity.

github-actions · 2023-05-14T01:56:11Z

This issue was closed because it has been inactive for 30 days since being marked as stale.

Dadavan · 2023-05-29T08:43:38Z

Hey guys, this is still happening, are you willing to accept a PR for this?

devopstagon · 2023-06-20T11:45:51Z

This is still broken :'(

karl-johan-grahn · 2023-09-26T18:43:37Z

Reopening issues that inadvertently were closed as stale

davidgibbons · 2023-10-18T17:36:37Z

This is basically unusable as is for pingdom or statuscake. It just makes a giant mess.

ThommyH · 2024-01-11T13:17:18Z

is this ever getting fixed?

ShawnUCD · 2024-01-31T21:52:46Z

I can confirm this is still happening with chart version 2.1.49. I can scale up the ingressmonitorcontroller deployment and watch the checks in Statuscake tick upward until we hit our limit (300).

abhishek-annigeri-brainbits · 2024-06-04T08:47:23Z

@karl-johan-grahn @rasheedamir - Can someone please look into this? This is pending from long time.

artemlive mentioned this issue Nov 2, 2020

Feature/advanced error handling #294

Open

rasheedamir assigned artemlive Mar 14, 2021

rasheedamir added kind/bug Something isn't working workflow/in progress labels Mar 14, 2021

Dadavan mentioned this issue Oct 4, 2021

Failure communicating with Statuscake API resulting in duplicate tests #362

Closed

github-actions bot added the stale label Apr 13, 2023

github-actions bot closed this as completed May 14, 2023

karl-johan-grahn reopened this Sep 26, 2023

github-actions bot removed the stale label Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pingdom duplicate checks creation #293

Pingdom duplicate checks creation #293

artemlive commented Nov 2, 2020 •

edited

ahmedwaleedmalik commented Nov 12, 2020

artemlive commented Nov 12, 2020 •

edited

tomjohnburton commented Aug 23, 2021

github-actions bot commented Apr 13, 2023

github-actions bot commented May 14, 2023

Dadavan commented May 29, 2023

devopstagon commented Jun 20, 2023

karl-johan-grahn commented Sep 26, 2023

davidgibbons commented Oct 18, 2023

ThommyH commented Jan 11, 2024

ShawnUCD commented Jan 31, 2024

abhishek-annigeri-brainbits commented Jun 4, 2024

Pingdom duplicate checks creation #293

Pingdom duplicate checks creation #293

Comments

artemlive commented Nov 2, 2020 • edited

ahmedwaleedmalik commented Nov 12, 2020

artemlive commented Nov 12, 2020 • edited

tomjohnburton commented Aug 23, 2021

github-actions bot commented Apr 13, 2023

github-actions bot commented May 14, 2023

Dadavan commented May 29, 2023

devopstagon commented Jun 20, 2023

karl-johan-grahn commented Sep 26, 2023

davidgibbons commented Oct 18, 2023

ThommyH commented Jan 11, 2024

ShawnUCD commented Jan 31, 2024

abhishek-annigeri-brainbits commented Jun 4, 2024

artemlive commented Nov 2, 2020 •

edited

artemlive commented Nov 12, 2020 •

edited