Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external-dns crashlooping when using godaddy provider - panic: invalid argument to Int63n #4864

Closed
alexstojda opened this issue Nov 11, 2024 · 3 comments · Fixed by #4866
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@alexstojda
Copy link
Contributor

alexstojda commented Nov 11, 2024

What happened:

Start around 11:55 EST today (Monday, Nov 11th, 2024) my external DNS deployment started crashlooping out of the blue. No configuration change has been made in the last 2 weeks.

As far as I can tell, it seems to be failing because the Retry-After header is either missing or its value can't be parsed to Int64 here:

retryAfter, _ := strconv.ParseInt(resp.Header.Get("Retry-After"), 10, 0)

In addition, what is strange is that the 429 response, which is what would trigger the above code, only seems to happen in the external-dns controller. Calling the GoDaddy API with the same API Key/Secret from the same IP does not trigger a 429 response, yet the external-dns pod is consistently unable to start.

What you expected to happen:

External-dns should not crash. At best, there should be a more graceful handling of a missing or invalid Retry-After header. Maybe defaulting to 60 seconds?

How to reproduce it (as minimally and precisely as possible):

Still unclear what the root cause is. Will setup a local cluster and report back if I find anything.

Anything else we need to know?:

Logs:

panic: invalid argument to Int63n

goroutine 1 [running]:
math/rand.(*Rand).Int63n(0x3bbc4c9?, 0xb?)
    math/rand/rand.go:122 +0xcb
math/rand.Int63n(0x0)
    math/rand/rand.go:443 +0x25
sigs.k8s.io/external-dns/provider/godaddy.(*Client).Do(0xc000ee03c0, 0xc0016fadc0)
    sigs.k8s.io/external-dns/provider/godaddy/client.go:235 +0x127
sigs.k8s.io/external-dns/provider/godaddy.(*Client).CallAPIWithContext(0xc000ee03c0, {0x4261490, 0x6574a40}, {0x3ba7c15?, 0xc000ecd550?}, {0x3c45c1f?, 0x30?}, {0x0?, 0x0?}, {0x0, ...}, ...)
    sigs.k8s.io/external-dns/provider/godaddy/client.go:301 +0x127
sigs.k8s.io/external-dns/provider/godaddy.(*Client).CallAPI(...)
    sigs.k8s.io/external-dns/provider/godaddy/client.go:272
sigs.k8s.io/external-dns/provider/godaddy.(*Client).Get(...)
    sigs.k8s.io/external-dns/provider/godaddy/client.go:143
sigs.k8s.io/external-dns/provider/godaddy.(*Client).validate(0x47305f?)
    sigs.k8s.io/external-dns/provider/godaddy/client.go:342 +0x4b
sigs.k8s.io/external-dns/provider/godaddy.NewClient(0x1?, {0x7ffef2140a54, 0x23}, {0x7ffef2140a8d, 0x16})
    sigs.k8s.io/external-dns/provider/godaddy/client.go:131 +0x198
sigs.k8s.io/external-dns/provider/godaddy.NewGoDaddyProvider({0x0?, 0xc03ca9?}, {{0xc000e8cd70, 0x1, 0x1}, {0x0, 0x0, 0x0}, 0x0, 0x0}, ...)
    sigs.k8s.io/external-dns/provider/godaddy/godaddy.go:142 +0x46
main.main()
    sigs.k8s.io/external-dns/main.go:342 +0x206f
Stream closed EOF for external-dns/external-dns-godaddy-cmls-75d996f88f-wp8gx (external-dns)

Environment:

  • External-DNS version (use external-dns --version): 1.15.0 (also happening in 1.14.5)
  • DNS provider: godaddy
  • Others:
@alexstojda alexstojda added the kind/bug Categorizes issue or PR as related to a bug. label Nov 11, 2024
@rayshoo
Copy link

rayshoo commented Nov 14, 2024

I am currently experiencing the same problem. As a result of testing with the same API key using postman, I received the following response.

  • body
{
    "code": "QUOTA_EXCEEDED",
    "message": "The monthly quota for requests has been exceeded."
}

I was also able to confirm that when the monthly quota limit is hit, the retryAfter header is not included.

FYI: Since I have a one-month limit, it is currently impossible to check whether the retryAfter header comes in case of a one-minute limit.

This appears to result in the retryAfter header being pulled as blank and parsed as 0, causing a panic in the rand.Int63n() function.

I didn't know whether the one-month limit existed originally, but it seems that the problem of not having a retryAfter header can be solved as follows.

// provider/godaddy/client.go:224
func (c *Client) Do(req *http.Request) (*http.Response, error) {
	...
	resp, err := c.Client.Do(req)
	// In case of several clients behind NAT we still can hit rate limit
	for i := 1; i < 3 && err == nil && resp.StatusCode == 429; i++ {
		retryAfter, _ := strconv.ParseInt(resp.Header.Get("Retry-After"), 10, 0)

		var jitter int64
		if retryAfter > 0 {
			jitter = rand.Int63n(retryAfter)
		}
		retryAfterSec := retryAfter + jitter/2
	...
}

In the case of a one-month limit, it seems that the body message must be read to distinguish correctly, but since there appears to be a function that already reads the body message after the client.Do function, it appears that only the header value must be processed.

// provider/godaddy/client.go:224
func (c *Client) Do(req *http.Request) (*http.Response, error) {
	...
	resp, err := c.Client.Do(req)
	// In case of several clients behind NAT we still can hit rate limit
	for i := 1; i < 3 && err == nil && resp.StatusCode == 429; i++ {
		var retryAfter int64
		var jitter int64
		if headerValue := resp.Header.Get("Retry-After"); headerValue == "" {
			// Even with the 1 minute limit, there may be no retryAfter header, so set it to 30 seconds instead of 60 seconds.
			retryAfter = 30
		} else {
			retryAfter, _ = strconv.ParseInt(headerValue, 10, 0)
			if retryAfter > 0 {
				jitter = rand.Int63n(retryAfter)
			}
		}
		retryAfterSec := retryAfter + jitter/2
	...
}

I will pull request this code. Please let me know if anything needs to be fixed

@alexstojda
Copy link
Contributor Author

@rayshoo FYI I already have a PR open to fix the issue with a bit of a different approach: #4866

If you'd like to suggest some changes we can discuss there? So as to not have 2 different PRs for the same issue.

@rayshoo
Copy link

rayshoo commented Nov 15, 2024

@alexstojda Of course. I will discuss this in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants