Discussion:
[Unbound-users] reddit.com issue
Dave Duchscher
2014-08-25 00:18:17 UTC
Permalink
We have just started using unbound and I am having an issue with resolving reddit.com do to some bad queries hitting our servers.

The bad queries are for 'http://www.reddit.com.' The colon in that name causes reddit.com's servers to not respond to the query. At some point unbound marks the whole domain reddit.com as failing and returns SERVFAIL for all queries. This clears after a bit and then repeats.

I have filtered out the bad queries to stop the immediate problem.

I am looking for a more robust way to fix this issue.

--
Dave
Thomas Guthmann
2014-08-25 04:58:46 UTC
Permalink
Hi,

Like this ?

15 A IN http???www.reddit.com. 221.211696 iterator wait for 173.245.58.24
22 AAAA IN http???www.reddit.com. 0.097014 iterator wait for 198.41.222.24

Thomas
W.C.A. Wijngaards
2014-08-25 07:16:38 UTC
Permalink
Hi,
Post by Thomas Guthmann
Hi,
Like this ?
15 A IN http???www.reddit.com. 221.211696 iterator wait for
173.245.58.24 22 AAAA IN http???www.reddit.com. 0.097014 iterator
wait for 198.41.222.24
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.

Unbound notices the domain does not respond to A queries. And marks
the domain as timeouted, down, for A queries. Unbound stops sending A
queries there to attempt to trottle down traffic towards that stricken
server. If A queries get replies (there is an exponential backoff to
the queries sent out) then unbound marks the server as responsive
again (the server is considered back up) and queries are resumed.

Best regards,
Wouter
Maciej Soltysiak
2014-08-25 11:05:05 UTC
Permalink
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.

Cloudflare seems to do the same thing for their customers.

If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.

It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Post by W.C.A. Wijngaards
Unbound notices the domain does not respond to A queries. And marks
the domain as timeouted, down, for A queries. Unbound stops sending A
queries there to attempt to trottle down traffic towards that stricken
server. If A queries get replies (there is an exponential backoff to
the queries sent out) then unbound marks the server as responsive
again (the server is considered back up) and queries are resumed.
Is there any unbound-control command to help in this situation? i.e.
manually override the backoff or reset it? Would flush_type or
flush_name help?

Best regards,
Maciej
W.C.A. Wijngaards
2014-08-25 11:28:41 UTC
Permalink
Hi Maciej,
On Mon, Aug 25, 2014 at 9:16 AM, W.C.A. Wijngaards
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers) are
not following the DNS specifications. They are dropping the
query and they should be replying. There was a draft at the IETF
even to mark this as harmful, but it did not progress through the
standards track, I believe. If they want to refuse the query for
unclear reasons (what is wrong with responding NXDOMAIN?) they
could choose from nice error codes like SERVFAIL and FORMERR and
REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same
(drop) When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this
you would have to start your own filtering... That shouldn't be the
point.
Post by W.C.A. Wijngaards
Unbound notices the domain does not respond to A queries. And
marks the domain as timeouted, down, for A queries. Unbound
stops sending A queries there to attempt to trottle down traffic
towards that stricken server. If A queries get replies (there is
an exponential backoff to the queries sent out) then unbound
marks the server as responsive again (the server is considered
back up) and queries are resumed.
Is there any unbound-control command to help in this situation?
i.e. manually override the backoff or reset it? Would flush_type
or flush_name help?
unbound-control flush_infra [all | ip-address of the nameserver]

This deletes the timing information so queries are sent again.

You could also reduce the infra-ttl in the config, so that unbound
forgets this sort of thing faster.

Best regards,
Wouter
Dave Duchscher
2014-08-25 12:56:00 UTC
Permalink
Post by Maciej Soltysiak
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Not a customer of Cloudflare but their help system allows outsiders to
submit so I have submitted a help request for this problem (172999).
Maybe this is a bug.

--
Dave
Dave Duchscher
2014-08-25 13:24:17 UTC
Permalink
Post by Dave Duchscher
Post by Maciej Soltysiak
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Not a customer of Cloudflare but their help system allows outsiders to
submit so I have submitted a help request for this problem (172999).
Maybe this is a bug.
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
Since these kinds of invalid queries don't get this far in the normal DNS system (since they get dropped at the root servers)
Let us know if you need any other help
Thanks
*sigh*

--
Dave
W.C.A. Wijngaards
2014-08-25 13:36:03 UTC
Permalink
Hi Dave,
Post by Dave Duchscher
On Aug 25, 2014, at 6:05 AM, Maciej Soltysiak
On Mon, Aug 25, 2014 at 9:16 AM, W.C.A. Wijngaards
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers)
are not following the DNS specifications. They are dropping
the query and they should be replying. There was a draft at
the IETF even to mark this as harmful, but it did not
progress through the standards track, I believe. If they
want to refuse the query for unclear reasons (what is wrong
with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just
asked cloudflare NSes for a name with a colon and it behaves
the same (drop) When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
If not FORMERR, they could've at least send ICMP
administratively prohibited to mark that this particular comms
is not ok with them. That would've made unbound record a
failure.
It's silly because in order to immunize your cache against this
you would have to start your own filtering... That shouldn't be
the point.
Not a customer of Cloudflare but their help system allows
outsiders to submit so I have submitted a help request for this
problem (172999). Maybe this is a bug.
Hey there,
Because the DNS query "http://reddit.com" is technically not
valid (since DNS queries should not contain the protocol URI),
CloudFlare's DNS servers will not respond to them.
Since these kinds of invalid queries don't get this far in the
normal DNS system (since they get dropped at the root servers)
Let us know if you need any other help Thanks
*sigh*
The root servers certainly respond. I got a very neat referral to .com.

Well, they list "http://reddit.com" which is a dotCOM domain with a
colon in it, that stops somewhere at the .com servers. And does not
reach CloudFlare, so they are right about that one.

But the trouble is with "http://www.reddit.com" because the DNS
servers for 'reddit.com' do not respond for it.

Best regards,
Wouter
John Peacock
2014-08-25 13:45:07 UTC
Permalink
Post by Dave Duchscher
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
That is what I would have predicted their response would have been. A
broken client is making illegal DNS queries; that is the root cause of
the difficulty. The fact that unbound itself doesn't return an error
for these illegal queries is only making matters worse. Neither ':' nor
'/' are legal DNS hostname characters (see RFC-1035 and onwards), so it
should be the resolver library (i.e. unbound) that should be validating
the query before sending it on, IMNSHO. The fact that reddit.com has an
unfriendly behavior WRT illegal queries doesn't mean it is their fault;
there is no requirement to return NXDOMAIN or SERVFAIL or anything at
all, so they chose to drop the query.

John

--
JOHN PEACOCK
senior software build and release engineer
www.messagesystems.com
twitter @MessageSystems

tel 410-872-4910 x239
email joh
Paul Wouters
2014-08-25 14:07:33 UTC
Permalink
Post by John Peacock
the query before sending it on, IMNSHO. The fact that reddit.com has an
unfriendly behavior WRT illegal queries doesn't mean it is their fault;
there is no requirement to return NXDOMAIN or SERVFAIL or anything at
all, so they chose to drop the query.
There is! Not answering a query is indistinguishable from packet loss,
forcing the client to re-send the query. So it is the wrong thing to
do, and will increase the number of these bad queries hitting their
servers.

Paul
Eric Meddaugh
2014-08-25 14:13:13 UTC
Permalink
I alerted Cloud Flare last week and they have indicate they have engineers looking into it. I opened the ticket as a DOS against any domains they provide hosing for. As long as there are clients querying 'http://www.reddit.com' (or any other cloud flare hosted domain) it can keep that domain offline. Our work-around as allowed reddit.com to appear to remain online.

---Eric

-----Original Message-----
From: Unbound-users [mailto:unbound-users-***@unbound.net] On Behalf Of John Peacock
Sent: Monday, August 25, 2014 9:45 AM
To: unbound-***@unbound.net
Subject: Re: [Unbound-users] reddit.com issue
Post by Dave Duchscher
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
That is what I would have predicted their response would have been. A
broken client is making illegal DNS queries; that is the root cause of
the difficulty. The fact that unbound itself doesn't return an error
for these illegal queries is only making matters worse. Neither ':' nor
'/' are legal DNS hostname characters (see RFC-1035 and onwards), so it
should be the resolver library (i.e. unbound) that should be validating
the query before sending it on, IMNSHO. The fact that reddit.com has an
unfriendly behavior WRT illegal queries doesn't mean it is their fault;
there is no requirement to return NXDOMAIN or SERVFAIL or anything at
all, so they chose to drop the query.

John
--
JOHN PEACOCK
senior software build and release engineer
www.messagesystems.com
twitter @MessageSystems

tel 410-872-4910 x239
email ***@messagesystems.com
Dave Duchscher
2014-08-25 15:29:54 UTC
Permalink
That is good to hear. I was thinking I was getting a first line response to the issue since it was so quick. I probably didn't explain it well enough. I will try again. More tickets may help push it up on their priority list.

--
Dave
Post by Eric Meddaugh
I alerted Cloud Flare last week and they have indicate they have engineers looking into it. I opened the ticket as a DOS against any domains they provide hosing for. As long as there are clients querying 'http://www.reddit.com' (or any other cloud flare hosted domain) it can keep that domain offline. Our work-around as allowed reddit.com to appear to remain online.
---Eric
-----Original Message-----
Sent: Monday, August 25, 2014 9:45 AM
Subject: Re: [Unbound-users] reddit.com issue
Post by Dave Duchscher
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
That is what I would have predicted their response would have been. A
broken client is making illegal DNS queries; that is the root cause of
the difficulty. The fact that unbound itself doesn't return an error
for these illegal queries is only making matters worse. Neither ':' nor
'/' are legal DNS hostname characters (see RFC-1035 and onwards), so it
should be the resolver library (i.e. unbound) that should be validating
the query before sending it on, IMNSHO. The fact that reddit.com has an
unfriendly behavior WRT illegal queries doesn't mean it is their fault;
there is no requirement to return NXDOMAIN or SERVFAIL or anything at
all, so they chose to drop the query.
John
--
JOHN PEACOCK
senior software build and release engineer
www.messagesystems.com
tel 410-872-4910 x239
_______________________________________________
Unbound-users mailing list
http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
_______________________________________________
Unbound-users mailing list
http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
John Graham-Cumming
2014-08-25 15:33:20 UTC
Permalink
Post by Dave Duchscher
That is good to hear. I was thinking I was getting a first line response
to the issue since it was so quick. I probably didn't explain it well
enough. I will try again. More tickets may help push it up on their
priority list.
No need. I'm the CloudFlare engineer on that and it has already been fixed
(we will now reply NXDOMAIN) and the code will be deployed shortly.

John.
Maciej Soltysiak
2014-08-25 19:45:35 UTC
Permalink
Post by John Graham-Cumming
No need. I'm the CloudFlare engineer on that and it has already been fixed
(we will now reply NXDOMAIN) and the code will be deployed shortly.
Thanks for your response John, it's very appreciated. Perhaps FORMERR
is more suited, but NXDOMAIN is true in this case as well and better
suited than the drop. Can you let the list know when it's done,
please?

Friends, the fact that I found the issue Dave reported in CloudFlare
doesn't mean it's not existing elsewhere.
I mean this should close Dave's case because cns[123].reddit.com are CloudFlare.
But this sort of thing can happen a lot with many IDS-type products
that do deep packet inspection and filtering.

I guess it may mean one service at a time.

Best regards,
Maciej
John Graham-Cumming
2014-08-26 07:11:20 UTC
Permalink
Post by Maciej Soltysiak
Thanks for your response John, it's very appreciated. Perhaps FORMERR
is more suited, but NXDOMAIN is true in this case as well and better
suited than the drop. Can you let the list know when it's done,
please?
Yes. I'll let the list know.

John.
John Graham-Cumming
2014-08-28 08:26:14 UTC
Permalink
Post by Maciej Soltysiak
Can you let the list know when it's done,
please?
This fix has been deployed globally.

John.

Jelte Jansen
2014-08-25 14:02:21 UTC
Permalink
Post by Dave Duchscher
Post by Dave Duchscher
Hey there,
Because the DNS query "http://reddit.com" is technically not valid (since DNS queries should not contain the protocol URI), CloudFlare's DNS servers will not respond to them.
Since these kinds of invalid queries don't get this far in the normal DNS system (since they get dropped at the root servers)
Let us know if you need any other help
Thanks
*sigh*
Wow. Not only is that answer wrong, that approach makes these zones easy
to DoS on a number of resolvers.

Worse, as someone on IRC just commented, it also makes it much, much
easier to do kaminsky-style attacks on those zones.

Jelte
Eric Meddaugh
2014-08-25 12:58:19 UTC
Permalink
I've done some packet captures and it appears it may be an app on some i* devices. We're seeing iPhone's and iPad's send these types of queries.

---Eric

-----Original Message-----
From: Unbound-users [mailto:unbound-users-***@unbound.net] On Behalf Of Thomas Guthmann
Sent: Monday, August 25, 2014 12:59 AM
To: Dave Duchscher; unbound-***@unbound.net
Subject: Re: [Unbound-users] reddit.com issue

Hi,

Like this ?

15 A IN http???www.reddit.com. 221.211696 iterator wait for 173.245.58.24
22 AAAA IN http???www.reddit.com. 0.097014 iterator wait for 198.41.222.24

Thomas
Dave Duchscher
2014-08-27 02:11:46 UTC
Permalink
Post by Eric Meddaugh
I've done some packet captures and it appears it may be an app on some i* devices. We're seeing iPhone's and iPad's send these types of queries.
---Eric
Just to be complete this story, the iOS app in question is Alien Blue. When the app is not being used, these queries pop up about every 5 minutes. When you use the app, queries are normal. I have notified the developer.

--
Dave
Leen Besselink
2014-08-25 11:39:31 UTC
Permalink
Post by Maciej Soltysiak
Post by W.C.A. Wijngaards
Yes. The reddit servers (or likely, their load-balancers) are not
following the DNS specifications. They are dropping the query and
they should be replying. There was a draft at the IETF even to mark
this as harmful, but it did not progress through the standards track,
I believe. If they want to refuse the query for unclear reasons (what
is wrong with responding NXDOMAIN?) they could choose from nice error
codes like SERVFAIL and FORMERR and REFUSED.
Yup. I have a domain that goes through cloudflare. I just asked
cloudflare NSes for a name with a colon and it behaves the same (drop)
When I asked the parents, they answered.
Cloudflare seems to do the same thing for their customers.
So I tried Dyn, they respond with NXDOMAIN.

I also tried DNSMadeEasy they respond with NXDOMAIN.

I noticed when the domain has a wildcard they respond with the A-record.

I then checked a PowerDNS server, they respond with SERVFAIL even when the domain has a wildcard.
Post by Maciej Soltysiak
If not FORMERR, they could've at least send ICMP administratively
prohibited to mark that this particular comms is not ok with them.
That would've made unbound record a failure.
It's silly because in order to immunize your cache against this you
would have to start your own filtering... That shouldn't be the point.
Post by W.C.A. Wijngaards
Unbound notices the domain does not respond to A queries. And marks
the domain as timeouted, down, for A queries. Unbound stops sending A
queries there to attempt to trottle down traffic towards that stricken
server. If A queries get replies (there is an exponential backoff to
the queries sent out) then unbound marks the server as responsive
again (the server is considered back up) and queries are resumed.
Is there any unbound-control command to help in this situation? i.e.
manually override the backoff or reset it? Would flush_type or
flush_name help?
Eric Meddaugh
2014-08-25 12:42:57 UTC
Permalink
I saw this about 2 weeks ago initially. I was able to track down the same you found. I was able to mitigate this issues by putting a "bad" answer back so we do not forward the "bad" query to Cloud Flare (I've alerted them).

I put a local-data entry:

local-data: "http://www.reddit.com. 300 IN A 0.0.0.0"


Not a great answer, but it keeps reddit.com online for sudents.

----Eric

-----Original Message-----
From: Unbound-users [mailto:unbound-users-***@unbound.net] On Behalf Of Dave Duchscher
Sent: Sunday, August 24, 2014 8:18 PM
To: unbound-***@unbound.net
Subject: [Unbound-users] reddit.com issue

We have just started using unbound and I am having an issue with resolving reddit.com do to some bad queries hitting our servers.

The bad queries are for 'http://www.reddit.com.' The colon in that name causes reddit.com's servers to not respond to the query. At some point unbound marks the whole domain reddit.com as failing and returns SERVFAIL for all queries. This clears after a bit and then repeats.

I have filtered out the bad queries to stop the immediate problem.

I am looking for a more robust way to fix this issue.

--
Dave
Loading...