This is with 2.3.1rc2, but I believe the issue started earlier.
2018-03-12T13:21:45.094Z 86844 TID-a3oqxw ActivityPub::ProcessingWorker JID-d54802549aadb871df136517 INFO: start
2018-03-12T13:21:45.161Z 86844 TID-a3oqxw ActivityPub::ProcessingWorker JID-d54802549aadb871df136517 INFO: fail: 0.067 sec
2018-03-12T13:21:45.161Z 86844 TID-a3oqxw WARN: {"context":"Job raised exception","job":{"class":"ActivityPub::ProcessingWorker","args":[36,"{\"@context\":[\"https://www.w3.org/ns/activitystreams\",\"https://w3id.org/security/v1\",{\"manuallyApprovesFollowers\":\"as:manuallyApprovesFollowers\",\"sensitive\":\"as:sensitive\",\"movedTo\":\"as:movedTo\",\"Hashtag\":\"as:Hashtag\",\"ostatus\":\"http://ostatus.org#\",\"atomUri\":\"ostatus:atomUri\",\"inReplyToAtomUri\":\"ostatus:inReplyToAtomUri\",\"conversation\":\"ostatus:conversation\",\"toot\":\"http://joinmastodon.org/ns#\",\"Emoji\":\"toot:Emoji\",\"focalPoint\":{\"@container\":\"@list\",\"@id\":\"toot:focalPoint\"},\"featured\":\"toot:featured\"}],\"id\":\"https://mastodon.social/users/Gargron/statuses/99670053839562591/activity\",\"type\":\"Announce\",\"actor\":\"https://mastodon.social/users/Gargron\",\"published\":\"2018-03-12T08:45:27Z\",\"to\":[\"https://www.w3.org/ns/activitystreams#Public\"],\"cc\":[\"https://mastodon.social/users/Noa89\",\"https://mastodon.social/users/Gargron/followers\"],\"object\":\"https://mastodon.social/users/Noa89/statuses/99668292355953925\",\"atomUri\":\"https://mastodon.social/users/Gargron/statuses/99670053839562591/activity\",\"signature\":{\"type\":\"RsaSignature2017\",\"creator\":\"https://mastodon.social/users/Gargron#main-key\",\"created\":\"2018-03-12T08:45:30Z\",\"signatureValue\":\"PxWbVG2+8ALQMLnPJqLiCfSOd+FGFb2O3EhwKv13z5Z/TPHOwYRb4ghCprUzeWLXigukCIDjseenBXzb4xA//UzErs594y54culgoha0mrL+gvnkg4QfzJ2p32kReHzK40OEvFiZFoUV5UBhsHvqUhx5s2b34l68VKHPqMxk01jJoA5tr2LphKxQnQHPjtpIo/MKKynlJCtLxxmCunK1z30CEEwzYyqkY9Z4txRmxIpoHsygAt6GuukrX/R6JntzUTcOhBw2648Ny+0Mj5beSIk2GPG7JvlBL2Z3flzB7ts6oHDSJVxtcfqU7yxTARyZYF/GbH3VjxKk9rujw5MUhQ==\"}}"],"retry":true,"queue":"default","backtrace":true,"jid":"d54802549aadb871df136517","created_at":1520844342.6147585,"enqueued_at":1520860905.0932858,"error_message":"failed to connect: No route to host - connect(2) for \"2001:bc8:3890:a100::1\" port 443 on https://mastodon.social/users/Noa89/statuses/99668292355953925","error_class":"HTTP::ConnectionError","failed_at":1520844342.7385914,"retry_count":9,"error_backtrace":["/usr/local/www/mastodon/app/lib/request.rb:99:in `initialize'"],"retried_at":1520854242.3667173},"jobstr":"{\"class\":\"ActivityPub::ProcessingWorker\",\"args\":[36,\"{\\\"@context\\\":[\\\"https://www.w3.org/ns/activitystreams\\\",\\\"https://w3id.org/security/v1\\\",{\\\"manuallyApprovesFollowers\\\":\\\"as:manuallyApprovesFollowers\\\",\\\"sensitive\\\":\\\"as:sensitive\\\",\\\"movedTo\\\":\\\"as:movedTo\\\",\\\"Hashtag\\\":\\\"as:Hashtag\\\",\\\"ostatus\\\":\\\"http://ostatus.org#\\\",\\\"atomUri\\\":\\\"ostatus:atomUri\\\",\\\"inReplyToAtomUri\\\":\\\"ostatus:inReplyToAtomUri\\\",\\\"conversation\\\":\\\"ostatus:conversation\\\",\\\"toot\\\":\\\"http://joinmastodon.org/ns#\\\",\\\"Emoji\\\":\\\"toot:Emoji\\\",\\\"focalPoint\\\":{\\\"@container\\\":\\\"@list\\\",\\\"@id\\\":\\\"toot:focalPoint\\\"},\\\"featured\\\":\\\"toot:featured\\\"}],\\\"id\\\":\\\"https://mastodon.social/users/Gargron/statuses/99670053839562591/activity\\\",\\\"type\\\":\\\"Announce\\\",\\\"actor\\\":\\\"https://mastodon.social/users/Gargron\\\",\\\"published\\\":\\\"2018-03-12T08:45:27Z\\\",\\\"to\\\":[\\\"https://www.w3.org/ns/activitystreams#Public\\\"],\\\"cc\\\":[\\\"https://mastodon.social/users/Noa89\\\",\\\"https://mastodon.social/users/Gargron/followers\\\"],\\\"object\\\":\\\"https://mastodon.social/users/Noa89/statuses/99668292355953925\\\",\\\"atomUri\\\":\\\"https://mastodon.social/users/Gargron/statuses/99670053839562591/activity\\\",\\\"signature\\\":{\\\"type\\\":\\\"RsaSignature2017\\\",\\\"creator\\\":\\\"https://mastodon.social/users/Gargron#main-key\\\",\\\"created\\\":\\\"2018-03-12T08:45:30Z\\\",\\\"signatureValue\\\":\\\"PxWbVG2+8ALQMLnPJqLiCfSOd+FGFb2O3EhwKv13z5Z/TPHOwYRb4ghCprUzeWLXigukCIDjseenBXzb4xA//UzErs594y54culgoha0mrL+gvnkg4QfzJ2p32kReHzK40OEvFiZFoUV5UBhsHvqUhx5s2b34l68VKHPqMxk01jJoA5tr2LphKxQnQHPjtpIo/MKKynlJCtLxxmCunK1z30CEEwzYyqkY9Z4txRmxIpoHsygAt6GuukrX/R6JntzUTcOhBw2648Ny+0Mj5beSIk2GPG7JvlBL2Z3flzB7ts6oHDSJVxtcfqU7yxTARyZYF/GbH3VjxKk9rujw5MUhQ==\\\"}}\"],\"retry\":true,\"queue\":\"default\",\"backtrace\":true,\"jid\":\"d54802549aadb871df136517\",\"created_at\":1520844342.6147585,\"enqueued_at\":1520860905.0932858,\"error_message\":\"failed to connect: No route to host - connect(2) for \\\"2001:bc8:3890:a100::1\\\" port 443 on https://mastodon.social/users/Noa89/statuses/99668292355953925\",\"error_class\":\"HTTP::ConnectionError\",\"failed_at\":1520844342.7385914,\"retry_count\":9,\"error_backtrace\":[\"/usr/local/www/mastodon/app/lib/request.rb:99:in `initialize'\"],\"retried_at\":1520854242.3667173}"}
2018-03-12T13:21:45.161Z 86844 TID-a3oqxw WARN: HTTP::ConnectionError: failed to connect: No route to host - connect(2) for "2001:bc8:3890:a100::1" port 443 on https://mastodon.social/users/Noa89/statuses/99668292355953925
2018-03-12T13:21:45.161Z 86844 TID-a3oqxw WARN: /usr/local/www/mastodon/app/lib/request.rb:99:in `initialize'
Also seeing it with rc3. The sidekiq log is exploding with these messages for many instances and I can confirm direct toots sent to, e.g., mastodon.social, and octodon.social do not arrive. Direct toots sent from those instances to our instance do arrive.
ipv6 is handled by the operating system, not mastodon. you might have some misconfiguration somewhere, maybe @staticsafe has an idea?
I will dig deeper, but we have no other connnectivity problems outside of Mastodon (it is just Mastodon that is trying to make connections to IPv6 addresses). Also, no OS configuration has changed with respect to IP. This just started with 2.3.0.
what happens if you run curl -6 mastodon.social?
% curl -6 mastodon.social
curl: (7) Couldn't connect to server
Added later:
```
curl mastodon.social
okay, that seems right. Not really sure how to debug further, i'm pretty
out of my depth network stack wise :D
On Tue, Mar 13, 2018 at 2:31 PM Joseph Mingrone notifications@github.com
wrote:
% curl -6 mastodon.social
curl: (7) Couldn't connect to server—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/tootsuite/mastodon/issues/6761#issuecomment-372771869,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAORVzH37xNYFgG4DqYGkY1Dc5UEygBDks5teBCYgaJpZM4SnICm
.
The line being referenced from request.rb in the error message is from a recent commit, within this code
class Socket < TCPSocket
class << self
def open(host, *args)
address = IPSocket.getaddress(host)
raise Mastodon::HostValidationError if PrivateAddressCheck.private_address? IPAddr.new(address)
> super address, *args
end
alias new open
end
end
in this commit: https://github.com/tootsuite/mastodon/commit/2e8a492e8843aa958c53636b24cf4d344e7ca47d
I don't know what any of that means, myself, but seeing as how it has to do with handling IP addresses, I suspect it's related.
Saw this issue on mastodon,
I recommend you use Google Public DNS64 for access to IPv4-sites.
Although it does have some limitations, but it might work.
I might be wrong, but it sounds like a OS issue.
Please give me the output of the following:
Linux should not connect to IPv6 hosts if it does not have an IPv6 address to use. My guess for the issue is that the node indeed does have a IPv6 address but it is non-functional or partly functional.
I am also seeing a lot of this in cases where my instance has known good IPv6 support, but the remote instance's IPv6 is broken (despite AAAA records being in place).
Ideally it should be falling back to IPv4 in this case, but if request.rb is using IPSocket.getaddress(host) and then connecting to the resulting IPv6 address directly, it's gonna have a bad time.
It has to connect to that IP address directly for security reasons, so we're not vulnerable to attacks like the lastpass dns rebinding attack
But even before that we didn't couldn't fallback because our timeouts measure the entire fetching process, inclusive of retries. While it would be nice to fallback to IPv4, it's pretty difficult to do that transparently while still maintaining throughput, and without allowing broken hosts to clog up the distribution process.
How about something like this: c032e2458ab186ecbb2b01620ef3f169cf150bec
Note: it's ugly code (I am not a Ruby dev) but it seems to be doing the trick here.
@rtucker the problem is that doubles or quadruples the amount of time broken sites take up in sidekiq, which really hurts the performance of distribution, even for sites that work. ideally we'd maybe set a flag so that when we retry the job in sidekiq it tries a different address or something, but i'm not sure how that would work technically. maybe a redis cache of addresses that have failed recently?
@nightpool Well, is it any worse than it was pre-2.3? It seems to fix the regression introduced by 2e8a492e8843aa958c53636b24cf4d344e7ca47d, and I'm now able to actually deliver to instances with broken IPv6, so my actual queue performance is better than with 2.3.
A cache would probably do the trick. I thought about finding some way to get the current retry count into there, to use that as the index into the Addrinfo array, but the ordering of addresses is not necessarily going to be the same on each call to getaddrinfo.
@staticsafe This host is running FreeBSD. Here is what I assume/hope is the equivalent information.
jrm@gly ~ % ifconfig
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:24:e8:2a:e5:a2
hwaddr 00:24:e8:2a:e5:a2
inet 129.173.118.183 netmask 0xfffffe00 broadcast 129.173.119.255
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
inet 127.0.0.1 netmask 0xff000000
inet 127.0.0.2 netmask 0xffffffff
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
groups: lo
pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160
groups: pflog
jrm@gly ~ % netstat -r
Routing tables
Internet:
Destination Gateway Flags Netif Expire
default gw81ad7600.backbon UGS re0
localhost link#2 UH lo0
localhost link#2 UH lo0
129.173.118.0/23 link#1 U re0
gly link#1 UHS lo0
Internet6:
Destination Gateway Flags Netif Expire
::/96 ::1 UGRS lo0
::1 link#2 UH lo0
::ffff:0.0.0.0/96 ::1 UGRS lo0
fe80::/10 ::1 UGRS lo0
fe80::%lo0/64 link#2 U lo0
fe80::1%lo0 link#2 UHS lo0
ff02::/16 ::1 UGRS lo0
md5-1fa3c5abd39134c188a759c1deb28f18
jrm@gly ~ % ndp -a
Neighbor Linklayer Address Netif Expire S Flags
That output indicates that the host has no IPv6 connectivity so I am now even more confused why a IPv6 connection is even being made. :/
Commit c032e24, as you suggested @rtucker, allows direct toots to be delivered from my instance to mastodon.social again. Thanks.
Even then, there is no guarantee that we firewall the address picked up in the end when connecting to the address. See my comment in https://github.com/tootsuite/mastodon/pull/6410#issuecomment-373951866.
This idea we can do this on the application level is simply wrong and should be removed from the code.
@rtucker
How about something like this: c032e24
The idea looks good to me. Please make a pull request and let me review.
Another workaround for IPv6 hosts is just to set up IPv6 correctly. Please see https://github.com/tootsuite/documentation/pull/535 if you are on Docker. But it is difficult, and not a possible option if you are on server without IPv6 connectivity anyway.
@akihikodaki PR #6813 opened!
For reference: Write "fix #6761" in the commit message or PR text to automatically close an issue when it is merged. Thanks.
Most helpful comment
@akihikodaki PR #6813 opened!