👉 nodes.json 👈
To help keep track of Fediverse’s growth (or stagnation T_T).
The-federation.info only checks the instances that were added manually. (They recently started working on a crawler, so that might change.) FediDB.org has a non-recursive crawler that fetches new peers from a few source instances. Fediverse.Observer seem to have a recursive crawler. This fragmentation leads to hubs seeing a different picture of the Fediverse, which leads to difference in their stats. I couldn’t see a better way to help than to create my own crawler with a sole purpose of discovering the instances.
Another raison d’être is to enable novel applications that need such a list. We don’t know what they are just yet. A service that recommends instances to newcomers? Some sort of a cataloguing effort? Global search? We want you to go straight to building that, without spending your energy on re-inventing the “Fediverse crawler” wheel.
You can only do this if you’re the node’s administrator.
If you’re running GNU Social or
Friendica: set the site.private
property
to 1
or true
in the StatusNet config.
If you’re running Hubzilla or Red:
set hide_in_statistics
property to 1
or
true
in siteinfo.json
.
If you’re running anything else: add the following
to your robots.txt
:
User-agent: MinoruFediverseCrawler
Disallow: /
Simply follow someone on Fediverse ツ Soon enough, the crawler will discover the node and add it to the list.
Make sure the instance didn’t opt out of statistics as described above.
If you did that, and after a week the instance still isn’t on the list, please file an issue.
It’s a JSON array of hostnames of all alive nodes that are known to the crawler.
“Alive” means any instance that responded with NodeInfo at least once within the last week. It doesn’t imply that the instance federates with anyone, or has a web UI, or is working right now.
Conversely, if an instance is missing from this list, it doesn’t mean the instance doesn’t exist. It could be blocking access to NodeInfo, or its address could be unreachable from the crawler’s host (as is the case with Tor and I2P addresses).
“Known” means that the crawler has seen the hostname in someone’s
peer list. As of 2021-11-09, the crawler requests
/api/v1/instance/peers
endpoint from Mastodon, Pleroma,
Misskey, BookWyrm, and Smithereen servers. If an instance doesn’t
federate with anyone, it would be missing from the peers lists, and the
crawler won’t know about its existence.
Only hostnames are included in the list; no ports, no URL schemas (HTTPS and 443 are assumed). Furthermore, only hostnames whose suffixes are on the Public Suffix List are allowed.
About every six hours.
Please do not fetch the list very often. It doesn’t make sense; only a couple instances appear and die every day, and you probably don’t need to know about it right away. This is not a monitoring service.
The nodes themselves are checked about once a day. The crawler also maintains internal lists of “moved” and “dead” instances, which are checked once a week (just in case they come back to life). The checks are spread throughout the day with some jitter, that’s why the list is updated more often than the check period.
Crawler’s user agent string is
Minoru's Fediverse Crawler (+https://nodes.fediverse.party)
.
It makes requests from the following IP addresses:
Historical addresses, in case you need to grep your logs or something:
Alexander Batischev AKA Minoru, whom you can reach on Fediverse at @minoru@functional.cafe or by email at eual.jp@gmail.com. My PGP key is 0x356961a20c8bfd03.
Kudos to @lightone@mastodon.xyz for all the discussions and all the ideas she brought to this project!
See github.com/Minoru/minoru-fediverse-crawler. I’ll gladly move to a self-hosted Gitea instance once ForgeFed becomes a reality ^_^
The code is licensed under AGPL 3.0+.