đ nodes.json đ
To help keep track of Fediverseâs growth (or stagnation T_T).
The-federation.info only checks the instances that were added manually. (They recently started working on a crawler, so that might change.) FediDB.org has a non-recursive crawler that fetches new peers from a few source instances. Fediverse.Observer seem to have a recursive crawler. This fragmentation leads to hubs seeing a different picture of the Fediverse, which leads to difference in their stats. I couldnât see a better way to help than to create my own crawler with a sole purpose of discovering the instances.
Another raison dâĂȘtre is to enable novel applications that need such a list. We donât know what they are just yet. A service that recommends instances to newcomers? Some sort of a cataloguing effort? Global search? We want you to go straight to building that, without spending your energy on re-inventing the âFediverse crawlerâ wheel.
You can only do this if youâre the nodeâs administrator.
If youâre running GNU Social or
Friendica: set the site.private
property
to 1
or true
in the StatusNet config.
If youâre running Hubzilla or Red:
set hide_in_statistics
property to 1
or
true
in siteinfo.json
.
If youâre running anything else: add the following
to your robots.txt
:
User-agent: MinoruFediverseCrawler
Disallow: /
Simply follow someone on Fediverse ă Soon enough, the crawler will discover the node and add it to the list.
Make sure the instance didnât opt out of statistics as described above.
If you did that, and after a week the instance still isnât on the list, please file an issue.
Itâs a JSON array of hostnames of all alive nodes that are known to the crawler.
âAliveâ means any instance that responded with NodeInfo at least once within the last week. It doesnât imply that the instance federates with anyone, or has a web UI, or is working right now.
Conversely, if an instance is missing from this list, it doesnât mean the instance doesnât exist. It could be blocking access to NodeInfo, or its address could be unreachable from the crawlerâs host (as is the case with Tor and I2P addresses).
âKnownâ means that the crawler has seen the hostname in someoneâs
peer list. As of 2021-11-09, the crawler requests
/api/v1/instance/peers
endpoint from Mastodon, Pleroma,
Misskey, BookWyrm, and Smithereen servers. If an instance doesnât
federate with anyone, it would be missing from the peers lists, and the
crawler wonât know about its existence.
Only hostnames are included in the list; no ports, no URL schemas (HTTPS and 443 are assumed). Furthermore, only hostnames whose suffixes are on the Public Suffix List are allowed.
About every six hours.
Please do not fetch the list very often. It doesnât make sense; only a couple instances appear and die every day, and you probably donât need to know about it right away. This is not a monitoring service.
The nodes themselves are checked about once a day. The crawler also maintains internal lists of âmovedâ and âdeadâ instances, which are checked once a week (just in case they come back to life). The checks are spread throughout the day with some jitter, thatâs why the list is updated more often than the check period.
Crawlerâs user agent string is
Minoru's Fediverse Crawler (+https://nodes.fediverse.party)
.
It makes requests from the following IP addresses:
Historical addresses, in case you need to grep your logs or something:
Alexander Batischev AKA Minoru, whom you can reach on Fediverse at @minoru@functional.cafe or by email at eual.jp@gmail.com. My PGP key is 0x356961a20c8bfd03.
Kudos to @lightone@mastodon.xyz for all the discussions and all the ideas she brought to this project!
See github.com/Minoru/minoru-fediverse-crawler. Iâll gladly move to a self-hosted Gitea instance once ForgeFed becomes a reality ^_^
The code is licensed under AGPL 3.0+.