Friendica: Global directory replacement

Created on 29 Dec 2016  Â·  61Comments  Â·  Source: friendica/friendica

TL;DR

I'm shutting down http://dir-fc.oscp.info/ and http://fc.oscp.info/ in the near future, and am looking for admins to replace the global directory I'm hosting.

Which directory will be the successor of mine?
What can I help with to get their directory on par with mine?

(Maybe @tugelbend is hosting one?)


Both http://dir.friendica.com/ and http://dir.friendi.ca/ seem to have some issues. Like #1626, #3020 and #2311.

The directory that's on my server (http://dir-fc.oscp.info/) seems to be the most commonly used alternative. Used for example on http://friendi.ca/'s "list of public nodes". Since I haven't been involved with Friendica much at all in the past 2 years, I would like to pass the baton and shut down this service. Silently killing a directory that people apparently are relying on without prior notice seems like a waste though. Hence this issue.

From what I can see the main reason people use my directory instead of the others is because:

  • Mine has a working health check system.
  • Mine has an automatic public server listing using this health + registration policy info.

What I could do is share an export of my database. Which holds > 760,000 historical performance metrics since 2014-07 (about 48MB). And the current directory listings.

However if there is no directory running that gets good health checks, it's rather pointless. Because this list of all public users you can get very easily using sync. http://dir-fc.oscp.info/sync/pull/all

The http://dir.friendica.com/health/1661 example seems to fail at running the cronjobs reliably. This has been a long standing issue with the hosting limitations for it's server. Not expecting this will be fixed.

The http://dir.friendi.ca/health/2643 one seems to run the health checks, but is showing poor health for all servers. Probably because http://dir.friendi.ca itself is not running very fast and thus distorting the health of other servers. (Same for http://dir.friendica.com/ for it's health checks's speeds).

Compared to http://dir-fc.oscp.info/health/2981 which gets much faster speeds across the board.

Most helpful comment

For everyone here, I added a wiki page that explains some concepts.
This should be useful for both development and admins.
https://github.com/friendica/dir/wiki

It explains the main concepts and caveats when configuring.

All 61 comments

Scarry to see that Open Source projects sometimes depend on only a few persons wills to continue. :-/ Maybe replace it with something we all can host (some DHT or so)?

@Quix0r actually my directory is not in any way official or authoritative really. I just thought it needed to improve and put it on my server. So that's proof anyone can already host a directory. There is a basic low-trust syncing mechanism in place that allows new directories to keep up to date with other directories.

All that's really needed is Someone :tm: who will take a day or two to configure a directory on a server that performs well and get it synced.

I can host a global directory on my server. I think the server can handle some additional traffic and the database. But which repo should I use? Friendica/dir or Beanow/dir?
I'll install one and test it...

https://github.com/Beanow/dir/tree/feature/redesign-prototype
My dir has been running this branch. Which I would recommend at least for a few performances improvements over the master or develop branch.
Also worth looking at is @tugelbend's fork. Haven't run it myself but has a few improvements it looks like.

For server requirements, keep in mind that the most demanding part is probably the number of slow outbound connections (thus threading crobjobs) you have to run plus occasional full text searches. Also should your server have lag spikes or slow performance it will lower the health score of sites you're checking. A server with good bandwidth and enough breathing room is optimal.

@AlfredSK If you want to, you can have a subdomain under *.pirati.ca for this. And of course we should check why dir.friendi.ca is slow. Ist @tugelbend's fork a fork from your design prototype?

Yes, see https://github.com/friendica/dir/network

It could be nice to have dir.friendi.ca redirect only to subdomains. So you can switch to new directories more easily and not have to change documentation or settings.

Thanks @annando. I have a subdomain already. :-) https://dir.libranet.de/
But I have some trouble with the setup. I'll post something to the helpers forum and link the thread here.

I think an issue on the dir repo would be better so others can find it quickly.

I'm confused about the difference between http://dir-fc.oscp.info and http://dir-fc.oscp.info/home - isn't the one under /home the replacement for the one without /home?

Yes, but it was a work in progress, so I didn't replace the old one yet.

This is the kind of "work in progress" that I would want to improve. Where do we find this?

Each path corresponds to a file in the /mod folder, so /home is /mod/home.php. Everything else uses the composer autoloader and comes from the /src folder.

Oh God. Meet stupid here âž¡ @AlfredSK :-)
It was a typo in the virtual host config.

I too am interested in running a Friendica directory that checks the health of servers and accounts alike. There are too many dead accounts in the main Friendica directory. Is it something that is already done by @Beanow's branch?

Is it something that is already done by @Beanow's branch?

Yes @Beanow did some work in this area. This work was committed to the standard directory repository. But there seems to be some problems at dir.friendica.com with long running processes (I think the cleaning process is getting killed).
Additional work which @Beanow has done in his own repos (as far as I remember):
1.) possibility syncing with other directory server (I guess this was also submitted to the standard dir repo but I'm not sure)
2.) some modern GUI pages (e.g. for home, health and some others)
3.) restructure of some code parts (OOP)

But you will have to look which branch in Beanows repo is the most recent one

@Hypolite we desperately need a page where people can chose a server from. It seems as if @tugelbend doesn't have much time at the moment. So maybe eventually we could then point dir.friendi.ca to your server.

@Hypolite that was also one of my thoughts when I started working on the directory code. And yeah I did build a maintenance script plus the health system mostly for that purpose.

In short the maintenance keeps an updated timestamp, this may update when a friendica server re-submits a profile (because they changed some details, like description or tags). Or through the maintenance cronjob. In the config you can tweak the interval.

The second part is, profiles that are hosted on a site that have consistent downtime are removed. Because a profile on a server that's dead, is in my opinion the same as a dead profile. And fixable by re-submitting once you fixed the server.

The example settings I have in /.htconfig.php are also what I've been running and they seem to do a good job.

@annando as far as server choice is concerned. Note that http://dir-fc.oscp.info/servers exists still. It should be trivial to change this /mod/servers.php#L119 and output JSON so it can be integrated on other pages.

@AlfredSK great! I see you've got it installed. 0 users in there though.

Be sure to check if the cronjobs are set up.
Then take a look at the sync-targets table in the database.
@tugelbend added an example SQL for the testing setup.
https://github.com/tugelbend/dir/blob/feature/redesign-prototype/util/vagrant_default_sync_servers.sql

In production I would recommend also to select a push target rather than only pulling.

Once that is set up it will start pulling in profiles from the other directories, run health checks and all that jazz.

@Beanow Yes, server is running with the code from your repo. Cronjobs are running. I figured out how to create the tables with that SQL script from the repo. I created three entries in sync-targets for dir.friendica.com, dir.friendi.ca and dir-fc.oscp.info with push and pull set to 1. But I only get empty results.
Are the baseurls correct? Do I have to change other stuff in .htconfig.php than the database credentials? Do I have to run some install script?
I think the database connection fails. If I run the sync cronjob manually I get a PHP error concerning dba.php and line 66 (?). Something like "should be integer, but getting a string"....

@AlfredSK good point. In my configuration I included the protocol. http:// for the entries, no trailing slash. Seems like that would be a bug with the SQL file there.

Here is the output:
````
sudo -u www-data php include/cron_sync.php

Warning: mysqli_result::fetch_array() expects parameter 1 to be integer, string given in /var/www/html/frnddir/include/dba.php on line 66

Warning: mysqli_result::fetch_array() expects parameter 1 to be integer, string given in /var/www/html/frnddir/include/dba.php on line 66
Logfile is showing this:
2017-01-01 14:40:02: Pulling 0 items from queue.
2017-01-01 14:40:02: Pulling from 0 remote targets.
2017-01-01 14:40:02: Pushing enabled, but no push targets.
2017-01-01 14:40:02: Syncing completed. Took 0 seconds.
````

@annando I'm up for it, it'll just have to wait until January 21st when I get back from Australia. 😅

For everyone here, I added a wiki page that explains some concepts.
This should be useful for both development and admins.
https://github.com/friendica/dir/wiki

It explains the main concepts and caveats when configuring.

@Beanow Thank you so much, it is very helpful indeed!

I think there is a discrepancy how links to the list of servers are generated, between the current Friendica code and Beanow's code.

This means that some URLs may not work in Friendica as documented here #3020

I just set up https://dir.friendica.mrpetovan.com using @Beanow's feature/redesign-prototype branch. I set up the same sync targets as @AlfredSK and everything seems to run fine except I have a ridiculously small amount of accounts in the directory at the moment.

I'm getting a few Warnings like this Warning: imagecopyresampled() expects parameter 2 to be resource, boolean given in /home/friendica/dir/include/Photo.php on line 138 when running the sync cron.

It takes a few hours at least to do the first sync. Since other dirs only report the URLs of profiles it has and your dir needs to request all of them at the friendica servers that hosts them.

The image warnings I get as well. Failed downloads perhaps. It doesn't do this for all profiles so it will work.

On 22 January 2017 16:05:29 CET, Hypolite Petovan notifications@github.com wrote:

I just set up https://dir.friendica.mrpetovan.com using @Beanow's
feature/redesign-prototype branch. I set up the same sync
targets as @AlfredSK and everything seems to run fine except I have a
ridiculously small amount of accounts in the directory at the moment.

I'm getting a few Warnings like this Warning: imagecopyresampled() expects parameter 2 to be resource, boolean given in /home/friendica/dir/include/Photo.php on line 138 when running the
sync cron.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/friendica/friendica/issues/3053#issuecomment-274336156

@hypolite Which PHP version do you use? I'm on 7.0.x and running it as Apache2 mod_php.

@AlfredSK I'm still on PHP 5.6.29 with CLI and Apache2 mod_php.

@Beanow Is there any other sync targets I should know about? How about a directory of directories?

These are good enough. The point is only to get a full picture of all the profiles out there that opt in for the directory.

If however you receive any submissions directly, it's good to set up a push target.
This way you'll have some directories that are the go-to links that connect the others.

Rather than a directory of directories, I think an improvement would be to have Friendica servers list all profiles on their server that wish to be in the directory.
That way only finding out about a single profile on the server allows the directory to expand that to the rest of the profiles.

If you do want to go down the directory of directories rabbit hole, I would look at true p2p setups where each directory implements a request to list all directories it knows, and a way for a directory to connect and promote itself. Similar to what you see in protocols like bitcoin and torrent peer discovery.

On 22 January 2017 16:47:45 CET, Hypolite Petovan notifications@github.com wrote:

@Beanow Is there any other sync targets I should know about? How about
a directory of directories?

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/friendica/friendica/issues/3053#issuecomment-274338665

If however you receive any submissions directly, it's good to set up a push target.
This way you'll have some directories that are the go-to links that connect the others.

I set up all the sync targets for push, am I doing it correctly?

If you do want to go down the directory of directories rabbit hole, I would look at true p2p setups where each directory implements a request to list all directories it knows, and a way for a directory to connect and promote itself. Similar to what you see in protocols like bitcoin and torrent peer discovery.

Yeah, that's what I had in mind, especially since we have that list in the sync-targets table.

So long you don't have an infinite push loop and have the main pool of accounts synced, you're doing it properly.

Friendica server admins that want to select an alternative dir will naturally go for the ones that are connected to the majority of submissions. So I don't think a perfect protocol is that important.
Keep in mind that before this issue brought it up there were 3-4 directories total.

On 22 January 2017 17:28:09 CET, Hypolite Petovan notifications@github.com wrote:

If however you receive any submissions directly, it's good to set up
a push target.
This way you'll have some directories that are the go-to links that
connect the others.

I set up all the sync targets for push, am I doing it correctly?

If you do want to go down the directory of directories rabbit hole, I
would look at true p2p setups where each directory implements a request
to list all directories it knows, and a way for a directory to connect
and promote itself. Similar to what you see in protocols like bitcoin
and torrent peer discovery.

Yeah, that's what I had in mind, especially since we have that list in
the sync-targets table.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/friendica/friendica/issues/3053#issuecomment-274341118

I suggest extending /poco/ for showing all users that want to be displayed in the global directory. So people had no problem even when they changed their mind.

Or, if possible, let the user (not admin) decide in which directory he wants to be shown?

@Quix0r I think that's not how it should work.

The main utility of a directory is to have all profiles that have opted in to be listed and provide fast search features. What the admin chooses right now, assuming syncing between directories is configured correctly, is which directory server your friendica server will communicate with (a performance and reliability concern only). From there it should propagate to all other directories.

The idea of users being selective about which specific directory instances they want to be on will only create a confusing situation where people need to know where their potential friends could be at. Which is exactly the problem directories are supposed to solve.

In short the choices to be made when choosing a directory:

  • Do they apply any censorship and do I agree?
  • Does their server respond fast enough to give my users a good experience?
  • Do they sync well with other directories for the visibility of my users?

I'll extend the wiki with these thoughts.

Edit: hope this sheds some light on the matter.
https://github.com/friendica/dir/wiki/Syncing-considerations

At everyone here, can you inform me whether other directories are working well enough for me to shut mine down?

http://dir.friendica.com as bad as always.
http://dir.friendi.ca seems to be running the old version and not doing too well measuring performance.
https://dir.libranet.de/ is redirecting elsewhere.
https://dir.friendica.mrpetovan.com/ seems to run quite well.

Any others?

@Beanow I thought @Hypolite wanted to set up a new server.

I successfully set up https://dir.friendica.mrpetovan.com on @Beanow latest dev branch.

@Hypolite Would you like to have it running under dir.friendica.social? Then I could create a CNAME entry for that hostname that points to your system.

Update: I just saw that my DNS provider has some issue with the domain administration tool. I hope I will be able to create that CNAME in the next days.

Definitely, I'll change the Apache vhost tonight to support dir.friendica.social.

I just noticed that my own directory is 70 pages short of http://dir.friendica.com/. Is it because I'm performing a purge of inactive accounts or could it be a connectivity issue?

The vhost is ready, however it's going to be HTTP-only until you confirm the DNS change to me.

Seems to work now.

Then maybe let the admin pre-choose some favourable directories and later let the user choose to opt-in into these choosen (each dir separately, can be done with a separate table "user_directory" and "directory" where in the later the choosen directories are listed and in the first user<->directory is being linked).

@Hypolite I would really hope for dir.friendica.com to stop being the leading directory. It's doing a really poor job at the maintenance cronjob. The server it runs on is slow, so it may incorrectly remove good profiles and the cronjob gets killed so bad profiles don't get removed. Those 70 pages are mostly dead profiles that should have been removed a long time ago.

http://dir.friendi.ca and http://dir-fc.oscp.info both have 14 pages where you have 13.

I think the differences in the exact numbers between these three are because:

  • Syncing happens on a single profile level.
  • Removing profiles that are now dead is slower than refusing to add profiles that are dead.

So basically, your directory should have been aware of all these profiles, since you're pulling from these directories. But because the Friendica server hosting it failed to respond the profiles are not added.

To get an impression of how big a difference the maintenance cronjob makes, try these queries.

# See which scores have been handed out how often.
SELECT `health_score`, COUNT( * )
FROM `site-health`
GROUP BY `health_score`

# See how many sites fell below the removal threshold.
SELECT COUNT(*) FROM `site-health` WHERE `health_score` <= -60

# See how many sites are above the removal threshold.
SELECT COUNT(*) FROM `site-health` WHERE `health_score` > -60

For my DB that turns up 2670 Friendica servers removed from my directory,
vs 698 ones above the threshold!

health_score | COUNT( * )
-|-
-100 | 2448
-95 | 6
-90 | 53
-85 | 13
-80 | 72
-75 | 19
-70 | 19
-65 | 23
-60 | 17
-55 | 2
-50 | 1
-45 | 2
-40 | 5
-35 | 4
-30 | 209
-20 | 2
-15 | 1
-10 | 7
-5 | 2
0 | 21
10 | 23
15 | 20
20 | 29
25 | 26
30 | 28
35 | 13
40 | 53
45 | 8
50 | 13
55 | 26

@Quix0r There really is no point in doing so, because directories sync with other directories. So the directories users don't opt in for will show the profile as well on their next sync cronjob. Which in my opinion is exactly how it should be.

If you can explain to me what benefit users have from being selective about which directories they appear in, I could give suggestions how to achieve that.

Okay and thank you for explaining it. Sometimes it takes me a bit longer ... ;-)

Can I run the directory in a sub path ? Or should I better create a sub-domain?

@Quix0r I would go for a sub-domain to be safe. I haven't tested hosting under a path.

I have now installed it and have noticed some issues (e.g. chmod, E_NOTICE). Will report them later on. As you can see in index page (no parameter), there is an E_NOTICE which hints lazy programming.

I was thinking, could a well-crafted query string for different search engines replace (or integrage) a "manually" maintained directory ? Having a public local directory with a well-recognized name in the URL/page-footer might help building that string. Did anyone try that ?

@strk search engines have the habit of changing their algorithm without notice. So even if it were to work, the next day it could break. Having an open source directory that runs for years without touching it I think is worth keeping.

The "public local directory" part would help this directory run more reliably too though. See https://github.com/friendica/dir/issues/9

@Quix0r going to https://dir.haeder.net/health path gives me a 500. Since there's no output of the error can you have a look at it?

Sorry, had been busy with work. Will check this evening.

Main Friendica directory is now dir.friendica.social hosted on my server. Closing as resolved.

(how) should node admins manually update their configs ?

There's a setting in Admin -> Site, and I just saw that the default htconfig.php mentions dir.friendi.ca which isn't pointing at anything at the moment.

Will you send a PR (or directly commit) a fix for the default htconfig.php ?

It's already done: #3361 😊

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nupplaphil picture nupplaphil  Â·  47Comments

hoergen picture hoergen  Â·  88Comments

utzer picture utzer  Â·  67Comments

4nanook picture 4nanook  Â·  51Comments

clacke picture clacke  Â·  76Comments