Server: Special characters in filenames

Created on 9 Jan 2018  Â·  29Comments  Â·  Source: nextcloud/server

Steps to reproduce

  1. Put some files (not all with those characters) in any folder
  2. Include this folder by external storage (local)

Expected behaviour

Every file in this folder shoud be scanned and shown in the files-app.

Actual behaviour

These files came through download on the harddisk of my homeserver. The folder containing the downloaded files are configured as “local” external storage in my nextcloud.
Files and folders with german “umlaute” created by nextcloud in the files-app appear in the file listings. Other files and folders (from download) are ignored by the occ-file-scan.

While file-scan in debug mode the following messages appear in nextcloud.log.
There have to be LĂŒgen instead of L\u00fcgen and Hölle instead of H\u00f6lle for example.

{“reqId”:“X7LIb2Ci8jdOOkqp3leZ”,“level”:0,“time”:“2017-12-26T17:25:29+01:00”,“remoteAddr”:"",“user”:"–",“app”:“OC\Files\Cache\Scanner”,“method”:"–",“url”:"–",“message”:"!!! Path ‘Serien/Zoo/S02E06.Sex, L\u00fcgen und Quallen.mp4’ is not accessible or present !!!",“userAgent”:"–",“version”:“12.0.4.3”}

{“reqId”:“X7LIb2Ci8jdOOkqp3leZ”,“level”:0,“time”:“2017-12-26T17:25:29+01:00”,“remoteAddr”:"",“user”:"–",“app”:“OC\Files\Cache\Scanner”,“method”:"–",“url”:"–",“message”:"!!! Path ‘Serien/Zoo/S02E10.H\u00f6lle in Helsinki.mp4’ is not accessible or present !!!",“userAgent”:"–",“version”:“12.0.4.3”}

Server configuration

Operating system: Ubuntu Server 17.10

Web server: Apache 2.4.27

Database: MySQL

PHP version: PHP 7.1.11-0ubuntu0.17.10.1

Nextcloud version: 12.0.4

Updated from an older Nextcloud/ownCloud or fresh install: fresh install

Where did you install Nextcloud from: nextcloud.com

List of activated apps:

App list
Enabled:

  • dav: 1.3.0
  • federatedfilesharing: 1.2.0
  • files: 1.7.2
  • files_external: 1.3.0
  • files_sharing: 1.4.0
  • files_videoplayer: 1.1.0
  • lookup_server_connector: 1.0.0
  • notifications: 2.0.0
  • oauth2: 1.0.5
  • provisioning_api: 1.2.0
  • theming: 1.3.0
  • twofactor_backupcodes: 1.1.1
  • updatenotification: 1.2.0
  • workflowengine: 1.2.0
    Disabled:
  • activity
  • admin_audit
  • comments
  • encryption
  • federation
  • files_pdfviewer
  • files_texteditor
  • files_trashbin
  • files_versions
  • firstrunwizard
  • gallery
  • logreader
  • nextcloud_announcements
  • password_policy
  • serverinfo
  • sharebymail
  • survey_client
  • systemtags
  • user_external
  • user_ldap

Nextcloud configuration:


Config report
{
"system": {
"instanceid": "oc65jgv8zf6o",
"passwordsalt": "REMOVED SENSITIVE VALUE",
"secret": "REMOVED SENSITIVE VALUE",
"trusted_domains": [
"toothless.goip.de",
"toothless.fritz.box"
],
"datadirectory": "\/var\/www\/nextcloud\/data",
"overwrite.cli.url": "https:\/\/toothless.goip.de",
"dbtype": "mysql",
"version": "12.0.4.3",
"dbname": "nextcloud",
"dbhost": "localhost",
"dbport": "",
"dbtableprefix": "oc_",
"mysql.utf8mb4": true,
"dbuser": "REMOVED SENSITIVE VALUE",
"dbpassword": "REMOVED SENSITIVE VALUE",
"installed": true,
"skeletondirectory": "",
"logtimezone": "Europe\/Berlin",
"memcache.local": "\OC\Memcache\APCu",
"memcache.locking": "\OC\Memcache\Redis",
"redis": {
"host": "localhost",
"port": "6379"
},
"htaccess.RewriteBase": "\/",
"mail_smtpmode": "smtp",
"mail_smtpauthtype": "LOGIN",
"mail_smtpauth": 1,
"mail_from_address": "jan.noormann",
"mail_domain": "gmail.com",
"mail_smtphost": "smtp.gmail.com",
"mail_smtpport": "587",
"mail_smtpname": "REMOVED SENSITIVE VALUE",
"mail_smtppassword": "REMOVED SENSITIVE VALUE",
"mail_smtpsecure": "tls"
}
}

Are you using external storage, if yes which one: local

Are you using encryption: no

Are you using an external user-backend, if yes which one: no

Client configuration

Browser: Opera, Chrome, Firefox

Operating system: Windows 10

1. to develop bug filesystem

Most helpful comment

Yeah, the issue is that the normalized version of the Unicode that nextcloud expects for a given path does not match the version on the filesystem, so when it attempts to find it, it doesn't exist. In theory it should be looking for close matches and then normalizing them the same way, since you can't guarantee how any given filesystem does normalization (if it even does any in the first place). That's somewhat problematic though because you have to do directory searches instead of direct name lookups.

All 29 comments

@icewind1991 fs fun :)

Are you perhaps using a non stand filesystems such as fat or ntfs?

Can you try creating a php file ls.php:

<?php
echo "Listing {$argv[1]}\n";
var_dump(scandir($argv[1]));

And run it using php ls.php /path/to/folder and see if you get the correct result

Filesystem ist ext4 on that hdd.
I just figured out, that there are other files with special characters in the same filesystem, which are listed by nextcloud's file-app. Seems to have something to do with exactly the mentioned files.

The result of your PHP looks fine:
Listing /mnt/Test array(6) { [0]=> string(1) "." [1]=> string(2) ".." [2]=> string(35) "S02E06.Sex, Lügen und Quallen.mp4" [3]=> string(30) "S02E09.Das Knochenrätsel.mp4" [4]=> string(30) "S02E10.Hölle in Helsinki.mp4" [5]=> string(31) "S02E12.Die Säbelzahnkatze.mp4" }

I'm having similar issue on ext4 filesystem.
For most of the files everything is okey but there is some amount of files with umlauts in their name that cannot be accessed by the File Scanner.

All affected files have error: "OC\Files\Cache\Scanner","method":"--","url":"--","message": !!! Path 'ROOT\/K\u00c4SKI\/DIR\/T\u00f6\u00f6teeb.pdf' is not accessible or present !!!","userAgent":"--","version":"13.0.2.1"}
It seems these files have non utf-8 filenames, for example iso-8859-*

It seems that the scanner expects all filenames to be in ascii or utf-8.

If i take one of the non working files from filesystem and upload it from web ui it's accessible (it seems something converts the filename enconding in that case).

if someone hits this problem and needs solution faster then the code gets fixed, then one solution is to use rclone / rsync to modify the filename charset.

Facing exactly the same problem. Any updates on this?

OS: Ubuntu Server 18.04
Webserver: Apache 2.4.37
Database: PostgreSQL
PHP version: 7.2.13-1+ubuntu18.04.1+deb.sury.org+1
Nextcloud version: 15.0.0
Filesystem of local storage added to NC: ext4

Just stumpled accross a very similar issue: Filenames containing a Plus-sign (+) cannot be uploaded - neither via Webfrontend nor via (Windows-) Client-Application.

Still present in v15.0.2

I don't know how to reproduce :disappointed:

peek 2019-01-16 15-06

Is it possible that the problem depends on the underlying OSes? I had the problem with the Plus-Sign when uploading a file from a Windows 10 client to a Nextcloud server hosted on Linux Mint

For me it has something to do with filename encodings I guess.
Following scenario:

I have a separated hard drive installed on the server where Nextcloud runs on. This drive is mounted as _external storage_ with type _local_ (ext4). Some people do have access to this drive via ssh/sftp. Folders copied over sftp on this drive containing symbols like Ă€, ö, ĂŒ are not shown on Nextcloud webclient. Renaming these folders manually using ssh terminal makes them visible though.
As there are terabytes of data manually renaming is not an option. I will do some further investigation and let you know any news.

cc @herrwiese

I faced this again and again.
I will try renaming to solve this. For now uploading the files via web and deleting the invisible ones is my workaround.

I put a cronjob in place to rename files containing Umlaute:
/30 * * * * find /etc/data/ -name "[Ă€Ă¶ĂŒĂ„Ă–ĂœĂŸ]*" -exec rename 's/Ă€/ae/g;s/ĂŒ/ue/g;s/ß/ss/g;s/Ä/Ae/g;s/Ü/Ue/g;s/Ö/Oe/g;s/ö/oe/g' {} \;

Solution:

I take no responsibility! create a database backup!!

Open PHPmyAdmin set Charset to ASCII and convert all tables.
set charset back to utf-8 and convert all tables again.
empty all file tables: oc_activity, oc_filecache, oc_files_trash.
DELETE FROM oc_filecache
rescan all files with
php -d memory_limit=1024M /var/www/cloud.nextloud.de/occ files:scan --all
I worked only on the database. Not the filesystem. Worked for me.
Umlaute in oc_accounts and other tables like groups must be changed manually.

/edit
just deleting the file tables and running the occ command doesn't work.
The Umlaute are still raw utf-8 À ö ĂƒÂŒ or \u00c4 \u00d6 \u00dc

I am experiencing a similar issue where some file paths containing special characters (specifically German umlauts) are not showing up. The folders in question are mounted as external storage via SFTP. I am running Nextcloud 16.0.3 as a docker container on Ubuntu Server 18.04.

What confused me was that some file paths containing umlauts were showing up while others were not. After poking around a bit I discovered that the paths that were not showing up contained "A", "O", or "U" followed by the unicode character "COMBINING DIAERESIS" (0x0308) whereas file paths that showed up normally seemed to contain "Ä", "Ö", or "Ü" directly. When renaming the combining diaeresis to the respective umlaut, the file path shows up as expected.

@schwma (and potentially others): I had the same issue (files with "COMBINING DIAERESIS" not showing up) and could resolve it by enabling the "NFD compatibility" option on the share. The problem is that Nextcloud normalizes unicode by default (see https://github.com/nextcloud/server/blob/21119633041d5ccae19975a58b0ae50ef5a8e33a/lib/private/Files/Filesystem.php#L821-L823) and turns names like "Lo\xcc\x88sungen.pdf" into "L\xc3\xb6sungen.pdf" which then are not found on the external share (because they don't exist). Enabling the option checks both encodings for such files. See https://github.com/owncloud/core/issues/21365 and https://github.com/owncloud/core/pull/24349 for an extensive discussion of the issue.

I have this problem and arrived at the conclusion that the issue involved Unicode normalization too; however, I'm running on ZFS and none of the Unicode normalization options on my filesystem seemed to resolve the issue, so I've resorted to...not storing files with non-ASCII filenames in Nextcloud :(

All of my MacosX users from different unrelated organizations fail to see files and folders containing "combining tildes" symbols.

Looks like PHP is able to handle this since PHP 7: https://wiki.php.net/rfc/unicode_escape

As per this page https://www.php.net/normalizer normalizing to NFC (being MacosX file and directory filenames NFD normalized) should fix this.

What worked to us to solve this issue is running frecuently cron tasks using following commands:

  • sudo -u www-data /usr/bin/convmv --notest --nfc -f utf8 -t utf8 -r data/ (better use absolute paths)
  • sudo -u www-data /usr/bin/php occ files:scan --all
  • sudo -u www-data /usr/bin/php occ groupfolders:scan 1 Optional (you may have more than one group folder which is a hassle)

The star here is convmv command and following SO question gave us the final touch:

https://stackoverflow.com/questions/26516700/file-name-look-the-same-but-is-different-after-copying

Looking now to use something like triggers to make de conversion, but we think this is issue shoud be addressed by Nextcloud.

We are testing now using Nextcloud module Workflow making all Created and Copied files with mime type not application/fuu (to make all files and folders pass through) to this script:
/usr/bin/convmv --notest --nfc -f utf8 -t utf8 -r %f

Here we are using spanish characters from MacosX keyboards.
If somebody else can make test that would be awesome.

Hi, amazing that this issue is still open considering the importance.
I just added this two special characters on mac, thinking it would "look nice" :sunglasses: :
small
smalll diam

And then all my files where deleted on all my machine (by witch app / OS ? I don't know. )
Screenshot from 2020-09-05 23-45-26

And then it is now impossible for me to restore them. Maybe because I configured the server to not store the files I delete. I need to check this tomorow... I am so sad I lost evrithing because of a simple ascii bug. .. :-1:
Screenshot from 2020-09-05 23-46-09

Screenshot from 2020-09-05 23-45-40
Screenshot from 2020-09-05 23-45-50

They are not deleted, is just nextcloud cannot see them. Access your file server directly (ssh).

I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd.
https://docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

@masterleo can you confirm this solves your issue?

I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd.
docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

@skjnldsv why needs info label has been added?

I hope masterleo did not hijack this issue with the emoji issue.

You can ask me any information about unicode NFC / NFD and will do my best to provide you with such information in order to fix this issue.

@skjnldsv why needs info label has been added?

I hope masterleo did not hijack this issue with the emoji issue.

Because I read too fast ;)

What is currently missing here? The issue still states "needs triage", meaning it's not confirmed.
Is it an issue with Nextcloud? With the file system?

@skjnldsv why needs info label has been added?
I hope masterleo did not hijack this issue with the emoji issue.

Because I read too fast ;)

What is currently missing here? The issue still states "needs triage", meaning it's not confirmed.
Is it an issue with Nextcloud? With the file system?

Is an issue with the unicode set Netxcloud supports in method OC_Util::normalizeUnicode() as @OpenCoreCH points out on comment on 22 Nov 2019

Yeah, the issue is that the normalized version of the Unicode that nextcloud expects for a given path does not match the version on the filesystem, so when it attempts to find it, it doesn't exist. In theory it should be looking for close matches and then normalizing them the same way, since you can't guarantee how any given filesystem does normalization (if it even does any in the first place). That's somewhat problematic though because you have to do directory searches instead of direct name lookups.

Solution that worked for me :

open “/lib/private/legacy/OC_Util.php” and change line 1367 :

public static function normalizeUnicode($value) {

if (Normalizer::isNormalized($value)) {
....
}

BY :

public static function normalizeUnicode($value) {

return mb_convert_encoding($value,"UTF-8");

if (Normalizer::isNormalized($value)) {
....
}

Solution that worked for me :

open “/lib/private/legacy/OC_Util.php” and change line 1367 :

public static function normalizeUnicode($value) {

if (Normalizer::isNormalized($value)) {
....
}

BY :

public static function normalizeUnicode($value) {

return mb_convert_encoding($value,"UTF-8");

if (Normalizer::isNormalized($value)) {
....
}

@benjelloun69 Would you mind opening a Pull Request with that solution approach so it can be properly tested and if applicable get merged right away?

Was this page helpful?
0 / 5 - 0 ratings