Flow: Recheck stats file ENAMETOOLONG

Created on 8 Jan 2019  Â·  21Comments  Â·  Source: facebook/flow

On version 0.89, running into an error in CI because the recheck stats file name is too long:

Unhandled exception: Unix.Unix_error(Unix.ENAMETOOLONG, "open", "/tmp/flow/[really long path].recheck_stats")

This occurs when doing a full check. My understanding is limited, maybe it's not necessary to look up the recheck stats in this case?

I suppose another approach would be to hash the path.

bug

Most helpful comment

Sorry about that! I have a fix, I'll try and get it in for v0.97

All 21 comments

cc @gabelevi

How long is that path?

How long is that path?

[really long path] is 383 characters 😬

Any chance making a symlink to your project root from your home directory fixes the issue?

That's a good idea, I hadn't thought of that. I'll look into it, although in our case the typecheck job runs in a Mesos sandbox, and I think [really long path] is enforced as part of the sandboxing.

@gabelevi Are you accepting PRs for disabling recheck for flow check or do you have concerns about that?

Thank you!

@gabelevi @jbrown215 friendly ping here :)
Are you accepting PRs for this?

Gabe is the best person to answer that question, but he is on PTO. I'll ping him for you when he gets back.

@gabelevi friendly ping here if you have a spare cycle :)

I also forgot to ask, so thank you for pinging him!

Huh...do you know what PATH_MAX is on your system?

I hadn't seen ENAMETOOLONG before, but for open https://eklitzke.org/path-max-is-tricky seems to suggest that it's 4096 by default? Or something like that?

Socket names have a much smaller max (like ~100) so we force those to be small, but haven't hit this before with our other temp directory files.

Thank you for the reply!
Indeed max file path is very likely at 4096, however _max file name_ is at 255. This is true on the system we are using in our mesos cluster (CentOS) [1] and seems to be pretty common across other linux systems (and file systems) [2]. Tests using an artificial 306 length path [3] are consistent with this.

The problem is that the recheck stats file will have a file name proportional in length with the full path where flow starts.

[1]

/* /usr/include/linux/limits.h */
#define NAME_MAX         255    /* # chars in a file name */
#define PATH_MAX        4096    /* # chars in a path name including nul */

[2] https://serverfault.com/questions/9546/filename-length-limits-on-linux
[3]

cprodescu@... > mkdir -p 12345678901234567890123456789012345678901234567890/12345678901234567890123456789012345678901234567890/12345678901234567890123456789012345678901234567890/12345678901234567890123456789012345678901234567890/12345678901234567890123456789012345678901234567890/12345678901234567890123456789012345678901234567890
# works as expected since each component is 50 characters
cprodescu@...> touch 12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890
# fails since 1 component is 306 characters: touch: cannot touch ‘12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890_12345678901234567890123456789012345678901234567890’: File name too long

@gabelevi friendly ping when you have a spare cycle :)

@gabelevi friendly ping here :)
Are you accepting PRs for this?

@gabelevi is on PTO again :) when he gets back I'll ping him again for you. Thank you so much for you willingness to help out here!

@gabelevi is back and is open to reviewing a fix here.

Gabe also pointed out that this same error should be occurring for the rest of the log files we use. I assume that if you were to run a regular flow server that you would get the same error.

Thank you for following up again and again :)

I've looked into this a bit, and there are 2 main directions this can be tackled:

  1. change the schema for the file names such that creating these files is not an issue
    Likely this would mean changes to server_files_js.ml to avoid ENAMETOOLONG.
    The easiest solution I see is to change file schema to nested path, so current /tmp/flow/zSUserszScprodescuzSworkspacezSflow.recheck_stats would become /tmp/flow/Users/cprodescu/workspace/flow.recheck_stats (I hope the code-base already has an example with mkdir -p behavior.). An alternative is to use a hash (e.g. sha256) of the path, but that becomes very ugly and hard to debug.
  1. disable creating the recheck file for check / focus-check modes
    Likely this would mean adding a disabled mode to Recheck_stats module and ensuring it does not do any writes in record_merge_time, record_last_estimates when it is disabled. This would be similar to other modules which are disabled in check_once here)

I'm leaning towards 1 with the schema above, but would like to check if you or @gabelevi have any concerns off the bat.

@gabelevi friendly ping here :) Do you have concerns regarding approach 1?

Friendly ping here :) This is a blocker for upgrading in our organization, hence I'm very interested in fixing this.

Sorry about that! I have a fix, I'll try and get it in for v0.97

Thanks @jbrown215 for pinging me again :)

Thank you @gabelevi!!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StoneCypher picture StoneCypher  Â·  253Comments

xtinec picture xtinec  Â·  65Comments

MarcoPolo picture MarcoPolo  Â·  67Comments

TylerEich picture TylerEich  Â·  49Comments

Macil picture Macil  Â·  47Comments