Visidata: -f applies to all files subsequently loaded, not just those given on the command line

Created on 27 May 2020  路  19Comments  路  Source: saulpw/visidata

Small description
I often invoke vd with -f to set the format for the given input files or input stream when I know it won't detect them properly. This works as expected. However, the -f is then sticky for the rest of that session and overrides normal format detection when I open other files from within the session. This includes using o / open-file as well as navigating within a zip file or remote s3 directory.

Expected result
-f applies only to stdin and the files named in the command-line invocation.

Actual result with screenshot
-f applies to all new files for that session.

Steps to reproduce with sample data and a .vd
From within the visidata repo, load some JSONL from stdin:

vd -f jsonl < sample_data/test.jsonl

then within vd, open sample_data/sample.tsv.

Additional context
saul.pw/VisiData v2.-4dev

bug filetype fixed

Most helpful comment

Options that are specific to a source:

  • skip
  • header
  • delimiter
  • encoding
  • encoding_errors
  • row_delimiter
  • tsv_safe_newline
  • tsv_safe_tab
  • filetype
  • csv_dialect
  • csv_delimiter
  • csv_quotechar
  • csv_skipinitialspace
  • csv_escapechar
  • csv_lineterminator

All 19 comments

The workaround I sometimes use is to unset the filetype option on the options sheet.

Good idea, @tsibley. The trick with filetype is that the actual class of sheet, which includes the loader, must be known at sheet construction time. This has been a thorn in VisiData since the beginning; if the filetype is wrong, you can't just change it and Ctrl+Reload. (You have to set it globally and open the source again). I would love to figure out how to have filetype be a sheet-specific option, though this would require a rearchitecting of the loader structure to not rely on subclasses.

I could see making the -f option special, in that it could be specified multiple times, and would apply only to subsequent paths; in this way you could have multiple filetype args for different sources in the same invocation. Then you could -f '' at the end to reset it; or we make -f really
special and make it unset itself after CLI parsing is done, as you suggest.

Note that this would break one common usecase, which is vd file.txt -f csv (I often will up-arrow and append the -f option).

The main difficulty here would be writing our own args parser, as argparse AFAICT doesn't allow positionally effective arguments. This would be some work but would have other advantages as well.

What other options should be like this? I can see --delimiter or various other loader-specific options being specifiable in a similar way, though those would be even better if they could only be
applied to the specific sheets directly (instead of being "temporarily global").

What other options should be like this?

Aside from --delimiter that you already mentioned, I've occasionally wanted to specify --header at the sheet level.

Options that are specific to a source:

  • skip
  • header
  • delimiter
  • encoding
  • encoding_errors
  • row_delimiter
  • tsv_safe_newline
  • tsv_safe_tab
  • filetype
  • csv_dialect
  • csv_delimiter
  • csv_quotechar
  • csv_skipinitialspace
  • csv_escapechar
  • csv_lineterminator

Yeah, I've run into this same global options issue with --header and --skip. It'd be great to fix it for those as well. Although I hit the problem with them less frequently than I do -f, which is pretty often.

I can definitely see that multiple, position-dependent -f options might be useful when loading a mix of poorly-detected files, but it seems like that would have to wait until after solving the core issue of global vs. source-specific options anyway.

In trying to understand this issue more, I took a dive into the code and made some proof-of-concept changes to see how source-specific options might work: https://gist.github.com/tsibley/646bb195562c62d859c0c27d6b610563

Dreaming out loud: what if the options parser allowed --sheetid:optname='value'? sheetid would be the same sheetname or number as in the cmdlog, or blank (to apply to the "current top of stack" which is the previous source). So then it would be foo.txt --:filetype csv, which would be rigid--any other option position wouldn't work, though you could still use --optname='value' as you can now to have it apply globally.

Hmm, to me, that sounds too complex/bespoke. I think I'd often forget how it works. Positionally effective arguments seem simpler, while retaining a lot of the same capabilities, and have a more extensive history of use.

Okay, let's see how 9f678b0 works for your cases. Here are the basics of the new CLI parser:

  • --options apply as sheet-specific option overrides to the sources following them

  • the last setting for a given option is also the cli-given global setting,
    which will apply if a sheet does not have its own sheet-specific
    setting already.

  • --help opens the manpage

This allows both:

vd -f csv foo.txt
vd foo.txt -f csv

Limitations:

  • there is no way to make an option apply only to the immediately following source
  • nor a way to unset an option after setting it

Let me know how this works for you!

Thanks for the work on this. Unfortunately, though, the new behaviour doesn't address the problem I described in the issue description.

Hah, I see that now. So the stated limitations directly apply to this issue after all. And even if we had a way to either one-off or unset an option, it would be cumbersome.

So here is what I propose:

  1. The last value for each option is applied as a sheet-specific option to all sheets opened from the command-line (instead of being set as a global for all sheets).
  2. We add a new pseudo-option --global (alias -g), and all options following -g are set globally also, as they are now.

I think (1) would fix this bug, does that work for you? and then (2) would cover the other cases I can think of.

Yep, I think (1) would fix the problem I have! :-)

Okay @tsibley, hopefully this fixes this issue, and won't require -g for non-sheet options like replay_wait. Otherwise should be as explained above! Let me know what you think.

Thanks! This appears to fix the issue I'd been encountering. :tada:

I did notice that the options sheets seem to be reporting the wrong value. Or I'm maybe misunderstanding something? Here's what I did:

  1. vd -f jsonl < sample_data/test.jsonl
  2. Open sheet options with zO and observe that filetype is jsonl. This is as expected.
  3. Open global options with O and observe that filetype is also jsonl. I expected no override value, e.g. nothing in the global_value column.
  4. Open new sheet from CSV file with o sample_data/StatusPR.csv. It opens correctly! (That's this issue being resolved.)
  5. Open sheet options with zO and observe that filetype is jsonl. I expected no override value here either, e.g. nothing in the sheet_value column.

@tsibley Yeah, the Global Options sheet is really an Override Options sheet! This is known behaviour.

@saulpw and I chatted about it, and thought it would be the prefered behaviour but we should probably think this through more deeply to avoid confusion like this down the line.

@anjakefala Ah, thanks for the explanation!

Does that explain why the sheet-specific options sheet as I observed in (5) also shows the overridden value? I'm not sure I understand that.

That one is confusing, I agree. Especially, because it did not actually override -> the file's extension was used.

I am going to look into that!

Awesome, thanks! :pray:

@tsibley Your intuition about what should happen is correct. The basic problem, it seems, was trying to use the classmethod BaseSheet.options as an instance method. For some reason I thought calling vs.options would pass vs instead of type(vs) to the property, but it doesn't. So now the BaseSheet.options class property is renamed to BaseSheet.class_options, and this API change should only affect loaders and plugins.

Let me know if things make more sense on the options sheets now. Thanks for your persistence and attention to detail on this! The options/settings framework is the biggest mess inside VisiData and I haven't yet been willing to risk cleaning it up, but maybe that time is coming. Hopefully after 2.0 :)

For some reason I thought calling vs.options would pass vs instead of type(vs) to the property, but it doesn't.

I'm pretty sure I would have thought the same thing! Good to know.

@saulpw and @anjakefala, thank you both for resolving all this!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

khughitt picture khughitt  路  14Comments

cbrueffer picture cbrueffer  路  12Comments

frosencrantz picture frosencrantz  路  15Comments

anjakefala picture anjakefala  路  35Comments

paulklemm picture paulklemm  路  11Comments