Visidata: Select starting sheet in html/xls/sqlite from command-line

Created on 24 Nov 2018  路  11Comments  路  Source: saulpw/visidata

Hi,
if I use vd -b t.html -o html.csv I have the table below in CSV and not the html table I have inside my file.

tag,id,nrows,ncols,classes
table,,72,4,wikitable sortable

table is the html table inside my t.html file. Is there a way to pass to the command line the sheet name? Something like vd -b t.html -sheet table -o html.csv

Thank you

wish granted wishlist

Most helpful comment

Hi @aborruso, there's not an easy way yet. I've wanted something like this myself at times. Let me see if I can come up with something. Thanks for the suggestion!

All 11 comments

Hi @aborruso, there's not an easy way yet. I've wanted something like this myself at times. Let me see if I can come up with something. Thanks for the suggestion!

Hi @saulpw is there a way to open the table directly, when it is only one?

My final goal is to use VisiData as HTML to CSV converter with something like below in which I use an xpath query to extract only one table

curl "http://example.com/page.html" | myScrapeUtilty -xpathRule '//table[count(tr/td)>7]' | vd -b  -f html -o out.csv

But also with only one table visidata asks me to choose, and it saves as csv sheets sheet

image

Thank you

Hi @aborruso, try adding -p dive.vd with the attached small .vd script.

sheet   col row longname    input   keystrokes  comment
            open-file   -   o   
-       0   dive-row        ^J  

The first command opens the input from stdin (-), and the second command dives into the first row (0).

You can get this .vd yourself with:

  1. the same command you have but without -b
  2. press Enter and do any other manual steps
  3. press Shift+D to go to the commandlog
  4. finally, press Ctrl+S to save to dive.vd, which you can use with your pipeline.

dive.vd.txt

@saulpw you are really brilliant, I'm impressed VisiData is a kind of magic

@saulpw I have added a recipe in my VisiData Italian guide https://github.com/ondata/guidaVisiData/blob/master/testo/README.md#Salvare-una-tabella-HTML-in-CSV-a-partire-da-una-pagina-web

Thank you againg

Fixed for html loader in f55de386d48aa5064f0b58cef3428e136cfc78ce; requires changes in other loaders with a sheet index.

To-do to resolve this issue:

  1. Fix loaders with sheet index to have rowdef sheets.
  2. Write above requirement into book/loaders.md.
  3. Improve startup with large files to remove sync(); file should load sync, cursor should jump after load completes (including ^C), or after sheet/row/col is available, if possible.

The IndexSheet has been developed (see visidata/sheets.py). It contains the attribute rowtype = 'sheets' on default.

Loaders to be ported:

  • [X] html
  • [X] xls
  • [X] xlsx
  • [X] xlsb
  • [X] hdf5
  • [x] sqlite
  • [ ] postgres

Misc:

  • [ ] requirements needs to be added to loaders.md

CLI syntax is +:<sheet>:<row>:<col>.

  • +:subsheet:: to ignore row/col
  • can name toplevel source index if more than one: +toplevel:subsheet::

Hi @saulpw if I run

curl -L "https://en.wikipedia.org/wiki/Olympic_medal" | vd -f html +:table_2:1:1

vd does not open the table_e. What's wrong in my command?

vd 2 is really great!

Hey @aborruso!

Can you please open a bug report, and link to this issue?

There is not a good way for me to remember to check up on this potential bug, otherwise. :sweat_smile:

Was this page helpful?
0 / 5 - 0 ratings