Bat: Automatic syntax detection/guessing without using filename

Created on 19 Aug 2018  路  9Comments  路  Source: sharkdp/bat

Pygments has this feature where you can cat <file> | pygmentize or pygmentize <file-without-extension> and the syntax will be guessed according to some patterns in the file.

It would be great to avoid relying on file extensions only (which are not always present) or integrating with other commands that can output the content and pipe it directly to bat.

feature-request

Most helpful comment

Also if there was an option to explicitly specify the language/content-type, the one-line-on-detection won't be a big problem.

There is! You can always use -l/--language to explicitly set the syntax:

For example (obviously, TOML is not the right syntax here, but it kinda works)

gst-inspect-1.0 | bat -l toml

All 9 comments

bat actually has this feature, but:

  • It is limited to a "first-line" detection (recognizing certain shebangs, for example). This is due to the way Sublime Text syntax definitions (and syntect) work:
    image

  • This does not work for STDIN or FIFOs because we can only read once from pipes and the "peek" at the file for first-line detection in syntect would take part of the input away. We could potentially circumvent this by reading the file-contents into a buffer, but I'm not sure if that would have any other implications.

FWIW, I think this is a very important feature. While I know that bat is supposed to be a cat replacement, I see a much bigger potential for it being a fancy less replacement.

Personally, I was hoping to makes gst-inspect-1.0 and other similar GStreamer tools to always pipe to less if stdout is not redirected. I was also hoping to build on top of that and pipe to bat instead if it's available. That would be a much more convincing patch/PR if bat could do syntax hilighting for those outputs (which I guess I'll have to add to https://github.com/trishume/syntect) and do so with FIFO. Should the latter also be fixed in syntect so it reads at least a few lines for detection?

@zeenix What is the structure/syntax of gst-inspect-1.0's output?

I'm leaving this ticket open because I think we can fix the "does not work for STDIN or FIFOs" part (I have already planned an implementation).

I don't think there is any change to make the syntax detection more powerful than it currently is. bat relies on syntect and syntect relies on Sublime Text syntaxes, which only provide "first-line detection".

@zeenix What is the structure/syntax of gst-inspect-1.0's output?

Some examples:

$ gst-inspect-1.0
vaapi:  vaapijpegdec: VA-API JPEG decoder
vaapi:  vaapimpeg2dec: VA-API MPEG2 decoder
vaapi:  vaapih264dec: VA-API H264 decoder
vaapi:  vaapivc1dec: VA-API VC1 decoder
vaapi:  vaapivp8dec: VA-API VP8 decoder
vaapi:  vaapivp9dec: VA-API VP9 decoder
vaapi:  vaapih265dec: VA-API H265 decoder
vaapi:  vaapipostproc: VA-API video postprocessing
vaapi:  vaapidecodebin: VA-API Decode Bin
vaapi:  vaapisink: VA-API sink
vaapi:  vaapimpeg2enc: VA-API MPEG-2 encoder
vaapi:  vaapih265enc: VA-API H265 encoder
vaapi:  vaapivp8enc: VA-API VP8 encoder
vaapi:  vaapivp9enc: VA-API VP9 encoder
vaapi:  vaapijpegenc: VA-API JPEG encoder
vaapi:  vaapih264enc: VA-API H264 encoder
video4linux2:  v4l2deviceprovider (GstDeviceProviderFactory)
video4linux2:  v4l2radio: Radio (video4linux2) Tuner
video4linux2:  v4l2sink: Video (video4linux2) Sink
video4linux2:  v4l2src: Video (video4linux2) Source
gio:  giosink: GIO sink
...
$ gst-inspect-1.0 ximagesink
Factory Details:
  Rank                     secondary (128)
  Long-name                Video sink
  Klass                    Sink/Video
  Description              A standard X based videosink
  Author                   Julien Moutte <[email protected]>

Plugin Details:
  Name                     ximagesink
  Description              X11 video output element based on standard Xlib calls
  Filename                 /usr/lib64/gstreamer-1.0/libgstximagesink.so
  Version                  1.14.1
  License                  LGPL
  Source module            gst-plugins-base
  Source release date      2018-05-17
  Binary package           Fedora GStreamer-plugins-base package
  Origin URL               http://download.fedoraproject.org

GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstBaseSink
                         +----GstVideoSink
                               +----GstXImageSink

Implemented Interfaces:
  GstNavigation
  GstVideoOverlay

Pad Templates:
  SINK template: 'sink'
    Availability: Always
    Capabilities:
      video/x-raw
              framerate: [ 0/1, 2147483647/1 ]
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]

Element has no clocking capabilities.
Element has no URI handling capabilities.

Pads:
  SINK: 'sink'
    Pad Template: 'sink'

Element Properties:
  name                : The name of the object
                        flags: readable, writable
                        String. Default: "ximagesink0"
  parent              : The parent of the object
                        flags: readable, writable
                        Object of type "GstObject"
  sync                : Sync on the clock
                        flags: readable, writable
                        Boolean. Default: true

I'm leaving this ticket open because I think we can fix the "does not work for STDIN or FIFOs" part (I have already planned an implementation).

Cool.

I don't think there is any change to make the syntax detection more powerful than it currently is. bat relies on syntect and syntect relies on Sublime Text syntaxes, which only provide "first-line detection".

Oh, that's a pity but I think at least for the second case of gst-launch (which is the most useful) that could work as I don't think many commands will output Factory Details: as their first line.

Also if there was an option to explicitly specify the language/content-type, the one-line-on-detection won't be a big problem.

Also if there was an option to explicitly specify the language/content-type, the one-line-on-detection won't be a big problem.

There is! You can always use -l/--language to explicitly set the syntax:

For example (obviously, TOML is not the right syntax here, but it kinda works)

gst-inspect-1.0 | bat -l toml

@sharkdp Ah yes, cool. TOML already makes it prettier to look at. So I guess we just need to add language support for gst-inspect then? That would go into syntect and I should create a ticket on that?

syntect does not accept any PRs adding new syntaxes. We can add new syntaxes in bat, but I don't think that gst-inspect-1.0 syntax is "mainstream" enough to be added by default. However, note that you can always add new syntaxes on your own (see https://github.com/sharkdp/bat#adding-new-syntaxes--language-definitions)

@sharkdp Right. I don't quite agree with GStreamer not being mainstream enough but as long as there is a way to easily add my own language/syntaxes, it's all good. :) Thanks for your quick and informative replies and have a lovely weekend!

My go-to resource to check popularity was https://packagecontrol.io and there doesn't even seem to be a package for gstreamer. Anyway, you have a nice weekend as well - Thank you for your feedback!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adamtabrams picture adamtabrams  路  3Comments

lilyball picture lilyball  路  3Comments

tomgoren picture tomgoren  路  3Comments

HakubJozak picture HakubJozak  路  3Comments

issmirnov picture issmirnov  路  3Comments