Rustfmt: Audit public API

Created on 20 Apr 2018  路  42Comments  路  Source: rust-lang/rustfmt

Most helpful comment

With 0.6 I've significantly refactored Rustfmt's API, I'm proposing this for the 1.0 API.

To summarise:

pub use syntax::codemap::FileName;

pub type FmtError;
pub type FmtResult<T>;
pub enum ErrorKind;

pub const WRITE_MODE_LIST;

pub enum Input
pub struct FormatReport

pub fn format_snippet(snippet: &str, config: &Config) -> Option<String>
pub fn format_code_block(code_snippet: &str, config: &Config) -> Option<String>
pub fn format_input<T: Write>(
    input: Input,
    config: &Config,
    out: Option<&mut T>,
) -> Result<(Summary, FileMap, FormatReport), (io::Error, Summary)>
pub fn format_and_emit_report(input: Input, config: &Config) -> FmtResult<Summary>
pub fn emit_pre_matter(config: &Config) -> FmtResult<()>
pub fn emit_post_matter(config: &Config) -> FmtResult<()>

// from config module:
pub struct Config
pub enum WriteMode
pub struct CliOptions
pub struct Summary

pub fn load_config(
    file_path: Option<&Path>,
    options: Option<&CliOptions>,
) -> FmtResult<(Config, Option<PathBuf>)>

pub mod file_lines {
    pub struct Files
    pub struct LineRange
    pub struct FileLines
    pub struct Range
}

All need better documentation. I wonder if some of these we should indicate that they are 'de facto unstable'? (Thinking of format_snippet and format_code_block in particular).

All 42 comments

With 0.6 I've significantly refactored Rustfmt's API, I'm proposing this for the 1.0 API.

To summarise:

pub use syntax::codemap::FileName;

pub type FmtError;
pub type FmtResult<T>;
pub enum ErrorKind;

pub const WRITE_MODE_LIST;

pub enum Input
pub struct FormatReport

pub fn format_snippet(snippet: &str, config: &Config) -> Option<String>
pub fn format_code_block(code_snippet: &str, config: &Config) -> Option<String>
pub fn format_input<T: Write>(
    input: Input,
    config: &Config,
    out: Option<&mut T>,
) -> Result<(Summary, FileMap, FormatReport), (io::Error, Summary)>
pub fn format_and_emit_report(input: Input, config: &Config) -> FmtResult<Summary>
pub fn emit_pre_matter(config: &Config) -> FmtResult<()>
pub fn emit_post_matter(config: &Config) -> FmtResult<()>

// from config module:
pub struct Config
pub enum WriteMode
pub struct CliOptions
pub struct Summary

pub fn load_config(
    file_path: Option<&Path>,
    options: Option<&CliOptions>,
) -> FmtResult<(Config, Option<PathBuf>)>

pub mod file_lines {
    pub struct Files
    pub struct LineRange
    pub struct FileLines
    pub struct Range
}

All need better documentation. I wonder if some of these we should indicate that they are 'de facto unstable'? (Thinking of format_snippet and format_code_block in particular).

Throwing my ideas:

To me FmtError and FmtResult are misleading. It took me a long time to figure out that it represented random errors rather than formatting related error. I am not sure what could be changed so there's no confusion between FmtResultand FormattingError.

Why do you want to expose emit_pre_matter and emit_post_matter?

My concern with the API is that right now there's no real separation of concern between what needs to be formatted and when/how it is done. I believe the api should have one method which produce a report. Then we could have other modules which handle this report to, for example, overwrite the file, produce a diff, write a checkstyle report...

If you think that's worth investigating, I could convert that into a more formal API and then into code

To me FmtError and FmtResult are misleading

We could use just Error or Result and use module-scoping, but then there is a bit of a clash with the std versions. Alternatively, we could use RustfmtError/RustfmtResult

Why do you want to expose emit_pre_matter and emit_post_matter?

The API is used by the binaries and these are needed for the check style output. I couldn't figure out any way of hiding them.

My concern with the API is that right now there's no real separation of concern between what needs to be formatted and when/how it is done. I believe the api should have one method which produce a report. Then we could have other modules which handle this report to, for example, overwrite the file, produce a diff, write a checkstyle report...

Yeah, I think that is a good point. There is meant to be some distinction between format_input and format_and_emit_report, but it is not strong. It would be nice if we could do better.

If you think that's worth investigating, I could convert that into a more formal API and then into code

That would be great, thanks!

I'm proposing this for the 1.0 API.

Would this be something that bindgen could link to without having to mess with LD_LIBRARY_PATH?

Would this be something that bindgen could link to without having to mess with LD_LIBRARY_PATH?

That would be the hope, yes

That's great! I think that this will cover bindgen's use cases, which are essentially format this {path, string} with the default config or the config at this path.

@nrc Thank you so much for your work! And my apologies for the late reply, I have been busy at work.

Some comments:

  1. Do we have to re-export syntax::codemap::FileName? This type does not seems to be used in any public APIs.
  2. Instead of exposing WRITE_MODE_LIST, I would rather have a function that returns a list of write mode. I am thinking of something like fn list_write_mode() -> &[&str].
  3. format_snippet and format_code_block are confusing, IMHO we should rename them to better names, or combine them into a single API (e.g. format_str). And yes, I think they should be marked as unstable, as they are particularly buggish.

Ok, so after giving it a fair amount of thoughts, here is my observation and suggestion (which of course I am keen to implement)

Observations:

  • Separating the formatting process from the other actions (producing a summary, writing diff, output checkstyle, overwirting file, ...) isn't trivial but definitely do-able.
  • FileMap contains the content of all files formatted. That's a big memory constraint that we should try to avoid in the api. (Through lazy evaluation, callback, ...? Not sure what's the best.)
  • Parse errors can be represented as io::Error (with InvalidData) which would be our only kind of error. Things like LineOverflow shouldn't be errors, it should be warnings that the API user is free to ignore.
  • I believe we should really be separating concerns between the different tools (I'll refer to them as rustfmt, rustfmt-lib, cargo-rustfmt and git-rustfmt). Should rustfmt-lib know about write modes? I think it's important to extract the essence of the lib which is about formatting, not about file handling. We can still have utility methods for those things but IMO that should be clearly separated. That means :

    • Make a clear separation in the api between producing the formatted result and the control operation

    • Config objects should have this separation.

    • Move some of the logic in main.rs?

  • I am not a big fan of the write modes. It's mixing too many things like how it's written, where, what, and some side effect like creating a backup file. I think it make sense for a command line tool, it provide some default behaviours, but I don't think it does for a library. I would be interested to hear some use cases for using the API and whether it makes sense to have write modes at all in the API.
  • I believe one of of the main use case for the API would be to format snippets of code. If that's indeed the case, we should think of what it involves to make it stable.

Suggestion in code :

pub struct FormatResult {
    filename:   String,
    result:     Result<SuccessfulFormatting, io::Error>  
}
pub struct SuccessfulFormatting{
    orig_content:      String,
    formatted_content: String,  // Formatted with correct platform line breaks
    warnings:          Vec<FormattingWarning>

}

pub struct FormattingWarning {
    line: usize,
    kind: FormattingWarningKind,
    is_comment: bool,
    is_string: bool,
    line_buffer: String,
}
pub enum FormattingWarningKind {
    LineOverflow(usize, usize),
    TrailingWhitespace,
    BadIssue,
    LicenseCheck,
}

pub struct FormatConfig {
    //Config related only to formatting
    combine_control_expr: bool,
}
pub enum Input {
    RustFile(PathBuf), // Link to a file
    RustFileAsText{ file: PathBuf, content: String}, // A valid file content
    Snippet{code: String} // An incomplete snippet of code
}

pub fn format_input(input: Input, config: &FormatConfig) -> impl Iterator<FormatResult> //Iterator or Stream if we are keen to depend on future


mod util {
    pub fn generateCheckstyleOutput<T>(formatResult: T) -> String where T: Iterator<FormatResult>;
    pub fn generateDiff<T>(formatResult: T) -> String where T: Iterator<FormatResult>;
    pub fn hasModification<T>(formatResult: T) -> bool where T: Iterator<FormatResult>;

    // !!! From here, not sure it should be in the API at all, depends on what is the usage of the API

    pub struct ControlOptions {
        //all options not related to the handling of the formatting result
        report_todo: bool,
        write_mode: WriteMode,
        // .... TBD
    }

    pub fn load_config_file(file_path: Option<&Path>) -> Result<(Option<Path>, FormatConfig, ControlOptions), io::Error>;

    pub fn overwriteFiles<T>(formatResult: T, options: &ControlOptions) -> io::Result<()> where T: Iterator<FormatResult>;
    pub fn replaceFiles<T>(formatResult: T, options: &ControlOptions) -> io::Result<()> where T: Iterator<FormatResult>;


    // Run the formatter and return true if successful, false otherwise. Successful meanings depends on the control options, write mode , etc..
    pub fn run_formatter(input: Input, config: &FormatConfig, options: &ControlOptions) -> bool ;

}

@nrc Looking at rls usage of rustfmt I think my suggestion makes even more sense. rls is not using any of the format_and_emit_report stuff.

I have kind of ignored range formatting for now. I have some concerns but no definite way of solving it. I'll come back to it.

I feel like I am kind of questioning everything and going down the route of a not-so-trivial refactor. I am unsure whether or not I am helping here. I am willing to give it a decent amount of time but I completely understand if you guys, @nrc and @topecongiro, are not keen to start something like that. Just let me know :)

Do we have to re-export syntax::codemap::FileName? This type does not seems to be used in any public APIs.

It's needed in Ranges and used by the RLS there

Instead of exposing WRITE_MODE_LIST, I would rather have a function that returns a list of write mode. I am thinking of something like fn list_write_mode() -> &[&str].

Why do you prefer the function to the const? I don't see much difference either way. We might not need this at all, depending on what we do with the CLI.

combine them into a single API

This sounds like a good idea

Separating the formatting process from the other actions

This sounds like a good thing to do though

FileMap contains the content of all files formatted. That's a big memory constraint that we should try to avoid in the api

Is it? I'd have thought that even for a large project the source text would not be more than a few MB (e.g., the entire source distributable is 24MB)

Parse errors can be represented as io::Error (with InvalidData) which would be our only kind of error

I would prefer to be future-proof and allow for lots of detail in our errors, so wrapping io errors seems like a win with not much downside.

I believe we should really be separating concerns between the different tools

I somewhat agree with this I've wanted in the past to separate out a rustfmt-core crate which just does the core formatting stuff, and then have another library crate which does the file handling and ranges, etc. OTOH, I don't want to move much to the binaries because that significantly affects reusability. This might be a refactoring too far right now, I imagine it would be a big project. We might be better doing this in the future and I think we could be backwards compatible by using a facade crate.

I am not a big fan of the write modes

Agree. I'm thinking of changing the CLI to not write modes, although I'm not sure how to represent them internally.

whether it makes sense to have write modes at all in the API.

This is an interesting question. I guess at the limit, the binary is using the API too, so there has to be something there for the binary to hook into.

I believe one of of the main use case for the API would be to format snippets of code.

Agree, but there is no-one doing this right now, whereas I am very keen to release a 1.0 soon, so I'd rather keep unstable and deal with this post-1.0

Separating the config is an interesting idea and one that appeals to me. The downside is that we only want one config file so we'd either need to translate the single parsed config to the multiple API configs, or make the config macro even more complex.

I generally disapprove of having a util module in a public API, I think we could probably do a bit better naming-wise, but the general idea of separating out the non-formatting stuff seems good.

Is it? I'd have thought that even for a large project the source text would not be more than a few MB (e.g., the entire source distributable is 24MB)

Fair point. I guess my point was that Iterator is more future proof. We could then implement something memory efficient and async in a backward compatible way.

I am actually unclear about why do we want a stable API? Knowing that it's compiling only on nightly and reading at the main issue, nothing indicate that an API is needed. If we need a stable API just for RLS, then should we be stabilising the only function (format_input) that it needs (along with it's input/output)?

If the answer is yes, and taking into account your comments, what do you think about that minimal API :

pub struct Error; //Implement failure::Fail
pub type Result<T> = std::result::Result<T, Error>;

pub struct SuccessfulFormatting{
    formatted_content: String,  // Formatted with correct platform line breaks, contains only the formatted bit if the range is not the whole file
    warnings:          Vec<FormattingWarning>

}

pub struct FormattingWarning {
    line: usize,
    kind: FormattingWarningKind,
    is_comment: bool,
    is_string: bool,
    line_buffer: String,
}
pub enum FormattingWarningKind {
    LineOverflow(usize, usize),
    TrailingWhitespace,
    BadIssue,
    LicenseCheck,
}

pub struct FormatConfig {
    //Config related only to formatting
    combine_control_expr: bool,
}

pub fn format_rust_file<R: RangeArgument<i32>>(content: String, line_range: R, config: &FormatConfig) -> Result<SuccessfulFormatting>

One aspect of the current API I'm really not happy about is how code for handling CLI options is split between the config module and the binary. We probably want to solve this by replacing CliOptions with a trait.

Instead of exposing WRITE_MODE_LIST, I would rather have a function that returns a list of write mode. I am thinking of something like fn list_write_mode() -> &[&str].

This is gone altogether now.

TODO list

  • [x] format_snippet and format_code_block should be marked unstable, and perhaps combined
  • [x] refactor CLI options (https://github.com/rust-lang-nursery/rustfmt/issues/2639#issuecomment-388592371)
  • [x] separate config files?
  • [x] separate formatting from non-formatting API?
  • [x] can we use an iterator for FileMap, or (better) avoid exposing it at all?
  • [x] review Result and error types
  • [x] can the API be more minimal?
  • [x] how to represent write modes

I am actually unclear about why do we want a stable API?

The RLS and the Rustfmt binaries have to use the API. Once we release a 1.0, then we are bound by semver to stick to the API, however, I guess we could have a major point release pretty easily if we don't change the formatting (the other core component of stability). I also think having a stable API will encourage more users of rustfmt (various tools have considered using it).

Looks like format_snippet and format_code_block don't need to be public

can the API be more minimal?

That's what my last suggestion is. RLS doesn't need write mode and file handling. Rustfmt bin needs it and I believe it should handle it itself. Exposing just format_rust_file as in my suggestion allows any tool to extend rustfmt however they want.

Stabilising also means that it has to compile on stable right?

Current public API:

pub use config::options::CliOptions;
pub use config::summary::Summary;
pub use config::{load_config, Color, Config, FileLines, FileName, Verbosity, EmitMode};

pub enum ErrorKind

pub fn format_input<T: Write>(
    input: Input,
    config: &Config,
    out: Option<&mut T>,
) -> Result<Summary, (ErrorKind, Summary)>

pub enum Input

pub fn format_and_emit_report(input: Input, config: &Config) -> Result<Summary, failure::Error>
pub fn emit_pre_matter(config: &Config) -> Result<(), ErrorKind>
pub fn emit_post_matter(config: &Config) -> Result<(), ErrorKind>

This is much better, but I still think there is room for improvement

Stabilising also means that it has to compile on stable right?

No, I don't think we can due to our dependence on libsyntax

separate config files?

I'm thinking to leave any movement here for later - all the control options are unstable, so we don't need to finalise the format before 1.0 and can change later (that will probably mean changing the API, but hopefully they will be back compat or at least minor changes).

I feel I might be misunderstanding something, but this statement:

No, I don't think we can due to our dependence on libsyntax

seems to contradict this statement:

Would this be something that bindgen could link to without having to mess with LD_LIBRARY_PATH?

That would be the hope, yes

since linking internal libraries like libsyntax requires LD_LIBRARY_PATH fiddling.

I think it also means that bindgen will not be able to use this public API, since (a) it must continue to work on stable, and (b) it cannot require users to set LD_LIBRARY_PATH.

Am I misunderstanding?

More tweaking:

pub use checkstyle::{footer as checkstyle_footer, header as checkstyle_header};
pub use config::summary::Summary;
pub use config::{
    load_config, CliOptions, Color, Config, EmitMode, FileLines, FileName, Verbosity,
};
pub use utils::use_colored_tty;

pub enum ErrorKind
pub enum Input

pub fn format_input<T: Write>(
    input: Input,
    config: &Config,
    out: Option<&mut T>,
) -> Result<Summary, (ErrorKind, Summary)>

Some thoughts:

  • exposing use_colored_tty is kind of weird but useful - perhaps I should pull that into it's own crate (maybe that functionality exists already)?
  • summary probably shouldn't be nested inside config, but that is not exposed publicly so probably OK
  • I feel kind of bad about how much config stuff we expose, but it is kind of logical. Maybe worth looking further at load_config.

exposing use_colored_tty is kind of weird but useful - perhaps I should pull that into it's own crate (maybe that functionality exists already)?

replaced with isatty crate

I feel kind of bad about how much config stuff we expose, but it is kind of logical. Maybe worth looking further at load_config.

I think we need to keep load_config

OK, I think I'm done:

pub use checkstyle::{footer as checkstyle_footer, header as checkstyle_header};
pub use config::summary::Summary;
pub use config::{
    load_config, CliOptions, Color, Config, EmitMode, FileLines, FileName, Verbosity,
};

pub enum ErrorKind
pub enum Input

pub fn format_input<T: Write>(
    input: Input,
    config: &Config,
    out: Option<&mut T>,
) -> Result<(Summary, FormatReport), (ErrorKind, Summary)>

@topecongiro @thibaultdelor what do you think?

One thought - I wonder if we should combine the Summary and FormatReport?

One thought - I wonder if we should combine the Summary and FormatReport?

I think we should, and probably reorganise the summary too.

@nrc I like where you are going!

Here's my feedback

  • I still think that formatting should be separated form the post processing (generate checkstyle, writing to the console, how to write thing depending on emit mode, timing, ...).

    • Do we really need a summary? All of it but the parse/format time can be computed values.

    • I am wondering if it make sense to have all errors together when there's a clear distinction between formatting errors (e.g LineOverflow) which might be acceptable to ignore and doesn't interfere with the rest of the formatting and other Errors (e.g io::Error) which might have left your workspace an in inconsistent state.

The reason I think we should really separate concerns is because right now I think the API is internally overly complex and might trick a lot of person. After just a quick look, here's few things I find unexpected in format_input and that I believe is directly linked to the lack of separation of concerns:

  • No matter the mode, if config.disable_all_formatting set to true it won't write to out. If the source is text it will actually write to stdout (not out). If config.disable_all_formatting is set to false but there's no change in formatting, then it will write to out...
  • A parsing error returns a Success object with an empty FormatReport.
  • EmitModes Files and diff ignores the parameter out. EmitMode Stdout doesn't write to stdout but to out...
  • For CheckStyle, you have to write the header and setter yourself

Whether or not those behaviour are expected or not is not my point. I am just thinking that as a user of the rustfmt API i just want to provide an unformatted input and get a formatted output. Doing the rest (generate a checkstyle report, overriding files) is trivial and can be seen as extension.

Hmm, those points make a lot of sense. My thoughts to the summary are that we can merge it into the error report and the values are just caches of the computed values, but that the output is an 'out' param that we pass into format_input. However, we are then getting to the stage where all the inputs/out params/returned values want to be organised as some kind of object.

I think one fundamental problem is the complexity of the problem - we can't abstract a writer, because in EmitMode::Files we want to write code for each file. If we want a really minimal, orthogonal API, I think we need to break the operations into quite small chunks, but that leads to a large API.

So we end up with this tension, between a simple API which feels like a bit of an inelegant mash of functionality, but is in practice very easy for clients to use, or a more elegant, orthogonal API, but which is larger and more complex.

Another tension is between formatting a single program (which feels like a natural unit to expose in the API) and formatting a batch of programs (which is what the CLI allows, and leads to requiring combining of FormatReports between runs and the checkstyle stuff, because the header and footer go at the start and end of a batch).

I guess I need to think more about what the really fundamental operations are.

Okay I guess now it's my time to do some coding. What I am thinking about requires a fair amount of refactoring that can't be done incrementally.
Give until next Monday, I'll start working on it and submit my PR... I am sure after that you will understand what I meant and whether or not that's where you want to go

Well, I couldn't work much on it this week. I'll try to give it more love but I can understand if you would like to move forward and go with this version. :)

I think it is worth pressing on a bit here and trying to get a nice API. On the other hand, given the small number of clients, I think it would be fine to release a 2.0 of Rustfmt which only changed the API (and not formatting) a few months after 1.0.

@nrc I have started worked on it again and ended up with lots of of merge conflict...
I wanted to do a big bang because all the thing I want to change makes much more sense together but
not knowing how much time I can give it I'll try to submit smaller PRs instead. Bear with me! :)
Here's a first one : https://github.com/rust-lang-nursery/rustfmt/pull/2779

In 71d3d04270474ae0afdeeb410fdcc168a54714b7 I factored out a Session object. I think this makes the API much cleaner. It also addresses the ugliness around checkstyle. There is still some work to do (separating formatting from other actions, in particular), but I think this gives a good foundation for that.

I'm also finding the current setup with Summary to be weird. I think it probably wants to be encapsulated by Session - none of the uses of it feel like good uses of abstraction.

In a series of commits up to d3288841ea86b7ebcff57c840b9a4855f9033ece, I've done a bunch of refactoring of the API. It is not complete, but I think we only need to add to it (i.e., backwards compatibly to get where we want).

The key difference for clients is the addition of a Session object which must be instantiated. I envisage in the future we could add some convenience functions to avoid this when formatting a single project.

Internally, I've separated out the file handling etc. (in Session and FormatHandler) from the actual formatting (in format_project, format_file, etc.). Currently the latter are not public. In the future I think we could make these functions more convenient to use and expose them as API.

The last thing I want to consider is merging FormatReport and Summary since they seem to be doing basically the same thing.

@thibaultdelor what do you think?

And now Summary is gone too.

@nrc Awesome work!
I browsed it and looks very much like what I was thinking of. There's some enhancement I can think of, but I think that the current state is more than acceptable for a public API.
Sorry for letting you down on that, I am swamped at work and have dropped all my side projects. Hopefully i'll come back to it!

@thibaultdelor cool, thanks for looking over it and no worries about not getting to it, work happens.

I'm going to close this issue, we can make improvements elsewhere.

Was this page helpful?
0 / 5 - 0 ratings