osu!mania star difficulty improvements

Created on 15 May 2018  路  43Comments  路  Source: ppy/osu

While I currently have no time, I was considering augmenting the star rating system for mania.

Currently the star rating system is based on the density of notes and the number of objects in the song.

The star rating system I am proposing will be based on:
-Density of notes per column (Both immediate and over time)
-The timing of notes both on pressing and releasing (Timing windows)
-Reading or rather the lack of repetition in patterning
-A multiplicative stamina system based on both length of song and reading [See above] {suggested removal}

difficulty proposal

Most helpful comment

@ppy it's more that he stated he was very experienced when he wasn't, and his statements do show a lack of understanding in how to convey the issues in SR (Flashes is a poor example) so i instead explained the flaws in sr from a patternical standpoint

similarly, at no point did i say you have to be top10k or whatever to discuss anything, just that it's not helpful to go around saying you're very experienced when you're not because its not helpful.

if we are doing examples of poor SR in charts then we have various things suck as JinJins everlasting 4k message in comparison to, say, Disconnected Trance, which are the same difficulty but are around 5s apart. This is an edge case but the flaws with SR show clear and SR is not able to judge difficulty past 4 (and some 3, 2, 1* are broken, too) and any time it is correct is seemingly luckshot from a jumptrill being in there (AiAe SHD is 5.85 stars without the jumptrills, Eternal Drain [Eternal] is low 5* - neither of these charts are hard because of the jumptrills, they're some of the easier parts of the chart due to it basically being free acc).

The issues with the current SR are as follows:

  • Density does not always equal difficulty.

While this is only slightly true at low levels of play, It is completely false when, say, comparing dense chordjacks to speedy singlestream charts.

  • Similarly, peak density does not equal overall difficulty

A chart can have a 7* jumptrill (which might not even be that hard) and it will spike up despite the rest of it being ~4* or so.

  • It does not account for patterns

SR does not understand how patterns work in difficulty - It does not assess that a one hand trill gets harder (as bpm is increased) much faster than a two hand trill (1414, 2323 or [12][34]) or for any other pattern.

  • It has no active component to negate vibro

Vibro is not a particularly hard thing to do, not even at a level or competency where you can get 10-20 thousand pp. SR does nothing to negate it and instead presumes this is the hardest thing ever.

  • LNs are overrated and abusable.

Short LNs count as two notes in very quick succession (LN ends count as a note) Yet if the LN is short enough you can just hit it normally, there is no additional difficulty but SR almost doubles.
(You can see this clearly in everlasting message)

  • Between keymodes, nothing is balanced

The easier solution would to be not having them on the same leaderboard (or atleast having seperate leaderboards) because they are not comparable, they are like two different games. 4K will get significantly less SR from a [1234] than 7K will get from a [123567], but both are similar in difficulty due to both motions being wristjacking.

  • Jacks are not accounted for properly

A longjack going 11112222 is rated the same as a longjack going 11114444 - That is silly as the latter has your hand alternating (Which gives it a break in stamina) whereas the former is similar an 8 note long jack in motion.

  • Finally, Stamina is not accounted for

1Hr 57JS Challenge is ~3*. That is almost 2hours of 180bpm jumpstream (with bad patterns lol) with no breaks.
That is rated the same as having that JS for, say, 1 minute.

The latter is much, much, much harder - yet they are rated the same.

This stems into the issue of pausing, a player could segment 1hr54 into various minute bursts which means they wouldn't drain their stamina that much. Which is why I think pausing should not be allowed to the players (It gives a stupid advantage and means that this major problem is unsolvable).

All 43 comments

SVs should have absolutely no impact on SR because they can be learned and therefore require no reading ability

This issue alone is relatively pointless in its current state as you are just pointing out very obvious heuristics without adding anything of value. Will leave it open to encourage further conversation, for now.

I don't think it makes sense to remove the pause feature from osu! not only because it would be odd to have one gamemode without pausing/one gamemode influence all of them, but what are you doing with the scores that include pausing? There's no way we can tell a replay has been paused as osu! doesn't keep track of that. This means that we cannot differentiate the "legit" plays from the "illegitimate". Even if we could it would be very unsatisfying and punishing for people who are competitive, and I am sure that would turn many people away from the game.

How would we measure repetition? A mapper would just make cheezable 1/16th rolls that vary in patterning. If we made a system that could measure that, then surely other patterns would become wrongly judged or seriously underrated.

Also with the proposal you make about density, that's the biggest problem with star rating, and here mash and vibro would end up becoming still overrated. However the "per column" part would help with jackhammers being seriously underrated.

I can say that pause will not be removed ever.

the issue with pausing in o!m is that it's an absolutely huge advantage and it's an unwritten rule at high levels to simply not pause on things

I can't speak for other gamemodes as much, but in o!m you can pause during streams/jumpstreams/handstreams, recover stamina and then go back into them, effectively making the chart much easier.

removing pause doesn't have to be done (SR will just have to NEVER account for stamina) but an indication on the scorescreen of times paused/time spent paused would be very nice from a players perspective

Definitely a counter of time or count pauses is something I could get behind.

legend - I'll open a seperate issue for that now

While I am not sure if this is a good idea, would a no-pause mod be acceptable? It would allow leaderboards of players that were actually unable to pause, it would allow players on a more even playing field when comparing each other's scores and it would also solidify dan runs for ones that don't allow pausing. Obviously this mod should act like Hidden in that it's submittable but doesn't give extra pp/score/etc.

This will only ever happen in multiplayer.

Do you mean that you could enable/disable pausing in multiplayer (when lazer comes out)? Or just that multiplayer strictly disallows pausing?

You cannot pause in multiplayer, correct.

Since we are talking about star rating, I want to propose a few goals I would like the new system to achieve. This is based on personal experience, so if you don't agree that's understandable. The current star rating fails in 2 aspects, and I would like the successor to address those.

  1. Star rating looks like a linear system, even though it's not. New players would expect a 3-star map to be three times as hard as a 1-star map. I did too. This is not the case. For every new star, the maps get double as hard. When I started out it took me a few months to reach 4* maps. Then it took me 1.5 years to reach 5*. This does not only make new players feel like they are not improving but also lies about how long it will take to get there. The exponential increase in difficulty is itself is not necessarily flawed, though I would personally prefer a linear system. It's just the way it's presented that is deceiving.

  2. The star rating has a bad spread (at least in 4k). All easy maps are placed between 1-3* and the hard ones between 4-6. The extreme maps might reach 7 but this is VERY RARE. Since only an extremely small part of players ever get to 6-7 stars most of the players will only ever play 1-4 star maps. The new system should spread this out more over the 10 stars available to make it easier to track your progress and compare your skill with others. let the easier maps be between 1-4, harder ones between 5-8, and extreme maps are 9-10. A good reference would be scaling the system so that the hardest ranked map at the moment would be placed at 9 (to give wiggle room for eventual harder maps in the future). I realize that applying this to all keys with a single system is difficult but still think it's worth mentioning. This would make high skilled players feel better about their skill and make it easier for new-commers to find maps. In 4k atm, I would argue that a human could probably never play a harder map than 8*. Why not push this up to 10?

In response to the op, I don't think reading (seeing the notes in time) should be part of the calculation at all since we really can't track it, and the all users have different scrolling speeds and visual mods. I would rather focus on the actual physical movement required to press different patterns, and look at each hand independently. By checking how many fingers have to move to press the following note or notes we could get an accurate difficulty system.

Any new system should also seriously reconsider long notes, since seeing hit and release as just 2 notes is EXTREMELY inaccurate to the actual difficulty, and holding down notes should give a diff multiplies to notes being pressed normally while holding ln notes.

Sorry for long post!

I don't think extending the star rating would be easy or necessary, one of the main things the recent pp nerf has done was aligning it to the other game modes more. I don't think that going out of line with much higher star maps would be acceptable, especially because 7k already does a good job have high star rated maps. However I do admit that 4k can be underrated compared to 7k equivalents, and there is no better example then stamina maps - a map in 4k that's very dense (notes per second per column) could have the same amount of stars than one that's way less dense because of the extra keys.

I am not sure how the game would be less linear in SR, as this will lead to the exponential problem where harder maps would give ridiculous amounts of pp.

I do agree with Tripp1n when he says that SR should be calculated from objective difficulty (speed, stamina, etc) rather than subjective difficulty (reading, finger control, etc) especially because it's significantly easier to implement objectivity.

You鈥檙e saying changing star rating affects pp. I thought they were independant?

Beatmap difficulty affects star rating.
Beatmap difficulty also affects pp calculation.

SR is a very simple calculation that's just a combination of the individual components of the beatmap's difficulty, so it's generally described as the beatmap difficulty itself.

But OP wants to affect beatmap difficulty, which changes both pp and sr.

Are the difficulty calculations currently in lazer? Or can we look at how osu currently handles it in stable?

osu! and osu!mania difficulty calculations are currently in lazer. osu!taiko and osu!catch will come soon.

Check out https://github.com/ppy/osu-tools/tree/master/PerformanceCalculator which directly uses lazer to compute difficulty/performance, along with:

osu!: https://github.com/ppy/osu/blob/master/osu.Game.Rulesets.Osu/Difficulty/OsuDifficultyCalculator.cs
osu!mania: https://github.com/ppy/osu/blob/master/osu.Game.Rulesets.Mania/Difficulty/ManiaDifficultyCalculator.cs

Having a lot of experience with 4K, I can confirm that the SR for 4K is underrating maps. I found a huge struggle to move from 2.5* to 3* maps, when moving from any difficulty lower than that to a slightly better was a standard procedure. Also, many 3* maps may differ in actual difficulty themselves.

A great example would be Flashes which seems to me quite challenging compared to other maps of the same SR. And that's for two reasons:

  • Few jacks in streams decreasing per column density.
  • Most of the map consists of fast streams, unlike the other maps that have a burst for a portion of the map which mistakenly increases the SR.

What I'm saying is no different than what has been said already, but regarding the implementation of the "subjective" factors, I think it's not that much of an issue, since players are not just robots whose performance has to solely depend on the objective factors.

Specifically, reading has to be calculated since to pass a map the player has to read the notes. Failure to do so will result in poorer performance, yet not accounting for it will cause confusion/frustration. There are different ways to read a map, making other maps easier or harder than what others think. I'm not asking for something crazy like understanding the current reading method of the player using AI, but at least implementing something that interprets more (objectively) complex patterns as harder. Their repetition would not add up, whereas the variety of complex patterns would increase the SR by a significant amount.

Finger control is not to be measured in SR since that's a prerequisite for playing in the set of keys the map is made in. A 7K map would seem like hell for a 4K player like me, whereas the opposite would happen for a 7K player since they're used to the controls.

Finger controll is not a prerequisite. Higher skill levels very much require very fast movement and jack skill. That should definitely be represented in SR.

You otherwise have some valid points about reading.

How would you measure something like reading or repetition?

With repetition you could easily work around by using patterns that resemble the original and switch between them and if there was a system that was more dynamic, then I feel it could misjudge tons of maps.

With reading, how would you judge that? Unlike jacks or speed where we can easily see which map is harder, I don't think we can create a system that accurately judges reading.

Even then how would you reward players anyway? I don't think we can set an arbitrary amount of bonus SR because a map is hard to read.

'Technical' rating (as MSD defines it) is the difficulty in a chart from complex patterning such as runningmen, minijacks and generally complex bursts.
That somewhat falls under reading ability - so it will assess that, but no subjective modifier needs to be applied to account for it.

also @alfasGD, you really shouldn't state you're a very experienced 4k player when you're ~30k rank - since that is not very experienced.

Flashes is a poor example of SR issues, the issue now is that of density being the only factor for SR, and density is not congruent with difficulty.

A 23 trill is rated roughly half of that of a [12][34] trill, despite being similar in difficulty. A 12 trill is rated the same as a 23 trill, despite the former being significantly harder.

as a rule of thumb, star rating underrates speedier, stream charts, and overrated denser jack charts

I'd really recommend anyone interested in assessing the difficulty of a chart to look at the etterna project and maybe ask mina about how it works more intimately, since the calculation is closed source. But it is able to determine the patterns in a chart and assess its difficulty to a strong degree of accuracy.

@DDMythical Unfortunately while I would love to just copy paste MSD into o!m, the fact is there are more key modes than just 4k, so if we are to design a system it must work for all key modes. This may mean a less accurate system overall but since we have opened this Pandora's box once we can patch in new things in a later revision.

@Emik03 My idea to test for repetition in patterning is to partition each measure in the song (variably if possible) and compare each partition with all others in the song. So rather than defining a bank of patterns doubles triples jumpstream etc. it would compare using raw patterns that are in the song. Yes this would allow minor variations as a loophole but i also think that it allows for more creative freedom for mappers.

@DDMythical please take your attitude of having-to-be-top-10000-to-be-experienced-enough-to-comment away.

Star rating is supposed to be something all players can use, not just the top 1%. Everyone's feedback is valued.

@ppy it's more that he stated he was very experienced when he wasn't, and his statements do show a lack of understanding in how to convey the issues in SR (Flashes is a poor example) so i instead explained the flaws in sr from a patternical standpoint

similarly, at no point did i say you have to be top10k or whatever to discuss anything, just that it's not helpful to go around saying you're very experienced when you're not because its not helpful.

if we are doing examples of poor SR in charts then we have various things suck as JinJins everlasting 4k message in comparison to, say, Disconnected Trance, which are the same difficulty but are around 5s apart. This is an edge case but the flaws with SR show clear and SR is not able to judge difficulty past 4 (and some 3, 2, 1* are broken, too) and any time it is correct is seemingly luckshot from a jumptrill being in there (AiAe SHD is 5.85 stars without the jumptrills, Eternal Drain [Eternal] is low 5* - neither of these charts are hard because of the jumptrills, they're some of the easier parts of the chart due to it basically being free acc).

The issues with the current SR are as follows:

  • Density does not always equal difficulty.

While this is only slightly true at low levels of play, It is completely false when, say, comparing dense chordjacks to speedy singlestream charts.

  • Similarly, peak density does not equal overall difficulty

A chart can have a 7* jumptrill (which might not even be that hard) and it will spike up despite the rest of it being ~4* or so.

  • It does not account for patterns

SR does not understand how patterns work in difficulty - It does not assess that a one hand trill gets harder (as bpm is increased) much faster than a two hand trill (1414, 2323 or [12][34]) or for any other pattern.

  • It has no active component to negate vibro

Vibro is not a particularly hard thing to do, not even at a level or competency where you can get 10-20 thousand pp. SR does nothing to negate it and instead presumes this is the hardest thing ever.

  • LNs are overrated and abusable.

Short LNs count as two notes in very quick succession (LN ends count as a note) Yet if the LN is short enough you can just hit it normally, there is no additional difficulty but SR almost doubles.
(You can see this clearly in everlasting message)

  • Between keymodes, nothing is balanced

The easier solution would to be not having them on the same leaderboard (or atleast having seperate leaderboards) because they are not comparable, they are like two different games. 4K will get significantly less SR from a [1234] than 7K will get from a [123567], but both are similar in difficulty due to both motions being wristjacking.

  • Jacks are not accounted for properly

A longjack going 11112222 is rated the same as a longjack going 11114444 - That is silly as the latter has your hand alternating (Which gives it a break in stamina) whereas the former is similar an 8 note long jack in motion.

  • Finally, Stamina is not accounted for

1Hr 57JS Challenge is ~3*. That is almost 2hours of 180bpm jumpstream (with bad patterns lol) with no breaks.
That is rated the same as having that JS for, say, 1 minute.

The latter is much, much, much harder - yet they are rated the same.

This stems into the issue of pausing, a player could segment 1hr54 into various minute bursts which means they wouldn't drain their stamina that much. Which is why I think pausing should not be allowed to the players (It gives a stupid advantage and means that this major problem is unsolvable).

@DDMythical while I do agree with most of your points, peppy has made it clear that he isn't going to remove pausing from the game.

then sr will never work

I don't understand how SR and pausing the game are related. One shows a theoretical difficulty and the other pauses the game.

Stamina drain is difficulty - in mania you can pause during anything with minimal penalty, therefore SR cannot be calculated accurately because pausing can make charts very easy

But how do you intent to calculate a value that is presented when not playing for something a user may do when playing?
So there is something that predicts users to pause and take a break. If I don't pause can I complain because the map is way harder than SR told me?
I still don't understand how and why these two things are related.

Pausing should not be considered for SR calculations. Assume that the player does not pause during gameplay.

That's the more reasonable thing to do; to calculate SR where pausing is not accounted for - @aergwyn your way of approaching the problem was the exact opposite of what i have been stated and you've missed the point entirely.

The issue now stems from charts being abusable for PP - Lets take a 500 note stream for example. Is it equally as difficult as 5 100 note streams, split up with 1 minute breaks? (the answer is no). So people could play a 5* stamina chart as if it was 4*, meaning they would be getting lots of PP they would not deserve.

A somewhat makeshift solution would to be nerfing pp by x amount for every pause the player does (I would personally make the player get no pp if they paused) but you could make it decrease to 1/x+1 pp where x is the times paused.

Pausing requires a somewhat arbitrary workaround since, it's not going to be removed, but due to the huge advantage it gives it is logical to nerf players for doing it. It's better to overpunish players than underpunish in this scenario because underpunishing may make pausing actually viable - which should not happen (it is abuse of the pp system) and counting pauses on the scorescreen (to prevent people from passing off paused scores as legitimate).

TL;DR: Calculate pp as if it was not paused, MaxPP is 1/x+1 where x is times paused. This is a very reasonable change in my opinion.

@DDMythical You say density is not a judgement for difficulty, is per column density a better solution or would you rather there be a bank of patterns to identify everything with? Because if so this may take a lot of work and it would require imput from players from all key modes. Also you mentioned wristjacking, how should I account for that when you've made it clear that single hand Jack's and trills are a lot harder than double hand Jack's and trills?

per column density suffers the exact same issues - if anything it would reward jumptrills even more for being spread out over more columns, increasing the average

it really does nothing useful and is a stupid way of approaching things.

@DDMythical I see your point, unfortunately the key binds and player choice for hitting the middle key renders a per hand approach unfeasible as an universal approach. Do you want to try splitting it up into the multiple key modes as separate difficulty calculations then? If so, I will need a lot of input from players because I mainly play 4K and am not familiar with the other key modes.

@DDMythical maybe this system could be released after the pause timer is installed and a part of the game, because while I do agree that a pause system should exist, I also recognize that it lowers the difficulty of a song drastically.

_apologies for the incredibly long post_

(preface: this is from mostly a 4k pov)

I agree with @BrokenGale that the mania difficulty calculation could use some improvements.

The two points I'd like to address are:

  • Column-dependent Density
  • Reading

Column-dependent Density

Rationale

I think the most feasible improvements concerns difficulty changes that take both hands into account.

eg. as others have mentioned, [1] [2] trills are more physically demanding than [1] [4] trills because the former puts more stress on one hand.

Likewise, consecutive 1/4 notes on [1] are harder to play than 1/4 notes between [1] [2]

Suggested Improvements

First, a proposed hierarchy of difficulty (from easiest to hardest):

  1. Notes that are split between hands (eg. [1] [4] or [2] [3])
  2. Notes that share a hand (eg. [1] [2] and [3] [4])
  3. Notes that are on the same finger (eg. [1] [1])

_(Note: I'm assuming that the 'density/stress of a note' is determined per-note based on the distance from the note that precedes it)_

Using this hierarchy, the density/stress of a note can be weighed by some constant based on the relative column of the preceding note.

eg. [1] followed by [2] would be more dense/stressful than [1] followed by [4] since the former falls under category 2 (notes that share a hand), and the latter falls under category 1 (notes that are split between hands). The density/stress of these notes would be multiplied by the appropriate constant.

Reading

Rationale

A random assortment of notes is harder to play than a consistent pattern. Neither regular density nor the proposed column-dependent density address this issue.

To play devil's advocate, osu! standard doesn't account for reading difficulty either (as far as I know).

Suggested Improvements

None really.

Dynamic pattern recognition relies on either a specialized algorithm or devolves into a problem similar to pattern recognition in character strings and is likely infeasible.

Static pattern recognition relies on community feedback and would likely change over time.

_closing note:_
_I'd love to tinker with the difficulty calculation myself, but I would need a primer on the current system. A brief look tells me that Skills.Individual and Skills.Overall are the most pertinent classes to tweak and propose changes to, but there's obviously much more to difficulty calculation than just that._

Not sure if this was discussed above or not (conversation is really long) but I would suggest instead of taking away pause menu (which I am pretty sure will never happen) that you can not set a ranked score if you pause the map, this is the same dynamic that applies in quaver (another rythm game), if you are playing a ranked map and you pause in the middle of it, you lose the ability to set your score on the leaderboards, I think that's fair for everyone.

@Gonzalo-Bruna you're posting this in the wrong place. you're looking for https://github.com/ppy/osu/issues/10202 most likely.

we may add a mod to limit pause count which can be applied to specific leaderboards (aka timeshift right now), but removing pause from default gameplay is anti-user in a single player game.

@Gonzalo-Bruna you're posting this in the wrong place. you're looking for #10202 most likely.

Sorry I was commenting because I saw this comment above

I don't think it makes sense to remove the pause feature from osu! not only because it would be odd to have one gamemode without pausing/one gamemode influence all of them, but what are you doing with the scores that include pausing? There's no way we can tell a replay has been paused as osu! doesn't keep track of that. This means that we cannot differentiate the "legit" plays from the "illegitimate". Even if we could it would be very unsatisfying and punishing for people who are competitive, and I am sure that would turn many people away from the game.

I thought It was somehow related, sorry.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Joehuu picture Joehuu  路  3Comments

smileyhead picture smileyhead  路  3Comments

DenshaOtk picture DenshaOtk  路  3Comments

LevKatenin picture LevKatenin  路  3Comments

Lerkeer picture Lerkeer  路  3Comments