As of now (16.02.2021, 17:11 CET), CWA shows a 7-day average of 7,274 confirmed infections and a 7-day incidence of 58.7/100,000. For the 7-day average, an arrow pointing towards the lower right indicates a downward trend, while for the 7-day incidence, an arrow pointing to the right indicates a stable trend. Yesterday, the difference was even higher: a downward trend vs an upward trend.
My understanding is that both numbers are related by a factor like
(7-day incidence) = (7-day average) * 7 * 100,000 / (about 83,000,000)
and therefore, the trend should always be the same. Or is there more to it?
Open the app and swipe through the widgets.


Same trend for both indicators.
Internal Tracking ID: EXPOSUREAPP-5225
@nilsalex
I think the cause for this is the following:
Die Anzahl der Fälle - und deren Differenz zum Vortag - und die Anzahl der Todesfälle beziehen sich auf Fälle, die dem RKI täglich ßbermittelt werden. Dies beinhaltet Fälle, die am gleichen Tag oder bereits an frßheren Tagen an das Gesundheitsamt gemeldet worden sind. Bei den Fällen in den letzten 7 Tagen und der 7-Tage-Inzidenz liegt das Meldedatum beim Gesundheitsamt zugrunde, also das Datum, an dem das lokale Gesundheitsamt Kenntnis ßber den Fall erlangt und ihn elektronisch erfasst hat.
(Source: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html).
This would explain the difference, or?
(pinging @MikeMcC399 since he has a great understanding of such things)
@Ein-Tim
I'm definitely not an expert on these statistics, but I can Google!
Start first by tapping the âšď¸ icon in the app for the definitions.
Then access the raw data through
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html and a link in that page to
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Daten.html
According to that Excel file in tab "BL_7-Tage-Inzidenz" the 7-Day Incidence on Feb 16, 2021 of confirmed new infections was 58.7 and 7 days before that on Feb 9, 2021 it was 72.8. So that was a downwards trend of 14.1 or -19% based on the Feb 9 data.
Using the tab "BL_7-Tage-Fallzahlen" I couldn't find values which matched the ones in the app, so I used the tab "Fälle-Todesfälle-gesamt" instead.
The sum of Differenz Vortag Fälle for Feb 10 to Feb 16, 2021 is 50919, divided by 7 is 7274.
The sum for Feb 3 to Feb 9, 2021 is 63839, divided by 7 is 9120.
This is a difference of 1846 or -20% compared to the Feb 9 data.
Based on that I don't understand why the app is showing Trend: Steady for the 7-Day Incidence when, according to the figure I quoted, the trend is 19% down and this is more than the 5% threshold to declare it as Trend: Downwards and mark it with a green arrow.
This needs to be looked at.
Thanks to @nilsalex for bringing this up!
Thank you @MikeMcC399 for checking (I can Google too, but I have to admit that you are often better in explaining (such) things than me đ )
I assume this also affects Android, or?
If yes, please move it to the documentation repo.
@Ein-Tim
Yes, this also affects Android, so it should be in the documentation repo.
I think it should be looked at urgently because the 7-Day Incidence value and the trend is the one figure that everybody, including politicians, are looking at to influence the decision about the easing of lockdown.
For ease of reference here are the RKI daily reports for Feb 16, 2021 and for 7 days previously on Feb 9, 2021.
2021-02-09-en.pdf
2021-02-16-en.pdf
These show the figures
Date | 7-Day Incidence per 100,000 population
--- | ---
Feb 9, 2021 | 73
Feb 16, 2021 | 59
which is a clear downwards trend (that I am sure we are all happy to be seeing đ!)
The value today, Feb 17, 2021, for 7-Day Incidence is 57.0 and the trend is down, which looks good.
Date | 7-Day Incidence per 100,000 population
--- | ---
Feb 10, 2021 | 68
Feb 17, 2021 | 57
The data for yesterday should still be investigated though.
@dsarkar Could you take a look at this and transfer it to the correct repo?
Thanks!
The value today, Feb 18, 2021, for 7-Day Incidence is 57.1 with "Trend: Steady".
Date | 7-Day Incidence per 100,000 population *
--- | ---
Feb 11, 2021 | 64.2
Feb 18, 2021 | 57.1
The incidence has decreased by 7.1 or 11% of 64.2, so why does it show "Trend: Steady" not "Trend: Downwards"?
* Values from Fallzahlen_Kum_Tab.xlsx
It looks like the trend indicator is just comparing to the value from the previous day, whereas the help text says "The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that." So the displayed comparison does not correspond to the method described in the help text. (Or I have misunderstood!)
Date | 7-Day Incidence per 100,000 population
-- | --
04.02.2021 | 80,7
05.02.2021 | 79,9
06.02.2021 | 77,3
07.02.2021 | 75,6
08.02.2021 | 76,0
09.02.2021 | 72,8
10.02.2021 | 68,0
11.02.2021 | 64,2
12.02.2021 | 62,2
13.02.2021 | 60,1
14.02.2021 | 57,4
15.02.2021 | 58,9
16.02.2021 | 58,7
17.02.2021 | 57,0
18.02.2021 | 57,1
The full help text from statistics_explanation_trend_text is:
EN
"The arrow direction indicates whether the trend is increasing, decreasing, or remaining steady â that is, demonstrates a deviation of less than 1% compared to the previous day or 5% compared to the previous week. The color indicates this trend as positive (green), negative (red), or neutral (gray). The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that."
DE
"Die Pfeilrichtung zeigt an, ob der Trend nach oben oder nach unten geht oder relativ stabil ist, d.h. eine Abweichung von weniger als 1% im Vortagesvergleich bzw. 5% im Vorwochenvergleich aufweist. Die Farbe bewertet diesen Trend als positiv (grĂźn), negativ (rot) oder neutral (grau). Der Trend vergleicht den Wert vom Vortag mit dem Wert von vor zwei Tagen bzw. fĂźr die 7-Tage-Trends den Mittelwert der letzten 7 Tage mit dem der vorausgegangenen 7 Tage."
@MikeMcC399 regarding your last comment:
@dsarkar
I understand these values are already 7-day averages
I think I can follow you, you are saying one should compare 17.2./57.0 with 10.2./68.0 which is clearly trending down.
Correct, yes, that is what I am saying. That is how I understand the explanation in the help text. Is that the way you understand the help text as well?
@MikeMcC399 Yes, I think I can follow through. For today and today-7 days I also get -11%, for yesterday and yesterday-7 i get -16%
Even (I think that would be wrong) taking averages of the averaged values, I get averaging 11-17 Feb (59.8) and comparing average 4-10 Feb (75.8) a change of -21.1%.
@dsarkar
For today and today-7 days I also get -11%, for yesterday and yesterday-7 i get -16%
Agreed! đ
Even (I think that would be wrong) taking averages of the averaged values, I get averaging 11-17 Feb (59.8) and comparing average 4-10 Feb (75.8) a change of -21.1%.
From my hazy memory of statistics, averages of averages is not a good thing. I think you should discard those numbers and stick with the first line.
Could you pass the issue on to the originators of the statistics?
I assume that the statistics are calculated by RKI and transferred to the CWA infrastructure. I couldn't find any new documentation in https://github.com/corona-warn-app/cwa-documentation covering the statistics calculations and distribution. It looks to me like there is a binary file pulled from /version/v1/stats on the DOWNLOAD_CDN_URL which suggests that the app just has the job of displaying the data, not calculating it. So if there is an issue with what is displayed then something further upstream needs to be looked at.
@MikeMcC399 indeed, I was told that the app only displays statistical data, it does not calculate it. I created an internal ticket 5225, and additionally, I will bring this up today in a meeting.
All,
due to a number of questions regarding our statistics I re-calculated all values for "Neuinfektionen" (new infections), the respective average values, the Incidence values and double-checked the trends - back until January 25.
Based on the results let me emphasize the following points:
Therefore, we decided to start a new task of communication - it's not yet clear if it becomes a blog, an FAQ entry or any other kind of media. We'll try to "translate" the intention of the statistical metrics shown in the CWA and what are the key drivers for the "trend arrow" indicator.
Believe me, this will not be an easy and fast task, as it challenges us to gain trust by "translating" the statistics into consumable portions of knowledge - _how to read_ the tiles. So, I kindly ask you to stay patient.
Furthermore, I want to encourage you to give feedback, once we provide first results in this matter.
One more word to @MikeMcC399 and @nilsalex : I cannot comment the full issue here. But I want to let you know (and hope you can adjust your viewpoint and accept): The 7-day-Incidence is not a ~7-day-trend~. Instead, the 7-day-Incidence is a normalized value accurate to the current day only, but based on the sum of new infections _during the last 7 days_ . Therefore, this value must not compared to the Incidence value of "day-7" but simply to the Incidence value of yesterday (that is, in fact, based on the new infections of those last 7 days).
@GisoSchroederSAP
Thank you for the response and information!
It seems that the help text is difficult to interpret correctly concerning what falls under the category of a "7-day trend". Could you help us out so that we understand this better?
For each of the four values which have a trend arrow:
... could you let us know if the arrow (Upwards, Downwards or Steady) is calculated based on comparing to the corresponding number displayed the previous day or the number displayed 7 days previously?
For "7-Day Incidence" you told us in the previous post that the trend depends on the number displayed from the previous day.
We are going to write that down, I promise.
The naming of "7-Day Incidence " may mislead the reader, it ist to be read as
"_Today's Incidence_ (based on the sum of nationwide infections of the last 7 days normalized to 100.000 of all German citizens)" - but certainly, this is much longer than the initial name, and maybe even not really easier to understand, sorry.
@GisoSchroederSAP Thanks for looking into this!
The naming of "7-Day Incidence " may mislead the reader, it ist to be read as "Today's Incidence (based on the sum of
nationwide infections of the last 7 days normalized to 100.000 of all German citizens)"
I don't think there is any confusion about the definition of the 7-day incidence. And because this metric is defined as above, I really don't get how it can follow a different trend than the 7-day average, which is also -- please correct me if I'm wrong -- based on the sum of nationwide infections of the last 7 days. So I guess my question really is:
Is it not the case that both numbers are the same up to a constant relative factor (of about 7*100,000/83,000,000)? If so, a user cannot expect to see different trends for both numbers, right?
@nilsalex
The number used for the population of Germany by RKI is close to the 83 Million which you assumed. In https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Daten.html => https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx
Tab "Tageswerte berechnet"
Cell A36
it uses the number 83166711 (which is the number displayed on https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Bevoelkerungsstand/Tabellen/zensus-geschlecht-staatsangehoerigkeit-2019.html for the date 31.12.2019).
I would also like to understand the difference in the two trends. I agree that it is not intuitively obvious that they should be different, so I'll be waiting with interest for the details of the calculations. I take on the statement from @GisoSchroederSAP that the calculations are correct, so I expect the reasons for differences will be caused by the calculation methods used.
Quick note to @nilsalex : You are referring a _constant relative factor_ , which I don't understand. Key for the calculation is the weighted occurrence of new infections per region (=federal state) based on it's fraction 100.000/citizens.
You may refer to the factor (*100000/83166711 = 0,001202404 that you can easily multiply by the total number 47.436 (Feb 17) of new infections across all 16 federal states during the last 7 days, which results in the "7-day-Incidence" (for Feb 17) of ~57,04 - which perfectly matches the number shown in the App, right?
However: This is already "normalized" to ~5,2 Million citizens (per federal state), which would result in a "normalized" factor of? Correct: 0,001202. You now may want to build the average number of infections per federal state (47436/16)= 2964,75, multiply with the constant factor and multiply this value by 16 to bring it back to the nationwide incidence, which results in: again 57,01
What I am just saying - yes, you can flatten everything by average calculation, but in that case you also have to flatten the distribution of citizens across the country and the number of infections across the country in each region. Those numbers are related to 100.000 citizens for a good reason, and this approach also leads to a value of 57 (nationwide) that is not the average of the incident values for the 16 federal states (which was 62 on Feb 17) - that's why I call is "weighted".
@GisoSchroederSAP I may refer back to my initial report, where I was stunned by the different trend for the two metrics, which are:
7-day average: nationwide total of confirmed infections over the last 7 days divided by 7
7-day incidence: nationwide total of confirmed infections over the last 7 days times 100,000/83,166,711
obviously, this yields a constant ratio
(7-day incidence) / (7-day average) = 7*100,000/83,166,711
or, in other words, both metrics are the same up to a constant relative factor of 7*100,000/83,166,711. (The exact total of the german population is not relevant for this argument, so I settled for 83,000,000.)
From the numbers you gave me, you seem to agree with this premise. Is this correct? So I would expect to see the same trend for both metrics. Even more so, I fail to understand how the trend could ever be different.
I take on the statement from @GisoSchroederSAP that the calculations are correct, so I expect the reasons for differences will be caused by the calculation methods used.
Oh, that may very well be true and I am also curious about that, but then I would argue that the method should be different. (In the sense that it should be the same for both metrics ;-) )
Sorry, now, please understand:
The 7-day Incidence IS NOT a linear average, it is a single value WEIGTHED by population.
Your calculation
(7-day incidence) / (7-day average) = 7*100,000/83,166,711
is NOT valid, as the average is NOT WEIGTHED at all.
You are absolutely correct in saying that a bottom-up calculation from the incidences in federal states is in fact a weighted average. I don't dispute this fact.
But then again, we can calculate the same numbers using nationwide totals, as you confirmed earlier.
We can give a proof, just so we are on the same page.
Definitions:
s_k: confirmed infections for federal states over the last seven days
S: nationwide confirmed infections over the last seven days
n_k: population of federal states
N: nationwide population
i_k: 7-day incidence for federal states (n.b. dimensionless, we don't need to say "per 100,000")
I: 7-day incidence nationwide
S = ÎŁ s_k
N = ÎŁ n_k
i_k = s_k / n_k
I = (ÎŁ i_k * n_k) / (ÎŁ n_k) = (ÎŁ i_k * n_k) / N
#######################
Theorem: I is the quotient of S and N
Proof: I = (ÎŁ i_k * n_k) / N = (ÎŁ s_k * n_k / n_k) / N = (ÎŁ s_k) / N = S / N
â
You see, it is perfectly permissible to state the problem using the nationwide totals. We could state it otherwise, it doesn't really matter. If it's preferred by you, we can talk about weighted averages. The problem at hand still stands: The numbers are essentially the same (assuming a constant population, but I think that's given?) and therefore it does not make sense to show a different trend.
Guys, I will write this calculation down sometimes, I promise.
The above calculation by @nilsalex fails because of the formula of the 7-days incidence
i_k = s_k / n_k
and because the wrong statement:
(n.b. dimensionless, we don't need to say "per 100,000")
In fact, in order to get the values "normalized" to the same "portion" of the population, you would try:
p_k ... population of the state k
'N = ÎŁ p_k
n_k = p_k / 100.000 ... normalized portion as stated "Infections PER 100.000 PERSONS"
Following your approch i_k = s_k / n_k this leads to the official formula `
i_k = 100.000 * s_k / p_k or
i_k = s_k * 100.000/p_k for each state, and you easily can expand this to
I = 100.000 (ÎŁ s_k ) / (ÎŁ p_k) = 100.000 * S/N = S * 100.000/N
Again, please note the 100.000/p_k "normalization". I think this is the missing link.
As far as I see, your whole calculation is a simple linear arithmetic average calculation. I suggest, you proof yourself with calculation of the factor 100.000/p_k for each state to see the different weight for the product with s_k.
In the end:
Yes, you can easily create the average population p_average for each state by adding all population p_k into N and devide by 16.
Yes, you can easily create the average number of infections for any state s_average by adding all infections into S and devide by 16.
If you now do the same with the factor 100.000/p_k , you may create the "average factor 100.000/p_k" just by adding the fractions and devide by 16 - say: f_average
Now compare f_average (0.04113) with the expected value of 100.000/N (0.0012024)
= = =
Please excuse, if I will not immediately comment each alterative calculation as it becomes time consuming to validate other approaches.
We will try to create "consumable" communication about the CWA's statistics. And I am in close contact to the RKI experts and to the SAP Analytics Department for further validation. This GitHub issue now already has the full explanation of the math and why you cannot link the trend of average new infections with the absolute incidence based on absolute infections per regions weigthed/normalized by the respective population.
Thx.
If you now do the same with the factor
100.000/p_k, you may create the "average factor 100.000/p_k" just by adding the fractions and devide by 16 - say:f_average
This is the misunderstanding. Why would I do that? I never referred to the unweighted average of state-wide numbers. In fact, I never touched numbers for individual states until you brought them up :-)
The average I am concerned with is (nationwide confirmed infections for the last 7 days) / 7, because that is the metric shown in CWA.
To cite myself:
7-day average: nationwide total of confirmed infections over the last 7 days divided by 7
7-day incidence: nationwide total of confirmed infections over the last 7 days times 100,000/83,166,711obviously, this yields a constant ratio
(7-day incidence) / (7-day average) = 7*100,000/83,166,711
or, in other words, both metrics are the same up to a constant relative factor of7*100,000/83,166,711. (The exact total of the german population is not relevant for this argument, so I settled for83,000,000.)From the numbers you gave me, you seem to agree with this premise. Is this correct? So I would expect to see the same trend for both metrics. Even more so, I fail to understand how the trend could ever be different.
I would kindly ask you not to dismiss this report prematurely. The problem has still not been addressed.
(Also, the incidence really is dimensionless. We don't need to introduce an artificial reference population number. I could of course do that, but all arguments are unaffected.)
Sorry, if you don't accept the incidence is a "normalized" number with the local factor depending on population, we probably will never find together. We agree to disagree.
May I ask, why the RKI would provide different local incidence numbers and how to consolidate those local incidences into a single nationwide number? Do you expect with your calculation the values of Bremen (680.000 citizens) have the same weight into the nationwide incidence calculation like Bavaria with 13.12 Mio citizens (factor ~20)?
If so, well, then we talk about different models, and your incidence is just a simple average calculation. Yes, in that case it always should follow the average trend of the new infections - but sorry, you will no get the same (incidence) numbers that are
The RKI model is different from yours, and therefore, the average model does not count for the incidence, and therefore, the trend of new infections is not related to the development of the incidence on a daily level.
To make that crystal clear:
According to the model/approach, the data are correct, and the description clearly refers to the "normalization factor" per 100.000 citizens.
@nilsalex , you may not agree to the model - but you cannot call the numbers or the trend indicator wrong - those numbers are valid.
This is a gross misrepresentation of my statements. I never said anything of the above. Any careful reader following along will understand this.
I still don't understand how you got the idea that I want to average incidences of federal states without any weight? This would be wrong and I do not propose this. It does not follow at all from my presentation of the math. You calculate the weighted average in a way I 100% agree with---and because of this, it is just another representation of the nationwide 7-day average. As I have shown using basic math in the hope that some nomenclature would clear things up.
Now, there may be a reason why trends are being calculated differently, but this is not at all obvious and probably a bad choice if it results in this discrepancy. I am very curious about this, but until this is resolved, the issue stands and will not go away by sparring over unrelated issues that aren't even disputed.
Can we please agree to lower the temperature? Again, I did not say any of the things you accused me of.
Maybe the problem is indeed the "bottom-up" approach, which can yield slightly different numbers than the straightforward---but mathematically equivalent---approach of just taking the nationwide totals. Two factors may play a role:
Fallzahlen_Kum_Tab.xslx)Curiously, the first tab defines the nationwide incidence as =B22/A36*100000. As I have said and shown repeatedly, this approach is just as valid as the bottom-up approach, but less error-prone. Adopting this, trends shouldn't show this weird glitch.
Then please, show us where in your calculation you bring in the "infections per 100.000" to your calculation.
Maybe, I missed this part.
All I am saying: The trend of the rolling (linear) average absolut number of infections (without any relation to regions) across the nation is not directly related to the non-linear but weighted/normalized value of the incidence, which is a number related to 100.000 citizens as documented (and there are good reason to normalize to 100.000 for the authorities). As far as I understand, there is no wired glitch, sorry.
Okay, let's go with numbers and compare (data from Feb 17):
212 new infections during the last 7 days for Tirschenreuth - Incidence: 294 (given by RKI, evaluated with the above formula)
212 new infection during the last 7 days for Osterzgebirge - Incidence : 86
What would be the "combined incidence" (as we cannot talk about _nationwide_ )?
Average=190? Or they weighted incidence per 100.000 = 133?
Background: Tirschenreuth has 72.406 citizens, Osterzgebirge has 245.586.
BTW: All my calculations and samples above are based on exactly the same file Fallzahlen_Kum_Tab.xslx you are referring to. With those, I can exactly reproduce the numbers shown in the App.
And finally, I disagree: Your model is not mathematically equivalent, as it does not include the localization/weighting factor "100.000/p_k" I tried to explain this already by adjusting your model with this factor, that perfectly leads to the calculation used by the RKI. And this factor does not go linear for the nationwide incidence trend (as it is related to local population), while the linear average trend of absolute number new infections without any relation to the popoluation.
Can you please at least agree:
Average number of new infections during last 7 days - not related to population
Incidence number based on total number of new infections during last 7 days - is related to population
Have a good night.
Everything you say about the bottom-up calculation is correct. There is no disagreement on the matter, only on a mathematical presentation by me which is 100% correct and also not in disagreement to your "correction" (can't be a correction if there wasn't anything wrong to begin with ;-) ) In fact, you confirmed my theorem (I feel almost silly for saying it like this, but it is best to clear up mathematical problems with mathematical language) with exact numbers in https://github.com/corona-warn-app/cwa-documentation/issues/528#issuecomment-782024045 .
To re-iterate:
cases / populationsum(population_k * cases_k / population_k) / sum(population_k) This is where the local population cancels. As is best seen using concise mathematical notation from above. So, of course it is related to population---but only global population. It cancels out. I think that this might be the cause of the glitch, because numerical values tend to not cancel out exactly. But there could be other causes as well.If you wish to use other units, we can multiply everything with 100,000---not material to the argument. I am sorry for having brought that up, I should have just stuck to the notion which uses this arbitrary unit. But again, not material.
I mean, you name-dropped the RKI above, which in the first tab of the Excel file uses exactly what I'm saying all along is an equivalent formula: =B22/A36*100000.
I think (and will do the evaluation later), the issue comes from the two points:
i_k into a sum that later gets divided by the N - which is, correct me if I'm wrong, doing a _simple average by the population_
The approach
I = (ÎŁ i_k * n_k) / N
just not works, because the value i_k cannot put in the sum and be _averaged_ later on. Therefore, the following "cancel out operation" does not work:
(ÎŁ s_k * n_k / n_k) / N = (ÎŁ s_k) / N
I played around with a view simple numbers to visualize:
i_k (cell D9)i_k / N (cell F9)is equal to the one that is correct one (shown in cells G8 and G9 - both calculated without intermediate usage of any i_k value).
Translation of my understanding:
The metric _incidence_ is an absolute number (not a _trend_!), based on the absolute number of new infections of the last 7 days and is normalized/weighted to the population of the considered region. The development/trend of this metric cannot be directly derived from former values of the _incidence_. As the metric get's normalized to the _regional_ population, the trend of the incidence does not necessarily follow the trend of the linear average of the _nationwide_ number of new infections during the last 7 days (as this has no relation to the population at all).
Additionally: Even if the current number of new infection raises, the rolling average of the last 7 days can decrease. (This does not relate to our topic here directly, but I wanted to make this clear. It just means: It sometimes sounds silly, but it is still true.)
I am not sure, if I can convince you with the sample calculation above, but I want to repeat again:
The numbers are correct, the trends are correct, and there is no correlation between
I am very curious as to what your definition of national incidence from regional incidences is. That is to say, the function
I(i_k, n_k)
where
i_k: regional incidences
n_k: regional population
Because you seem to disagree with the basic mathematical notion of the weighted average
I(i_k, n_k) = (ÎŁ i_k * n_k) / (ÎŁ n_k)
which is of course the only sensible definition. How else would you aggregate intensive properties?
So, please, what is the "correct" formula? What did you type into Excel?
Because: I agree. The numbers are correct. Up to rounding errors or other glitches like inconsistencies with state-wide population totals. This is easily cured by just using the totals to begin with. It is that easy.
The issue is rather miniscule because it should only occur for corner cases. However, this case did come up last week repeatedly as R_t approached 1. CWA as source for information is a great idea, so we should do everything we can to present the information consistently.
So, please, what is the "correct" formula? What did you type into Excel?
Already done multiple times: Take either
I = S/N or I = s_avg/n_avg
but do not inserti_kas it does not go linear withs_k(you did check the Tischenreuth/Osterzgebirge example here, didn't you?)
The information provided by the CWA is correct and consistent (and we don't at all talk about rounding errors here, please).
I just kindly ask you not to compare the trend of one number (without any relation to population) with another number (that is related to the population by definition).
Thank you.
@nilsalex / @GisoSchroederSAP
Have you considered about the different dates used for the two different values? I think this means that the data sets used may be slightly different depending in one case when the data was received by RKI and in the other case when the data was received by the Gesundheitsamt.
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html says
"Die Anzahl der Fälle - und deren Differenz zum Vortag - und die Anzahl der Todesfälle beziehen sich auf Fälle, die dem RKI täglich ßbermittelt werden. Dies beinhaltet Fälle, die am gleichen Tag oder bereits an frßheren Tagen an das Gesundheitsamt gemeldet worden sind. Bei den Fällen in den letzten 7 Tagen und der 7-Tage-Inzidenz liegt das Meldedatum beim Gesundheitsamt zugrunde, also das Datum, an dem das lokale Gesundheitsamt Kenntnis ßber den Fall erlangt und ihn elektronisch erfasst hat."
I = S/N
Is exactly the point.
(you did check the Tischenreuth/Osterzgebirge example here, didn't you?)
Let me walk you through.
n_1 = 72,406
n_2 = 245,586
s_1 = 212
s_2 = 212
i_1 = 294
i_2 = 86
I1 = (ÎŁ i_k * n_k) / (ÎŁ n_k) = (294 * 72,406 + 86 * 245,586) / (72,406 + 245,586)
= 133.36
What are you even trying to argue?
but do not insert i_k as it does not go linear with s_k
Is a nonsensical statement.
The numbers may be correct in some sense, but they are inconsistent. There are explanations, and @MikeMcC399 proposed another one. But please stop trying to gaslight me with phony mathematical arguments.
We should strive for consistency and use the consistent solution. Which you agree with:
I = S/N
Please forgive me, but I am not willing to accept your accusations anymore. Stop it, please.
Final comment here:
I did a sample of three days for a few selected federal states and the nationwide summary. This sample should support my statements:
i_k and s_k; neither for a given state nor for a given period.i_k of different states (which is, I hope, obvious)i_k values with the nationwide value for I - weighting normalization does not allow this directly; it's possible just by sum up all the calculations which leads to I = S / N by eliminating `i_k' completely.i_k and s_k are different (which is not by rounding errors) and they are not "bound" to each other, they developm independendly. So, you don't want to make a statement about the trend for ĂŹ_k` out of the of any 7-day-average number that is not related to the population at all.
In order to keep my promise, I will stop here explaining again and again, what can be reviewed and validated by everyone.
All formulas are given already, but my goal is to make these numbers "consumable" for the users. It seems not fruitful to discuss the buttom-up/top-down or any other approach on a statistics level anymore.
If I cannot convince you, @nilsalex , then we are on a dead-end here, sorry, as I seem to be unable to dispel your concerns.
You may address your statement of _mathematical inconsistency of the data_ directly to the RKI and to the T-Systems data analysts. I'm happy to help you with finding the right contacts, if you wish
Hopefully, I gave insight and was able to earn trust by the other users following this issue.
Thank you.
The trends of i_k and s_k are different (which is not by rounding errors) and they are not "bound" to each other, they developm independendly.
No, this is impossible. Your tabulation must contain some errors. You first confirm the constant ratio, but then calculate a different trend. Going bottom-up, this accumulates error. The solution: top-down.
I mean, bottom-up works. But it is a detour where you can mistakes. We could fix them or go the easy way.
I'/I = (sum(n_k i'_k) / sum(n_k)) * (sum(n_k) / sum(n_k i_k)) = sum(s'_k) / sum(s_k) = S'/S
If you cannot accept this, I am not to blame.
I haven't dug in to the details quite so deeply, but I'm convinced that the dates are the cause of the issue.
Today, Saturday, Feb 20, 2021 as shown by the app:
The 7-Day Average is the sum of confirmed new infections today and the previous six days, which is 50 436, divided by 7 days
= 7 205.
The 7-Day Incidence is the sum of infections based on the date the infection was reported to the Gesundheitsamt. The number labelled "Fälle in den letzten 7 Tagen" is reported to be 48 042. (Note this is a different number to 50 436 above.) This number normalized against the nominal population of the country (100 000 / 83 166 711) gives a 7-Day Incidence of 57.8
If the values of "7-Day Average" and "7-Day Incidence" are based on a different data sets due to the underlying calendar dates, then the trends of the two values may also differ.
This is true, Mike, different data sets are another reason (and should be communicated clearly) why numbers differ. (However, I just worked on the one file _Fallzahlen_Kum_Tab.xlsx_ to validate the incidence numbers. + the population numbers coming from https://de.statista.com/).
Though, I just emphasize: The 7-Day-Average of new infections is an absolute number (counted nationwide) without any relationship to the distribution of the regional population. It's just a rolling average number across the nation.
The Incidence is "bound" to the weighted number of regional new infections (based on population), it is not a rolling average number across the nation.
Those two numbers are not at all an "equivalent", imho.
Hi Giso @GisoSchroederSAP
I ran a correlation test using the Excel CORREL() function comparing sets of 14 day's data for 7-Day Average and 7-Day Incidence going back to Jan 1, 2021 and the correlation varies between 95% and 99% (it is never 100%), so it can be very tempting to assume that the data sets are equivalent, because they are so close. As we've seen though, they are not the same!
Regarding your point about the 7-Day Incidence being weighted:
I'm not seeing this in the Excel Fallzahlen_Kum_Tab. If you take the value "Gesamt" in Line 20 of "BL_7-Tage-Fallzahlen" and divide it by the population factor 831.66711 you get exactly the "Gesamt" value in Line 20 of "BL_7-Tage-Inzidenz". For example, from yesterday, the value in cell KE20 of "BL_7-Tage-Fallzahlen" is 47266, divided by 831.66711 makes 56.8, which is exactly the value in cell KF20 of "BL_7-Tage-Inzidenz". That is true for every day except Jan 27, 2021 going all the way back to May 6, 2020. I assume that one day is just a glitch.
Whether it's a weighted result or not doesn't really matter though if the underlying datasets are different.
My conclusion anyway is that that although the 7-Day Average and the 7-Day Incidence are closely related, their trends may not be the same on any given day, so I agree with you that the display is correct based on the data source Fallzahlen_Kum_Tab as comparison.
It may be worth reviewing the help text though because it can easily be misunderstood that the trend of the 7-Day Incidence is based on a comparison with 7 days previously and not with the value two days ago.
@MikeMcC399 Yes, I agree. Such an effect can be an explanation for the discrepancy. However, it should not be the reason. Because, the expectation is clear:
I = S/N
I'/I = S'/S
This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.
We cannot dispute proven mathematical facts.
Now, if there are different data sources for both values, we should settle on one of them. Absent a good reason, but which reason would that be?
Edit: Sorry, I did not see your latest comment. So it is the explanation. Thanks for digging in to this! So I would suggest to consolidate the metrics. The current state breaks expectation by any reasonable user.
Thanks for making this double-check, @MikeMcC399 .
And yes, the "wording" of the help text was the very first I stated internally to the product owner. This is already under review.
I cannot state this enough:
The Incidence is "bound" to the weighted number of regional new infections (based on population), it is not a rolling average number across the nation.
Is just false. Assuming both metrics refer to the same set, of course---that is, both or none are correct w.r.t. symptom onset.
(To be perfectly clear: Yes, it is the weighted average of local incidences. But incidentally (pun intended), this translates into the nationwide incidence which is the ratio of nationwide totals. By multiplication with national population, you have the nationwide infections over the last 7 days.)
The hostility towards me because you disagree with this basic fact has no place here. I am truly disappointed that people are treated this way in this community.
Now, you say "won't fix" because you have a good reason for using different numbers (one corrected, one not corrected, whatever). That is kind of acceptable, although not optimal. But your entire argument and personal attacks did not revolve around this.
@nilsalex ,
This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.
Then just explain the difference of all these number I'/I and S'/S for any given day - those _are_ calculated directly from the only _one_ source (in fact, the source numbers are all in the only one table above, not from different sources - and yes, these numbers are quite close together.

If you excuse me, I'm going to stop the discussion here.
We have a different view on this, I can live with that and will return to my task.
@GisoSchroederSAP
When the facts have been checked with the product owner, we should also consider updating the FAQ https://www.coronawarn.app/en/faq/#further_details including the point about how the data movements of 7-Day Average and 7-Day Incidence are only loosely coupled with an explanation of why this is so.
Probably this has not been obvious before because the RKI daily situation reports do not show a trend for these two indicators. The press tends to use the 7-Day Incidence alone. This may be the first time that the two values have been displayed together closely and with trends. The display is likely to cause confusion to other people even though it is technically correct.
Thanks, @MikeMcC399 , I can already state that also the FAQ is under review. We definitely will enhance this communication - over time.
I'm curiously reading this, and I really don't understand anything about these numbers, etc, so I won't make any statement here.
But I want to ask:
What should we do now, IIUC @nilsalex does not consider this as solved, but @GisoSchroederSAP does?
Maybe the best way is what has been proposed above by @GisoSchroederSAP:
You may address your statement of mathematical inconsistency of the data directly to the RKI and to the T-Systems data analysts. I'm happy to help you with finding the right contacts, if you wish
Would that be a good solution for all parties involved here?
I never accused you of hostility or insults. I never used those idiom mentioned above.
I only explained what I think is right and what I think is wrong with your argumentation. Please, excuse if this threatened you - this was definitely not my intention.
Again: I offer support, getting you contacts at the source of the data and calculations. You may discuss and resolve this there.
Good evening.
@nilsalex ,
This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.
Then just explain the difference of all these number
I'/IandS'/Sfor any given day - those _are_ calculated directly from the only _one_ source (in fact, the source numbers are all in the only one table above, not from different sources - and yes, these numbers are quite close together.
If you excuse me, I'm going to stop the discussion here.
We have a different view on this, I can live with that and will return to my task.
Well, in fact:

@Ein-Tim
If all calculations are correct and the discrepancy is just due to different underlying numbers, there are two options:
1) Decide that this is for a good reason (I'd be curious as to what this reason would be) and communicate this clearly within the app.
2) Fix this. Use the source that is better by some metric.
If there are errors in calculations (I mean, well, the excel screenshot above clearly contains rounding errors, as pointed out in my previous comment, but I trust that this is unrelated to the actual production calculation), fix them.
So, discussing 1) or 2) may warrant getting RKI or similar involved for your discussions. For me, I don't see the need to discuss anything, as 1) or 2) really is your decision.
That the expectation any reasonable user has, which is
I = S/N
I'/I = S'/S
for comparable datasets is right is a fact, for which I don't see the need for further clarification.
Again, you may break with this expectation for a good reason (that is, consider this as solved). But I would be very curious about this reason.
Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way.
I'm just a user/community member and want that everybody here is happy at the end.
The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.
So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.
Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way.
I'm just a user/community member and want that everybody here is happy at the end.
Oops, sorry :-)
The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.
So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.
Well, I don't need to do that. It may be necessary for the decision the developers have to make.
Oh, one more thing: Does anyone have the population data for federal states used by the RKI and by the App? I would very much like to know them. Is it verified that those numbers match? This may in fact be a proposal I would bring towards the RKI: Include the population data in the daily numbers or at least document the data at a prominent place.
Also, is the code where the calculations are performed publicly available? I am not able to find it.
@nilsalex
Oops, sorry :-)
No need to apologize, I should have made this clearer đ
Well, I don't need to do that. It may be necessary for the decision the developers have to make.
Okay, since @GisoSchroederSAP is one of the Developers (at least he is inside of the Development Team of CWA) the decision seem to be already made...
Sorry, not a developer anymore since decades. I am just working _for_ the Community and _with_ the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.
As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here. As we disagree here, @nilsalex , again I invite you one more time to convince the experts on the source of the data.
So far, I don't see a calculation issue/bug here. However, multiple times I agreed:
So, if you want to question the trend indicators, feel free to ping me and I try to connect you to the experts.
Cheers, Giso
@GisoSchroederSAP
Sorry, not a developer anymore since decades. I am just working for the Community and with the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.
Thank you so much for this information, I did not know this đ
Everybody, have a good night.
As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here.
Oh, that is an important clarification. Of course, the mobile app does not perform any calculations.
What I understand is:
The distribution service seems to parse a JSON. The properties relevant for this discussion are
@JsonProperty("infections_effective_7days_avg")
private Double infectionsReported7daysAvg;
@JsonProperty("infections_effective_7days_avg_growthrate")
private Double infectionsReported7daysGrowthrate;
@JsonProperty("infections_effective_7days_avg_trend_5percent")
private Integer infectionsReported7daysTrend5percent;
@JsonProperty("seven_day_incidence_1st_reported_daily")
private Double sevenDayIncidence;
@JsonProperty("seven_day_incidence_1st_reported_growthrate")
private Double sevenDayIncidenceGrowthrate;
@JsonProperty("seven_day_incidence_1st_reported_trend_1percent")
private Integer sevenDayIncidenceTrend1percent;
Now, I was under the assumption that the backend performs some calculations to provide these values---because @GisoSchroederSAP talked in great length about the bottom-up calculation, etc.
My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?
I would be grateful to anyone who can answer this.
I already mentioned in an early statement here with a similar summary like the last one above, that I could reproduce all the numbers and trends by the public-available data sources that we discussed here earlier.
But to detach the discussion from my personal view, I just transferred your request to the product owner and to one of the T-Systems data analysts, @nilsalex. Let's see, what we get out of there. Maybe, they forward this to the RKI directly. As soon as I get a response, I'll share it here.
All, enjoy the weekend.
Checking the values and the trends today, they are consistent with what we already found out.

Using the historical data from https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx the 7-Day Average of 7,420 can be confirmed. The value of the 7-Day Average 7 days before that on reporting day Feb 15, 2021 was 7,206 (50,442 / 7) - that is adding the values from Feb 9 to Feb 15, 2021 "Differenz Vortag Fälle" in "Fälle-Todesfälle-gesamt". So the 7-Day Average has gone up by 214 cases, or 3.0% of 7,206. The trend of 3% is less than the 5% hurdle, so it is categorized as a Steady trend.
From the same Excel file the value of the 7-Day Incidence 60.2 from yesterday Feb 21, 2021 can be extracted. Today's value of 61.0 is an increase of 0.7 or 1.2% of yesterday's value of 60.2. The trend hurdle for comparisons with the previous day is 1%, so this trend of 1.2% is classed as Upwards.
So the data and the display in the app agree with the base data from the Excel sheet published by RKI. đ
Edit: Sorry about the decimal point and thousands separator in the screenshot. I had the locale on the device set to English (Germany) which produces strange results. I updated the text above to use comma as thousands separator and dot as decimal point, which is the usual way for English texts.
@nilsalex
My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?
I asked and received an answer in https://github.com/corona-warn-app/cwa-server/issues/1223#issuecomment-785111671
"the 'cwa-server' doesn't collect any nor calculates any statistics, but it reads in a json file coming from CWA-Analytics framework and transforms it into protobuf structure, which is then consumed by the mobile clients, when you open your app.
Unfortunately I don't have all the details where the CWA-Analytics framework gets its information from. But for sure its using the RKI as one of the data-sources."
To summarize the findings:
There is a more detailed write-up in https://github.com/corona-warn-app/cwa-website/issues/904 which is open for review.
I hope that the information text regarding Trend will be acknowledged as a documentation bug and addressed through
Internal Tracking ID: EXPOSUREAPP-5225. This is the "Key Figures, Explanation of Statistics" text which is shown by tapping on the âšď¸ icon in any of the statistics tiles in the app. More specifically the string statistics_explanation_trend_text:
"Trend"
"The arrow direction indicates whether the trend is increasing, decreasing, or remaining steady â that is, demonstrates a deviation of less than 1% compared to the previous day or 5% compared to the previous week. The color indicates this trend as positive (green), negative (red), or neutral (gray). The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that."
@nilsalex
Could we close this issue now?
The trend for Confirmed New Infections is calculated based on a comparison to the value of the 7-Day Average one week previously whereas the trend for the 7-Day Incidence is calculated using the value one day previously. So that difference on its own is enough reason that the trends will not necessarily be the same on any one day.
In your original post, you wrote under Expected Behaviour "Same trend for both indicators.". Through the research we did, we now know that it is not expected that trend will be the same, for all the reasons I gave in https://github.com/corona-warn-app/cwa-documentation/issues/528#issuecomment-786543498.
I made a suggestion in the open issue #550 about changing the help text to explain better. Also there is a note in https://github.com/corona-warn-app/cwa-documentation/issues/535#issuecomment-799158881 that the FAQs will be updated.
@nilsalex
Could we close this issue now?
Sure. It is certainly not a bug because the behaviour is intended, as you explained.
Let me, however, just note: I do not expect this behaviour as user as laid out in great detail and it's weird to tell the user what to expect :-) The question should really be: How does the user benefit from seeing different numbers and trends?
But this is more an issue for the RKI as data source and the stakeholders as the ones who decide what information to present in the widgets. People have pointed out this inconsistency elsewhere (CWA is of course not the only medium where the data is published) but apparently it has been decided not to act on this.
Hi @nilsalex , you are free to call it "inconsistency" - this is your opinion, I still don't agree here.
Instead, I call it "different" metrics (but indirectly related), where on a given date the trend indicators _can_ differ.
Just saying.
I wanted to make this clear to avoid the impression, we agree your point of view. Hope you understand and accept our standpoint as well.
@nilsalex
Could we close this issue now?
Sure. It is certainly not a bug because the behaviour is intended, as you explained.
Thank you very much for raising this issue. I learned a lot trying to understand it myself!
You should see a button at the bottom so you can close it yourself. I'm not a moderator, just a Contributor so I can't close it for you.
Most helpful comment
Sorry, not a developer anymore since decades. I am just working _for_ the Community and _with_ the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.
As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here. As we disagree here, @nilsalex , again I invite you one more time to convince the experts on the source of the data.
So far, I don't see a calculation issue/bug here. However, multiple times I agreed:
So, if you want to question the trend indicators, feel free to ping me and I try to connect you to the experts.
Cheers, Giso