Deykun
g/programowanie

Leniwy tutaj. Jak chcę mierzyć średnią w czasie to wystarczy mi dwie liczby, suma wszystkich ocen i liczba ocen. Jak pojawia się nowa ocena to dodaje jej wartość do sumy i 1 do liczby i mam nową średnią.

Medianę da się w ogóle mierzyć w taki uproszczony sposób (albo w przybliżeniu jakimś), że mam tylko 3-6 liczb do niej?

#
Deykun

What you are looking for is an "online" algorithm to compute the median in constant space, and I don't think an exact one exists. There are approximate algorithms, and if you know the kind of values you are expecting (for instance if the inputs are a finite set of integers) you could get a good answer by counting occurrences. As for your histogram idea, you could always use a cheap solution (like keeping a short list of values, and using an O(n) median-finding algorithm when required) and then switch to a histogram once there is enough data.
https://math.stackexchange.com/questions/3837060/how-to-compute-median-without-storing-all-the-values#comment7914107_3837060

#
Deykun

If you can't hold all the items in memory at once, this problem becomes much harder. The heap solution requires you to hold all the elements in memory at once. This is not possible in most real world applications of this problem.

Instead, as you see numbers, keep track of the count of the number of times you see each integer. Assuming 4 byte integers, that's 2^32 buckets, or at most 2^33 integers (key and count for each int), which is 2^35 bytes or 32GB. It will likely be much less than this because you don't need to store the key or count for those entries that are 0 (ie. like a defaultdict in python). This takes constant time to insert each new integer.

Then at any point, to find the median, just use the counts to determine which integer is the middle element. This takes constant time (albeit a large constant, but constant nonetheless).
https://stackoverflow.com/a/10692777/6743808

To w sumie jest manageable jak są trzymane w ryzach inty tylko.

W diffle do zwracania mediany długości słowa zamiast średniej która może być zjebana fest jak ktoś dwa razy odgadnie w 35 słowie to jest ok, bo tych intów dla diffle by było z dla przeciętnego gracza 10 żeby obsługiwać wszystkie inputy.

#