Mean Price
^^ Plotting switched to Line.
This method of financial time series (aka bars) downsampling is literally, naturally, and thankfully the best you can do in terms of maximizing info gain. You can finally chill and feed it to your studies & eyes, and probably use nothing else anymore.
(HL2 and occ3 also have use cases, but other aggregation methods? Not really, even if they do, the use cases are ‘very’ specific). Tho in order to understand why, you gotta read the following wall, or just believe me telling you, ‘I put it on my momma’.
The true story about trading volumes and why this is all a big misdirection
Actually, you don’t need to be a quant to get there. All you gotta do is stop blindly following other people’s contextual (at best) solutions, eg OC2 aggregation xD, and start using your own brain to figure things out.
Every individual trade (basically an imprint on 1D price space that emerges when market orders hit the order book) has several features like: price, time, volume, AND direction (Up if a market buy order hits the asks, Down if a market sell order hits the bids). Now, the last two features—volume and direction—can be effectively combined into one (by multiplying volume by 1 or -1), and this is probably how every order matching engine should output data. If we’re not considering size/direction, we’re leaving data behind. Moreover, trades aren’t just one-price dots all the time. One trade can consume liquidity on several levels of the order book, so a single trade can be several ticks big on the price axis.
You may think now that there are no zero-volume ticks. Well, yes and no. It depends on how you design an exchange and whether you allow intra-spread trades/mid-spread trades (now try to Google it). Intra-spread trades could happen if implemented when a matching engine receives both buy and sell orders at the same microsecond period. This way, you can match the orders with each other at a better price for both parties without even hitting the book and consuming liquidity. Also, if orders have different sizes, the remaining part of the bigger order can be sent to the order book. Basically, this type of trade can be treated as an OTC trade, having zero volume because we never actually hit the book—there’s no imprint. Another reason why it makes sense is when we think about volume as an impact or imbalance act, and how the medium (order book in our case) responds to it, providing information. OTC and mid-spread trades are not aggressive sells or buys; they’re neutral ticks, so to say. However huge they are, sometimes many blocks on NYSE, they don’t move the price because there’s no impact on the medium (again, which is the order book)—they’re not providing information.
... Now, we need to aggregate these trades into, let’s say, 1-hour bars (remember that a trade can have either positive or negative volume). We either don’t want to do it, or we don’t have this kind of information. What we can do is take already aggregated OHLC bars and extract all the info from them. Given the market is fractal, bars & trades gotta have the same set of features:
- Highest & lowest ticks (high & low) <- by price;
- First & last ticks (open & close) <- by time;
- Biggest and smallest ticks <- by volume.*
*e.g., in the array ,
2323: biggest trade,
-1212: smallest trade.
Now, in our world, somehow nobody started to care about the biggest and smallest trades and their inclusion in OHLC data, while this is actually natural. It’s the same way as it’s done with high & low and open & close: we choose the minimum and maximum value of a given feature/axis within the aggregation period.
So, we don’t have these 2 values: biggest and smallest ticks. The best we can do is infer them, and given the fact the biggest and smallest ticks can be located with the same probability everywhere, all we can do is predict them in the middle of the bar, both in time and price axes. That’s why you can see two HL2’s in each of the 3 formulas in the code.
So, summed up absolute volumes that you see in almost every trading platform are actually just a derivative metric, something that I call Type 2 time series in my own (proprietary ‘for now’) methods. It doesn’t have much to do with market orders hitting the non-uniform medium (aka order book); it’s more like a statistic. Still wanna use VWAP? Ok, but you gotta understand you’re weighting Type 1 (natural) time series by Type 2 (synthetic) ones.
How to combine all the data in the right way (khmm khhm ‘order’)
Now, since we have 6 values for each bar, let’s see what information we have about them, what we don’t have, and what we can do about it:
- Open and close: we got both when and where (time (order) and price);
- High and low: we got where, but we don’t know when;
- Biggest & smallest trades: we know shit, we infer it the way it was described before.'
By using the location of the close & open prices relative to the high & low prices, we can make educated guesses about whether high or low was made first in a given bar. It’s not perfect, but it’s ultimately all we can do—this is the very last bit of info we can extract from the data we have.
There are 2 methods for inferring volume delta (which I call simply volume) that are presented everywhere, even here on TradingView. Funny thing is, this is actually 2 parts of the 1 method. I wonder how many folks see through it xD. The same method can be used for both inferring volume delta AND making educated guesses whether high or low was made first.
Imagine and/or find the cases on your charts to understand faster:
* Close > open means we have an up bar and probably the volume is positive, and probably high was made later than low.
* Close < open means we have a down bar and probably the volume is negative, and probably low was made later than high.
Now that’s the point when you see that these 2 mentioned methods are actually parts of the 1 method:
If close = open, we still have another clue: distance from open/close pair to high (HC), and distance from open/close pair to low (LC):
* HC < LC, probably high was made later.
* HC > LC, probably low was made later.
And only if close = open and HC = LC, only in this case we have no clue whether high or low was made earlier within a bar. We simply don’t have any more information to even guess. This bar is called a neutral bar.
At this point, we have both time (order) and price info for each of our 6 values. Now, we have to solve another weighted average problem, and that’s it. We’ll weight prices according to the order we’ve guessed. In the neutral bar case, open has a weight of 1, close has a weight of 3, and both high and low have weights of 2 since we can’t infer which one was made first. In all cases, biggest and smallest ticks are modeled with HL2 and weighted like they’re located in the middle of the bar in a time sense.
P.S.: I’ve also included a "robust" method where all the bars are treated like neutral ones. I’ve used it before; obviously, it has lesser info gain -> works a bit worse.