> If I’d written out the query, I might have seen that I didn’t need the JOIN (or never written it in the first place). Whereas the view hides that complexity. So they can make things easier, but that can lead to performance pitfalls if we’re not careful.
you can avoid this particular pitfall by using left joins for views like this (that join stuff in for convenience that you might not select) - postgres will eliminate a left join but not an inner join since the inner join could filter rows (eg symbols in stocks_real_time that don't exist in company (commercial engines will use the presence of fk constraints to know that's impossible and go ahead and eliminate inner joins as well))
Should've been just a count off the PK Employees table.
Apache Calcite in Hive had to bake in this specific optimization because there were a ton of deep join views (coming off Teradata) where someone would run something like a "select count(*) + where" or just project 2-3 columns out of a 60+ join view from a couple of tables.
And those just ran forever without the PK-FK removals.
Thanks yes! Totally true, was thinking about including some of that but it felt like it opened a can of worms about join types and why certain things would be included and others not (ie inner join needs to see that it's there on both sides whereas the left join doesn't) etc. and the post was already kinda long in the tooth.
Postgres is an amazing database. It’s only significant weakness now is in Materialized views, with their lack of incremental refresh. Was disappointing to see there was no progress towards this in v15.
That work towards incrementally updated views is happening and progressing. For now, it's a separate extension, though: https://github.com/sraoss/pg_ivm.
I wanted incremental refresh in Postgres as well and found that you can manage your own table to get something close.
Basically you create a regular table in place of a materialised one, only aggregate data newer than what's currently in the table then store the new aggregates in table. Repeat this an interval.
Enjoyed this post! Djk447 would love to ask a question as well.
We've started working with Timescale to process historical time series data. However there is so much of it that we chose which parts we process.
It's possible that in the future we may need to go back and reprocess the dataset to add something we decided we want after all.
In your post it seems like this is handled automatically on a smaller time scale, meaning you could insert into the past without doing anything special.
What happens if you need to insert data at numerous points across two years worth of data instead? Do you have to use backfilling as described in your documentation? Or is it better to maybe rebuild the entire hypertable?
Yes. this is generally handled automatically, there may be times though where you want to essentially pause refreshing the view for a while while you do some backfilling and then eventually let it catch up, especially if you're overwriting the same time period multiple times in a row. If you can insert in time order then it just breaks up re-calculation into smaller segments, which can be quite useful rather than having to process the whole data set again.
This can be a little bit different if you're doing compression, but with continuous aggregates I think it should work fine. I'm not 100% sure that was what you were looking for, let me know if it's not.
Did you consider a type of continuous aggregate that works over pre-aggregated partitions for time buckets without modifications and swapping those buckets in the continuous aggregate view with a live view when data is modified? I guess it would mean that performance would sort of continually degrade as inserts/updates/deletes happened in more and more time buckets, but as soon as the aggregation event fired performance would recover. It seems like one could provide a more strongly consistent view of aggregates with this sort of approach.
I'm not 100% sure I understand what you're asking, but essentially something that would look for data modifications and at query time run the query over the older regions as well?
If that's what you're asking the answer is yes, we did consider it, but basically decided that it was something that relatively few people needed and the complexity and performance tradeoffs were unlikely to be worth it for most folks.
Essentially, we could do something like this now by looking at our invalidation log and running a join against it to get to a more strongly consistent state (I haven't thought through the full implications and whether it's truly strong consistency, I think it might be, but it'd require a proof / some thinking through of all of our locking logic to really get there). It's interesting to consider though.
Great approach and good write-up! I’ve implemented a similar technique before on PostgreSQL, but with the materialisation in the application backend. Still works like a charm.
>So instead, we created a special kind of trigger that tracks the minimum and maximum times modified across all the rows in a statement and writes out the range of times that were modified to a log table. We call that an invalidation log.
Does this invalidation log also take into account cases where the view has an aggregate that is based on data from a bucket other than itself? For example, a lag() or lead() might be used to calculate a delta compared to the previous bucket. Then, if a data point inside bucket 1 is added into the realtime table and bucket 1 is invalidated and re-materialised, for integrity reasons also bucket 2 needs to be re-materialised?
This is an excellent article. I like the way the author builds up in steps to eventual consistency between source tables and their materialized views. It was fun to guess the next step.
I do have one question: how does the algorithm described in the article work when the source table is spread across multiple servers, say in multiple shards? Can TimescaleDB maintain materialized views on each shard and then run a query that reconciles them?
So we thought about doing something like that with multinode where each of the nodes would maintain their own materialization but abandoned it for that very reason it’s very, very difficult to maintain any sort of consistency guarantees in that case, or even to reason about it.
Instead we use the access nodes as coordinators to do the materialization. right now the materialization only exists on the access node but there’s no reason we couldn’t send it back out to the data nodes, you just need a coordination point to start a distributed transaction to have some semblance of a guarantee.
you can avoid this particular pitfall by using left joins for views like this (that join stuff in for convenience that you might not select) - postgres will eliminate a left join but not an inner join since the inner join could filter rows (eg symbols in stocks_real_time that don't exist in company (commercial engines will use the presence of fk constraints to know that's impossible and go ahead and eliminate inner joins as well))
What, postgres doesn't do FK join removals?
Like I tried it right now and it didn't remove the hash-join
http://sqlfiddle.com/#!17/073747/2
Should've been just a count off the PK Employees table.
Apache Calcite in Hive had to bake in this specific optimization because there were a ton of deep join views (coming off Teradata) where someone would run something like a "select count(*) + where" or just project 2-3 columns out of a 60+ join view from a couple of tables.
And those just ran forever without the PK-FK removals.
Deleted Comment
Thanks yes! Totally true, was thinking about including some of that but it felt like it opened a can of worms about join types and why certain things would be included and others not (ie inner join needs to see that it's there on both sides whereas the left join doesn't) etc. and the post was already kinda long in the tooth.
Basically you create a regular table in place of a materialised one, only aggregate data newer than what's currently in the table then store the new aggregates in table. Repeat this an interval.
https://github.com/cadbox1/prawn-stack/blob/master/src/backe...
I use this to show page view data aggregated by hour without calculating it on each request using Lambda
https://prawn.cadell.dev/
We've started working with Timescale to process historical time series data. However there is so much of it that we chose which parts we process.
It's possible that in the future we may need to go back and reprocess the dataset to add something we decided we want after all.
In your post it seems like this is handled automatically on a smaller time scale, meaning you could insert into the past without doing anything special.
What happens if you need to insert data at numerous points across two years worth of data instead? Do you have to use backfilling as described in your documentation? Or is it better to maybe rebuild the entire hypertable?
Yes. this is generally handled automatically, there may be times though where you want to essentially pause refreshing the view for a while while you do some backfilling and then eventually let it catch up, especially if you're overwriting the same time period multiple times in a row. If you can insert in time order then it just breaks up re-calculation into smaller segments, which can be quite useful rather than having to process the whole data set again.
This can be a little bit different if you're doing compression, but with continuous aggregates I think it should work fine. I'm not 100% sure that was what you were looking for, let me know if it's not.
I'm not 100% sure I understand what you're asking, but essentially something that would look for data modifications and at query time run the query over the older regions as well?
If that's what you're asking the answer is yes, we did consider it, but basically decided that it was something that relatively few people needed and the complexity and performance tradeoffs were unlikely to be worth it for most folks.
Essentially, we could do something like this now by looking at our invalidation log and running a join against it to get to a more strongly consistent state (I haven't thought through the full implications and whether it's truly strong consistency, I think it might be, but it'd require a proof / some thinking through of all of our locking logic to really get there). It's interesting to consider though.
>So instead, we created a special kind of trigger that tracks the minimum and maximum times modified across all the rows in a statement and writes out the range of times that were modified to a log table. We call that an invalidation log.
Does this invalidation log also take into account cases where the view has an aggregate that is based on data from a bucket other than itself? For example, a lag() or lead() might be used to calculate a delta compared to the previous bucket. Then, if a data point inside bucket 1 is added into the realtime table and bucket 1 is invalidated and re-materialised, for integrity reasons also bucket 2 needs to be re-materialised?
I do have one question: how does the algorithm described in the article work when the source table is spread across multiple servers, say in multiple shards? Can TimescaleDB maintain materialized views on each shard and then run a query that reconciles them?
Edited: clarification
So we thought about doing something like that with multinode where each of the nodes would maintain their own materialization but abandoned it for that very reason it’s very, very difficult to maintain any sort of consistency guarantees in that case, or even to reason about it.
Instead we use the access nodes as coordinators to do the materialization. right now the materialization only exists on the access node but there’s no reason we couldn’t send it back out to the data nodes, you just need a coordination point to start a distributed transaction to have some semblance of a guarantee.