Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Select view #130

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

WIP: Select view #130

wants to merge 2 commits into from

Conversation

Felixoid
Copy link
Member

This is a POC of work with VIEW.

Selecting from view with hidden aggregation allows doing all work on shards without the huge final aggregation.

Here is a research regarding selecting from table and vew

You could see from analyzer, that with selection from a table remote shards do much less work and the peak memory consumption 4 times less.

Request to the table

CREATE TABLE graphite.data_lr
(
    `metric` String, 
    `value` Float64, 
    `timestamp` UInt32, 
    `date` Date, 
    `updated` UInt32
)
ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/graphite.data_lr/{shard}', '{replica}', 'graphite_ig_rollup')
PARTITION BY toYYYYMM(date)
ORDER BY (metric, timestamp)
SETTINGS index_granularity = 8192

CREATE TABLE graphite.data
(
    `metric` String, 
    `value` Float64, 
    `timestamp` UInt32, 
    `date` Date, 
    `updated` UInt32
)
ENGINE = Distributed('graphite_data', 'graphite', 'data_lr', sipHash64(metric))


SELECT 
    metric, 
    ts, 
    avg(value) AS value
FROM 
(
    SELECT 
        metric, 
        ts, 
        argMax(value, updated) AS value
    FROM data
    WHERE (metric IN (.....)) AND (ts >= 1563984700) AND (ts < 1564001100) AND (date >= toDate(1563984700)) AND (date <= toDate(1564001100))
    GROUP BY 
        metric, 
        timestamp AS ts
)
GROUP BY 
    metric, 
    intDiv(toUInt32(ts), 60) * 60 AS ts
ORDER BY 
    metric ASC, 
    ts ASC


2019.08.09 08:25:01.262172 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(date >= toDate(1563984700)) AND (date <= toDate(1564001100))" moved to PREWHERE
2019.08.09 08:25:01.274627 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> graphite.data_lr (SelectExecutor): Key condition: (column 0 in 212-element set), (column 1 in [1563984700, +inf)), and, (column 1 in (-inf, 1564001099]), and, unknown, unknown, and, and
2019.08.09 08:25:01.274680 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> graphite.data_lr (SelectExecutor): MinMax index condition: unknown, unknown, and, unknown, and, (column 0 in [18101, +inf)), (column 0 in (-inf, 18101]), and, and
2019.08.09 08:25:01.304592 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> graphite.data_lr (SelectExecutor): Selected 4 parts by date, 4 parts by key, 332 marks to read from 219 ranges
2019.08.09 08:25:01.304808 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> graphite.data_lr (SelectExecutor): Reading approx. 2719744 rows with 10 streams
2019.08.09 08:25:01.305057 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> InterpreterSelectQuery: FetchColumns -> WithMergeableState
2019.08.09 08:25:01.309110 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2019.08.09 08:25:01.309229 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2019.08.09 08:25:01.310084 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> executeQuery: Query pipeline:
Expression
 MergeSorting
  PartialSorting
   Expression
    Aggregating
     Concat
      Expression
       Expression
        Expression
         MergingAggregated
          Union
           Materializing
            ParallelAggregating
             Expression × 10
              Filter
               MergeTreeThread
           Remote × 3

2019.08.09 08:25:01.310604 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregating
2019.08.09 08:25:01.310641 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Reading blocks of partially aggregated data.
2019.08.09 08:25:01.310947 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregating
2019.08.09 08:25:01.321755 [ 73 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.326513 [ 88 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.327967 [ 55 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.334752 [ 72 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.361993 [ 81 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.377850 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 819 to 819 rows (from 0.076 MiB) in 0.067 sec. (12253.568 rows/sec., 1.134 MiB/sec.)
2019.08.09 08:25:01.377900 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 4914 to 4914 rows (from 0.489 MiB) in 0.067 sec. (73521.405 rows/sec., 7.315 MiB/sec.)
2019.08.09 08:25:01.377920 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.067 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:25:01.377939 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 6279 to 6279 rows (from 0.579 MiB) in 0.067 sec. (93944.018 rows/sec., 8.655 MiB/sec.)
2019.08.09 08:25:01.377958 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 1911 to 1911 rows (from 0.181 MiB) in 0.067 sec. (28591.658 rows/sec., 2.703 MiB/sec.)
2019.08.09 08:25:01.377989 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.067 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:25:01.378008 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.067 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:25:01.378027 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 4095 to 4095 rows (from 0.391 MiB) in 0.067 sec. (61267.838 rows/sec., 5.843 MiB/sec.)
2019.08.09 08:25:01.378044 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.067 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:25:01.378062 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.067 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:25:01.378081 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> ParallelAggregatingBlockInputStream: Total aggregated. 18018 rows (from 1.714 MiB) in 0.067 sec. (269578.485 rows/sec., 25.651 MiB/sec.)
2019.08.09 08:25:01.378097 [ 25 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Merging aggregated data
2019.08.09 08:25:01.454208 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Read 4 blocks of partially aggregated data, total 57876 rows.
2019.08.09 08:25:01.454281 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Merging partially aggregated single-level data.
2019.08.09 08:25:01.470857 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Merged partially aggregated single-level data.
2019.08.09 08:25:01.470883 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Converting aggregated data to blocks
2019.08.09 08:25:01.485431 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Converted aggregated data to blocks. 57876 rows, 5.308 MiB in 0.014 sec. (3994679.558 rows/sec., 366.335 MiB/sec.)
2019.08.09 08:25:01.485772 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:25:01.495595 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> UnionBlockInputStream: Waiting for threads to finish
2019.08.09 08:25:01.495652 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> UnionBlockInputStream: Waited for threads to finish
2019.08.09 08:25:01.495696 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Aggregated. 57876 to 57876 rows (from 5.528 MiB) in 0.185 sec. (312767.574 rows/sec., 29.876 MiB/sec.)
2019.08.09 08:25:01.495715 [ 79 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> Aggregator: Merging aggregated data
2019.08.09 08:25:01.548454 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Information> executeQuery: Read 11665925 rows, 117.03 MiB in 0.313 sec., 37262293 rows/sec., 373.81 MiB/sec.
2019.08.09 08:25:01.548523 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> MemoryTracker: Peak memory usage (for query): 34.02 MiB.
2019.08.09 08:25:01.548595 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2019.08.09 08:25:01.548669 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> UnionBlockInputStream: Waiting for threads to finish
2019.08.09 08:25:01.548699 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> UnionBlockInputStream: Waited for threads to finish
2019.08.09 08:25:01.548722 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2019.08.09 08:25:01.548922 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Debug> MemoryTracker: Peak memory usage (total): 34.02 MiB.
2019.08.09 08:25:01.548963 [ 65 ] {df448110-33ef-4282-b97b-66c6461ea6f6} <Information> TCPHandler: Processed in 0.322 sec.











Request to the view

CREATE VIEW graphite.data_view
(
    `metric` String, 
    `value` Float64, 
    `timestamp` UInt32, 
    `date` Date
) AS
SELECT 
    metric, 
    timestamp, 
    argMax(value, updated) AS value, 
    date
FROM graphite.data_lr
GROUP BY 
    metric, 
    timestamp, 
    date

CREATE TABLE graphite.data_view_distributed
(
    `metric` String, 
    `value` Float64, 
    `timestamp` UInt32, 
    `date` Date
)
ENGINE = Distributed('graphite_data', 'graphite', 'data_view', sipHash64(metric))

SELECT 
    metric, 
    ts, 
    avg(value) AS value
FROM 
(
    SELECT 
        metric, 
        timestamp AS ts, 
        value
    FROM data_view_distributed
    WHERE (metric IN (...)) AND (ts >= 1563984700) AND (ts < 1564001100) AND (date >= toDate(1563984700)) AND (date <= toDate(1564001100))
)
GROUP BY 
    metric, 
    intDiv(toUInt32(ts), 60) * 60 AS ts
ORDER BY 
    metric ASC, 
    ts ASC

2019.08.09 08:28:31.818976 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(date >= toDate(1563984700)) AND (date <= toDate(1564001100))" moved to PREWHERE
2019.08.09 08:28:31.820700 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(date >= toDate(1563984700)) AND (date <= toDate(1564001100))" moved to PREWHERE
2019.08.09 08:28:31.822543 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> graphite.data_lr (SelectExecutor): Key condition: (column 0 in 34-element set), unknown, unknown, and, and
2019.08.09 08:28:31.822573 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> graphite.data_lr (SelectExecutor): MinMax index condition: unknown, (column 0 in [18101, +inf)), (column 0 in (-inf, 18101]), and, and
2019.08.09 08:28:31.823920 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> graphite.data_lr (SelectExecutor): Selected 4 parts by date, 4 parts by key, 55 marks to read from 38 ranges
2019.08.09 08:28:31.824080 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> graphite.data_lr (SelectExecutor): Reading approx. 450560 rows with 4 streams
2019.08.09 08:28:31.824207 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2019.08.09 08:28:31.824572 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> InterpreterSelectQuery: FetchColumns -> WithMergeableState
2019.08.09 08:28:31.824762 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> InterpreterSelectQuery: WithMergeableState -> Complete
2019.08.09 08:28:31.824805 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2019.08.09 08:28:31.825388 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> executeQuery: Query pipeline:
Expression
 MergeSorting
  PartialSorting
   Expression
    ParallelAggregating
     Expression
      Expression
       Materializing
        Expression
         Filter
          Materializing
           Expression
            Expression
             ParallelAggregating
              Expression × 4
               Filter
                MergeTreeThread
     Expression × 3
      Expression
       Remote

2019.08.09 08:28:31.825688 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregating
2019.08.09 08:28:31.825969 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregating
2019.08.09 08:28:31.827253 [ 25 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.827829 [ 80 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.841570 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 901 to 901 rows (from 0.097 MiB) in 0.016 sec. (57968.333 rows/sec., 6.210 MiB/sec.)
2019.08.09 08:28:31.841619 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.016 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:28:31.841640 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 7764 to 7764 rows (from 0.832 MiB) in 0.016 sec. (499518.464 rows/sec., 53.519 MiB/sec.)
2019.08.09 08:28:31.841657 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.016 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.09 08:28:31.841676 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Total aggregated. 8665 rows (from 0.928 MiB) in 0.016 sec. (557486.797 rows/sec., 59.729 MiB/sec.)
2019.08.09 08:28:31.841691 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Merging aggregated data
2019.08.09 08:28:31.847687 [ 70 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.862286 [ 77 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.863540 [ 89 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.884678 [ 90 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Aggregation method: serialized
2019.08.09 08:28:31.885761 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 2457 to 2457 rows (from 0.256 MiB) in 0.060 sec. (40929.338 rows/sec., 4.268 MiB/sec.)
2019.08.09 08:28:31.885793 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 2457 to 2457 rows (from 0.259 MiB) in 0.060 sec. (40929.338 rows/sec., 4.307 MiB/sec.)
2019.08.09 08:28:31.885816 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 2457 to 2457 rows (from 0.246 MiB) in 0.060 sec. (40929.338 rows/sec., 4.098 MiB/sec.)
2019.08.09 08:28:31.885837 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 1911 to 1911 rows (from 0.192 MiB) in 0.060 sec. (31833.930 rows/sec., 3.192 MiB/sec.)
2019.08.09 08:28:31.885869 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> ParallelAggregatingBlockInputStream: Total aggregated. 9282 rows (from 0.952 MiB) in 0.060 sec. (154621.944 rows/sec., 15.865 MiB/sec.)
2019.08.09 08:28:31.885887 [ 56 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> Aggregator: Merging aggregated data
2019.08.09 08:28:31.895983 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Information> executeQuery: Read 1884677 rows, 20.12 MiB in 0.084 sec., 22378836 rows/sec., 238.86 MiB/sec.
2019.08.09 08:28:31.896045 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> MemoryTracker: Peak memory usage (for query): 8.28 MiB.
2019.08.09 08:28:31.896109 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2019.08.09 08:28:31.896204 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2019.08.09 08:28:31.896338 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Debug> MemoryTracker: Peak memory usage (total): 8.28 MiB.
2019.08.09 08:28:31.896373 [ 65 ] {2feb0071-108b-4ee9-8c69-0e1b773a64ce} <Information> TCPHandler: Processed in 0.086 sec.

@AndreevDm
Copy link
Contributor

Here is debatable, I'll look at it later

@CLAassistant
Copy link

CLAassistant commented Sep 28, 2021

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants