Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSSQL Stored Procedure Metadata Ingestion causes Arithmetic Overflow #19071

Open
stevelangley-rangeabove opened this issue Dec 14, 2024 · 2 comments · May be fixed by #19505
Open

MSSQL Stored Procedure Metadata Ingestion causes Arithmetic Overflow #19071

stevelangley-rangeabove opened this issue Dec 14, 2024 · 2 comments · May be fixed by #19505
Assignees
Labels
bug Something isn't working Ingestion

Comments

@stevelangley-rangeabove

Affected module
ingestion/src/metadata/ingestion/source/database/mssql/queries.py

Describe the bug

The MSSQL_GET_STORED_PROCEDURE_QUERIES sql code is using the wrong system table to compute stored procedure duration and the wrong unit conversion to seconds (millisec s/b microsec).

The query is using sys.dm_exec_procedure_stats "total_elapsed_time" column to reflect stored proc duration and using it in a DATEADD function to compute end-time.
In addition, it is dividing by 1000 to convert to seconds, but should divide by 1000000.

This column ("total_elapsed_time") is the cumulative time the stored procedure has a accrued over ALL of it's executions, not just the last one. For a MSSQL server with a long running and frequently executed Stored Procedure, the "total_elapsed_time" value is large enough to cause an Arithmetic Overflow error in the DATEADD function.

Suggest the column that should be used for this is the "last_worker_time" column.

To Reproduce

Using an extracted subset of the MSSQL_GET_STORED_PROCEDURE_QUERIES sql:

        select
          s.last_execution_time  start_time,
          s.total_elapsed_time  elapsed_time,
          DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
          OBJECT_NAME(s.object_id, s.database_id) as procedure_name
          from 
            sys.dm_exec_procedure_stats s
          where OBJECT_NAME(s.object_id, s.database_id) IS NOT NULL
          and s.database_id = db_id()
          order by elapsed_time desc
        ;

        -- RESULT
        -- SQL Error [8115] [S0002]: An error occurred during the current command (Done status 0). 
        -- Arithmetic overflow error converting expression to data type int.

Finding the offending Stored Procedure:

        select
          s.last_execution_time  start_time,
          s.total_elapsed_time  elapsed_time,
          -- DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
          OBJECT_NAME(s.object_id, s.database_id) as procedure_name
          from 
            sys.dm_exec_procedure_stats s
          where OBJECT_NAME(s.object_id, s.database_id) not like 'sp_%'
          order by elapsed_time desc
        ;

        2024-12-08 20:00:01.047	**_3431915680349_**	replication_cis
        2024-12-08 20:30:00.497	1063036715221	CIS_TASKS_LOAD
        2024-12-09 02:43:41.597	230498365752	replication_abs
        2024-12-08 21:59:01.030	186063585227	CIS_TASKS_PreProcesses
        2024-12-07 06:02:20.223	149528623548	DatabaseMaintenancePlan
        2024-12-07 05:05:01.013	59591778456	GetSnrStatsForSPU

Expected behavior
MSSQL Stored Procedure Metadata would be ingested into OpenMetadata instead of failing on an Arithmetic Overflow error.

Version:

  • OS: Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-198-generic x86_64)
  • Python version: 3.10.14
  • OpenMetadata version: 1.5.12
  • OpenMetadata Ingestion package version: docker.getcollate.io/openmetadata/ingestion:1.5.12

select @@ServerName, @@Version;
-- SQLSERVER
-- Microsoft SQL Server 2008 R2 (SP3-GDR) (KB4057113) - 10.50.6560.0 (X64)
-- Dec 28 2017 15:03:48
-- Copyright (c) Microsoft Corporation
-- Standard Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)

Additional context

Suggested SQL correction:

MSSQL_GET_STORED_PROCEDURE_QUERIES = textwrap.dedent(
"""
WITH SP_HISTORY (start_time, end_time, procedure_name, query_text) AS (
select
s.last_execution_time start_time,
-- DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
DATEADD(s, s.last_worker_time/1000000, s.last_execution_time) end_time,
OBJECT_NAME(object_id, database_id) as procedure_name,
text as query_text
from sys.dm_exec_procedure_stats s
CROSS APPLY sys.dm_exec_sql_text(s.plan_handle)
WHERE OBJECT_NAME(object_id, database_id) IS NOT NULL
AND s.last_execution_time > '{start_date}'
),
Q_HISTORY (database_name, query_text, start_time, end_time, duration,query_type, schema_name, user_name) AS (
select
db.NAME database_name,
t.text query_text,
s.last_execution_time start_time,
-- DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
-- s.total_elapsed_time/1000 duration,
DATEADD(s, s.last_worker_time/1000000, s.last_execution_time) end_time,
s.last_worker_time/1000000 duration,
case
when t.text LIKE '%%MERGE%%' then 'MERGE'
when t.text LIKE '%%UPDATE%%' then 'UPDATE'
when t.text LIKE '%%SELECT%%INTO%%' then 'CREATE_TABLE_AS_SELECT'
when t.text LIKE '%%INSERT%%' then 'INSERT'
else 'UNKNOWN' end query_type,
NULL schema_name,
NULL user_name
FROM sys.dm_exec_cached_plans AS p
INNER JOIN sys.dm_exec_query_stats AS s
ON p.plan_handle = s.plan_handle
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
INNER JOIN sys.databases db
ON db.database_id = t.dbid
WHERE t.text NOT LIKE '/* {{"app": "OpenMetadata", %%}} /%%'
AND t.text NOT LIKE '/
{{"app": "dbt", %%}} */%%'
AND p.objtype NOT IN ('Prepared', 'Proc')
AND s.last_execution_time > '{start_date}'
)
select
Q.query_type AS QUERY_TYPE,
Q.database_name AS QUERY_DATABASE_NAME,
Q.schema_name AS QUERY_SCHEMA_NAME,
Q.query_text AS QUERY_TEXT,
Q.user_name AS QUERY_USER_NAME,
Q.start_time AS QUERY_START_TIME,
Q.duration AS QUERY_DURATION,
SP.procedure_name AS PROCEDURE_NAME,
SP.query_text AS PROCEDURE_TEXT,
SP.start_time AS PROCEDURE_START_TIME,
SP.end_time AS PROCEDURE_END_TIME
from SP_HISTORY SP
JOIN Q_HISTORY Q
ON (
Q.start_time BETWEEN SP.start_time and SP.end_time
OR Q.end_time BETWEEN SP.start_time and SP.end_time
)
order by PROCEDURE_START_TIME desc
;
"""
)

@stevelangley-rangeabove
Copy link
Author

p.s. It appears that the MSSQL_SQL_STATEMENT query (in the same file) should also be corrected, as it is doing the same thing (using total_elapsed_time column and /1000).

@pmbrull pmbrull added bug Something isn't working Ingestion labels Jan 2, 2025
@pmbrull pmbrull removed this from Release 1.6.2 Jan 7, 2025
@ulixius9 ulixius9 moved this to Integration in Release 1.6.3 Jan 8, 2025
@akashverma0786
Copy link
Contributor

Hi, I am working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Ingestion
Projects
Status: Integration
Development

Successfully merging a pull request may close this issue.

4 participants