Code Optimization

Interesting things about software development and code optimization

MS SQL Batch or Bulk UPDATE and all around EF

Dear friends,


Thank you for comming and here I will show you how to make a huge bulk UPDATE in database from your C# code using Entity Framework.

My issue was about calculation prices for +50K products and then updating these product prices all at once.

Before this optimization SQL Update took more than 60 seconds ( using different approaches and of course you even should not think to do it with EF update/save method at all). After implementation of this approach we get less than 10 seconds for updating +50K products (of course it depends on hosting and environment you use). So what is that?


Table-Valued Parameters

yes, may be you have heard about it but have not understand it enough to use or just have not believed that it would really help you.


So everything is really easy - you create table, put all data you need for your update - it is +50K rows and pass it to your stored procedure that knows how to use it, here is an example:

- create table and columns

                            DataTable tblUpdate = new DataTable();
                            tblUpdate.Columns.Add("Id", typeof(int));               
                            tblUpdate.Columns.Add("DeliveryDateId", typeof(int));   
                            tblUpdate.Columns.Add("OldPrice", typeof(decimal));     
                            tblUpdate.Columns.Add("Price", typeof(decimal));        
                            tblUpdate.Columns.Add("ProductCost", typeof(decimal));  
                            tblUpdate.Columns.Add("AdditionalShippingCharge", typeof(decimal)); 
                            tblUpdate.Columns.Add("DisableBuyButton", typeof(bool));
                            tblUpdate.Columns.Add("Published", typeof(bool));       
                            tblUpdate.Columns.Add("AdminComment", typeof(string));  
                            tblUpdate.Columns.Add("UpdatedOnUtc", typeof(DateTime));

- add rows

...
                            object[] row = new object[10];                            
                            ...
                            ...
                            tblUpdate.Rows.Add(row);
                            ...

- now it is time to push it to SQL

                            var dataSettings = Core.Data.DataSettingsManager.LoadSettings();

                            using (System.Data.SqlClient.SqlConnection conn = new System.Data.SqlClient.SqlConnection(dataSettings.DataConnectionString))
                            {
                                conn.Open();
                                
                                //check if table-valued type is already exists
                                using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand(
@"IF TYPE_ID(N'TargetUDT') IS NULL CREATE TYPE [TargetUDT] AS TABLE
(
Id int not null,
DeliveryDateId int not null,
OldPrice decimal(18,4) not null,
Price decimal(18,4) not null,
ProductCost decimal(18,4) not null,
AdditionalShippingCharge decimal(18,4) not null,
DisableBuyButton bit not null,
Published bit not null,
AdminComment nvarchar(max) null,
UpdatedOnUtc datetime not null
);", conn))
                                {
                                    cmd.CommandType = CommandType.Text;
                                    cmd.ExecuteNonQuery();
                                }

//create or update stored procedure using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand( @"CREATE OR ALTER PROCEDURE usp_UpdateAllProdPrices (@tvpUpdateProducts [TargetUDT] READONLY) AS BEGIN UPDATE dbo.Product SET Product.DeliveryDateId = ec.DeliveryDateId, Product.OldPrice = ec.OldPrice, Product.Price = ec.Price, Product.ProductCost = ec.ProductCost, Product.AdditionalShippingCharge = ec.AdditionalShippingCharge, Product.DisableBuyButton = ec.DisableBuyButton, Product.Published = ec.Published, Product.AdminComment = ec.AdminComment, Product.UpdatedOnUtc = ec.UpdatedOnUtc FROM @tvpUpdateProducts AS ec INNER JOIN dbo.Product ON dbo.Product.Id = ec.Id END", conn)) { cmd.CommandType = CommandType.Text; cmd.ExecuteNonQuery(); } //run the stored procedure and pass the all data as table-valued parameter using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand( @"usp_UpdateAllProdPrices", conn)) { System.Data.SqlClient.SqlParameter tvpParam = cmd.Parameters.AddWithValue("@tvpUpdateProducts", tblUpdate); tvpParam.SqlDbType = SqlDbType.Structured; tvpParam.TypeName = "[TargetUDT]"; cmd.CommandType = System.Data.CommandType.StoredProcedure; cmd.ExecuteNonQuery(); } conn.Close(); }


That's it guys enjoy your super-fast bulk update :)




Thank you and see you :)


1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y



MS SQL Maintenance and Performance

Hello friends,


I want to share my experience with MS SQL maintenance and performance with regards to nonclustered Indexes and Fragmentation.


If you do delete unnecessary rows in tables from time to time then you definitely need to maintenance fragmented indexes.

When you delete a row in a table SQL Server modifies indexes and it leads to fragmentation that leads to slowing down query performances on that table.


To avoid any bottlenecks and be sure your fragmentation is minimal and performance is maximum you need to reorganize indexes or rebuild them.


I use the following piece of stored procedure code whenever I do any clean on a table:


    StoredProcedure

....

    -- Reorganize the NCIX_MyTable 
    -- index on the dbo.MyTable table.   
SELECT @frag = avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (DB_ID(N'my_database'), OBJECT_ID(N'dbo.MyTable'), NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id and [name] = N'NCIX_MyTable';
if @frag>=50 begin --To rebuild a fragmented index ALTER INDEX NCIX_MyTable
ON dbo.MyTable REBUILD WITH (FILLFACTOR = 50); end; if @frag>=25 begin ALTER INDEX NCIX_MyTable ON dbo.MyTable REORGANIZE; end; ....


Conclusion - it keeps your DB size to minimum, removes unnecessary data and keeps sql performance to maximum.


Thank you and see you ;)


1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y



MS SQL - Speed up Order By with OFFSET FETCH paging

UPDATE:

So adding nonclustered index, as the MS SQL Execution Plan suggests, seems reduced the query time even more, here is the suggested nonclustered index:

USE [myDB]

GO


CREATE NONCLUSTERED INDEX NONCLUSTERED_INDEX2_tblPictures

ON [dbo].[tblPictures] ([RatingTotal],[shape])

INCLUDE ([PictureID],[OwnerID],[PictureTypeID])

GO

but using only nonclustered index, without my #temp table technic, seems does not help too much.


Hello,

This time I'm going to share my solution on how I did reduce time of my SQL query with Order By clause by almost x2 times.

Firstly let me show my original query:

select @rowstotal = count(*)

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])



Select

@rowstotal as TotalCount

, PictureID

, OwnerID

, PictureName

, [Description]

, Description2

, Description3

, aspectRatio

, [min]

, [max]

, [percent]

, thumbWidth

, thumbHeight

, processState

, keywords

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])

Order By RatingTotal Desc

OFFSET @pageNumber ROWS FETCH NEXT @pageSize ROWS ONLY;


So to be able to calculate pages we have to know the total number of rows in the DB. Each row will contain TotalCount - total number of rows


(pay attention, that the techinc COUNT(*) OVER () as TotalCount is slower than just select @rowstotal = count(*) From that I use in my queries )


Then we select all needed columns with the order and paging technic.

Everything looks great, simple and fast unless you got more than 300 000 rows in the table.

Main problem here is that the Order By clause takes 98% of the whole stored procedure and in my case it takes 5 sec in total.

How to speed it up? 

I did google a lot of posts and almost everyone suggests to use indexes or nonclustered indexes, or other things.

As I'm not DBA and do not know a lot about all these things I decided to check what if I would select only one column instead of all of them? 

And when I got my 40 rows for one page I would select the rest of columns?

I did write some test query and had been surprised that it took almost x2 time less than before.


So here is the new query:

IF OBJECT_ID(N'tempdb..#tempPage', N'U') IS NOT NULL

DROP TABLE #tempPage;


create table #tempPage

(

TotalCount int null

, PictureID int

, OwnerID int null

, PictureName nvarchar(500) null

, [Description] nvarchar(500) null

, Description2 nvarchar(500) null

, Description3 nvarchar(max) null

, aspectRatio decimal(18,2) null

, [min] money null

, [max] money null

, [percent] decimal(18,2) null

, thumbWidth decimal(18,2) null

, thumbHeight decimal(18,2) null

, processState varchar(50) null

, keywords varchar(500) null

)


insert into #tempPage

Select

null --@rowstotal

,PictureID

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])

Order By RatingTotal Desc

OFFSET @pageNumber ROWS FETCH NEXT @pageSize ROWS ONLY


Update t

Set t.OwnerID = p.OwnerID

,t.TotalCount = @rowstotal

,t.PictureName=p.PictureName

,t.[Description]=p.[Description]

,t.Description2=p.Description2

,t.Description3=p.Description3

,t.aspectRatio=p.aspectRatio

,t.[min]=p.[min]

,t.[max]=p.[max]

,t.[percent]=p.percent

,t.thumbWidth=p.thumbWidth

,t.thumbHeight=p.thumbHeight

,t.processState=p.processState

,t.keywords=p.keywords

From #tempPage as t

INNER JOIN [dbo].[tblPictures] as p on p.PictureID = t.PictureID;

select * from #tempPage;

DROP TABLE #tempPage;


 Now we use a temp sql table to select only IDs firstly and then we update all 40 rows and set all other columns' values.

Also do not forget to drop temp table before and after to avoid existing table errors.


So this is the way that reduced my query time from 5 seconds to almost 2 seconds in total.


(SQL Server 2012 version: x64 11.0.3156.0)


Of course, if I would add nonclustered index, as MS SQL Graphical Execution Plan suggests, it may be even faster, but to be able to add them we have to understand what it is and how to use it. So maybe next step will be to learn and add nonclustered index ;)



Thank you and see you soon :)

 







1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y