Trials and Tribulations of SCCM replication…

2017-12-11T12:12:37+00:00 May 1st, 2012|Azure, Cloud, Windows|

Hi there,

A customer I am working with recently implemented a new child primary site and all seemed to be ticking along smoothly until a rather angry DBA turned up to ask why disk space was being consumed at the rate of knots on the SQL server hosting the SCCM databases…First things first, I was impressed with the speed that the DBA noticed this as the database hadn’t actually grown that ‘large’, and as SCCM admins we generally keep an eye on database/backup sizes. 1-0 to the SQL DBA 🙂

Now in my experience a fast growing SCCM database generally indicates one of two things; inbox issues or more commonly – site replication issues. Given this was the first child site to be introduced in the hierarchy therefore introducing the first ‘replication’ of content then this seemed like a good avenue to start investigating.

So first things first. Let’s check the database sizes out ourselves (not that we don’t trust the DBA!). We logged on to the SQL server and reviewed the SQL ‘data’ location where we found the SCCM parent site database had ballooned by approx 20% and the new child site database was nearly the same size as the parent site which is strange given it has approx. 90% less client data!

Now that we know that an issue exists we need to identify exactly what tables contain all this ‘extra data’. To do that we need to fire up SQL Management Studio, and log on to the instance as a user with permissions to run a few queries…

First we ran the following query on  both databases to identify large tables; (please note this query is from a third party and has not been tested and as such this is run at your own risk!)

This demonstrated that on the parent site the following tables were consuming more space than expected;

  • StatusMessageinsstrs
  • StatusMessages
  • StatusMessageAttributes

And on the new child primary site;

  • TS_TaskSequences

This was the interesting table – it was approx 90% the total size of the database and only had 45 rows! The same table on the parent site was approx 2mb and had 52 rows. You can check the size directly in SQL Management Studio if you wish.

So first things first – the status messages. Let’s review the component status view in the SCCM console. From here we got a bit more information – the SMS_DISTRIBUTION_MANAGER component was flagging thousands of notifications. Viewing this component demonstrated that there were several task sequences repeatedly processing every 5 seconds. Ah! Well first things first we have a patch that should be applied to address issues with distribution points repeatadly processing content/packages/task sequences… and this is KB978021. So let’s install that (on any and all DPs) to suppress these status messages.

Now we need a solution to get rid of some of the status messages in the database of the parent site. It is bad practice to interact with the database directly and for obvious reasons is not supported by Microsoft so therefore we need to look at another way. Another blog post came up with the following solution;

  • Fire up WBEMTEST
  • Namespace: rootsmssite_<code>
  • Execute Method
  • Path: sms_statusmessage
  • Select ‘Detect By Query’
  • Edit In Parameters
  • Select WQLSelect
  • Select ‘NOT NULL’
  • Query:
    • Select * from sms_statusmessage where messageid=’number’ and component = ‘SMS_DISTRIBUTION_MANAGER’

This will remove the status messages and can take some time. If there are lots of these to do then you should consider doing this in chunks. (Again, please try this in a test environment first to ensure no adverse affects occur in your production environment) It is worth noting that status messages are cleared up automatically by one of the SCCM site maintenance tasks… by default audit/component status messages are kept for 180 days, and this is configured in the status message filter section of the site settings.

We should now see the size of the parent site database has decreased. Half way there! Whilst we have stopped the status messages being generated we haven’t fixed the underlying issue. Why were these task sequences constantly processing? For that my next avenue was to look in the DISTMGR log file on the child site. Lo and behold I was greeted with the following messages;

“Failed to insert SMS Package <Packge_ID> because SDM Type Content <Unique_ContentID> is not present in the CI_Contents table. Will try later.”

These messages were repeated constantly, and there was 3 different Package ID’s referenced.

Now whilst these messages lead me to the solution it was a bit confusing as the CI_Contents table pertains to Software Updates. Now this site has a replica WSUS and did at one stage host a SUP however this was removed previously as it was added incorrectly. Whether this caused the issue we were seeing is unknown and was not investigated further.

I took the package ID and compared it to the ‘references’ tab on the task sequences that were not replicating and all the task sequences contained these packages which turned out to be Driver Packages. To resolve the issue  I simply re-created the driver package and modified the task sequences to point to the new version. Very simple, and it worked (after restarting the SMS_DISTRIBUTION_MANAGER component)… the messages disappeared from the DISTMGR log.

So we are nearly there, however one final thing we need to do is reclaim that database space to make the DBA happy. There are a number of SQL commands that can be used, however it is important to note that editing the database is not supported unless instructed to do so by Microsoft Premier Support. Doing so can leave you in an unsupported state. The commands are listed here for reference for you to use in your test environments;

  • dbcc cleantable (SMS_CEN, TS_TaskSequence)
  • dbcc shrinkdatabase (SMS_CEN)

Where SMS_CEN is the name of your database…
It is worth noting that these take a long time to run, approximately 1 hour per 10gb.