PowerPivotGeek?

Who is this mystery man?
Click on the icon to find out. Who is powerpivotgeek?

Archives

Steps taken during a PowerPivot data refresh

In this posting we will take a more detailed technical look at how the data refresh facility works and the steps that it takes to accomplish a data refresh cycle. Rather than starting with the “Manage data refresh” page, we will assume that you know how to setup a schedule – in this posting, we will take a deep dive into the cycle itself.

What steps are taken when the data is refreshed?

Now that you have configured your schedule(s) for the workbook, let’s take a step back and examine more closely what data refresh actually means. I think that it is valuable to understand, at some basic level, exactly what the system is going to do on your behalf at 2am in the morning. When a job actually run, the data refresh facility goes through the following steps:

  1. First, the system looks for schedules that are ‘runable’ meaning that their schedule time period has come due. As all of the jobs might be scheduled at close to the same time (midnight, for example, is a popular time :-) ), the system tries to run the job as soon as it can. All of the PowerPivot SharePoint servers are doing this at the same time. Ultimately one of them detects that your job has “come due” and is runable.
  2. After impersonating the Windows user specified in the schedule, the system extracts the workbook from the content database using the SharePoint binary OM. The user must supply a valid Windows account in the schedule and he or she must ensure that that account has contributor (read/write) access rights to the workbook. The workbook is stored in a temporary folder (in the SSAS Backup folder) so it can be used later (see step 9 below).
  3. The system extracts the embedded database from the workbook and loads the database into the local SSAS Engine instance. The database is loaded read/write (so it can be updated). This database is only used for this data refresh job – the system ensures that it is not used for querying while updating is going on (the SSAS processing commands).
  4. If a data source(s) specified for this schedule has custom data source credentials specified for the job, then the data source(s) have their connect string properties changed (in V1 we only support the changing of the “Username” and “Password” properties. This is done using an XMLA command to the data source.
  5. The system impersonates the Windows user for a second time and sends processing commands to the database. This causes the Engine to reach out to the sources and pull updated data into the database. The processing command is not sent to all tables/dimensions. The process commands are sent just to those objects that are dependent on the data source(s) included in the schedule.
  6. The data source credentials (if any) are reset.
  7. The database is saved back to the workbook.
  8. If it is not set already, the embedded connection’s property “Refresh data when opening the file” is set to True. This ensures that users immediately see the new data the workbook opened. It also means that snapshot generation will include the new data in the thumbnail.
  9. Impersonating the Windows account yet a 3rd time, the workbook is saved back to the content database using the SharePoint binary OM. If the document library is a PowerPivot Gallery, then the OM fires its ‘new file’ event handler fires which starts the snapshot generation process. The “new file’ event handler was added by the Gallery content type.
  10. The schedule’s status is updated with information about the job, i.e. its success, failure, error messages, etc.
  11. And finally, the database is converted to a read-only database so it is available to users immediately for querying. This makes the user’s first query as fast as possible and lessens the load on the SharePoint content database since the PowerPivot database is already loaded into memory. 

The end result is that a new, updated workbook has been stored back in the original workbook’s document library – the overall system is primed and ready to go when the workbook is viewed.

 

A few observations:

  • Remember that to edit the schedule, you must enable it. I am always forgetting to check the “Enable” box at the top of the schedule. The radio buttons can still be selected if disabled, but the options will not expand. I cannot tell you how often I’ve sat staring at a page wondering what was wrong, only to realize that I forget to enable the schedule.
  • The schedule is kept independent from the workbook itself. It is stored in the service application database indexed by the SPFile.FileID. This uniquely defines a file on the SharePoint farm. A file can be deleted and its schedule remains. Publish a new file and the schedule automatically picks up.
  • The schedule history (success or failure results w/ error messages) is also kept in the service application database so it can remain a long time after the file has been deleted. While not available from the end-user’s UI (unless they recreate the file), the history information is available via the Mgmt Dashboard – so the information can be in a report (again, long after the file has been deleted).
  • An important point to remember: You specify the Windows user here:  (one per schedule; pick your favorite method – one of the three)image

    You specify the data source user here: (one per data source; again, pick your favorite one of three methods)

    image

    Get these two types of users mixed up and it will be very confusing.

  • Troubleshoot: The ULS logs are your friend. Search Codeplex (http://www.codeplex.com) or your favorite SharePoint web site and pickup a good viewer. You will use it *a lot*.
    Another good ULS viewer is at: http://code.msdn.microsoft.com/ULSViewer 

Enjoy.

  • Share/Bookmark

5 comments to Steps taken during a PowerPivot data refresh

  • Hi Dave,

    Thx for the great post, great to have read this stuff beforehand. But when i see all these impersonation steps i get a little nervous having seen too much kerberos issues with MOSS 2007 and SSAS/SSRS. Is this going to work the same in 2010?

    Thanks,
    Kasper de Jonge

  • I agree, but we aren’t using Kerberos. This is regular NTLM. The data refresh facilities does a true Win32 logon using the uid/pwd provided in the schedule. You are right, I should have been more technically accurate by saying ‘logon’ instead of ‘impersonation’. Sorry.

    After the logon, then you get one hop from your AS machine to your data source. The only time we need Kerberos is if an additional hop is needed, e.g. you are querying against a data source that is really a linked server to a 3rd machine.

    BTW: Keep up the dialog . . . it is cool that folks are actually reading this stuff :-)
    Thanks.

  • >> “Is this going to work the same in 2010?”

    If by “2010″, you means “SharePoint 2010″ then the answer is “Yes” and “No”. SP has implemented a new claims system based on the Geneva Framework. This means that you don’t need Kerberos *within* the farm; nor do you need Kerberos to talk to PowerPivot. However, since the backend Windows token that the claims system gives Excel Services (or others) is an Identity token, it means that Kerberos is still needed to talk to normal data sources, e.g. refreshing data in a workbook coming from SQL RDBMS; or SQL SSAS servers. Kerberos isn’t needed w/ PowerPivot because we support the claims based system natively like other SharePoint services.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>