Enabling the Bulk API with the Salesforce Data Loader
Part of the Salesforce Winter '10 release was something that the release notes and folks at salesforce.com refer to as the Bulk API. Also, you may have read that the "Bulk Loader" is now available and it will now take longer to count the records than it will to load them, blah, blah, blah...
Since I work with customers that have large numbers of records I thought I would look into getting the "Bulk Loader" and using it to tweak some 3 million account records in a Sandbox. I figured this would be a good test and a way to get a feel for how much faster this tool would be than the existing Data Loader.
So I went out to Salesforce and tried to find the "Bulk Loader" but was unsuccessful. I then searched Google to find out where to get the loader. I still couldn't find an actual installation link but I did find a short video on the developer boards walking through how to use the "Bulk Loader." What was I missing? How could this tool be so useful yet so difficult to find? Was it in some sort of developer edition only release?
It turns out that the "Bulk Loader" is actually the same old "Data Loader" that we've been using for years now. There is simply a setting that must be enabled in the tool to allow for use of the bulk API when inserting, upserting & updating records. Doh!
I don't recall how I figured out that the "Bulk Loader" was the same as the Data Loader but when I did I was thinking "man, I am a tool." So I opened the Data Loader and went into the settings to enable use of the Bulk API and guess what, no option. What the...
Oh, you have to install the latest and greatest "Data Loader" to get the "Bulk Loader." Now I'm just upset at myself that it's taken me so long to figure this out and why wouldn't someone at salesforce.com simply write that the Bulk Loader and Data Loader are synonymous? Better yet why not refer to the damn thing as the Data Loader so there is absolutely no confusion?
Semantics aside, you can get the latest version of the Data Loader by logging into Salesforce and following this click stream: Setup > Data Management (Under Administration Setup) > Data Loader > Download the Data Loader. Then follow the installation steps.
In order to enable the Data Loader to insert, upsert and update records using the Bulk API you need to launch the Data Loader. Click the Settings heading and the Settings option from the list and a new window will be displayed. In that window you can then check the box next to the option reading "Use Bulk API for Insert, Update, and Insert." Then click the "OK" button. The "Bulk Loader" (if that is what you want to call it) is now enabled. Please see the image below for the exact steps from this paragraph.
Alright, let's get to the results of my 3 million account record updates. The data loaded very fast. However, I'd like to point out the one item that I found to be an issue with the Bulk Loader. It doesn't load NULL values. More specifically, if you use the Bulk API with the Data Loader and you want to eliminate a field value from some of your existing Salesforce records it will not actually make that field update.
For example, let's say you've got an Account record in Salesforce with a Site field value of "Buffalo, New York 14202." Now let's say that you want to update that field to be empty/null using the Bulk API and the Data Loader. So your CSV file contains the Id for this account and a column for Site and the cell has no data in it. When you load the data and check the record in Salesforce the data for the Site will still be there.
What's the point? Well, if you know that some of your data being updated is going to need to be nulled out (or cleared out) then you cannot use the Bulk API feature of the Data Loader. You'll need to use the same old Data Loader functionality where records are updated 200 at a time and you have to wait for the data to load like it always has. This also means that you'll need to make sure that the "Insert null values" option is checked from the Data Loader settings after you uncheck the "Use Bulk API for Insert, Update, and Insert" option.
Overall, I think the Bulk API for the Data Loader is a cool concept and a step in the right direction for salesforce.com. And when I am absolutely sure that all of my records in a CSV file contain values and will over-write the existing data in Salesforce then I will definitely use the Bulk API for updates. However, when I am updating millions of records from a source (external) system into Salesforce I simply cannot take the time to determine if some of the values from the source are being nulled out prior to loading my data. In this case I will need to rely on the regular data loading capabilities of the Data Loader. And this is not very cool.
Of course, I'm not really any worse off than I was prior to the rollout of the Bulk API so I shouldn't complain.