What is the best way to lookup specific data in a set of lines or columns in VBA  Episode II
We all usually do for (and sometimes for each) loops, and by doing so we largely underutilize some very useful features of Excel, especially the lookup worksheet functions which are very good at fetching things.
This article is a followup to the previous similar article on the best way to find a value in a range or an array in Excel with VBA. We all usually do for (and sometimes for each) loops, and by doing so we largely underutilize some very useful features of Excel, especially the lookup worksheet functions which are very good at fetching things. The amazing comments and replies I got on Linkedin for this article pushed me to continue the previous article and try some of the solutions that were offered by my peer VBA developers. I honestly had no idea what the result would be, and I didn’t even know beforehand that some of these approaches were possible.
Let me redescribe the dataset and search scenario we have been looking at in the previous article. The A:B columns in Sheet2 link a capital city to its country. In Sheet3, column A contains a list of capital cities and we want to allocate a value (a country) to each of these capital cities. To make the timing differences easier to see, the search will be repeated 21 times.
Using a more precise timing function
We are getting serious here. We can’t use anymore the simple Time() VBA function that we previously used, as the precision is to the second. And trust me our journey to optimization today will bring us way below one second! What does a biologist do when he needs to looks at very small things? He takes a microscope! We need to get our own microscope.
Let me introduce for those who don’t know it yet the GetTickCount function from the Kernel32 library. Note that this only works well on 32bit systems. But a quick search on Google should show you a way to make it work on 64bit systems!
Just add this line at the very beginning of the module:
Private Declare Function GetTickCount Lib "kernel32" () As Long
GetTickCount() works similarly to the Time() function, except it counts roughly the number of milliseconds elapsed since your system has started (with a max of 49 days).
The common VBA search methods we previously tried
I retimed the methods I used in the previous article, so that we can precisely compare with the new approaches I will be describing today. If you haven’t done it yet, I encourage you to go through this other article. Note that the run times are different (shorter) than in the previous article. This is linked to my current system. For some reason the for loops are much more competitive all of a sudden.
If working on the range object:
For loop: 10 seconds
For each loop: 5 seconds
Find method: 4 seconds
Vlookup function: 2 seconds
If working on a VBA array:
For loop: 4 seconds
For each loop: 5 seconds
Vlookup function: 7 seconds (so the vlookup function on a VBA array is not always faster than a for loop…)
Using the features of the collection VBA object to lookup values
First we need to build our collection with a string key:
Public DataCollection As Collection
Sub MAKE_COLLECTION()
Set DataCollection = New Collection
For Each c In Sheets("Sheet2").Range("A1:A500")
If c.Value <> "" Then
DataCollection.Add c.Offset(0, 1).Value, Key:=c.Value
End If
Next
End Sub
Now let us use the collection approach which is another rather mainstream one.
Sub test_collection()
On Error Resume Next
t = GetTickCount()
For i = 0 To 20
For Each c In Sheets("Sheet3").Range("A1:A500")
c.Offset(0, 1).Value = DataCollection(c.Value)
Next
Next
Debug.Print (GetTickCount()  t)
End Sub
Run time: 780 milliseconds
Wow, looks like we already crushed the competition! Holdon to your seats because we are only becoming serious now.
Using the range array version of lookup functions
Now I will try the solution proposed by Peter B. using named formulas, named ranges, and the lookup function. This approach has some prerequisites here:
Sub test_Peter()
t = GetTickCount()
For i = 0 To 20
[Etat] = [trouver.Etat]
'trouver.Etat is a named formula
'= IF(LEN(Capitale), LOOKUP(Capitale,data), "" )
'Capitale is a named range with the search values
'data is a named range of two columns Capital  Country
'Etat is named range where found values are placed
Next
Debug.Print (GetTickCount()  t)
End Sub
Peter solution’s run time: 202 milliseconds
Ok, wow that was kickass fast!
Let us break down this solution in a more common expression. Evaluate simply computes the named formula. Using the brackets is similar to using the range method. For example here: [Etat] = Range("Etat"). The LOOKUP function is similar to vLookup and hLookup, do have a look at the specifications of the Lookup function.
So rewriting this solution in a way that is comparable to other solutions, we have:
Sub test_Peter2()
t = GetTickCount()
Set SearchList = Sheets("Sheet3").Range("A1:A500")
Set DataSet = Sheets("Sheet2").Range("A1:B500"
For i = 0 To 20
Sheets("Sheet3").Range("B1:B500") = WorksheetFunction.VLookup(SearchList, DataSet, 2, 1)
Next
Debug.Print (GetTickCount()  t)
End Sub
Run time: 260 milliseconds
Note that Peter’s solution is still slightly faster. I suppose that using named ranges instead of actual range coordinates is a little faster to execute.
There are two very interesting takeaway here:
 Using array lookup functions is very efficient on large datasets,
 Using “binary” lookup instead of “linear” lookup is more efficient on large datasets.
I expect that most people are puzzled, just like I was when I did some search for this article. Let us have a look at what these sentences actually mean.
The array version of the lookup function in Excel VBA
Few people know it (and I included the prearticle version of me here), but the lookup, vlookup and hlookup can also be used on arrays instead of single values. Don’t misunderstand I am not talking about VBA arrays here!
In Excel you can use vlookup in two ways:
 Single cell function: VLOOKUP(OneCell, SearchRange, ColumnNumber, 0)
 Matrix function: {VLOOKUP(SeveralCells, SearchRange, ColumnNumber, 0)}
The binary lookup algorithm
The Binary search was recommended as a comment to my previous article by Michael B., and I have to admit I didn't know what it really was before.
The binary lookup algorithm is a fancy name for what an Excel user would call a nonexact match with the lookup functions. Remember that last parameter in VLOOKUP that you nearly always set to FALSE (or 0)? The 0 stands for the exact match algorithm while the 1 stands for, well, nonexact match.
 Exact Match (Linear search): VLOOKUP(OneCell, SearchRange, ColumnNumber, 0)
 NonExact Match (binary search): VLOOKUP(OneCell, SearchRange, ColumnNumber, 1)
What is the point of a function that doesn’t match the value you are looking for you may wonder? It actually matches with the value that is closest to the value you are looking for.
 The main drawback of this nonexact lookup is that the searched dataset (specifically the column containing the searched values) has to be sorted in ascending order.
 The main advantage is that the nonexact match is much faster to run on very large datasets.
How can it be faster? Instead of looking at all the values linearly, the algorithm will look at the value in the middle of the dataset and compare it to the value you are looking for. For example if middle value > searched value, then it means that the value we are looking for is in the first half of the dataset, and the other half can be discarded, and so on until the “middle value” is equal to the searched value or the dataset can’t be split further. Now you see why the dataset has to be sorted!
Using the binary lookup worksheet functions in VBA
Now we will dig further into this Binary search analysis. Because the search times are really short and the resolution (even that of the millisecond timer) isn’t that good, we need to increase the number of search and reduce the other elements impacting on the search time like worksheet interactions.
Now let us think for a second about what the Binary search algorithm we just described really means. Obviously on short search datasets, the performance difference between the Linear and Binary search is tight. But when the size of the search dataset increases, the binary search algorithm clearly dominates the Linear search algorithm. After only 5 iterations, the Binary search only has to search the remaining $0.5^5 = 3.1%$ of the whole dataset, while the linear search is sadly only 5 rows down the full list!
We can even try and generalize this a little: we want to find the number I of iterations required to reduce the number of items left to search to 1 (finding the result), let there be X records to search through.
$ X . 0.5^I = 1 $ => $ 0.5^I = 1/X $ => $ e^{I . ln(0.5)} = 1/X $ => $ I . ln(0.5) = ln(1/X) $=> $ I = {ln(1/X)}/ {\ln(0.5)} $
Running this calculus for 1,000,000 rows in the dataset, it should only take no more than 20 iterations to find the result, so up to 50,000x faster than the linear search! How insane is that?
To convince us, let us make a little experiment. I have created a silly dataset with 150k rows with integers in the first column sorted in ascending order. A list of 10k items (that exist in the dataset) will be searched in the dataset. The dataset will be progressively enlarged from 10k to 150k lines.
Search dataset 
Search list 

The search dataset will be gradually enlarged from 10k rows to 150k. The dataset look like this:

Find the values associated to a list of 10k integers ranging from 1 to the maximum key number in the currently searched dataset.

The search will be performed with both the linear and binary algorithms.
At the end we compare both search results to be absolutely certain the binary lookup algorithm didn’t “cheat”.
Here comes the code:
Sub gradualIncreaseBinSearch()
Dim seed As Integer
Dim SearchArea As Range, LookedUpValues As Range
seed = 1
Dim ArrayOfLookedUpValues(1 To 10000, 1 To 1) As Long
Dim ArrayOfFoundValuesLin(1 To 10000, 1 To 1) As String
Dim ArrayOfFoundValuesBin(1 To 10000, 1 To 1) As String
For i = 1 To 15
Set SearchArea = Sheets("Sheet5").Range("A2:B" & i & "0000")
' Set the list of values to search
For j = 1 To UBound(ArrayOfLookedUpValues, 1)
ArrayOfLookedUpValues(j, 1) = Rnd(seed) * (WorksheetFunction.Max(Application.WorksheetFunction.Index(SearchArea, 0, 1)))
Next
Debug.Print SearchArea.Address
Sheets("Sheet5").Range("i2:i2001") = ArrayOfLookedUpValues
Set LookedUpValues = Sheets("Sheet5").Range("i2:i2001")
On Error Resume Next
'Binary search
t = GetTickCount()
For k = 1 To UBound(ArrayOfLookedUpValues, 1)
ArrayOfFoundValuesBin(k, 1) = WorksheetFunction.VLookup(ArrayOfLookedUpValues(k, 1), SearchArea, 2, 1)
'LookedUpValues.Offset(0, 1) = WorksheetFunction.VLookup(ArrayOfLookedUpValues, SearchArea, 2, 0)
Next
binTime = (GetTickCount()  t)
t = GetTickCount()
For k = 1 To UBound(ArrayOfLookedUpValues, 1)
ArrayOfFoundValuesLin(k, 1) = WorksheetFunction.VLookup(ArrayOfLookedUpValues(k, 1), SearchArea, 2, 0)
'LookedUpValues.Offset(0, 1) = WorksheetFunction.VLookup(ArrayOfLookedUpValues, SearchArea, 2, 1)
Next
LinTime = (GetTickCount()  t)
Sheets("Sheet6").Cells(i + 1, 1) = i
Sheets("Sheet6").Cells(i + 1, 2) = binTime
Sheets("Sheet6").Cells(i + 1, 3) = LinTime
'check identical found values
Sheets("Sheet6").Cells(i + 1, 4) = "Identical values"
For j = 1 To UBound(ArrayOfLookedUpValues, 1)
If ArrayOfFoundValuesLin(j, 1) <> ArrayOfFoundValuesBin(j, 1) Then
Sheets("Sheet6").Cells(i + 1, 4) = "NOT Identical values"
Exit For
End If
Next
DoEvents
Next
End Sub
Now behold the resulting graph:
What do we see? The Binary search is nearly not impacted by the length of the dataset, while the Linear search is (linearly) impacted by the length of the dataset.
References
 On get tick count : https://msdn.microsoft.com/enus/library/windows/desktop/ms724408%28v=vs.85%29.aspx
The Linkedin groups:
 Excel Developers: https://www.linkedin.com/groups/58704
 Excel and VBA users: https://www.linkedin.com/groups/82527
 Special thanks to Michael B., Peter B., Richard D., Hans H. and the many others who contributed. I haven’t been able to test all the solutions (dictionary, autofilter…) but I will try to remedy this as soon as I have some time!