By Ryszard Jędraszyk


2016-10-29 09:06:56 8 Comments

I navigate with IE, do various things, then select all results option from a list and fire on click event. Once all results have been listed, I loop through their URLs, using the following code to gather data inside these URLs.

What can I do to improve speed of data scraping (apart from VB Script multithreading)?

With

  • IE navigation, visible = false and no images I had 1000 links extracted in 3min 45sec
  • msxml2.XMLHTTP in 3min 30sec
  • msxml2.serverXMLHTTP 1min 40sec - .setRequestHeader "Content-Type", "text/xml" and .setRequestHeader "Accept-Encoding", "compress, gzip" didn't affect performance

Is it better to declare and create objects within this Sub (meaning inside the loop for 42,000 links) or outside and make them public variables?

Public Sub PrintCompanyData()
Dim oDom As Object: Set oDom = CreateObject("htmlFile")
Dim htmlelePopUp As IHTMLElement
Dim unformattedData As String

With CreateObject("msxml2.serverXMLHTTP") 'CreateObject("msxml2.xmlhttp")
    .Open "GET", Link, False
    '.setRequestHeader "Content-Type", "text/xml"   
    '.setRequestHeader "Accept-Encoding", "compress, gzip"
    .send

    If .ReadyState = 4 And .Status = 200 Then

        oDom.body.innerHTML = .responseText

        With WS
            For Each htmlelePopUp In oDom.getElementsByTagName("tbody")
                unformattedData = htmlelePopUp.Children(htmlelePopUp.Children.Length - 1).innerText
                .Range("a" & ItemCount + 1).Value2 = Replace(unformattedData, Chr(10), vbNullString)
            Next htmlelePopUp
        End With

    End If

End With

Set oDom = Nothing

End Sub

2 comments

@Warren Hall 2017-09-26 03:24:14

The line of code ".Open "GET", Link, False" causes the process to be asynchronous. Change that to "True" in the third parameter and the process will run asynchronously. Apparently, VBA is underappreciated.

@Mathieu Guindon 2017-09-26 03:35:50

Runs asynchronously... Your first two sentences are contradictory, and it's not clear how the third relates to anything. Asynchronous... and then what? Sleep-wait until it finishes? VBA is single-threaded, surely you need an event sink to respond to asynchronous processing - and that changes OP's code quite drastically. Mind illustrating? Or was this post meant to be a comment on the OP?

@RubberDuck 2016-10-29 19:31:21

It's likely not a huge improvement, because the vast majority of this program's time will be spent going over the network, but there is a micro-optimization you can make here.

Instead of latebinding like this

With CreateObject("msxml2.serverXMLHTTP")

You can early bind your objects, eliminating the time COM takes to do a dynamic lookup. However, the time this saves will be minuscule compared to the total runtime.


I'm not bashing on VBA, I <3 VBA, but it's just not the right tool for the job if you need it to run faster than this. What you really need is a language that's capable of multi-threading and asynchronous programming to make the most of the CPU time while you're waiting for those Http requests to return. Sounds like it's time to learn some C# or F# to me. Even Powershell may be a good option.

@Ryszard Jędraszyk 2016-10-29 22:23:38

Using an early binding reduced the runtime by 2% on average.

Related Questions

Sponsored Content

1 Answered Questions

[SOLVED] Scraping a whole site using vba

  • 2017-05-28 17:16:56
  • SIM
  • 523 View
  • 4 Score
  • 1 Answer
  • Tags:   vba web-scraping

3 Answered Questions

[SOLVED] Web Scraping with VBA

  • 2015-12-02 16:30:52
  • ForrestA
  • 1674 View
  • 9 Score
  • 3 Answer
  • Tags:   vba web-scraping

1 Answered Questions

[SOLVED] Web scraping using proxy in vba

1 Answered Questions

[SOLVED] Web scraping VBA - Internet Explorer

1 Answered Questions

[SOLVED] Web scraping VBA and VB Script

1 Answered Questions

[SOLVED] Web scraping with VBA

2 Answered Questions

[SOLVED] Fast(er) web scraping with VBA

2 Answered Questions

[SOLVED] Java web scraping robots

Sponsored Content