By Aloehart

2013-08-05 18:51:24 8 Comments

I've been using this site for a long time to find answers to my questions, but I wasn't able to find the answer on this one.

I am working with a small group on a class project. We're to build a small "game trading" website that allows people to register, put in a game they have they want to trade, and accept trades from others or request a trade.

We have the site functioning long ahead of schedule so we're trying to add more to the site. One thing I want to do myself is to link the games that are put in to Metacritic.

Here's what I need to do. I need to (using asp and c# in visual studio 2012) get the correct game page on metacritic, pull its data, parse it for specific parts, and then display the data on our page.

Essentially when you choose a game you want to trade for we want a small div to display with the game's information and rating. I'm wanting to do it this way to learn more and get something out of this project I didn't have to start with.

I was wondering if anyone could tell me where to start. I don't know how to pull data from a page. I'm still trying to figure out if I need to try and write something to automatically search for the game's title and find the page that way or if I can find some way to go straight to the game's page. And once I've gotten the data, I don't know how to pull the specific information I need from it.

One of the things that doesn't make this easy is that I'm learning c++ along with c# and asp so I keep getting my wires crossed. If someone could point me in the right direction it would be a big help. Thanks


@jasniec 2019-11-01 16:04:12

I'd recomend you WebsiteParser - it's based on HtmlAgilityPack (mentioned by Hanlet EscaƱo) but it makes web scraping easier with attributes and css selectors:

class PersonModel
    public DateTime BirdthDate { get; set; }

// ...

PersonModel person = WebContentParser.Parse<PersonModel>(html);

Nuget link

@Hanlet Escaño 2013-08-05 20:00:15

This small example uses HtmlAgilityPack, and using XPath selectors to get to the desired elements.

protected void Page_Load(object sender, EventArgs e)
    string url = "";
    var web = new HtmlAgilityPack.HtmlWeb();
    HtmlDocument doc = web.Load(url);

    string metascore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
    string userscore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
    string summary = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;

An easy way to obtain the XPath for a given element is by using your web browser (I use Chrome) Developer Tools:

  • Open the Developer Tools (F12 or Ctrl + Shift + C on Windows or Command + Shift + C for Mac).
  • Select the element in the page that you want the XPath for.
  • Right click the element in the "Elements" tab.
  • Click on "Copy as XPath".

You can paste it exactly like that in c# (as shown in my code), but make sure to escape the quotes.

You have to make sure you use some error handling techniques because Web Scrapping can cause errors if they change the HTML formatting of the page.


Per @knocte's suggestion, here is the link to the Nuget package for HTMLAgilityPack:

@Aloehart 2013-09-13 05:38:33

It took a while to realize that my browser pulled xpaths differently than what the html agility pack used, but once I figured out that issue it only took a few hours to get a functioning setup of what I wanted to do. Thank you for your help.

@Djeroen 2015-11-04 19:23:38

@Aloehart how does the html agility pack want the xpaths? i think i'm having the same problem

@cheesey_toastie 2016-02-25 10:55:49

Be warned - Chrome "Fixes" html. So say you query an element in a table, if the source html doesn't have the correct <tbody> section Chrome will add it to render the page and return this in the xpath. For your code you'll NOT want the tbody part of the path. To check view the source of the page and sense check your xpath.

@Tomi 2017-12-15 09:45:27

I prefer to use CSS selectors, then go with the Dcsoup.

@knocte 2019-07-05 06:10:12

how about updating this answer to mention the nuget package?

@Jason Goemaat 2015-11-17 12:26:16

I recommend Dcsoup. There's a nuget package for it and it uses CSS selectors so it is familiar if you use jquery. I've tried others but it is the best and easiest to use that I've found. There's not much documentation, but it's open source and a port of the java jsoup library that has good documentation. (Documentation for the .NET API here.) I absolutely love it.

var timeoutInMilliseconds = 5000;
var uri = new Uri("");
var doc = Supremes.Dcsoup.Parse(uri, timeoutInMilliseconds);

// <span itemprop="ratingValue">86</span>
var ratingSpan = doc.Select("span[itemprop=ratingValue]");
int ratingValue = int.Parse(ratingSpan.Text);

// selectors match both critic and user scores
var scoreDiv = doc.Select("div.score_summary");
var scoreAnchor = scoreDiv.Select("a.metascore_anchor");
int criticRating = int.Parse(scoreAnchor[0].Text);
float userRating = float.Parse(scoreAnchor[1].Text);

@Jose A 2016-11-03 18:14:04

Awesome! Thanks a lot. I wonder why it doesn't have the Docs online... It would have been pretty slick that way!

@rTECH 2017-07-13 15:05:11

It is just what I needed. Strangely, however, this created Culture-related problems for me: when I tried to float.Parse() an HTML text element that had a fraction (e.g., 7.5), I got a parsing error, because my Culture settings is different (e.g., 7.5 is 7,5). So, whenever I used parsing with fractions, I had to include CultureInfo.InvariantCulture as an optional parameter from the namespace System.Globalization and afterwards it worked fine.

@knocte 2019-07-05 10:20:13

have been trying complex selectors like table:nth-child(1) tr:nth-child(1) td:nth-child(1) or table:nth-child(1)>tbody>tr:nth-child(1)>td:nth-child(1) but they don't seem to work :(

@JeremiahDotNet 2013-08-05 20:13:23

I looked and doesn't have an API.

You can use an HttpWebRequest to get the contents of a website as a string.

using System.Net;
using System.IO;
using System.Windows.Forms;

string result = null;
string url = "";
WebResponse response = null;
StreamReader reader = null;

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.Method = "GET";
    response = request.GetResponse();
    reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
    result = reader.ReadToEnd();
catch (Exception ex)
    // handle error
    if (reader != null)
    if (response != null)

Then you can parse the string for the data that you want by taking advantage of Metacritic's use of meta tags. Here's the information they have available in meta tags:

  • og:title
  • og:type
  • og:url
  • og:image
  • og:site_name
  • og:description

The format of each tag is: meta name="og:title" content="In a World..."

@FistOfFury 2018-11-13 14:40:58

This is the best answer because it separates the actual "scraping" (fetching the HTML from the site) from the parsing. Parsing HTML can be done in a separate process.

Related Questions

Sponsored Content

9 Answered Questions

1 Answered Questions

[SOLVED] Pulling SQL data into an web page

  • 2016-06-22 00:56:24
  • Robert
  • 50 View
  • 0 Score
  • 1 Answer
  • Tags:   c# webpage

2 Answered Questions

[SOLVED] Display data dynamically

  • 2012-05-15 19:12:35
  • drinu16
  • 332 View
  • 0 Score
  • 2 Answer
  • Tags:   c#

1 Answered Questions

[SOLVED] App that will pull data from website flash game

1 Answered Questions

[SOLVED] pulling data from a dataset

  • 2014-06-06 20:23:39
  • Voxum
  • 63 View
  • 0 Score
  • 1 Answer
  • Tags:

1 Answered Questions

[SOLVED] Data Displayed Twice In Gridview (ASP.NET)

  • 2010-10-12 10:18:04
  • mickburkejnr
  • 8459 View
  • 5 Score
  • 1 Answer
  • Tags: gridview

1 Answered Questions

C# post to ASP and return response in the browser

  • 2010-10-25 12:31:57
  • shaiss
  • 233 View
  • 1 Score
  • 1 Answer
  • Tags:   c# post

Sponsored Content