By dcrosta


2011-01-26 23:17:46 8 Comments

Do any of the AWS APIs/Services provide access to the product reviews for items sold by Amazon? I'm interested in looking up reviews by (ASIN, user_id) tuple. I can see that the Product Advertising API returns a URL to a page (for embedding in an IFRAME) containing the URLs, but I am interested in a machine-readable format of the review data, if possible.

8 comments

@Chandan Purohit 2017-01-12 12:43:05

As said by others above, amazon has discontinued providing reviews in its api. Howevever, i found this nice tutorial to do the same with python. Here is the code he gives, works for me! He uses python 2.7

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Written as part of https://www.scrapehero.com/how-to-scrape-amazon-product-reviews-using-python/      
from lxml import html  
import json
import requests
import json,re
from dateutil import parser as dateparser
from time import sleep

def ParseReviews(asin):
    #This script has only been tested with Amazon.com
    amazon_url  = 'http://www.amazon.com/dp/'+asin
    # Add some recent user agent to prevent amazon from blocking the request 
    # Find some chrome user agent strings  here https://udger.com/resources/ua-list/browser-detail?browser=Chrome
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
    page = requests.get(amazon_url,headers = headers).text

    parser = html.fromstring(page)
    XPATH_AGGREGATE = '//span[@id="acrCustomerReviewText"]'
    XPATH_REVIEW_SECTION = '//div[@id="revMHRL"]/div'
    XPATH_AGGREGATE_RATING = '//table[@id="histogramTable"]//tr'
    XPATH_PRODUCT_NAME = '//h1//span[@id="productTitle"]//text()'
    XPATH_PRODUCT_PRICE  = '//span[@id="priceblock_ourprice"]/text()'

    raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
    product_price = ''.join(raw_product_price).replace(',','')

    raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
    product_name = ''.join(raw_product_name).strip()
    total_ratings  = parser.xpath(XPATH_AGGREGATE_RATING)
    reviews = parser.xpath(XPATH_REVIEW_SECTION)

    ratings_dict = {}
    reviews_list = []

    #grabing the rating  section in product page
    for ratings in total_ratings:
        extracted_rating = ratings.xpath('./td//a//text()')
        if extracted_rating:
            rating_key = extracted_rating[0] 
            raw_raing_value = extracted_rating[1]
            rating_value = raw_raing_value
            if rating_key:
                ratings_dict.update({rating_key:rating_value})

    #Parsing individual reviews
    for review in reviews:
        XPATH_RATING  ='./div//div//i//text()'
        XPATH_REVIEW_HEADER = './div//div//span[contains(@class,"text-bold")]//text()'
        XPATH_REVIEW_POSTED_DATE = './/a[contains(@href,"/profile/")]/parent::span/following-sibling::span/text()'
        XPATH_REVIEW_TEXT_1 = './/div//span[@class="MHRHead"]//text()'
        XPATH_REVIEW_TEXT_2 = './/div//span[@data-action="columnbalancing-showfullreview"]/@data-columnbalancing-showfullreview'
        XPATH_REVIEW_COMMENTS = './/a[contains(@class,"commentStripe")]/text()'
        XPATH_AUTHOR  = './/a[contains(@href,"/profile/")]/parent::span//text()'
        XPATH_REVIEW_TEXT_3  = './/div[contains(@id,"dpReviews")]/div/text()'
        raw_review_author = review.xpath(XPATH_AUTHOR)
        raw_review_rating = review.xpath(XPATH_RATING)
        raw_review_header = review.xpath(XPATH_REVIEW_HEADER)
        raw_review_posted_date = review.xpath(XPATH_REVIEW_POSTED_DATE)
        raw_review_text1 = review.xpath(XPATH_REVIEW_TEXT_1)
        raw_review_text2 = review.xpath(XPATH_REVIEW_TEXT_2)
        raw_review_text3 = review.xpath(XPATH_REVIEW_TEXT_3)

        author = ' '.join(' '.join(raw_review_author).split()).strip('By')

        #cleaning data
        review_rating = ''.join(raw_review_rating).replace('out of 5 stars','')
        review_header = ' '.join(' '.join(raw_review_header).split())
        review_posted_date = dateparser.parse(''.join(raw_review_posted_date)).strftime('%d %b %Y')
        review_text = ' '.join(' '.join(raw_review_text1).split())

        #grabbing hidden comments if present
        if raw_review_text2:
            json_loaded_review_data = json.loads(raw_review_text2[0])
            json_loaded_review_data_text = json_loaded_review_data['rest']
            cleaned_json_loaded_review_data_text = re.sub('<.*?>','',json_loaded_review_data_text)
            full_review_text = review_text+cleaned_json_loaded_review_data_text
        else:
            full_review_text = review_text
        if not raw_review_text1:
            full_review_text = ' '.join(' '.join(raw_review_text3).split())

        raw_review_comments = review.xpath(XPATH_REVIEW_COMMENTS)
        review_comments = ''.join(raw_review_comments)
        review_comments = re.sub('[A-Za-z]','',review_comments).strip()
        review_dict = {
                            'review_comment_count':review_comments,
                            'review_text':full_review_text,
                            'review_posted_date':review_posted_date,
                            'review_header':review_header,
                            'review_rating':review_rating,
                            'review_author':author

                        }
        reviews_list.append(review_dict)

    data = {
                'ratings':ratings_dict,
                'reviews':reviews_list,
                'url':amazon_url,
                'price':product_price,
                'name':product_name
            }
    return data


def ReadAsin():
    #Add your own ASINs here 
    AsinList = ['B01ETPUQ6E','B017HW9DEW']
    extracted_data = []
    for asin in AsinList:
        print "Downloading and processing page http://www.amazon.com/dp/"+asin
        extracted_data.append(ParseReviews(asin))
        sleep(5)
    f=open('data.json','w')
    json.dump(extracted_data,f,indent=4)

if __name__ == '__main__':
    ReadAsin()

Here, is the link to his website reviews scraping with python 2.7

@Tyler Collier 2011-01-28 20:37:04

Update 2:

Please see @jpillora's comment. It's probably the most relevant regarding Update 1.

I just tried out the Product Advertising API (as of 2014-09-17), it seems that this API only returns a URL pointing to an iframe containing just the reviews. I guess you'd have to screen scrape - though I imagine that would break Amazon's TOS.

Update 1:

Maybe. I wrote the original answer below earlier. I don't have time to look into this right now because I'm no longer on a project concerned with Amazon reviews, but their webpage at Product Advertising API states "The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews..." as of 2011-12-08. So I hope someone looks into it and posts back here; feel free to edit this answer.

Original:

Nope.

Here is an intersting forum discussion about the fact including theories as to why: http://forums.digitalpoint.com/showthread.php?t=1932326

If I'm wrong, please post what you find. I'm interested in getting the reviews content, as well as allowing submitting reviews to Amazon, if possible.

You might want to check this link: http://reviewazon.com/. I just stumbled across it and haven't looked into it, but I'm surprised I don't see any mention on their site about the update concerning the drop of Reviews from the Amazon Products Advertising API posted at: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

@dcrosta 2011-01-29 22:26:25

Sigh, I was afraid that would be the answer.

@jpillora 2014-09-17 02:35:48

I just tried out the Product Advertising API (as of 2014-09-17), it seems that this API only returns a URL pointing to an iframe containing just the reviews. I guess you'd have to screen scrape - though I imagine that would break Amazon's TOS.

@zero 2014-09-18 15:21:05

well Tyler Collier - we need to know; just let us know : is it allowed or not!? can we put amazon-reviews on the page or not

@Cory Dee 2015-07-24 18:08:13

As of 24 Jul 15, comment by @jpillora is still correct.

@DataMiningEnthusiast 2015-06-02 08:05:07

You can use Amazon Product Advertising API. It has a Response Group 'Reviews' which you can use with operation 'ItemLookup'. You need to know ASIN i.e. unique item id of the product.

Once you set all the parameters and execute the signed URL, you will receive an XML which contains a link to customer reviews under "IFrameURL" tag.

Use this URL and use pattern searching in html returned from this url to extract the reviews. For each review in the html, there will be a unique review id and under that you can get all the data for that particular review.

@user1029829 2018-03-24 22:55:08

Well, that is exactly what he is saying. The only way to get the reviews is through iframe which is extremely inconvenient.

@thorwhalen 2014-02-24 11:42:45

I would use something like the answer of @mfs above. Unfortunately, his/her answer would only work for up to 10 reviews, since this is the maximum that can be displayed on one page.

You may consider the following code:

import requests

nreviews_re = {'com': re.compile('\d[\d,]+(?= customer review)'), 
               'co.uk':re.compile('\d[\d,]+(?= customer review)'),
               'de': re.compile('\d[\d\.]+(?= Kundenrezens\w\w)')}
no_reviews_re = {'com': re.compile('no customer reviews'), 
                 'co.uk':re.compile('no customer reviews'),
                 'de': re.compile('Noch keine Kundenrezensionen')}

def get_number_of_reviews(asin, country='com'):                                 
    url = 'http://www.amazon.{country}/product-reviews/{asin}'.format(country=country, asin=asin)
    html = requests.get(url).text
    try:
        return int(re.compile('\D').sub('',nreviews_re[country].search(html).group(0)))
    except:
        if no_reviews_re[country].search(html):
            return 0
        else:
            return None  # to distinguish from 0, and handle more cases if necessary

Running this with 1433524767 (which has significantly different number of reviews for the three countries of interest) I get:

>> print get_number_of_reviews('1433524767', 'com')
3185
>> print get_number_of_reviews('1433524767', 'co.uk')
378
>> print get_number_of_reviews('1433524767', 'de')
16

Hope it helps

@user2660784 2013-08-07 12:53:13

According to Amazon Product Advertizing API License Agreement (https://affiliate-program.amazon.com/gp/advertising/api/detail/agreement.html) and specifically it's point 4.b.iii:

You will use Product Advertising Content only ... to send end users to and drive sales on the Amazon Site.

which means it's prohibited for you to show Amazon products reviews taken trough their API to sale products at your site. It's only allowed to redirect your site visitors to Amazon and get the affiliate commissions.

@zero 2014-02-09 20:56:14

so you can use it to show it in a i-frame - arent you please set me straight if i am wrong

@mfs 2013-01-07 20:57:32

Here's my quick take on it - you easily can retrieve the reviews themselves with a bit more work:

countries=['com','co.uk','ca','de']
books=[
        '''http://www.amazon.%s/Glass-House-Climate-Millennium-ebook/dp/B005U3U69C''',
        '''http://www.amazon.%s/The-Japanese-Observer-ebook/dp/B0078FMYD6''',
        '''http://www.amazon.%s/Falling-Through-Water-ebook/dp/B009VJ1622''',
      ]
import urllib2;
for book in books:
    print '-'*40
    print book.split('%s/')[1]
    for country in countries:
        asin=book.split('/')[-1]; title=book.split('/')[3]
        url='''http://www.amazon.%s/product-reviews/%s'''%(country,asin)
        try: f = urllib2.urlopen(url)
        except: page=""
        page=f.read().lower(); print '%s=%s'%(country, page.count('member-review'))
print '-'*40

@Nate Reed 2013-07-06 16:02:05

Nice Python script. It works still.

@Sammaye 2013-08-21 15:31:08

I would note that recent effects in laws (thinking of craiglist here) means that if you do this without their permission and they intend to block individuals like you then you could be breaking the law by using this script

@zero 2013-11-14 18:44:36

works very well - for me

@gphilip 2012-12-08 19:42:54

Unfortunately you can only get an iframe URL with the reviews, the content itself is not accessible.

Source: http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/DG/CHAP_MotivatingCustomerstoBuy.html#GettingCustomerReviews

@Bryce 2012-02-26 09:10:01

Here's the official word on the matter:

Dear Product Advertising API Developer,

On November 8, 2010 the Reviews response group of the Product Advertising API will no longer return customer reviews content and instead will return a link to customer reviews content hosted on Amazon.com. You will be able to display customer reviews on your site using that link. Please refer to the Product Advertising API Developer guide found here for more details. The Reviews response group will continue to function as before until November 8 and the new link to customer reviews is available to you now through the Product Advertising API as well.

@Daniel Magnusson 2012-04-25 18:20:23

This is so useless.. ;(

@zero 2014-09-18 15:20:25

just let us know : is it allowed or not!? can we put amazon-reviews on the page or not

Related Questions

Sponsored Content

6 Answered Questions

[SOLVED] Amazon products API - Looking for basic overview and information

0 Answered Questions

0 Answered Questions

Amazon Product Advertising API returns price zero

2 Answered Questions

How to estimates monthly/daily sales of an item using Amazon Advertising API

3 Answered Questions

How to find out if Amazon itself is selling an item through API?

2 Answered Questions

0 Answered Questions

How better request Amazon Product Advertising API Web service?

3 Answered Questions

1 Answered Questions

Sponsored Content