<pedrocorreia.net ⁄>
corner
<mySearch ⁄> <mySearch ⁄>

corner
 
corner
<mySnippets order="rand" ⁄> <mySnippets order="rand" ⁄>

corner
 
corner
<myContacts ⁄> <myContacts ⁄>

<email ⁄>


pc@pedrocorreia.net

<windows live messenger ⁄>


pedrojacorreia@hotmail.com

<myCurriculum type="pdf" ⁄>


Download
corner
 
corner
<myBlog show="last" ⁄> <myBlog show="last" ⁄>

corner
 
corner
<myNews show="rand" ⁄> <myNews show="rand" ⁄>

corner
 
corner
<myNews type="cat" ⁄> <myNews type="cat" ⁄>

corner
 
corner
<myQuote order="random" ⁄> <myQuote order="random" ⁄>

corner
 
corner
<myPhoto order="random" ⁄> <myPhoto order="random" ⁄>

<pedrocorreia.net ⁄>
corner
 
corner
<myAdSense ⁄> <myAdSense ⁄>

corner
 
corner
<myVisitorsMap ⁄> <myVisitorsMap ⁄>

corner
 
 

<A So Called IMDB API ⁄ >




clicks: 22941 22941 2008-11-08 2008-11-08 goto mySnippets mySnippets asp.net  Download  Bookmark This Bookmark This



As far as I know IMDB doesn't provide any official API, there's a Pro version but I think that doesn't give you access to their database, what I heard is that IMDB will do something in that area, but for the moment, there's no official solution.



Everyone (or at least almost) knows about IMDB, it has a huge movies database, you can check many info about the movie, such as: release date, director, producer, cast, user rating, user comments, etc, etc.



If for instance, you're building a personal movie database and want to keep track all these information's, you could easily make a copy+paste, but that'd cost time, and of course, longer the movie list, longer the time will be taken.


So let's just build a small snippet that will fetch all that info for us.
We'll use a technique that it's called screen scrapping, basically our application will grab html page that corresponds to our movie.

In this snippet I only implemented the direct link, i.e., we'll have to know the movie ID ("ttXXXXXXX"), so that we can read the page directly.






We'll examine the html code by analyzing some html tags, for instance:

<title>Wo hu cang long (2000)</title>

We know that in the header tag title we'll have our movie's name;





For retrieving the run time:

<div class="info">
<h5>Runtime:</h5>
120 min
</div>


And so on, and so on...




As you can see there's a pattern in this, an init tag and a closing tag, you want to get that tag's contents. This is the generic case, for instance to get the actors list we'll have to use a different approach, but very similar in the core.
You can do this with regular expressions, although, I'll use String operations.


Please note that, since we're using screen scrapping, of course this method can be outdated at any time, if IMDB's web-developers change html code, then you have to tweak your code too, but, let's just cross our fingers ^_^''




Here goes the code:


style.css:
  1. *{padding: 0; margin: 0;}
  2.  
  3. body{
  4. font-family: Verdana, Tahoma, Arial, sans-serif;
  5. color: #000;background: #fff;font-size: 62.5%;
  6. text-align: left;margin-top: .5em;
  7. }
  8. div#container,div#search{
  9. margin: 0 auto;overflow:hidden;
  10. width: 80em;padding: .8em;
  11. border: solid 0.6em #3F4C6B;
  12. }
  13. div#search{text-align: center;}
  14.  
  15. label, span, strong{font-size: 1.2em;color: #D01F3C;}
  16. strong{color: #356AA0 !important; font-weight: 900;}
  17.  
  18. input{
  19. border: solid .1em #FF7400;width: 10em;
  20. color: #B02B2C;background-color: #FFFF88;
  21. }
  22.  
  23. input#txtFilmID{
  24. font-size: 2.8em;border-color: #FF7400;
  25. text-align: center;
  26. }
  27.  
  28. input#btnFilmViewStats{
  29. background-color: #356AA0;color: #F9F7ED;
  30. font-size: 1.8em;
  31. }
  32.  
  33. div#FilmInfo{
  34. float: left;text-align: justify;
  35. padding: .6em;line-height: 1.6em;width: 65em;
  36. border: solid 0.4em #6BBA70;
  37. }
  38.  
  39. div#FilmPoster{
  40. float: right;
  41. padding: .6em;right: .8em;top: .8em;
  42. border: solid 0.4em #6BBA70;
  43. }
  44.  
  45. a{text-decoration: none;color: #FF7400;}
  46. a:hover{text-decoration: underline;}



SoCalledImdbApi.vb, this will be our main Class, it will grab webpage's html and parse it. In this snippet I'll use VB.NET, however porting it to C# won't be difficult.
  1. Imports Microsoft.VisualBasic
  2. Imports System.Net
  3. Imports System.IO
  4. Imports System.Text
  5.  
  6. ''' <summary>
  7. ''' Class responsible for screen scrapping in www.imdb.com
  8. ''' </summary>
  9. ''' <author>pedrocorreia.net</author>
  10. Public Class SoCalledImdbApi
  11.  
  12. Private Const _imdb_url As String = "http://www.imdb.com"
  13. Private _movie_url As String = ""
  14. Private _movie_id As String = ""
  15. Private _html As String = ""
  16.  
  17. ''' <summary>
  18. ''' Construtor
  19. ''' </summary>
  20. ''' <param name="movie_id">Movie ID, must have format "tt0000000"</param>
  21. Public Sub New(ByVal movie_id As String)
  22. If Me._IsValid(movie_id) Then
  23. MovieID = movie_id
  24. Me._SetMovieURL()
  25. Me.FetchInfo()
  26. Else
  27. Throw New Exception("INVALID_MOVIE_ID_FORMAT")
  28. End If
  29. End Sub
  30.  
  31. ''' <summary>
  32. ''' Get Webpage HTML
  33. ''' </summary>
  34. Private Sub FetchInfo()
  35. Dim pedido As WebRequest = WebRequest.Create(Me._movie_url)
  36. Dim resposta As HttpWebResponse = CType(pedido.GetResponse(), HttpWebResponse)
  37. Dim dataStream As Stream = resposta.GetResponseStream()
  38. Dim reader As New StreamReader(dataStream)
  39. Me._html = reader.ReadToEnd()
  40. End Sub
  41.  
  42. ''' <summary>
  43. ''' Getter/ Setter movie ID
  44. ''' Format: tt0000000
  45. ''' </summary>
  46. ''' <value>Movie ID, format "tt0000000"</value>
  47. ''' <returns>Movie ID, format "tt0000000"</returns>
  48. Public Property MovieID() As String
  49. Get
  50. Return Me._movie_id
  51. End Get
  52.  
  53. Set(ByVal value As String)
  54. Me._movie_id = value
  55. End Set
  56. End Property
  57.  
  58. ''' <summary>
  59. ''' Get Movie Title
  60. ''' </summary>
  61. ''' <returns>String</returns>
  62. Public Function GetMovieTitle() As String
  63. GetMovieTitle = Me._GetGenericInfo("<title>", "</title>")
  64. End Function
  65.  
  66. ''' <summary>
  67. ''' Get Poster
  68. ''' </summary>
  69. ''' <returns>String</returns>
  70. ''' <remarks>Returns complete img tag</remarks>
  71. Public Function GetMoviePoster() As String
  72. GetMoviePoster = Me._GetGenericInfo("<div class=""photo"">", "</div>")
  73. End Function
  74.  
  75. ''' <summary>
  76. ''' Get Language
  77. ''' </summary>
  78. ''' <returns>String</returns>
  79. Public Function GetMovieLanguage() As String
  80. GetMovieLanguage = Me._GetGenericInfo("<h5>Language:</h5>", "</div>")
  81. End Function
  82.  
  83. ''' <summary>
  84. ''' Get Genre
  85. ''' </summary>
  86. ''' <returns>String</returns>
  87. Public Function GetMovieGenre() As String
  88. GetMovieGenre = Me._GetGenericInfo("<h5>Genre:</h5>", "</div>")
  89. End Function
  90.  
  91. ''' <summary>
  92. ''' Get Tagline
  93. ''' </summary>
  94. ''' <returns>String</returns>
  95. Public Function GetMovieTagline() As String
  96. GetMovieTagline = Me._GetGenericInfo("<h5>Tagline:</h5>", "</div>")
  97. End Function
  98.  
  99. ''' <summary>
  100. ''' Get Director
  101. ''' </summary>
  102. ''' <returns>String</returns>
  103. Public Function GetMovieDirector() As String
  104. GetMovieDirector = Me._GetGenericInfo("<h5>Director:</h5>", "</div")
  105. End Function
  106.  
  107. ''' <summary>
  108. ''' Get Writers
  109. ''' </summary>
  110. ''' <returns>String</returns>
  111. Public Function GetMovieWriters() As String
  112. Dim writers As String = Me._GetGenericInfo("<h5>Writers:</h5>", "</div>")
  113.  
  114. If String.IsNullOrEmpty(writers) Then
  115. writers = Me._GetGenericInfo("<h5>Writers <a href=""/wga"">(WGA)</a>:</h5>", "</div>")
  116. End If
  117.  
  118. If String.IsNullOrEmpty(writers) Then
  119. writers = Me._GetGenericInfo("<h5>Writer:</h5>", "</div>")
  120. End If
  121.  
  122. If String.IsNullOrEmpty(writers) Then
  123. writers = Me._GetGenericInfo("<h5>Writer: <a href=""/wga"">(WGA)</a>:</h5>", "</div>")
  124. End If
  125.  
  126. GetMovieWriters = writers
  127. End Function
  128.  
  129. ''' <summary>
  130. ''' Get plot
  131. ''' </summary>
  132. ''' <returns>String</returns>
  133. Public Function GetMoviePlot() As String
  134. GetMoviePlot = Me._GetGenericInfo("<h5>Plot:</h5>", "</div")
  135. End Function
  136.  
  137. ''' <summary>
  138. ''' Get Runtime
  139. ''' </summary>
  140. ''' <returns>String</returns>
  141. Public Function GetMovieRuntime() As String
  142. GetMovieRuntime = Me._GetGenericInfo("<h5>Runtime:</h5>", "</div")
  143. End Function
  144.  
  145. ''' <summary>
  146. ''' Get user comments (short description)
  147. ''' </summary>
  148. ''' <returns>String</returns>
  149. Public Function GetMovieShortUserComment() As String
  150. GetMovieShortUserComment = Me._GetGenericInfo("<h5>User Comments:</h5>", "<a")
  151. End Function
  152.  
  153. ''' <summary>
  154. ''' Get user comments (long description)
  155. ''' </summary>
  156. ''' <returns>String</returns>
  157. Public Function GetMovieFullUserComment() As String
  158. GetMovieFullUserComment = Me._GetGenericInfo("Author:", "<div class=""yn""", True)
  159. End Function
  160.  
  161. ''' <summary>
  162. ''' Get imdb movie link
  163. ''' </summary>
  164. ''' <returns>String</returns>
  165. Public Function GetMovieLink() As String
  166. GetMovieLink = String.Format("<a href='{0}' target='_blank'>{0}</a>", Me._movie_url)
  167. End Function
  168.  
  169. ''' <summary>
  170. ''' Get Release Date
  171. ''' </summary>
  172. ''' <returns>String</returns>
  173. Public Function GetMovieYear() As String
  174. GetMovieYear = Me._GetGenericInfo("<h5>Release Date:</h5>", "<a")
  175. End Function
  176.  
  177. ''' <summary>
  178. ''' Get User Rating
  179. ''' </summary>
  180. ''' <returns>String</returns>
  181. Public Function GetUserRating() As String
  182. GetUserRating = Me._GetGenericInfo("<div class=""meta"">", "</div>")
  183. End Function
  184.  
  185. ''' <summary>
  186. ''' Get Cast list with actors real and characters name
  187. ''' </summary>
  188. ''' <returns>String</returns>
  189. Public Function GetCast() As String
  190. Dim init_tag_actor As String = "<td class=""nm"">", init_tag_char As String = "<td class=""char"">"
  191. Dim end_tag As String = "</td>"
  192. Dim aux_str_actor As String = "", aux_str_char As String = ""
  193. Dim str As New StringBuilder()
  194. Dim cur_pos As Integer = 0, start_at As Integer = 0, end_at As Integer = 0
  195. Dim size_init_tag_actor As Integer = init_tag_actor.Length
  196. Dim size_init_tag_char As Integer = init_tag_char.Length
  197.  
  198. While cur_pos <> -1
  199. 'get actors real name
  200. cur_pos = Me._html.IndexOf(init_tag_actor, end_at)
  201. aux_str_actor = "" : aux_str_char = ""
  202.  
  203. If cur_pos > -1 Then
  204. start_at = cur_pos + size_init_tag_actor
  205. end_at = Me._html.IndexOf(end_tag, start_at)
  206. aux_str_actor = _StripTags(Me._html.Substring(start_at, (end_at - start_at)))
  207. End If
  208.  
  209. 'get actors movie name
  210. cur_pos = Me._html.IndexOf(init_tag_char, end_at)
  211. If cur_pos > -1 Then
  212. start_at = cur_pos + size_init_tag_char
  213. end_at = Me._html.IndexOf(end_tag, start_at)
  214. aux_str_char = _StripTags(Me._html.Substring(start_at, (end_at - start_at)))
  215. End If
  216.  
  217. If Not String.IsNullOrEmpty(aux_str_actor) Or Not String.IsNullOrEmpty(aux_str_char) Then
  218. str.Append(String.Format("{0} ({1}); ", aux_str_actor, aux_str_char))
  219. End If
  220. End While
  221.  
  222. GetCast = str.ToString()
  223. End Function
  224.  
  225. ''' <summary>
  226. ''' Get generic info
  227. ''' </summary>
  228. ''' <param name="init_tag">Initial Tag</param>
  229. ''' <param name="end_tag">Ending Tag</param>
  230. ''' <param name="include_br">Include tag br ?</param>
  231. ''' <returns>String</returns>
  232. Private Function _GetGenericInfo(ByVal init_tag As String, ByVal end_tag As String, Optional ByVal include_br As Boolean = False) As String
  233. Dim size_init_tag As Integer = init_tag.Length
  234.  
  235. Dim start_at As Integer = Me._html.IndexOf(init_tag)
  236. If start_at = -1 Then Return ""
  237.  
  238. start_at = start_at + size_init_tag
  239. Dim end_at As Integer = Me._html.IndexOf(end_tag, start_at)
  240.  
  241. _GetGenericInfo = _StripTags(Me._html.Substring(start_at, (end_at - start_at)), include_br)
  242. End Function
  243.  
  244. ''' <summary>
  245. ''' Strip all tags, except img and anchor
  246. ''' </summary>
  247. ''' <param name="txt">String</param>
  248. ''' <returns>String</returns>
  249. Private Function _StripTags(ByVal txt As String, Optional ByVal include_br As Boolean = False) As String
  250. 'convert <br/> e converter <p> para <br>
  251. Dim str As String = txt.Trim().Replace("<p>", "<br/>").Replace("<br>", "<br/>")
  252.  
  253. Dim strip_tags As String = "b,i,u,title,link,small,p,title,div"
  254. Dim splitted_str As String() = strip_tags.Split(New [Char]() {","})
  255. For Each s As String In splitted_str
  256. str = str.Replace(String.Format("<{0}>", s), String.Empty)
  257. str = str.Replace(String.Format("</{0}>", s), String.Empty)
  258. Next
  259.  
  260. If str.EndsWith("<br/>") Then str = str.Remove(str.LastIndexOf("<br/>"))
  261.  
  262. If include_br = False Then str = str.Replace("<br/>", ". ")
  263.  
  264. 'imdb links are relatives, let's make them absolutes
  265. If str.Contains("<a") Then
  266. str = str.Replace("href=""", "href=""" & SoCalledImdbApi._imdb_url)
  267. str = str.Replace("<a", "<a target=""_blank"" ")
  268. End If
  269.  
  270. 'we have a few special cases, like synopsis and ratings
  271. Dim aux_str As String = String.Format("{0}synopsis", SoCalledImdbApi._imdb_url)
  272. If str.Contains(aux_str) Then str = str.Replace(aux_str, String.Format("{0}synopsis", Me._movie_url))
  273. aux_str = String.Format("{0}ratings", SoCalledImdbApi._imdb_url)
  274. If str.Contains(aux_str) Then str = str.Replace(aux_str, String.Format("{0}ratings", Me._movie_url))
  275.  
  276. _StripTags = str
  277. End Function
  278.  
  279. ''' <summary>
  280. ''' Is movie ID valid?
  281. ''' </summary>
  282. ''' <param name="movie_id"></param>
  283. ''' <returns>Boolean</returns>
  284. Private Function _IsValid(ByVal movie_id As String) As Boolean
  285. _IsValid = Regex.IsMatch(movie_id, "^tt[0-9]{7}$")
  286. End Function
  287.  
  288. ''' <summary>
  289. ''' Set movie URL
  290. ''' </summary>
  291. ''' <remarks></remarks>
  292. Private Sub _SetMovieURL()
  293. Me._movie_url = String.Format("{0}/title/{1}/", SoCalledImdbApi._imdb_url, MovieID)
  294. End Sub
  295.  
  296. End Class








Default.aspx.vb, our code-behind file:
  1. Partial Class _Default
  2. Inherits System.Web.UI.Page
  3. Protected Sub btnFilmViewStats_Click(ByVal sender As Object, ByVal e As System.EventArgs) _
  4. Handles btnFilmViewStats.Click
  5.  
  6. Try
  7. Dim imdb As New SoCalledImdbApi(Me.txtFilmID.Text)
  8.  
  9. FilmPoster.InnerHtml = imdb.GetMoviePoster()
  10. lblTitle.Text = imdb.GetMovieTitle()
  11. lblRating.Text = imdb.GetUserRating()
  12. lblDirector.Text = imdb.GetMovieDirector()
  13. lblWriters.Text = imdb.GetMovieWriters()
  14. lblCast.Text = imdb.GetCast()
  15. lblLanguage.Text = imdb.GetMovieLanguage()
  16. lblGenre.Text = imdb.GetMovieGenre()
  17. lblYear.Text = imdb.GetMovieYear()
  18. lblTagLine.Text = imdb.GetMovieTagline()
  19. lblPlot.Text = imdb.GetMoviePlot()
  20. lblRuntime.Text = imdb.GetMovieRuntime()
  21. lblShortComment.Text = imdb.GetMovieShortUserComment()
  22. lblFullComment.Text = imdb.GetMovieFullUserComment()
  23. lblLink.Text = imdb.GetMovieLink()
  24.  
  25. imdb = Nothing
  26. Catch ex As Exception
  27. lblTitle.Text = ex.Message
  28. End Try
  29.  
  30. End Sub
  31.  
  32. End Class




Default.aspx, in here I'll just want to make a note about validators;
  1. <%@ Page Language="VB" AutoEventWireup="false" CodeFile="Default.aspx.vb" Inherits="_Default" EnableViewState="false" %>
  2.  
  3. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  4. <html xmlns="http://www.w3.org/1999/xhtml">
  5. <head runat="server">
  6. <title>A So Called IMDB tiny API?!</title>
  7. <link href="StyleSheet.css" rel="stylesheet" type="text/css" />
  8. </head>
  9. <body>
  10. <form id="form1" runat="server">
  11. <div id="container">
  12. <div id="FilmInfo">
  13. <strong>Title: </strong>
  14. <asp:Label ID="lblTitle" runat="server"></asp:Label><br />
  15. <strong>User Rating: </strong>
  16. <asp:Label ID="lblRating" runat="server"></asp:Label><br />
  17. <strong>Director: </strong>
  18. <asp:Label ID="lblDirector" runat="server"></asp:Label><br />
  19. <strong>Writer(s): </strong>
  20. <asp:Label ID="lblWriters" runat="server"></asp:Label><br />
  21. <strong>Cast: </strong>
  22. <asp:Label ID="lblCast" runat="server"></asp:Label><br />
  23. <strong>Language: </strong>
  24. <asp:Label ID="lblLanguage" runat="server"></asp:Label><br />
  25. <strong>Genre: </strong>
  26. <asp:Label ID="lblGenre" runat="server"></asp:Label><br />
  27. <strong>Date: </strong>
  28. <asp:Label ID="lblYear" runat="server"></asp:Label><br />
  29. <strong>Tagline: </strong>
  30. <asp:Label ID="lblTagLine" runat="server"></asp:Label><br />
  31. <strong>Plot: </strong>
  32. <asp:Label ID="lblPlot" runat="server"></asp:Label><br />
  33. <strong>Runtime: </strong>
  34. <asp:Label ID="lblRuntime" runat="server"></asp:Label><br />
  35. <strong>Short Comment: </strong>
  36. <asp:Label ID="lblShortComment" runat="server"></asp:Label><br />
  37. <strong>Full Comment: </strong>
  38. <asp:Label ID="lblFullComment" runat="server"></asp:Label><br />
  39. <strong>Link: </strong>
  40. <asp:Label ID="lblLink" runat="server"></asp:Label><br />
  41. </div>
  42. <div runat="server" id="FilmPoster"></div>
  43. </div>
  44. <br />
  45. <div id="search">
  46. <asp:TextBox ID="txtFilmID" runat="server"></asp:TextBox>
  47. <asp:RequiredFieldValidator ID="RequiredFilmID" runat="server"
  48. ControlToValidate="txtFilmID" Display="Static"
  49. ErrorMessage="Movie ID" SetFocusOnError="True">*</asp:RequiredFieldValidator>
  50. <asp:Button ID="btnFilmViewStats" runat="server" Text="Get Data" />
  51. </div>
  52. </form>
  53. </body>
  54. </html>
  55.  





Here goes a few screenshots (click in the image):

ScreenShot #1:








ScreenShot #2:



ScreenShot #3:



ScreenShot #4:





If you have any doubt or found any error, please drop me an email









clicks: 22941 22941 2008-11-08 2008-11-08 goto mySnippets mySnippets asp.net  Download  Bookmark This Bookmark This