Wednesday, May 30, 2007

How to download a web page using IronPython

Downloading a web page is a common programming task. Here are two snippets of code that show how to do it with IronPython. The code was tested on Windows and Mac (with Mono).

Using the WebClient class

from System.Net import WebClient
dataStream = WebClient().OpenRead('http://google.com')
from System.IO import StreamReader
reader = StreamReader(dataStream)
result = reader.ReadToEnd()
print result

Using the WebRequest and WebResponse classes

from System.Net import WebRequest
request = WebRequest.Create("http://google.com")
response = request.GetResponse()
responseStream = response.GetResponseStream()
from System.IO import StreamReader
result = StreamReader(responseStream).ReadToEnd()
print result

14 comments:

  1. Seems rather more complex than:

    import urllib
    content = urllib.urlopen("http://google.com").read()
    print content

    I think the above code should work on any python implementation. Not sure why you are using the .NET libraries (unless IronPython doesn't have urllib?).

    ReplyDelete
  2. Evan,

    Unfortunately, the code you pasted doesn't work with IronPython. Urllib2 is also not supported in the official release of IronPython (lack of the md5 module).

    ReplyDelete
  3. I think MD5 is now a built in in IronPython.

    The problem is more that urllib2 uses some 'undocumented' attributes of sockets that weren't implemented in IronPython.

    This will be fixed, but in the meantime using the .NET classes is easy enough.

    ReplyDelete
  4. Ah, I haven't been following IronPython that closely. Good info to remember if I try it in the future.

    I noticed both of you mentioned urllib2, but I wonder if urllib (no 2) might work since I think its simpler than urllib2.

    ReplyDelete
  5. No - I'm pretty sure that because of problems with sockets that both urllib2 and urllib are broken in IronPython.

    There is a fix in FePy, so it might be worth trying that. The problems are recorded as IronPython bugs on Codeplex and so they will get fixed eventually.

    In the meantime, using the .NET classes is not quite so nice - but works fine.

    ReplyDelete
  6. I think the moral of the post is to just stick with CPython.

    ReplyDelete
  7. Anonymous,

    Yes, if you don't need .Net then it's better to stick with CPython.

    ReplyDelete
  8. I'm a little late to the discussion but the .NET code can be simplified as:

    content = WebClient().DownloadString("http://google.com")

    ReplyDelete
  9. Thanks for interesting article.

    ReplyDelete
  10. Glad to read articles like this. Thanks to author!

    ReplyDelete
  11. Excellent website. Good work. Very useful. I will bookmark!

    ReplyDelete
  12. 物流网是现代物流产品设备资讯传媒. 水工业网面向给排水领域设计院所、-电源自来水厂、污水处理厂及市政管理部门,面向工业污水处理、工业制水、水文水利、楼宇供水及水泵应用等水工业领域用户,发布和交流各种传感器、检测分析仪表、SCADA设备、人机界面监控系统及调速装置的产品、-机械传动
    -机器视觉
    -传感器
    -现场仪表
    -显示控制仪表技术、应用、解决方案及市场信息;探讨、推进我国水工业自动化技术、节能技术应用发展。视频,多媒体,自动化,工控视频,自动化视频, PLC教程,变频器教程,软件教程,自动化行业视频新媒体的创造者和领先者-工控TV,教程,播客, PLC,可编程序控制器,自动化软件。同时产品频道有DCS -PAC- PC-BASED-CPCI- PXI-嵌入式系统-
    SCADA

    -自动化软件
    -工业以太网
    -现场总线
    -无线通讯
    -低压变频器
    -高压变频器
    -运动控制
    -分析测试仪表
    -执行机构
    -工业安全
    -低压电器

    ReplyDelete