Page 84 -
P. 84
textual data
Q: So, I can put any web address into Q: And I guess that call to urlopen() Q: I get that the call to read() actually
this code and grab the associated web goes and gets the web page? reads the web page from the page
page from the Internet? variable, but what’s that decode(“utf8”)
A: That’s right! The provided web thing?
A: Yes, feel free to try it out for yourself. address (or “URL” to use the proper web-
Q: Don’t I need a web browser to speak) is fetched from the Internet and A: When the web page is fetched from
the Internet, it is in a “raw” textual format.
returned by the call to urlopen().
view web pages? In this code, the fetched web page is This format can be a little hard for humans
assigned to the page variable. to read. The call to decode() converts
A: Yes, to view a web page in all its Q: the raw web page into something that looks
formatted glory−with embedded pictures, And the urllib.request bit? a little easier on the eye.
music, videos and the like−a web browser
is a must-have. However, if all you want to A: That just tells the program to use the To see what we mean, try removing the
see is the “raw” HTML, a browser is overkill. urlopen() function that comes as call to decode() from the program and
running the code again. Looks a little weird,
Q: What does the import line of code standard with Python 3’s Internet page— doesn’t it? (Don’t forget to put the call to
reading technology. We’ll have more to say
do? about urllib.request in a little bit. decode() back in before continuing.)
For now, just think how lucky we all are not
A: It gives the program the ability to talk to have to write code to fetch web pages
to the Internet. The from the Internet.
urllib.request code comes as
standard with Python 3.
You can download the HTML of a Substrings are specified using
web page as a textual string. two index values−for example:
text[10:20].
A string is a sequence of characters.
You can access individual characters The first index value is the location of
the first character of the substring.
in a string using an offset.
The offset is known as the index The second index value is the
location after the last character of the
value of the character (or just index substring (up to, but not including).
for short).
Strings within strings are called Subtract the second index from
the first to work out how long the
substrings.
substring should be.
you are here 4 49