Page 84 -
P. 84

textual data






            Q:   So, I can put any web address into   Q:  And I guess that call to urlopen()   Q:  I get that the call to read() actually

            this code and grab the associated web   goes and gets the web page?  reads the web page from the page
            page from the Internet?                                         variable, but what’s that decode(“utf8”)
                                            A: That’s right! The provided web   thing?
            A: Yes, feel free to try it out for yourself.  address (or “URL” to use the proper web-
            Q:   Don’t I need a web browser to   speak) is fetched from the Internet and   A: When the web page is fetched from
                                                                            the Internet, it is in a “raw” textual format.
                                            returned by the call to urlopen().
            view web pages?                 In this code, the fetched web page is   This format can be a little hard for humans
                                            assigned to the page variable.  to read. The call to decode() converts
            A: Yes, to view a web page in all its   Q:                      the raw web page into something that looks
            formatted glory−with embedded pictures,    And the urllib.request bit?  a little easier on the eye.
            music, videos and the like−a web browser
            is a must-have. However, if all you want to   A: That just tells the program to use the   To see what we mean, try removing the
            see is the “raw” HTML, a browser is overkill.  urlopen() function that comes as   call to decode() from the program and
                                                                            running the code again. Looks a little weird,
            Q:   What does the import line of code   standard with Python 3’s Internet page—   doesn’t it? (Don’t forget to put the call to
                                            reading technology. We’ll have more to say
            do?                             about urllib.request in a little bit.   decode() back in before continuing.)
                                            For now, just think how lucky we all are not
            A: It gives the program the ability to talk   to have to write code to fetch web pages
            to the Internet. The            from the Internet.
            urllib.request code comes as
            standard with Python 3.










                              ƒ  You can download the HTML of a     ƒ  Substrings are specified using
                               web page as a textual string.   two index values−for example:
                                                               text[10:20].
                              ƒ  A string is a sequence of characters.

                              ƒ  You can access individual characters     ƒ  The first index value is the location of
                                                               the first character of the substring.
                               in a string using an offset.
                              ƒ  The offset is known as the index     ƒ  The second index value is the
                                                               location after the last character of the
                               value of the character (or just index   substring (up to, but not including).
                               for short).

                              ƒ  Strings within strings are called     ƒ  Subtract the second index from
                                                               the first to work out how long the
                               substrings.
                                                               substring should be.


                                                                                        you are here 4    49
   79   80   81   82   83   84   85   86   87   88   89