A robot's job is to get information about the documents, data e.t.c. that are available on the Web and then store that information in some kind of master index of the Web. Usually, the robot is limited by author to hunt for a particular topic or segment of the internet.
At the very least, most robots are programmed to search at the TITLE and H1 tags in the HTML documents they discover. Then scan the entire contents of the file looking for A HREF tags to other documents. A general robot might store the URLs of the documents in a data structure called a tree, which it uses to continue the search whenever it reaches a dead-end (more technically called a leaf-node).The larger robots probably use much more sophisticated algorithms. The basic principles are the same.
The good thing is that most robots are successful at this and do help make subsequent search and retrieval of those files more efficient. This is important in terms of Internet traffic. If a robot spends hours or days looking for documents, but thousands (or even millions) of users take advantage of the index that is generated, it will save all users from tapping their own means of discovering the links, potentially saving a great amount of network bandwidth.
The bad thing is that some robots inefficiently revisit the same site more than once, or they submit rapid-fire requests to the same site in such a frenzy that the server can't keep up. This is obviously a cause of concern for Webmasters. The robot authors are as upset as the rest of the Internet community when they find out that a poorly behaved robot has been unleashed. But usually such problems are found only in a few poorly written robots.