{"id":510,"date":"2016-08-19T01:35:21","date_gmt":"2016-08-19T01:35:21","guid":{"rendered":"http:\/\/intelligentonlinetools.com\/blog\/?p=510"},"modified":"2016-08-20T23:55:17","modified_gmt":"2016-08-20T23:55:17","slug":"getting-data-from-wikipedia-using-python","status":"publish","type":"post","link":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/","title":{"rendered":"Getting Data From Wikipedia Using Python"},"content":{"rendered":"<p>Recently I come across python package Wikipedia which is a Python library that makes it easy to access and parse data from Wikipedia. Using this library you can search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using Wikipedia data, not getting it. [1]<\/p>\n<p>This is a great way to complement the web site with Wikipedia information about web site product, service  or topic discussed. The other example of usage could be showing to web users random page from Wikipedia, extracting topics or web links from Wikipedia content, tracking new pages or updates, using downloaded text in text mining projects. <\/p>\n<p>I created python source code that is doing the following:<\/p>\n<p>Defining the the list of topics. This is the user input.<br \/>\nFor each topic the script is searching and finding pages.<br \/>\nThen for each page the script is showing link, page title, page content.<br \/>\nIn case of error the script is continuing to the next page.<br \/>\nFor each page content the script is removing sections identified in skip_section list in the beginning of script.<br \/>\nThe script is saving page content after removing not needed sections &#8211; for each page as separate text file. <\/p>\n<p>Below is shown full source python script. Fill free to provide any suggestions, comments, questions or requests for modifications.<\/p>\n<pre><code>\r\nimport wikipedia\r\n\r\nterms=[\"Optimization\", \"Data Science\"]\r\nsections_to_skip=[\"== See also ==\",\"== References ==\",\"== Further reading ==\"]\r\nn=0\r\ndocs=[]\r\nfor term in range (len(terms)):\r\n  print (terms[term])  \r\n  results=wikipedia.search(terms[term], results=3)\r\n  for i in range(len(results)):\r\n     print (results[i])\r\n     try:\r\n        ny = wikipedia.page(results[i])\r\n        print (ny.url, ny.title)\r\n        \r\n        with open(\"C:\\\\Python_projects\\\\file\" + str(n) + \".txt\", 'w') as file_:\r\n               ny_content=ny.content\r\n               for j in range(len(sections_to_skip)):\r\n                   pos=ny_content.find(sections_to_skip[j])\r\n                  \r\n                   if pos >=0:\r\n                       pos1=ny_content.find(\"== \", pos+len(sections_to_skip[j]))\r\n                       if pos1 >= 0:\r\n                          ny_content=ny_content[0:pos] + ny_content[pos1:len(ny_content)]  \r\n                       else:\r\n                          ny_content=ny_content[0:pos]\r\n                      \r\n               file_.write(ny_content)\r\n               n=n+1\r\n               docs.append (ny_content)\r\n        \r\n     except:       \r\n        print(\"Error\")  \r\nfor  d in docs:\r\n   print (d)\r\n\r\n<\/code><\/pre>\n<p><strong>References<\/strong><br \/>\n1.  <a href=https:\/\/pypi.python.org\/pypi\/wikipedia\/ target=\"blank\">Wikipedia API for Python<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I come across python package Wikipedia which is a Python library that makes it easy to access and parse data from Wikipedia. Using this library you can search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using &#8230; <a title=\"Getting Data From Wikipedia Using Python\" class=\"read-more\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":[]},"categories":[11,10],"tags":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Getting Data From Wikipedia Using Python - Machine Learning Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Getting Data From Wikipedia Using Python - Machine Learning Applications\" \/>\n<meta property=\"og:description\" content=\"Recently I come across python package Wikipedia which is a Python library that makes it easy to access and parse data from Wikipedia. Using this library you can search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using ... Read more\" \/>\n<meta property=\"og:url\" content=\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Machine Learning Applications\" \/>\n<meta property=\"article:published_time\" content=\"2016-08-19T01:35:21+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-08-20T23:55:17+00:00\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\",\"name\":\"Getting Data From Wikipedia Using Python - Machine Learning Applications\",\"isPartOf\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\"},\"datePublished\":\"2016-08-19T01:35:21+00:00\",\"dateModified\":\"2016-08-20T23:55:17+00:00\",\"author\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\"},\"breadcrumb\":{\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/intelligentonlinetools.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Getting Data From Wikipedia Using Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#website\",\"url\":\"http:\/\/intelligentonlinetools.com\/blog\/\",\"name\":\"Machine Learning Applications\",\"description\":\"Artificial intelligence, data mining and machine learning for building web based tools and services.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Getting Data From Wikipedia Using Python - Machine Learning Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/","og_locale":"en_US","og_type":"article","og_title":"Getting Data From Wikipedia Using Python - Machine Learning Applications","og_description":"Recently I come across python package Wikipedia which is a Python library that makes it easy to access and parse data from Wikipedia. Using this library you can search Wikipedia, get article summaries, get data like links and images from a page, and more. Wikipedia wraps the MediaWiki API so you can focus on using ... Read more","og_url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/","og_site_name":"Machine Learning Applications","article_published_time":"2016-08-19T01:35:21+00:00","article_modified_time":"2016-08-20T23:55:17+00:00","author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/","url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/","name":"Getting Data From Wikipedia Using Python - Machine Learning Applications","isPartOf":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#website"},"datePublished":"2016-08-19T01:35:21+00:00","dateModified":"2016-08-20T23:55:17+00:00","author":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478"},"breadcrumb":{"@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/19\/getting-data-from-wikipedia-using-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/intelligentonlinetools.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Getting Data From Wikipedia Using Python"}]},{"@type":"WebSite","@id":"http:\/\/intelligentonlinetools.com\/blog\/#website","url":"http:\/\/intelligentonlinetools.com\/blog\/","name":"Machine Learning Applications","description":"Artificial intelligence, data mining and machine learning for building web based tools and services.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/intelligentonlinetools.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/7a886dc5eb9758369af2f6d2cb342478","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/intelligentonlinetools.com\/blog\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7h1IJ-8e","jetpack-related-posts":[{"id":827,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/01\/11\/apis\/","url_meta":{"origin":510,"position":0},"title":"Useful APIs for Your Web Site","date":"January 11, 2017","format":false,"excerpt":"Here\u2019s a useful list of resources on how to create an API, compiled from posts that were published recently on this blog. The included APIs can provide a fantastic ways to enhance websites. 1. The WordPress(WP) API exposes a simple yet powerful interface to WP Query, the posts API, post\u2026","rel":"","context":"In &quot;API Programming&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":533,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/08\/28\/web-scraping-with-beautifulsoup\/","url_meta":{"origin":510,"position":1},"title":"Web Scraping with BeautifulSoup with Python 3","date":"August 28, 2016","format":false,"excerpt":"Keeping up-to-date on your industry is very important as it will help make better decisions, spot threats and opportunities early on and identify the changes that you need to think about.[1] There are many ways to stay informed and getting automatically data from the web is one of them. In\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1410,"url":"http:\/\/intelligentonlinetools.com\/blog\/2017\/10\/26\/image-processing-using-pixabay-api-and-python\/","url_meta":{"origin":510,"position":2},"title":"Image Processing Using Pixabay API and Python","date":"October 26, 2017","format":false,"excerpt":"Recently I visited great website Pixabay [1] that offers a wide range of images from people all around the world. These images are free to use even for commercial use. And there is an API [2] for accessing images on Pixabay. This brings a lot of ideas for interesting web\u2026","rel":"","context":"In &quot;API Programming&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":383,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/07\/03\/getting-the-data-from-the-web-using-php-for-api-using-the-api-with-php\/","url_meta":{"origin":510,"position":3},"title":"Getting the Data from the Web using PHP or Python for API","date":"July 3, 2016","format":false,"excerpt":"In the previous posts [1],[2] perl was used to get content from the web through Faroo API and Guardian APIs. In this post PHP and Pyhton will be used to get web data using same APIs. PHP has a powerful JSON parsing mechanism, which, because PHP is a dynamic language,\u2026","rel":"","context":"In &quot;API Programming&quot;","img":{"alt_text":"Trend for Python, Perl, PHP","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/07\/trend_for_python_perl_php-300x144.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":721,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/12\/15\/latent-dirichlet-allocation-lda-with-python\/","url_meta":{"origin":510,"position":4},"title":"Latent Dirichlet Allocation (LDA) with Python Script","date":"December 15, 2016","format":false,"excerpt":"In the previous posts [1],[2] few scripts for extracting web data were created. Combining these scripts, we will create now web crawling script with text mining functionality such as Latent Dirichlet Allocation (LDA). In LDA, each document may be viewed as a mixture of various topics. Where each document is\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","img":{"alt_text":"Program Flow Chart for Extracting Data from Web and Doing LDA","src":"https:\/\/i0.wp.com\/intelligentonlinetools.com\/blog\/wp-content\/uploads\/2016\/12\/program-flow-300x247.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":133,"url":"http:\/\/intelligentonlinetools.com\/blog\/2016\/03\/11\/133\/","url_meta":{"origin":510,"position":5},"title":"7 Ideas for Building Text Mining Application","date":"March 11, 2016","format":false,"excerpt":"It is no doubt that the web is growing at an incredible pace. And as the most documents of the web consist of the text, the applications of text analytics or text mining are getting more use. In such applications the textual data are used for extracting intelligence from a\u2026","rel":"","context":"In &quot;Data Mining&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/510"}],"collection":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/comments?post=510"}],"version-history":[{"count":8,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/510\/revisions"}],"predecessor-version":[{"id":519,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/posts\/510\/revisions\/519"}],"wp:attachment":[{"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/media?parent=510"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/categories?post=510"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/intelligentonlinetools.com\/blog\/wp-json\/wp\/v2\/tags?post=510"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}