wikipedia-api – Row Coding

How to obtain a list of titles of all Wikipedia articles

November 27, 2023 by Tarik

The allpages API module allows you to do just that. Its limit (when you set aplimit=max) is 500, so to query all 4.5M articles, you would need about 9000 requests. But a dump is a better choice, because there are many different dumps, including all-titles-in-ns0 which, as its name suggests, contains exactly what you want … Read more

How to get plain text out of Wikipedia

September 26, 2023 by Tarik

Here are a few different possible approaches; use whichever works for you. All my code examples below use requests for HTTP requests to the API; you can install requests with pip install requests if you have Pip. They also all use the Mediawiki API, and two use the query endpoint; follow those links if you … Read more

How to use wikipedia api if it exists? [closed]

May 17, 2023 by Tarik

You really really need to spend some time reading the documentation, as this took me a moment to look and click on the link to fix it. :/ but out of sympathy i’ll provide you a link that maybe you can learn to use. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=timestamp|user|comment|content That’s the variabled you will be looking to get. Your … Read more

Get Text Content from mediawiki page via API

March 24, 2023 by Tarik

Use action=parse to get the html: /api.php?action=parse&page=test One way to get the text from the html would be to load it into a browser and walk the nodes, looking only for the text nodes, using JavaScript.

Accessing main picture of wikipedia page by API

February 28, 2023 by Tarik

You can get the thumbnail of any wikipedia page using prop=pageimages. For example: http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100 And you will get the thumbnail full URL.

How can I get Wikipedia content using Wikipedia’s API?

February 19, 2023 by Tarik

See this section in the MediaWiki API documentation, specifically involving getting the contents of the page. use the sandbox to test the API call. These are the key parameters. prop=revisions&rvprop=content&rvsection=0 rvsection = 0 specifies to only return the lead section. See this example. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=pizza To get the HTML, you can use similarly use action=parse http://en.wikipedia.org/w/api.php?action=parse&section=0&prop=text&page=pizza … Read more

What is wikipedia pageid? how to change it into real page url?

December 6, 2022 by Tarik

You can just use a URL like this: http://en.wikipedia.org/?curid=18630637 This is the shortest form, others are also possible: http://en.wikipedia.org/wiki?curid=18630637 http://en.wikipedia.org/wiki/Translation?curid=18630637 http://en.wikipedia.org/w/index.php?curid=18630637 Note that MediaWiki ignores the page title if you specify a curid, so even http://en.wikipedia.org/wiki/FooBar?curid=18630637 leads to the same page.

Is there a Wikipedia API just for retrieve the content summary?

October 12, 2022 by Tarik

There’s a way to get the entire “introduction section” without any HTML parsing! Similar to AnthonyS’s answer with an additional explaintext parameter, you can get the introduction section text in plain text. Query Getting Stack Overflow’s introduction in plain text: Using the page title: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Stack%20Overflow Or use pageids: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=21721040 JSON Response (warnings stripped) { “query”: … Read more