Grab datas from wiki

Hi,

I've found a  interresting thing, but i need help !

This thing is a .jar that can grab and sort datas from wiki ( wikipedia,wikidata,etc ...) so , coupled with aiml it can be very powerfull !

But for now, i'm just able to collect data from the ID of the page . By exemple, i ask the label of the id "q42" and i get "Terry Pratchett" .

The very good thing is that wikis are written in several languages .

Just by asking "Adam Sandler", i can get his birthdate,full name, high, ....

This .jar is nammed "wiki-toolkit" and it's open source ! ( apache 2 )

exemple : https://www.wikidata.org/wiki/Q132952

or : https://www.wikidata.org/wiki/Q132952

 

github page for it is here : https://github.com/Wikidata/Wikidata-Toolkit

home page here : https://www.mediawiki.org/wiki/Wikidata_Toolkit

 

To test it, i've imported in a new java project " wdtk-toolkit-with-dependency.jar " and "wdtk-client" .

I've tryed the exemple "fetchOnlineDataExemple" but i get an errors (ExampleHelpers.configureLogging();)

this error doesn't appear if i run this exemple from source or of course if you comment the line  . But while running, the first datas come, and i get a null pointer error ...

And even if i find the problem, i don't know how add the jar to the MRL repo and i'm too scary to do a mistake lol !

 

That all !!

 

kwatters's picture

borg borg borg...

Hi Beetle!  

Very cool idea!  I've been looking for good datasets and I've used wikipedia a lot in the past to create search engine demos.  It sounds like you'll want a wikipedia service.  I know you can download the wikipedia dataset locally as xml & wikimarkup.  It's just a matter of searching that and pulling out the appropriate metadata to answer your question.  

I'll have a look at that library and see about adding it into the repo...

I've been building a suite of services that can crawl data from the internet, parse it, and stuff it into a Solr search engine so that we can search it using natural language and programab...  Currently, it's called the DocumentPipeline service...  This would be a good test dataset to use with it I suspect.

 

beetlejuice's picture

ha ha ! I've found ! I don't

ha ha ! I've found !

I don't know why, but i can't use the same WikibaseDataFetcher object to grab different source

( WikibaseDataFetcher wbdf = WikibaseDataFetcher.getWikidataDataFetcher(); )

So now i can get desciption by name or by ID , now i've to look in the javadoc to find how to collect

statements ( name, birthdate, ... ) . I've fund a function, but it collect all in one time . I just want collect one !

But it's on a good way

Next episode, next time !