Today's task was the Bulk Import Wiktionary data to SQL Server 2016.
Surprisingly little evidence of other people's attempts on this available on the Internet.
First I tried a node by node approach parsing the XML.
But this was going to take far too long - the current Wiktionary English file from https://dumps.wikimedia.org/enwiktionary/latest/ is over 6gb decompressed, and I estimate it would take over a month slowly iterating through the XML and importing one by one. If the routine fails for any reason it would be to be restarted from scratch - hardly an optimal solution!
So the recommend method is via OPENROWSET BULK.
First create the destination SQL DB table. I didn't import all the information, but you should be able to add new columns if needed.
There is a 2gb limit on XML file importation so I had to split the XML file into 4 parts each < 2gb. As this was a "once of" import I did manually with the amazing EMEDITOR - if it need to become a regular occurence then using the XMLREADER first and saving out < 2gb temporary files might be way forward.
Let me know if you write a file splitter to do this!
© 2004 - 2021 1 Oak Hill Grove Surbiton Surrey KT6 6DS Phone: +44(020) 8123 1321