Node Import Part 0: the prequel

Ethan's picture

I thought I'd share some lessons learned in the node creation wild as a follow-up to Tom's recent post surveying the options available to prospective content importers who need to get a slew of nodes into Drupal from some far off place such as the land of flat HTML, custom CMSes or even the fabled Drupal 4.6.

For a very exciting upcoming site we had to import a few hundred old blog posts from a Drupal 4.6 install, along with 30 authors and around 2000 taxonomy associations. While these numbers are nowhere near as big as content migration figures can crawl, the complexity of the data structure was multiplied by the specific requirements of the various node importing schemes available in Drupal. This multiplicative factor yielded a somewhat dizzying array of alternatives that had to be evaluated, with the vast majority of them leading down paths that involved exporting nodes, importing with old legacy data attached then iteratively importing the author and taxonomy data one piece at a time using custom scripts and translation tables.

Scary.

While the Macro module definitely made the job a great deal easier -- mainly because it was able to handle authorship, taxonomy and creation timestamps all as part of the node creation process -- one of the real gems I uncovered was a MySQL function I wish I'd found years ago: GROUP_CONCAT.

GROUP_CONCAT concatenates all records returned by a single field query, breaking up the results according to the GROUP BY clause, if one is present. As an example, this means that this statement:

SELECT GROUP_CONCAT(t_d.name) AS tags, t_n.nid FROM term_data as t_d JOIN term_node AS t_n ON t_n.tid = t_d.tid WHERE t_d.vid =2 GROUP BY t_n.nid

will return a two column result set, in the first column will be a comma-delimited listing of all taxonomy terms from the vocabulary with vid=2 and in the second column we'll have the nid of the post which has all these tags. In the past, this step would require my writing some intermediary script to reduce the fully-normalized result in which each term was on a separate row.

The real advantage of doing this de-normalization in pure SQL comes when we use this expression as a sub-select to construct a single table with all the information needed to construct full drupal_execute form value arrays for each node. This can be done with SQL that looks something like this:

SELECT n.nid, a.dst, n.title, n.created, n.body, tags.tags, u.name, u.uid, u.mail FROM node as n JOIN
(SELECT GROUP_CONCAT(t_d.name) AS tags, t_n.nid FROM term_data as t_d JOIN term_node AS t_n ON t_n.tid = t_d.tid WHERE t_d.vid =2 GROUP BY t_n.nid) as tags ON tags.nid = n.nid
JOIN users as u on u.uid = n.uid
JOIN url_alias as a ON a.src = CONCAT('node/',n.nid);

This statement makes use of the second column in our GROUP_CONCAT query to join those results to the node data via a subselect. What comes out of this query can then be processed by a PHP script and used to generate the drupal_execute calls or Macro PHP code needed to migrate the nodes into the new database, taxonomy, users and all.

Comments

CriS's picture

I'm pleased this subject. Really, i'm interested. Thanks to who prepare that. Greetings from Turkey.

sohbet - chat - sohbetev

XCem's picture

Thank you very much for this information. Good post thanks for sharing. I like this site. evden eve nakliyat

Anonymous's picture

Bosch markalı tüm beyaz eşyalarınızın uzun ömürlü ve performanslı
çalışmasını sağlamak ve enerji tasarrufu vb. etkenler için yılda bir
defa olmak üzere tamir, bakım ve onarımı Bosch teknik servisi mutlaka yapılmalıdır.Herhangi
bir arıza durumunda müşterilermiz istediği servisin hızlı olmasını
bekler ve biz 12 gezici aracımız sayesinde bu hizmeti gayret ve
profesyonelliğimiz ile yerine getirmekteyiz.Tüm Bosch kullanıcılarına
profesyonel teknik ekibimiz ile hizmet vermekteyiz.

Voleta's picture

Give please. Happiness is not a station you arrive at, but a manner of traveling. Help me! It has to find sites on the: Stackers two. I found only this - pizza box stackers. Digit data can be erased by a available action love or meal beneath the stacker to console character. Three other ranges of channel were known on often characters: speed, guest, and propeller. With respect :-(, Voleta from Nauru.

Anonymous's picture
Anonymous's picture
Wynn's picture

I am very interested to learn Drupal and this post is very helpful to me..though its a bit technical abut it has improved my knowledge about it.

above ground pool solar heaters

Wynn's picture

I am very interested to learn Drupal and this post is very helpful to me..though its a bit technical abut it has improved my knowledge about it.

above ground pool solar heaters

Wynn's picture

I am very interested to learn Drupal and this post is very helpful to me..though its a bit technical abut it has improved my knowledge about it.

above ground pool solar heaters

Wynn's picture

I am very interested to learn Drupal and this post is very helpful to me..though its a bit technical abut it has improved my knowledge about it.

above ground pool solar heaters

Wynn's picture

I am very interested to learn Drupal and this post is very helpful to me..though its a bit technical abut it has improved my knowledge about it.

above ground pool solar heaters

Katherine's picture

Give please. There is a tragic flaw in our precious Constitution, and I don't know what can be done to fix it. This is it: Only nut cases want to be president. Help me! Need information about: Forex market, while unique risk about talking their information banks, financial investments around the problem are looking to lose their shares.. I found only this - [URL=http://www.catapultbrands.com/Members/ForexMarket]forex market cycle[/URL]. The country of calls eased, forex market. Thus, big products have given costs that financial macroeconomic providers could be hit or perceived this words together before the sailing on july malaysian, forex market. Thanks :mad:. Katherine from Chad.

Yardley's picture

Hello! My name is Yardley!.
I am from Moldova and also am speaking English, please tell me right I wrote the following sentence: "Massage therapy schools, via these hours, little autism supervised and licensed veterinary testing in the marriages."

THX 8), Yardley.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockcode>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]". PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

More information about formatting options

CAPTCHA
Are you a robot? We usually like robots, but not in our comments.
Image CAPTCHA
Enter the characters (without spaces) shown in the image.