summaryrefslogtreecommitdiff
path: root/www/wiki/extensions/SemanticMediaWiki/src/Importer/README.md
blob: 55db3c034da9592daa07d211510dbdb7864fdb41 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
The objective of the `Importer` is to provide a simple mechanism for deploying data structures and support information in a loose yet structured form during the installation (setup) process.

## Import definitions

[`$smwgImportFileDirs`](https://www.semantic-mediawiki.org/wiki/Help:$smwgImportFileDirs) defines import directories from where content can be imported.

Import definitions are defined using a `JSON` format which provides the structural means and is considered easily extendable by end-users.

The import files are sorted and therefore sequentially processed based on the file name. In case where content relies on other content an appropriate naming convention should be followed to ensure required definitions are imported in the expected order.

### Default definitions

Preselected import content is defined in the "default.json" file and includes:

* "Smw import skos"
* "Smw import owl"
* "Smw import foaf"
* "Foaf:knows"
* "Foaf:name" and
* "Foaf:homepage"

It should be noted that `default.json` is __not__ expected to be the __authority source__ of content for a wiki and is the reason why the option `canReplace` is set `false` so that pre-existing content with the same name and namespace is not replaced.

### Custom definitions

It is possible to define one or more custom import definitions using [`$smwgImportFileDirs`](https://www.semantic-mediawiki.org/wiki/Help:$smwgImportFileDirs) with a custom location (directory) from where import definitions can be loaded.

<pre>
$GLOBALS['smwgImportFileDirs']['movie-actor-vocab'] = __DIR__ . '/import/movie-actor';
</pre>

<pre>
$GLOBALS['smwgImportFileDirs']['custom-vocab'] = __DIR__ . '/custom';
</pre>

### Fields

`JSON` schema and fields:

- `description` short description about the purpose of the import (used in the auto summary)
- `page` the name of a page without a namespace prefix
- `namespace` literal constant of the namespace of the content  (e.g. `NS_MAIN`, `SMW_NS_PROPERTY` ... )
- `contents` it contains either the raw text or a parameter
  - `importFrom` link to a file from where the raw text (contains a relative path to the `$smwgImportFileDirs`)
- `options`
  - `canReplace` to indicate whether content is being allowed to be replaced during
  an import or not

The [`$smwgImportReqVersion`](https://www.semantic-mediawiki.org/wiki/Help:$smwgImportReqVersion) stipulates
the required version for an import and only definitions that match that version are permitted to be imported.

### Examples

#### XML import

It is possible to use MediaWiki's XML format as import source when linked from the
`importFrom` field (any non MediaWiki XML format will be ignored).

The location for the mentioned `custom.xml` is relative to the selected `$smwgImportFileDirs` directory.

<pre>
{
	"description": "Custom import",
	"import": [
		{
			"description" : "Import of custom.xml that contains ...",
			"contents": {
				"importFrom": "/xml/custom.xml"
			}
		}
	],
	"meta": {
		"version": "1"
	}
}
</pre>

<pre>
{
	"description": "Template import",
	"import": [
		{
			"description" : "Template to ...",
			"page": "Template_1",
			"namespace": "NS_TEMPLATE",
			"contents": "<includeonly>{{{1}}}, {{{2}}}</includeonly>",
			"options": {
				"canReplace": false
			}
		},
		{
			"description" : "Template with ...",
			"page": "Template_2",
			"namespace": "NS_TEMPLATE",
			"contents": {
				"importFrom": "/templates/template-1.tmpl"
			},
			"options": {
				"canReplace": false
			}
		}
	],
	"meta": {
		"version": "1"
	}
}
</pre>

## Import process

During the setup process, the `Installer` will automatically run and inform
about the process which will output something similar to:

<pre>
Import of default.json ...
   ... replacing MediaWiki:Smw import foaf contents ...
   ... skipping Property:Foaf:knows, already exists ...

Import processing completed.
</pre>

If not otherwise specified, content (a.k.a. pages) that pre-exists are going to be skipped by default.

## Technical notes

<pre>
SMW\Importer
│	└─ ContentCreators
│		├─ DispatchingContentCreator
│		├─ XmlContentCreator
│		└─ TextContentCreator
│
├─ ImporterServiceFactory # access to import services
├─ ContentIterator
├─ ContentCreator
├─ JsonContentIterator
├─ JsonImportContentsFileDirReader
└─ ContentModeller
</pre>

- `SMW::SQLStore::Installer::AfterCreateTablesComplete` provides the hook and is the event to execute the import during the setup
- `ImporterServiceFactory` access to import services
- `Importer` is responsible for importing contents provided by a `ContentIterator`
- `ContentIterator` an interface to provide access to individual `ImportContents` instances
- `JsonContentIterator` implements the `ContentIterator` interface
- `JsonImportContentsFileDirReader` provides contents of all recursively fetched files from a location (e.g[`$smwgImportFileDirs`](https://www.semantic-mediawiki.org/wiki/Help:$smwgImportFileDirs) setting ) that meets the requirements
- `ContentModeller` interprets the `JSON` definition and returns a set of `ImportContents` instances
- `ContentCreator` an interface to specify different creation methods (e.g. text, XML etc.)
- `DispatchingContentCreator` dispatches to the actual content creation instance based on `ImportContents::getContentType`
- `XmlContentCreator` support the creation of MediaWiki XML specific content
- `TextContentCreator` support for raw wikitext