summaryrefslogtreecommitdiff
path: root/bin/wiki/ImportarDesdeURL/node_modules/tldts/README.md
blob: 61d39cc620c27164fd609b953d7c1babfbcd2ee6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# tldts - Blazing Fast URL Parsing

`tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.

**Features**:
1. Tuned for **performance** (order of 0.1 to 1 μs per input)
2. Handles both URLs and hostnames
3. Full Unicode/IDNA support
4. Support parsing email addresses
5. Detect IPv4 and IPv6 addresses
6. Continuously updated version of the public suffix list
7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and *type definitions*
8. Small bundles and small memory footprint
9. Battle tested: full test coverage and production use

# Install

```bash
npm install --save tldts
```

# Usage

Using the command-line interface:
```js
$ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
{
  "domain": "writethedocs.org",
  "hostname": "www.writethedocs.org",
  "isIcann": true,
  "isIp": false,
  "isPrivate": false,
  "publicSuffix": "org",
  "subdomain": "www"
}
```

Programmatically:
```js
const { parse } = require('tldts');

// Retrieving hostname related informations of a given URL
parse('http://www.writethedocs.org/conf/eu/2017/');
// { domain: 'writethedocs.org',
//   hostname: 'www.writethedocs.org',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'org',
//   subdomain: 'www' }
```

Modern *ES6 modules import* is also supported:

```js
import { parse } from 'tldts';
```

Alternatively, you can try it *directly in your browser* here: https://npm.runkit.com/tldts

# API

* `tldts.parse(url | hostname, options)`
* `tldts.getHostname(url | hostname, options)`
* `tldts.getDomain(url | hostname, options)`
* `tldts.getPublicSuffix(url | hostname, options)`
* `tldts.getSubdomain(url, | hostname, options)`

The behavior of `tldts` can be customized using an `options` argument for all
the functions exposed as part of the public API. This is useful to both change
the behavior of the library as well as fine-tune the performance depending on
your inputs.

```js
{
  // Use suffixes from ICANN section (default: true)
  allowIcannDomains: boolean;
  // Use suffixes from Private section (default: false)
  allowPrivateDomains: boolean;
  // Extract and validate hostname (default: true)
  // When set to `false`, inputs will be considered valid hostnames.
  extractHostname: boolean;
  // Validate hostnames after parsing (default: true)
  // If a hostname is not valid, not further processing is performed. When set
  // to `false`, inputs to the library will be considered valid and parsing will
  // proceed regardless.
  validateHostname: boolean;
  // Perform IP address detection (default: true).
  detectIp: boolean;
  // Assume that both URLs and hostnames can be given as input (default: true)
  // If set to `false` we assume only URLs will be given as input, which
  // speed-ups processing.
  mixedInputs: boolean;
  // Specifies extra valid suffixes (default: null)
  validHosts: string[] | null;
}
```

The `parse` method returns handy **properties about a URL or a hostname**.

```js
const tldts = require('tldts');

tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
// { domain: 'amazonaws.com',
//   hostname: 'spark-public.s3.amazonaws.com',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'com',
//   subdomain: 'spark-public.s3' }

tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv', { allowPrivateDomains: true })
// { domain: 'spark-public.s3.amazonaws.com',
//   hostname: 'spark-public.s3.amazonaws.com',
//   isIcann: false,
//   isIp: false,
//   isPrivate: true,
//   publicSuffix: 's3.amazonaws.com',
//   subdomain: '' }

tldts.parse('gopher://domain.unknown/');
// { domain: 'domain.unknown',
//   hostname: 'domain.unknown',
//   isIcann: false,
//   isIp: false,
//   isPrivate: true,
//   publicSuffix: 'unknown',
//   subdomain: '' }

tldts.parse('https://192.168.0.0') // IPv4
// { domain: null,
//   hostname: '192.168.0.0',
//   isIcann: null,
//   isIp: true,
//   isPrivate: null,
//   publicSuffix: null,
//   subdomain: null }

tldts.parse('https://[::1]') // IPv6
// { domain: null,
//   hostname: '::1',
//   isIcann: null,
//   isIp: true,
//   isPrivate: null,
//   publicSuffix: null,
//   subdomain: null }

tldts.parse('tldts@emailprovider.co.uk') // email
// { domain: 'emailprovider.co.uk',
//   hostname: 'emailprovider.co.uk',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'co.uk',
//   subdomain: '' }
```

| Property Name  | Type   | Description                                 |
|:-------------- |:------ |:------------------------------------------- |
| `hostname`     | `str`  | `hostname` of the input extracted automatically |
| `domain`       | `str`  | Domain (tld + sld)                          |
| `subdomain`    | `str`  | Sub domain (what comes after `domain`)      |
| `publicSuffix` | `str`  | Public Suffix (tld) of `hostname`           |
| `isIcann`      | `bool` | Does TLD come from ICANN part of the list   |
| `isPrivate`    | `bool` | Does TLD come from Private part of the list |
| `isIP`         | `bool` | Is `hostname` an IP address?                |


## Single purpose methods

These methods are shorthands if you want to retrieve only a single value (and
will perform better than `parse` because less work will be needed).

### getHostname(url | hostname, options?)

Returns the hostname from a given string.

```javascript
const { getHostname } = require('tldts');

getHostname('google.com');        // returns `google.com`
getHostname('fr.google.com');     // returns `fr.google.com`
getHostname('fr.google.google');  // returns `fr.google.google`
getHostname('foo.google.co.uk');  // returns `foo.google.co.uk`
getHostname('t.co');              // returns `t.co`
getHostname('fr.t.co');           // returns `fr.t.co`
getHostname('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`
```

### getDomain(url | hostname, options?)

Returns the fully qualified domain from a given string.

```javascript
const { getDomain } = require('tldts');

getDomain('google.com');        // returns `google.com`
getDomain('fr.google.com');     // returns `google.com`
getDomain('fr.google.google');  // returns `google.google`
getDomain('foo.google.co.uk');  // returns `google.co.uk`
getDomain('t.co');              // returns `t.co`
getDomain('fr.t.co');           // returns `t.co`
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`
```

### getSubdomain(url | hostname, options?)

Returns the complete subdomain for a given string.

```javascript
const { getSubdomain } = require('tldts');

getSubdomain('google.com');             // returns ``
getSubdomain('fr.google.com');          // returns `fr`
getSubdomain('google.co.uk');           // returns ``
getSubdomain('foo.google.co.uk');       // returns `foo`
getSubdomain('moar.foo.google.co.uk');  // returns `moar.foo`
getSubdomain('t.co');                   // returns ``
getSubdomain('fr.t.co');                // returns `fr`
getSubdomain('https://user:password@secure.example.co.uk:443/some/path?and&query#hash'); // returns `secure`
```

### getPublicSuffix(url | hostname, options?)

Returns the [public suffix][] for a given string.

```javascript
const { getPublicSuffix } = require('tldts');

getPublicSuffix('google.com');       // returns `com`
getPublicSuffix('fr.google.com');    // returns `com`
getPublicSuffix('google.co.uk');     // returns `co.uk`
getPublicSuffix('s3.amazonaws.com'); // returns `com`
getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns `s3.amazonaws.com`
getPublicSuffix('tld.is.unknown');   // returns `unknown`
```

# Troubleshooting

## Retrieving subdomain of `localhost` and custom hostnames

`tldts` methods `getDomain` and `getSubdomain` are designed to **work only with *known and valid* TLDs**.
This way, you can trust what a domain is.

`localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:

```js
const tldts = require('tldts');

tldts.getDomain('localhost');           // returns null
tldts.getSubdomain('vhost.localhost');  // returns null

tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] });  // returns 'vhost'
```

## Updating the TLDs List

`tldts` made the opinionated choice of shipping with a list of suffixes directly
in its bundle. There is currently no mechanism to update the lists yourself, but
we make sure that the version shipped is always up-to-date.

If you keep `tldts` updated, the lists should be up-to-date as well!

# Performance

`tldts` is the *fastest JavaScript library* available for parsing hostnames. It is able to parse *millions of inputs per second* (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).

Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.

## Contributors

`tldts` is based upon the excellent `tld.js` library and would not exist without
the many contributors who worked on the project:
<a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>

This project would not be possible without the amazing Mozilla's
[public suffix list][]. Thank you for your hard work!

# License

[MIT License](LICENSE).

[badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
[badge-downloads]: https://img.shields.io/npm/dm/tldts.svg

[public suffix list]: https://publicsuffix.org/list/
[list the recent changes]: https://github.com/publicsuffix/list/commits/master
[changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom

[public suffix]: https://publicsuffix.org/learn/