## this is a list of all solr keys for the default index 'collection1', the fulltext search index ## this complete list of keys can be changed; the actual schema is stored in: ## DATA/SETTINGS/solr.collection.schema ## the syntax of this file: ## - all lines beginning with '##' are comments ## - all non-empty lines not beginning with '#' are keyword lines ## - all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines ### mandatory values, do not disable them, YaCy won't work without them ## primary key of document, the URL hash, string (mandatory field) id # primary key of document, the URL hash, string (mandatory field) ##url of document, string (mandatory field) sku #url of document, string (mandatory field) ## last-modified from http header, date (mandatory field) last_modified # last-modified from http header, date (mandatory field) ## time when resource was loaded load_date_dt # time when resource was loaded ## mime-type of document, string (mandatory field) content_type # mime-type of document, string (mandatory field) ## content of title tag, text (mandatory field) title # content of title tag, text (mandatory field) ## id of the host, a 6-byte hash that is part of the document id (mandatory field) host_id_s # id of the host, a 6-byte hash that is part of the document id (mandatory field) ## host of the url, string host_s # host of the url, string ## the size of the raw source (mandatory field) size_i # the size of the raw source (mandatory field) ## fail reason if a page was not loaded. if the page was loaded then this field is empty, string (mandatory field) failreason_s # fail reason if a page was not loaded. if the page was loaded then this field is empty, string (mandatory field) ## fail type if a page was not loaded. This field is either empty, 'excl' or 'fail' failtype_s # fail type if a page was not loaded. This field is either empty, 'excl' or 'fail' ## html status return code (i.e. "200" for ok), -1 if not loaded (see content of failreason_t for this case), int (mandatory field) httpstatus_i # html status return code (i.e. "200" for ok), -1 if not loaded (see content of failreason_t for this case), int (mandatory field) ## the file name extension url_file_ext_s # the file name extension ## either the second level domain or, if a ccSLD is used, the third level domain. Needed to search in the url host_organization_s # either the second level domain or, if a ccSLD is used, the third level domain. Needed to search in the url ## internal links, only the protocol. Needed for IndexBrowser inboundlinks_protocol_sxt # internal links, only the protocol. Needed for IndexBrowser ## internal links, the url only without the protocol. For correct assembly of inboundlinks inboundlinks_protocol_sxt + inboundlinks_urlstub_sxt is needed inboundlinks_urlstub_sxt # internal links, the url only without the protocol. For correct assembly of inboundlinks inboundlinks_protocol_sxt + inboundlinks_urlstub_sxt is needed ## external links, only the protocol. For correct assembly of outboundlinks outboundlinks_protocol_sxt + outboundlinks_urlstub_sxt is needed outboundlinks_protocol_sxt # external links, only the protocol. For correct assembly of outboundlinks outboundlinks_protocol_sxt + outboundlinks_urlstub_sxt is needed ## external links, the url only without the protocol. Needed to enhance the crawler outboundlinks_urlstub_sxt # external links, the url only without the protocol. Needed to enhance the crawler ## all image links without the protocol and '://'. For correct assembly of image url images_protocol_sxt + images_urlstub_sxt is needed images_urlstub_sxt # all image links without the protocol and '://'. For correct assembly of image url images_protocol_sxt + images_urlstub_sxt is needed ## all image link protocols images_protocol_sxt # all image link protocols ### optional but highly recommended values, part of the index distribution process ## date until resource shall be considered as fresh fresh_date_dt # date until resource shall be considered as fresh ## id of the referrer to this document, discovered during crawling referrer_id_s # id of the referrer to this document, discovered during crawling ## the name of the publisher of the document publisher_t # the name of the publisher of the document ## the language used in the document language_s # the language used in the document ## number of links to audio resources audiolinkscount_i # number of links to audio resources ## number of links to video resources videolinkscount_i # number of links to video resources ## number of links to application resources applinkscount_i # number of links to application resources ### optional but highly recommended values, not part of the index distribution process ## the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of title, used to compute title_unique_b #title_exact_signature_l # the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of title, used to compute title_unique_b ## flag shows if title is unique within all indexable documents of the same host with status code 200; if yes and another document appears with same title, the unique-flag is set to false, boolean #title_unique_b # flag shows if title is unique within all indexable documents of the same host with status code 200; if yes and another document appears with same title, the unique-flag is set to false, boolean ## counter for the number of documents which are not unique (== count of not-unique-flagged documents + 1) #exact_signature_copycount_i # counter for the number of documents which are not unique (== count of not-unique-flagged documents + 1) ## intermediate data produced in EnhancedTextProfileSignature: a list of word frequencies #fuzzy_signature_text_t # intermediate data produced in EnhancedTextProfileSignature: a list of word frequencies ## counter for the number of documents which are not unique (== count of not-unique-flagged documents + 1) #fuzzy_signature_copycount_i # counter for the number of documents which are not unique (== count of not-unique-flagged documents + 1) ## needed (post-)processing steps on this metadata set #process_sxt # needed (post-)processing steps on this metadata set ## if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances dates_in_content_dts # if date expressions can be found in the content, these dates are listed here as date objects in order of the appearances ## the number of entries in dates_in_content_sxt dates_in_content_count_i # the number of entries in dates_in_content_sxt ## content of itemprop attributes with content='startDate' startDates_dts # content of itemprop attributes with content='startDate' ## content of itemprop attributes with content='endDate' endDates_dts # content of itemprop attributes with content='endDate' ## number of unique http references, should be equal to references_internal_i + references_external_i references_i # number of unique http references, should be equal to references_internal_i + references_external_i ## number of unique http references from same host to referenced url references_internal_i # number of unique http references from same host to referenced url ## number of unique http references from external hosts references_external_i # number of unique http references from external hosts ## number of external hosts which provide http references references_exthosts_i # number of external hosts which provide http references ## crawl depth of web page according to the number of steps that the crawler did to get to this document; if the crawl was started at a root document, then this is equal to the clickdepth crawldepth_i # crawl depth of web page according to the number of steps that the crawler did to get to this document; if the crawl was started at a root document, then this is equal to the clickdepth ## key from a harvest process (i.e. the crawl profile hash key) which is needed for near-realtime postprocessing. This shall be deleted as soon as postprocessing has been terminated. harvestkey_s # key from a harvest process (i.e. the crawl profile hash key) which is needed for near-realtime postprocessing. This shall be deleted as soon as postprocessing has been terminated. ## unique-field which is true when an url appears the first time. If the same url which was http then appears as https (or vice versa) then the field is false http_unique_b # unique-field which is true when an url appears the first time. If the same url which was http then appears as https (or vice versa) then the field is false ## unique-field which is true when an url appears the first time. If the same url within the subdomain www then appears without that subdomain (or vice versa) then the field is false www_unique_b # unique-field which is true when an url appears the first time. If the same url within the subdomain www then appears without that subdomain (or vice versa) then the field is false ## the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of text_t exact_signature_l # the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of text_t ## flag shows if exact_signature_l is unique at the time of document creation, used for double-check during search exact_signature_unique_b # flag shows if exact_signature_l is unique at the time of document creation, used for double-check during search ## 64 bit of the Lookup3Signature from EnhancedTextProfileSignature of text_t fuzzy_signature_l # 64 bit of the Lookup3Signature from EnhancedTextProfileSignature of text_t ## flag shows if fuzzy_signature_l is unique at the time of document creation, used for double-check during search fuzzy_signature_unique_b # flag shows if fuzzy_signature_l is unique at the time of document creation, used for double-check during search ## tags that are attached to crawls/index generation to separate the search result into user-defined subsets collection_sxt # tags that are attached to crawls/index generation to separate the search result into user-defined subsets ## geospatial point in degrees of latitude,longitude as declared in WSG84, location; this creates two additional subfields, coordinate_p_0_coordinate (latitude) and coordinate_p_1_coordinate (longitude) coordinate_p # geospatial point in degrees of latitude,longitude as declared in WSG84, location; this creates two additional subfields, coordinate_p_0_coordinate (latitude) and coordinate_p_1_coordinate (longitude) ## content of author-tag, texgen author # content of author-tag, texgen ## content of description-tag(s), text description_txt # content of description-tag(s), text ## the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of description, used to compute description_unique_b #description_exact_signature_l # the 64 bit hash of the org.apache.solr.update.processor.Lookup3Signature of description, used to compute description_unique_b ## flag shows if description is unique within all indexable documents of the same host with status code 200; if yes and another document appears with same description, the unique-flag is set to false, boolean #description_unique_b # flag shows if description is unique within all indexable documents of the same host with status code 200; if yes and another document appears with same description, the unique-flag is set to false, boolean ## content of keywords tag; words are separated by space keywords # content of keywords tag; words are separated by space ## character encoding, string charset_s # character encoding, string ## number of words in visible area, int wordcount_i # number of words in visible area, int ## number of all outgoing links; including linksnofollowcount_i, int linkscount_i # number of all outgoing links; including linksnofollowcount_i, int ## number of all outgoing inks with nofollow tag, int linksnofollowcount_i # number of all outgoing inks with nofollow tag, int ## number of outgoing inbound (to same domain) links; including inboundlinksnofollowcount_i, int inboundlinkscount_i # number of outgoing inbound (to same domain) links; including inboundlinksnofollowcount_i, int ## number of outgoing inbound (to same domain) links with nofollow tag, int #inboundlinksnofollowcount_i # number of outgoing inbound (to same domain) links with nofollow tag, int ## number of outgoing outbound (to other domain) links, including outboundlinksnofollowcount_i, int outboundlinkscount_i # number of outgoing outbound (to other domain) links, including outboundlinksnofollowcount_i, int ## number of outgoing outbound (to other domain) links with nofollow tag, int #outboundlinksnofollowcount_i # number of outgoing outbound (to other domain) links with nofollow tag, int ## number of images, int imagescount_i # number of images, int ## response time of target server in milliseconds, int responsetime_i # response time of target server in milliseconds, int ## all visible text, text text_t # all visible text, text ## additional synonyms to the words in the text synonyms_sxt # additional synonyms to the words in the text ## h1 header h1_txt # h1 header ## h2 header h2_txt # h2 header ## h3 header h3_txt # h3 header ## h4 header h4_txt # h4 header ## h5 header h5_txt # h5 header ## h6 header h6_txt # h6 header ### unused, delete candidates ## the md5 of the raw source #md5_s # the md5 of the raw source ## redirect url if the error code is 299 < httpstatus_i < 310 #httpstatus_redirect_s # redirect url if the error code is 299 < httpstatus_i < 310 ### optional values, not part of standard YaCy handling (but useful for external applications) ## ip of host of url (after DNS lookup), string #ip_s # ip of host of url (after DNS lookup), string ## tags of css entries, normalized with absolute URL #css_tag_sxt # tags of css entries, normalized with absolute URL ## urls of css entries, normalized with absolute URL #css_url_sxt # urls of css entries, normalized with absolute URL ## number of css entries, int #csscount_i # number of css entries, int ## urls of script entries, normalized with absolute URL #scripts_sxt # urls of script entries, normalized with absolute URL ## number of entries in scripts_sxt, int #scriptscount_i # number of entries in scripts_sxt, int ## noindex and nofollow attributes ## from HTML (meta-tag in HTML header: robots) ## and HTTP header (X-Robots-Tag property) ## coded as binary value: ## bit 0: "all" contained in html header meta ## bit 1: "index" contained in html header meta ## bit 2: "follow" contained in html header meta ## bit 3: "noindex" contained in html header meta ## bit 4: "nofollow" contained in html header meta ## bit 8: "all" contained in http header X-Robots-Tag ## bit 9: "noindex" contained in http header X-Robots-Tag ## bit 10: "nofollow" contained in http header X-Robots-Tag ## bit 11: "noarchive" contained in http header X-Robots-Tag ## bit 12: "nosnippet" contained in http header X-Robots-Tag ## bit 13: "noodp" contained in http header X-Robots-Tag ## bit 14: "notranslate" contained in http header X-Robots-Tag ## bit 15: "noimageindex" contained in http header X-Robots-Tag ## bit 16: "unavailable_after" contained in http header X-Robots-Tag #robots_i # bit 16: "unavailable_after" contained in http header X-Robots-Tag ## content of tag, text #metagenerator_t # content of tag, text ## internal links, the visible anchor text inboundlinks_anchortext_txt # internal links, the visible anchor text ## external links, the visible anchor text outboundlinks_anchortext_txt # external links, the visible anchor text ## all icon links without the protocol and '://' icons_urlstub_sxt # all icon links without the protocol and '://' ## all icon links protocols : split from icons_urlstub to provide some compression, as http protocol is implied as default and not stored icons_protocol_sxt # all icon links protocols : split from icons_urlstub to provide some compression, as http protocol is implied as default and not stored ## all icon links relationships space separated (e.g. 'icon apple-touch-icon') icons_rel_sxt # all icon links relationships space separated (e.g. 'icon apple-touch-icon') ## all icon sizes space separated (e.g. '16x16 32x32') icons_sizes_sxt # all icon sizes space separated (e.g. '16x16 32x32') ## all text/words appearing in image alt texts or the tokenized url images_text_t # all text/words appearing in image alt texts or the tokenized url ## all image link alt tag images_alt_sxt # all image link alt tag ## size of images:height images_height_val # size of images:height ## size of images:width images_width_val # size of images:width ## size of images as number of pixels (easier for ranking than using with and height) #images_pixel_val # size of images as number of pixels (easier for ranking than using with and height) ## number of image links with alt tag #images_withalt_i # number of image links with alt tag ## binary pattern for the existance of h1..h6 headlines, int #htags_i # binary pattern for the existance of h1..h6 headlines, int ## url inside the canonical link element, string #canonical_s # url inside the canonical link element, string ## flag shows if the url in canonical_t is equal to sku, boolean #canonical_equal_sku_b # flag shows if the url in canonical_t is equal to sku, boolean ## link from the url property inside the refresh link element, string #refresh_s # link from the url property inside the refresh link element, string ## all texts in