Introduction
Welcome to the Scholarcy APIs. We have two core API servies:
- Metadata extraction API at https://api.scholarcy.com This is a developer API that comprises a number of endpoints for extracting machine-readable knowledge as JSON data from documents in many formats. The service is optimised to work with research papers and articles, but should provide useful results for any document in any format.
- Synopsis API at https://summarizer.scholarcy.com/
This includes a web front end for testing and a developer endpoint at
/summarize
We provide examples in Shell, Ruby, and Python. You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.
Authentication
Authentication headers must be sent with every request:
# 1. Metadata API:
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb')
})
response = request.execute
puts(response.body)
# 2. Synopsis API:
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb')
})
response = request.execute
puts(response.body)
# 1. Metadata API:
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
timeout=timeout)
print(r.json())
# 2. Synopsis API:
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com/'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
timeout=timeout)
print(r.json())
# 1. Metadata API:
# With shell, you can just pass the correct header with each request
curl "https://api.scholarcy.com/api/posters/generate" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf"
curl "https://api.scholarcy.com/api/posters/generate" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3"
# 2. Synopsis API:
# With shell, you can just pass the correct header with each request
curl "https://summarizer.scholarcy.com/summarize" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf"
curl "https://summarizer.scholarcy.com/summarize" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3"
Make sure to replace
abcdefg
with your API key.
Each API service (metadata and synopsis) requires a separate key.
The Scholarcy APIs expects the API key to be included in all API requests to the server in a header that looks like the following:
Authorization: Bearer abcdefg
Generate a Poster
The API endpoints at https://api.scholarcy.com/api/posters/generate
will extract the information needed to populate data into your own poster-creation services, and will also generate a basic Powerpoint template for you to use as a starting point for editing.
POST a local file to generate a poster
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb'),
:type => 'headline',
:start_page => 24,
:end_page => 37
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
params = {'type': 'headline', 'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
data=params,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/posters/generate" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf" \
-F "type=headline" \
-F "start_page=24" \
-F "end_page=37"
The above command returns JSON structured like this:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [
"Department of Silly Walks, University of Life, London, UK"
],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"references": [
"1. Smith, J. (2001) A study on self-citation. J. Chem. Biol., 123, 456-789.",
"2. Jones, R. (2015) He didn't write this one. Science, 101, 101010."
],
"emails": ["smith.j@uni.ac.uk"],
"figure_captions": [
{
"id": "1",
"caption": "Figure 1 caption"
},
{
"id": "2",
"caption": "Figure 2 caption"
}
],
"figure_urls": [
"https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-000.png",
"https://api.scholarcy.com/images/file.pdf_agtuhsnt_images_1x1uj_5t/img-002.png"
],
"poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
"keywords": [
"atomic force microscopy",
"dna nanostructure",
"drug release",
"single-stranded DNA",
"double-stranded DNA",
"dox molecule"
],
"abbreviations": {
"DONs": "DNA origami nanostructures",
"ROS": "reactive oxygen species",
"DNase I": "deoxyribonuclease I",
"PEG": "polyethylene glycol"
},
"headline": "We prove some important facts in this paper",
"highlights": [
"Facts are very important.",
"The force is strong in this one.",
"We ran some tests and this is what we found"
],
"summary": {
"Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
"Methods": [
"We mixed some chemicals.",
"We heated them up.",
"We distilled the mixture."
],
"Results": [
"There was a big explosion",
"But the crystals were pure",
"We identified a new compound"
],
"Conclusion": [
"We proved some important things and we summarise them here.",
"Further work is necessary"
]
}
}
}
This endpoint generates a poster from a local file. File formats supported are:
- Word (.docx)
- Rich Text (.rtf)
- Powerpoint (.pptx)
- BibTeX (.bib)
- RIS (.ris)
- XML
- HTML
- Plain Text (.txt)
- LaTeX (.tex)
Please note that when sending a file, at least one additional parameter needs to
be sent with the payload, e.g. start_page=1
.
HTTP Request
POST https://api.scholarcy.com/api/posters/generate
Query Parameters
Parameter | Default | Description |
---|---|---|
file | null | A file object. |
url | null | URL of public, open-access document. |
type | full | The type of poster to generate. full will create a large, landscape poster with blocks for each section. headline will create a portrait poster containing the main takeaway finding and a single image. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
GET a poster from a URL
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:url => 'https://www.nature.com/articles/s41746-019-0180-3',
:type => 'full',
:start_page => 1
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/posters/generate'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
params = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'type': 'full',
'start_page': 1
}
r = requests.post(POST_ENDPOINT,
headers=headers,
data=params,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/posters/generate" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3" \
-d "type=full" \
-d "start_page=1"
The above command returns JSON structured as for the POST endpoint:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"references": [],
"emails": ["smith.j@uni.ac.uk"],
"figure_captions": [
{
"id": "1",
"caption": "Figure 1 caption"
},
{
"id": "2",
"caption": "Figure 2 caption"
}
],
"figure_urls": [],
"poster_url": "https://api.scholarcy.com/posters/file.pdf_agtuhsnt.pptx",
"keywords": [],
"abbreviations": {},
"headline": "We prove some important facts in this paper",
"highlights": [],
"summary": {
"Introduction": [],
"Methods": [],
"Results": [],
"Conclusion": []
}
}
}
This endpoint generates a poster from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint
HTTP Request
GET https://api.scholarcy.com/api/posters/generate
Query Parameters
Parameter | Default | Description |
---|---|---|
url | null | URL of public, open-access document. |
type | full | The type of poster to generate. full will create a large, landscape poster with blocks for each section. headline will create a portrait poster containing the main takeaway finding and a single image. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
Extract Highlights
The API endpoints at https://api.scholarcy.com/api/highlights/extract
will pull out the key findings/highlights of an article and also provide a longer, extractive summary.
POST a local file to extract highlights
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb'),
:start_page => 24,
:end_page => 37
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
params = {'wiki_links': True, 'reference_links': True}
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
data=params,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/highlights/extract" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf" \
-F "start_page=24" \
-F "end_page=37"
The above command returns JSON structured like this:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [
"Department of Silly Walks, University of Life, London, UK"
],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"funding": [
{
"award-group": [
{
"funding-source": "FEDER/COMPETE",
"award-id": ["AAA/BBB/04007/2019"]
}
],
"funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
}
]
},
"keywords": [
"atomic force microscopy",
"dna nanostructure",
"drug release",
"single-stranded DNA",
"double-stranded DNA",
"dox molecule"
],
"keyword_relevance": {
"atomic force microscopy": 0.345678,
"dna nanostructure": 0.23456,
"drug release": 0.12345,
"single-stranded DNA": 0.034567,
"double-stranded DNA": 0.02345,
"dox molecule": 0.01234
},
"abbreviations": {
"DONs": "DNA origami nanostructures",
"ROS": "reactive oxygen species",
"DNase I": "deoxyribonuclease I",
"PEG": "polyethylene glycol"
},
"headline": "We prove some important facts in this paper",
"highlights": [
"Facts are very important.",
"The force is strong in this one.",
"We ran some tests and this is what we found"
],
"findings": [
"A statistically significant difference was noted between the four groups on the combined dependent variables",
"We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
],
"summary": [],
"structured_summary": {
"Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
"Methods": [
"We mixed some chemicals.",
"We heated them up.",
"We distilled the mixture."
],
"Results": [
"There was a big explosion",
"But the crystals were pure",
"We identified a new compound"
],
"Conclusion": [
"We proved some important things and we summarise them here.",
"Further work is necessary"
]
}
}
This endpoint extracts highlights from a local file. File formats supported are:
- Word (.docx)
- Rich Text (.rtf)
- Powerpoint (.pptx)
- BibTeX (.bib)
- RIS (.ris)
- XML
- HTML
- Plain Text (.txt)
- LaTeX (.tex)
Please note that when sending a file, at least one additional parameter needs to
be sent with the payload, e.g. wiki_links=true
.
HTTP Request
POST https://api.scholarcy.com/api/highlights/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
file | null | A file object. |
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages |
reference_links | false | If true, parse and link each reference to its full text location |
replace_pronouns | false | If true, replace first-person pronouns with third-person mentions (the author(s)?, they). |
key_points | 5 | The number of key points/key takeaway items to extract. |
focus_terms | null | Semicolon separated list of terms around which the extracted highlights will focus. |
sampling | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
add_background_info | false | If true, generate an introductory sentence. Useful generating an abstract from an article. |
add_concluding_info | false | If true, generate an concluding sentence. Useful generating an abstract from an article. |
structured_summary | false | If true, summarise each of the main sections separately, and then provide a summary structured according to those sections. |
summary_engine | v1 | v1: Best for articles. v2: best for book chapters. |
highlights_algorithm | weighted | weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally. |
headline_from | highlights | highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline. |
GET highlights from a URL
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
request = RestClient::Request.new(
:method => :get,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:url => 'https://www.nature.com/articles/s41746-019-0180-3',
:start_page => 1
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/highlights/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
headers=headers,
params=payload,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/highlights/extract" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3" \
-d "start_page=1"
The above command returns JSON structured as for the POST endpoint:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [
"Department of Silly Walks, University of Life, London, UK"
],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"funding": [
{
"award-group": [
{
"funding-source": "FEDER/COMPETE",
"award-id": ["AAA/BBB/04007/2019"]
}
],
"funding-statement": "..."
}
]
},
"keywords": [],
"keyword_relevance": {},
"abbreviations": {},
"headline": "We prove some important facts in this paper",
"highlights": [],
"findings": [],
"summary": [],
"structured_summary": {
"Introduction": [],
"Methods": [],
"Results": [],
"Conclusion": []
}
}
This endpoint extracts highlights from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint
HTTP Request
GET https://api.scholarcy.com/api/highlights/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages |
reference_links | false | If true, parse and link each reference to its full text location |
replace_pronouns | false | If true, replace first-person pronouns with third-person mentions (the author(s)?, they). |
key_points | 5 | The number of key points/key takeaway items to extract. |
focus_terms | null | Semicolon separated list of terms around which the extracted highlights will focus. |
sampling | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
add_background_info | false | If true, generate an introductory sentence. Useful generating an abstract from an article. |
add_concluding_info | false | If true, generate an concluding sentence. Useful generating an abstract from an article. |
structured_summary | false | If true, summarise each of the main sections separately, and then provide a summary structured according to those sections. |
summary_engine | v1 | v1: Best for articles. v2: best for book chapters. |
highlights_algorithm | weighted | weighted: attend more closely to the results and conclusion. unweighted: attend to all content equally. |
headline_from | highlights | highlights: use the highest scoring highlight as the headline. summary: use the first summary sentence as the headline. conclusions: use the first conclusion statement as a headline. claims: use the main claim as the headline. |
Extract Structured Content
The API endpoints at https://api.scholarcy.com/api/metadata/extract
and /api/metadata/basic
will convert a document into structured, machine-readable data in JSON format.
POST a local file to extract content
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb'),
:start_page => 24,
:end_page => 37
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
data=params,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/metadata/extract" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf" \
-F "start_page=24" \
-F "end_page=37"
The above command returns JSON structured like this:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [
"Department of Silly Walks, University of Life, London, UK"
],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"references": [],
"emails": ["author@email.com"],
"funding": [
{
"award-group": [
{
"funding-source": "FEDER/COMPETE",
"award-id": ["AAA/BBB/04007/2019"]
}
],
"funding-statement": "We acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
}
],
"table_captions": [
{
"id": "1",
"caption": "Sample demographics and characteristics"
},
{
"id": "2",
"caption": "Construct measurements"
}
],
"figure_captions": []
},
"sections": {
"introduction": ["Introduction section contents"],
"methodology": ["Methods section contents"],
"findings": ["Main results contents"],
"conclusion": ["Concluding remarks"],
"limitations": [
"There are also several limitations to this research. Small sample size was an issue."
],
"acknowledgements": [
"We'd like to thank our supervisors for support, tea and biscuits."
],
"funding": [
"The authors acknowledge financial support from Fundação para a Ciência e a Tecnologia and FEDER/COMPETE (grant AAA/BBB/04007/2019)"
],
"future_work": [
"More research is needed to better understand what is going on."
],
"objectives": [
"The aim of this research is to provide insights into the inner workings of cellular processes."
]
},
"structured_content": [
{
"heading": "ABSTRACT",
"content": ["This is a very exciting paper. Please read it."]
},
{
"heading": "INTRODUCTION",
"content": ["Introduction paragraph 1", "Introduction paragraph 2"]
},
{
"heading": "RESEARCH METHODOLOGY",
"content": ["Methods paragraph 1", "Methods paragraph 2"]
},
{
"heading": "FINDINGS AND DISCUSSION",
"content": ["Results paragraph 1", "Results paragraph 2"]
},
{
"heading": "CONCLUSION",
"content": ["Conclusion paragraph 1", "Conclusion paragraph 2"]
}
],
"participants": [
{
"participant": "Patients",
"number": 15,
"context": "Fifteen patients participated in the study."
}
],
"statistics": [
{
"tests": {
"context": "We performed exploratory factor analysis using SPSS 20",
"tests": [
{
"test": "exploratory factor analysis"
}
]
}
},
{
"tests": {
"context": "We performed confirmatory factor analyses with AMOS 20 using the maximum likelihood estimation method",
"tests": [
{
"test": "confirmatory factor analyses"
},
{
"test": "maximum likelihood estimation method"
}
]
}
},
{
"p_value": "P < 0.001",
"context": "We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)</mark>",
"tests": {
"tests": [
{
"test": "analysis of variance",
"value": "P < 0.001"
}
]
}
}
],
"keywords": [
"atomic force microscopy",
"dna nanostructure",
"drug release",
"single-stranded DNA",
"double-stranded DNA",
"dox molecule"
],
"keyword_relevance": {
"atomic force microscopy": 0.345678,
"dna nanostructure": 0.23456,
"drug release": 0.12345,
"single-stranded DNA": 0.034567,
"double-stranded DNA": 0.02345,
"dox molecule": 0.01234
},
"abbreviations": {
"DONs": "DNA origami nanostructures",
"ROS": "reactive oxygen species",
"DNase I": "deoxyribonuclease I",
"PEG": "polyethylene glycol"
},
"headline": "We prove some important facts in this paper",
"top_statements": [
"Facts are very important.",
"The force is strong in this one.",
"We ran some tests and this is what we found"
],
"findings": [
"A statistically significant difference was noted between the four groups on the combined dependent variables",
"We also noted significant differences when we performed a one-way between-groups analysis of variance on each of the 14 items (P < 0.001)"
],
"facts": [],
"claims": [],
"summary": [],
"structured_summary": {
"Introduction": ["Introduction paragraph 1", "Introduction paragraph 2"],
"Methods": [
"We mixed some chemicals.",
"We heated them up.",
"We distilled the mixture."
],
"Results": [
"There was a big explosion",
"But the crystals were pure",
"We identified a new compound"
],
"Conclusion": [
"We proved some important things and we summarise them here.",
"Further work is necessary"
]
}
}
This endpoint extracts structured content from a local file. File formats supported are:
- Word (.docx)
- Rich Text (.rtf)
- Powerpoint (.pptx)
- BibTeX (.bib)
- RIS (.ris)
- XML
- HTML
- Plain Text (.txt)
- LaTeX (.tex)
Please note that when sending a file, at least one additional parameter needs to
be sent with the payload, e.g. engine=v1
.
HTTP Request
POST https://api.scholarcy.com/api/metadata/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
file | null | A file object. |
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
parse_references | false | If true, parse into BibTeX and link each reference to its full text location. |
reference_style | ensemble | Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver. |
reference_format | text | Output references in plain text or bibtex format. |
generate_summary | true | Create an extractive summary of the article. |
summary_engine | v1 | v1: Best for articles. v2: best for book chapters. |
replace_pronouns | false | If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they). |
strip_dialogue | false | If true, remove dialog and quoted text from input prior to summarising. |
summary_size | 400 | Length of summary in words. |
summary_percent | 0 | Length of summary as a % of the original article. |
structured_summary | false | If true, summarise each of the main sections separately, and then provide a summary structured according to those sections. |
keyword_method | sgrank+acr | Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex. |
keyword_sample | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
keyword_limit | 25 | Target number of key terms to extract. |
abbreviation_method | schwartz | Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble. |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages. |
extract_facts | true | Extract SVO-style factual statements from the article. |
extract_claims | true | Extract specific claims made by the article. |
key_points | 5 | The number of key points/key takeaway items to extract. |
focus_terms | null | Semicolon separated list of terms around which the extracted highlights will focus. |
citation_contexts | false | If true, extract the inline citation contexts (preceding and current sentences). |
inline_citation_links | false | If true, link inline citations to their identifiers in the references. |
extract_pico | true | Extract population, intervention, control, outcome data. |
extract_tables | false | If true, extract tabular data as CSV/Excel files. |
extract_figures | false | If true, extract figures and images as PNG files. |
require_captions | true | Requires an accompanying caption to trigger figure/table extraction. |
extract_sections | true | Extracts section headers and paragraphs. |
include_bodytext | true | If extracting sections, includes the main body text content for each section. |
unstructured_content | false | If true, include a raw, unstructured text dump of the file. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
engine | v1 | PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters. |
image_engine | v1 | Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2. |
Output fields
Field | Description |
---|---|
filename | The filename of the uploaded document, or input URL slug |
content_type | The file or URL MIME type |
metadata | Structured article metadata |
message | Any error or status messages |
title | Article title |
author | List of authors |
pages | Number of pages in the document |
date | Article year |
full_date | Article date string in ISO 8601 format (where available) |
affiliations | Author affiliations |
journal | Journal title (from CrossRef) |
abbreviated_journal | Abbreviated journal title (where available) |
volume | Journal volume (from CrossRef) |
page | Journal page range (from CrossRef) |
cited_by | Citation count (from CrossRef) |
identifiers | Any identifier extracted from the document, such as DOI, ISBN, arXiv ID, or other identifier. If an open-access version of the paper is available, the URL to that version will be displayed here. |
abstract | The author-written abstract, if available, or a proxy for the abstract, such as background, introduction, preface etc. |
keywords | Author-supplied keywords |
references | The plain reference strings extracted from the end of the article, or from the footnotes |
emails | Email addresses of the authors |
type | Article type: journal-article, book-chapter, preprint, web-page, review-article, case-study, report |
references_ris | RIS parse of the references |
links | Any URLs identified in the document |
author_conclusions | Author-stated conclusions/takeaways |
funding | Funding statement structured as follows: "award-group": [{"funding-source": "National Institutes of Health", "award-id": ["R43HL137469"] }] |
table_captions | Table captions |
figure_captions | Figure captions |
tables_url | Link to download the tables as Excel |
figure_urls | List of links to download extracted images as PNG files |
word_count | A range representing maximum and minimum estimated word count. The maximum includes appendices and supplementary information. The minimum includes the core article body text. Both exclude references and footnotes. |
is_oa | Boolean flag if the document is open access or not. This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497 |
oa_status | Open access status: closed, bronze, green, or gold. This flag is only present if the input is a DOI URL, e.g. https://doi.org/10.1177/0846537120913497 |
sections | Snippets from each main section in the article |
introduction, methods, results, conclusion | If section headings can be mapped to standard names such as Introduction, Methods, Results, Conclusions, these snippets are shown here |
funding | Any funding statements |
disclosures | Any disclosures of conflicts of interest |
ethical_compliance | Any information about consent and ethical regulations |
data_availability | Any information about data and code availability related to this study |
limitations | Any discussion of study limitations |
future_work | Any information about further research needed and future work |
registrations | Any study registration identifiers |
structured_content | The section headings as they appear in the source document, along with their full section content. |
participants | Quantifiable information about the study subjects |
statistics | Information about statistical tests and analysis performed in the study |
populations | Quantifiable information about the population background |
keywords | A combination of the author-supplied keywords, plus new keywords or key terms extracted from the document |
keyword_relevance | keywords ranked by their relevance scores |
species | Any Latin species names detected |
summary | An extractive summary of the main points of the entire article. |
structured_summary | An extractive summary structured according to the main sections of the article. |
reference_links | Shown if reference parsing has been enabled. This contains links to the full text for each of the references in the paper |
facts | Subject-predicate-object statements expressed in the article |
claims | Claims made by the authors of the study |
findings | Any important, quantitative findings extracted from the document, such as statistically significant results |
key_statements | A longer set of important sentences, from which the top_statements are selected. |
top_statements | The top 3-7 key points in the document. Typically, these highlights will include introductory and concluding information, as well as the main claims and findings of the article |
headline | A short, one line summary of the entire article. This headline attempts to express the main finding or main result of the paper. |
abbreviations | Abbreviations and their fully spelt out names, extracted from the document |
GET structured content from a URL
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
request = RestClient::Request.new(
:method => :get,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:url => 'https://www.nature.com/articles/s41746-019-0180-3',
:start_page => 1
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/metadata/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
headers=headers,
params=payload,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/metadata/extract" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3" \
-d "start_page=1"
The above command returns JSON structured as for the POST endpoint:
{
"filename": "filename.pdf",
"content_type": "application/pdf",
"file_size": 123456,
"metadata": {
"title": "Article title",
"author": "Smith, J.",
"pages": "15",
"date": 2021,
"affiliations": [
"Department of Silly Walks, University of Life, London, UK"
],
"identifiers": {
"arxiv": null,
"doi": "10.1010/101010.10.10.1101010",
"isbn": null,
"doc_id": null
},
"abstract": "This is a very exciting paper. Please read it.",
"references": [],
"emails": ["author@email.com"],
"funding": [],
"table_captions": [],
"figure_captions": []
},
"sections": {
"introduction": [],
"methodology": [],
"findings": [],
"conclusion": [],
"limitations": [],
"acknowledgements": [],
"funding": [],
"future_work": [],
"objectives": []
},
"structured_content": [],
"participants": [],
"statistics": [],
"keywords": [],
"keyword_relevance": {},
"abbreviations": {},
"headline": "We prove some important facts in this paper",
"top_statements": [],
"findings": [],
"facts": [],
"claims": [],
"summary": [],
"structured_summary": {}
}
This endpoint extracts structured content from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint
HTTP Request
GET https://api.scholarcy.com/api/metadata/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
parse_references | false | If true, parse into BibTeX and link each reference to its full text location. |
reference_style | ensemble | Referencing style used by the document, if known, or use the default. Available values : acs, ama, anystyle, apa, chicago, ensemble, experimental, harvard, ieee, mhra, mla, nature, vancouver. |
reference_format | text | Output references in plain text or bibtex format. |
generate_summary | true | Create an extractive summary of the article. |
summary_engine | v1 | v1: Best for articles. v2: best for book chapters. |
replace_pronouns | false | If true, replace first-person pronouns in summary with third-person mentions (the author(s)?, they). |
strip_dialogue | false | If true, remove dialog and quoted text from input prior to summarising. |
summary_size | 400 | Length of summary in words. |
summary_percent | 0 | Length of summary as a % of the original article. |
structured_summary | false | If true, summarise each of the main sections separately, and then provide a summary structured according to those sections. |
keyword_method | sgrank+acr | Available values : sgrank, sgrank+np, sgrank+acr, textrank, np, regex. |
keyword_sample | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
keyword_limit | 25 | Target number of key terms to extract. |
abbreviation_method | schwartz | Select an abbreviation extraction method. Available values: schwartz, statistical, ensemble. |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages. |
extract_facts | true | Extract SVO-style factual statements from the article. |
extract_claims | true | Extract specific claims made by the article. |
key_points | 5 | The number of key points/key takeaway items to extract. |
focus_terms | null | Semicolon separated list of terms around which the extracted highlights will focus. |
citation_contexts | false | If true, extract the inline citation contexts (preceding and current sentences). |
inline_citation_links | false | If true, link inline citations to their identifiers in the references. |
extract_pico | true | Extract population, intervention, control, outcome data. |
extract_tables | false | If true, extract tabular data as CSV/Excel files. |
extract_figures | false | If true, extract figures and images as PNG files. |
require_captions | true | Requires an accompanying caption to trigger figure/table extraction. |
extract_sections | true | Extracts section headers and paragraphs. |
include_bodytext | true | If extracting sections, includes the main body text content for each section. |
unstructured_content | false | If true, include a raw, unstructured text dump of the file. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
engine | v1 | PDF text extraction engine. v1: best general purpose. v2: best for articles containing marginal line numbering or narrow column gutters. |
image_engine | v1 | Image extraction engine. v1: best for bitmap images. v2: best for line images. Available values : v1, v2, v1+v2. |
Extract Key Terms
The API endpoints at https://api.scholarcy.com/api/keywords/extract
will pull out the key terms from an article.
POST a local file to extract key terms
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb'),
:start_page => 24,
:end_page => 37
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
params = {'start_page': 24, 'end_page': 37}
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
data=params,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/keywords/extract" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf" \
-F "start_page=24" \
-F "end_page=37"
The above command returns JSON structured like this:
{
"filename": "article2016.pdf",
"abbreviations": {
"U&G": "uses and gratification",
"SNS": "social networking sites",
"COBRA": "Consumer Online Brand-Related Activity",
"AVE": "average variance extracted",
"CLF": "common latent factor"
},
"keywords": [
{
"term": "Facebook",
"url": "https://en.wikipedia.org/wiki/Facebook"
},
{
"term": "typology",
"url": "https://en.wikipedia.org/wiki/typology"
},
{
"term": "social media",
"url": "https://en.wikipedia.org/wiki/social_media"
},
{
"term": "consumer behavior",
"url": "https://en.wikipedia.org/wiki/consumer_behavior"
},
{
"term": "average variance extracted",
"url": "https://en.wikipedia.org/wiki/average_variance_extracted"
},
{
"term": "online branding",
"url": "https://en.wikipedia.org/wiki/online_branding"
},
{
"term": "cluster analysis",
"url": "https://en.wikipedia.org/wiki/cluster_analysis"
},
{
"term": "social networking sites",
"url": "https://en.wikipedia.org/wiki/social_networking_sites"
},
{
"term": "brand manager",
"url": "https://en.wikipedia.org/wiki/brand_manager"
}
],
"keyword_relevance": {
"Facebook": 0.3834808259587021,
"social media": 0.16224188790560473,
"social networking sites": 0.08259587020648967,
"brand interaction": 0.07964601769911504,
"typology": 0.038348082595870206,
"brand manager": 0.038348082595870206,
"uses and gratification": 0.035398230088495575,
"consumer interaction": 0.032448377581120944,
"consumer behavior": 0.02359882005899705,
"brand communication": 0.02064896755162242,
"cluster analysis": 0.017699115044247787,
"main motivation": 0.017699115044247787,
"Consumer Online Brand-Related Activity": 0.014749262536873156,
"average variance extracted": 0.011799410029498525,
"common latent factor": 0.008849557522123894,
"online branding": 0.008849557522123894
}
}
The above command can also returns CSV structured like this:
"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"
This endpoint extracts key terms from a local file. File formats supported are:
- Word (.docx)
- Rich Text (.rtf)
- Powerpoint (.pptx)
- BibTeX (.bib)
- RIS (.ris)
- XML
- HTML
- Plain Text (.txt)
- LaTeX (.tex)
Please note that when sending a file, at least one additional parameter needs to
be sent with the payload, e.g. output_format=csv
.
HTTP Request
POST https://api.scholarcy.com/api/keywords/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
file | null | A file object. |
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages |
sampling | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
output_format | json | json or CSV. |
GET key terms from a URL
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
request = RestClient::Request.new(
:method => :get,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:url => 'https://www.nature.com/articles/s41746-019-0180-3',
:start_page => 1
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://api.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/api/keywords/extract'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'start_page': 1
}
r = requests.get(POST_ENDPOINT,
headers=headers,
params=payload,
timeout=timeout)
print(r.json())
curl "https://api.scholarcy.com/api/keywords/extract" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3" \
-d "start_page=1"
The above command returns JSON structured as for the POST endpoint:
The above command returns JSON structured like this:
{
"filename": "article2016.pdf",
"abbreviations": {
"U&G": "uses and gratification",
"SNS": "social networking sites",
"COBRA": "Consumer Online Brand-Related Activity",
"AVE": "average variance extracted",
"CLF": "common latent factor"
},
"keywords": [
{
"term": "Facebook",
"url": "https://en.wikipedia.org/wiki/Facebook"
},
{
"term": "typology",
"url": "https://en.wikipedia.org/wiki/typology"
},
{
"term": "social media",
"url": "https://en.wikipedia.org/wiki/social_media"
},
{
"term": "consumer behavior",
"url": "https://en.wikipedia.org/wiki/consumer_behavior"
},
{
"term": "average variance extracted",
"url": "https://en.wikipedia.org/wiki/average_variance_extracted"
},
{
"term": "online branding",
"url": "https://en.wikipedia.org/wiki/online_branding"
},
{
"term": "cluster analysis",
"url": "https://en.wikipedia.org/wiki/cluster_analysis"
},
{
"term": "social networking sites",
"url": "https://en.wikipedia.org/wiki/social_networking_sites"
},
{
"term": "brand manager",
"url": "https://en.wikipedia.org/wiki/brand_manager"
}
],
"keyword_relevance": {
"Facebook": 0.3834808259587021,
"social media": 0.16224188790560473,
"social networking sites": 0.08259587020648967,
"brand interaction": 0.07964601769911504,
"typology": 0.038348082595870206,
"brand manager": 0.038348082595870206,
"uses and gratification": 0.035398230088495575,
"consumer interaction": 0.032448377581120944,
"consumer behavior": 0.02359882005899705,
"brand communication": 0.02064896755162242,
"cluster analysis": 0.017699115044247787,
"main motivation": 0.017699115044247787,
"Consumer Online Brand-Related Activity": 0.014749262536873156,
"average variance extracted": 0.011799410029498525,
"common latent factor": 0.008849557522123894,
"online branding": 0.008849557522123894
}
}
The above command can also returns CSV structured like this:
"filename","key term","wikipedia_link"
"article2016.pdf","Facebook","https://en.wikipedia.org/wiki/Facebook"
"article2016.pdf","typology","https://en.wikipedia.org/wiki/typology"
"article2016.pdf","social media","https://en.wikipedia.org/wiki/social_media"
"article2016.pdf","consumer behavior","https://en.wikipedia.org/wiki/consumer_behavior"
"article2016.pdf","average variance extracted","https://en.wikipedia.org/wiki/average_variance_extracted"
"article2016.pdf","online branding","https://en.wikipedia.org/wiki/online_branding"
"article2016.pdf","cluster analysis","https://en.wikipedia.org/wiki/cluster_analysis"
"article2016.pdf","social networking sites","https://en.wikipedia.org/wiki/social_networking_sites"
"article2016.pdf","brand manager","https://en.wikipedia.org/wiki/brand_manager"
This endpoint extracts key terms from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint
HTTP Request
GET https://api.scholarcy.com/api/keywords/extract
Query Parameters
Parameter | Default | Description |
---|---|---|
url | null | URL of public, open-access document. |
text | null | Plain text content to be processed. |
start_page | 1 | Start reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
end_page | null | Stop reading the document from this page (PDF urls only). Useful for processing a single article/chapter within a larger file. |
external_metadata | false | If true, fetch article metadata from the relevant remote repository (e.g. CrossRef). |
wiki_links | false | If true, map extracted key terms to their Wikipedia pages |
sampling | representative | For large documents, when extracting key terms, use either a representative sample of the full content, or the fulltext content. |
extract_snippets | true | If true, sample snippets from each section, otherwise, sample the full text. |
output_format | json | json or CSV. |
Generate a Synopsis
The API endpoints at https://summarizer.scholarcy.com/summarize
will generate a short, abstractive synopsis (70-100 words) or a mini-review (around 150-300 words), depending on the parameters chosen.
By default, output is in JSON format.
Alternatively, you can receive output in HTML format if you pass an Accept: text/html
header with your request.
POST a local file to generate a synopsis
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
request = RestClient::Request.new(
:method => :post,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:multipart => true,
:file => File.new(file_path, 'rb'),
:wiki_links => true,
:format_summary => true
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
file_path = '/path/to/local/file.pdf'
params = {'wiki_links': True, 'format_summary': True}
with open(file_path, 'rb') as file_data:
file_payload = {'file': file_data}
r = requests.post(POST_ENDPOINT,
headers=headers,
files=file_payload,
data=params,
timeout=timeout)
print(r.json())
curl "https://summarizer.scholarcy.com/summarize" \
-H "Authorization: Bearer abcdefg" \
-F "file=@/path/to/local/file.pdf" \
-F "wiki_links=true" \
-F "format_summary=true"
The above command returns JSON structured like this:
{
"response": {
"abbreviations": {
"EPOSA": "European Project on OSteoArthritis",
"GPS": "Global Positioning System",
"ISD": "Integrated Surface Database",
"OR": "odds ratio"
},
"headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
"keywords": [
{
"term": "physical activity",
"url": "https://en.wikipedia.org/wiki/physical_activity"
},
{
"term": "osteoarthritis",
"url": "https://en.wikipedia.org/wiki/osteoarthritis"
},
{
"term": "atmospheric pressure",
"url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
},
{
"term": "rheumatoid arthritis",
"url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
},
{
"term": "Global Positioning System",
"url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
},
{
"term": "fibromyalgia",
"url": "https://en.wikipedia.org/wiki/fibromyalgia"
},
{
"term": "arthritis",
"url": "https://en.wikipedia.org/wiki/arthritis"
},
{
"term": "smartphone app",
"url": "https://en.wikipedia.org/wiki/smartphone_app"
},
{
"term": "relative humidity",
"url": "https://en.wikipedia.org/wiki/relative_humidity"
},
{
"term": "chronic pain",
"url": "https://en.wikipedia.org/wiki/chronic_pain"
},
{
"term": "odds ratio",
"url": "https://en.wikipedia.org/wiki/odds_ratio"
},
{
"term": "Parkinson disease",
"url": "https://en.wikipedia.org/wiki/Parkinson_disease"
},
{
"term": "joint pain",
"url": "https://en.wikipedia.org/wiki/joint_pain"
},
{
"term": "wind speed",
"url": "https://en.wikipedia.org/wiki/wind_speed"
},
{
"term": "cohort study",
"url": "https://en.wikipedia.org/wiki/cohort_study"
}
],
"message": "",
"metadata": {
"citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
"citation_affiliation": "",
"citation_author": "William G. Dixon et al.",
"citation_date": 2019,
"citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
"citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
},
"readership_level": "technical-readership-accurate",
"summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
"title": "How the weather affects the pain of citizen scientists using a smartphone app"
}
}
This endpoint generates a synopsis from a local file. File formats supported are:
- PDF (.pdf)
- Word (.docx)
- Rich Text (.rtf)
- Powerpoint (.pptx)
- BibTeX (.bib)
- RIS (.ris)
- XML (.xml)
- HTML (.html or .htm)
- Plain Text (.txt)
- LaTeX (.tex)
HTTP Request
POST https://summarizer.scholarcy.com/summarize
Query Parameters
Parameter | Default | Description |
---|---|---|
file | null | A file object. |
url | null | URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497 |
input_text | null | You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL. |
structured_summary | false | Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion. |
summary_type | combined | Level of detail of summary: overview : abstractive synopsis of Scholarcy highlights. detail : abstractive synopsis of Scholarcy summary. combined : union of overview and detail. merged : an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary. |
focus_level | 4 | This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. 4 : wide focus. 3 : medium focus. 2 : narrow focus. 1 : narrowest focus |
readership_level | technical-readership- accurate | This controls the level of language complexity and amount of paraphrasing in the output. technical-readership-accurate : output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. technical-readership-fast : output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than technical-readership-accurate . lay-readership-accurate : output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. lay-readership-fast : output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than lay-readership-accurate . |
wiki_links | false | Map extracted key terms to Wikipedia entries. |
format_summary | false | Format the summary so it can be more easily used as part of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated |
headline_type | verbatim | Determines how the headline is generated. verbatim (default): uses the main finding extracted directly from the paper. The other options are as for readership_level , i.e. technical-readership-accurate , technical-readership-fast , lay-readership-accurate and lay-readership-fast . If format_summary is true , then headline_type defaults to lay-readership-accurate unless otherwise specified. |
GET a synopsis from a URL
require 'rest-client'
AUTH_TOKEN = 'abcdef' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {"Authorization": "Bearer " + AUTH_TOKEN}
request = RestClient::Request.new(
:method => :get,
:url => POST_ENDPOINT,
:headers => headers,
:payload => {
:url => 'https://www.nature.com/articles/s41746-019-0180-3',
:wiki_links => true,
:format_summary => true
})
response = request.execute
puts(response.body)
import requests
timeout = 30
AUTH_TOKEN = 'abcdefg' # Your API key
API_DOMAIN = 'https://summarizer.scholarcy.com'
POST_ENDPOINT = API_DOMAIN + '/summarize'
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}
payload = {
'url': 'https://www.nature.com/articles/s41746-019-0180-3',
'wiki_links': True,
'format_summary': True
}
r = requests.get(POST_ENDPOINT,
headers=headers,
params=payload,
timeout=timeout)
print(r.json())
curl "https://summarizer.scholarcy.com/summarize" \
-H "Authorization: Bearer abcdefg" \
-d "url=https://www.nature.com/articles/s41746-019-0180-3" \
-d "wiki_links=true" \
-d "format_summary=true"
The above command returns JSON structured as for the POST endpoint:
{
"response": {
"abbreviations": {
"EPOSA": "European Project on OSteoArthritis",
"GPS": "Global Positioning System",
"ISD": "Integrated Surface Database",
"OR": "odds ratio"
},
"headline": "Researchers have used smartphone data to investigate the relationship between pain and weather conditions, and found that there is a small but significant relationship.",
"keywords": [
{
"term": "physical activity",
"url": "https://en.wikipedia.org/wiki/physical_activity"
},
{
"term": "osteoarthritis",
"url": "https://en.wikipedia.org/wiki/osteoarthritis"
},
{
"term": "atmospheric pressure",
"url": "https://en.wikipedia.org/wiki/atmospheric_pressure"
},
{
"term": "rheumatoid arthritis",
"url": "https://en.wikipedia.org/wiki/rheumatoid_arthritis"
},
{
"term": "Global Positioning System",
"url": "https://en.wikipedia.org/wiki/Global_Positioning_System"
},
{
"term": "fibromyalgia",
"url": "https://en.wikipedia.org/wiki/fibromyalgia"
},
{
"term": "arthritis",
"url": "https://en.wikipedia.org/wiki/arthritis"
},
{
"term": "smartphone app",
"url": "https://en.wikipedia.org/wiki/smartphone_app"
},
{
"term": "relative humidity",
"url": "https://en.wikipedia.org/wiki/relative_humidity"
},
{
"term": "chronic pain",
"url": "https://en.wikipedia.org/wiki/chronic_pain"
},
{
"term": "odds ratio",
"url": "https://en.wikipedia.org/wiki/odds_ratio"
},
{
"term": "Parkinson disease",
"url": "https://en.wikipedia.org/wiki/Parkinson_disease"
},
{
"term": "joint pain",
"url": "https://en.wikipedia.org/wiki/joint_pain"
},
{
"term": "wind speed",
"url": "https://en.wikipedia.org/wiki/wind_speed"
},
{
"term": "cohort study",
"url": "https://en.wikipedia.org/wiki/cohort_study"
}
],
"message": "",
"metadata": {
"citation": "William G. Dixon, Anna L. Beukenhorst, Belay B. Yimer, Louise Cook, Antonio Gasparrini, Tal El-Hay, Bruce Hellman, Ben James, Ana M. Vicedo-Cabrera, Malcolm Maclure, Ricardo Silva, John Ainsworth, Huai Leng Pisaniello, Thomas House, Mark Lunt, Carolyn Gamble, Caroline Sanders, David M. Schultz, Jamie C. Sergeant, John McBeth (2019). How the weather affects the pain of citizen scientists using a smartphone app. npj Digital Medicine 2. https://www.nature.com/articles/s41746-019-0180-3",
"citation_affiliation": "",
"citation_author": "William G. Dixon et al.",
"citation_date": 2019,
"citation_title": "How the weather affects the pain of citizen scientists using a smartphone app",
"citation_url": "https://www.nature.com/articles/s41746-019-0180-3"
},
"readership_level": "technical-readership-accurate",
"summary": "<a class=\"has-tooltip\" title=\"Read the article\" target=\"_blank\" href=\"https://www.nature.com/articles/s41746-019-0180-3\">William Dixon et al. (2019)</a> studied how the weather affects the pain of citizen scientists using a smartphone app. Weather has been thought to affect symptoms in patients with chronic disease since the time of Hippocrates over 2000 years ago.\nMultivariable case-crossover analysis including the four state weather variables demonstrated that an increase in relative humidity was associated with a higher odds of a pain event with an OR of 1.139 (95% confidence interval 1.099\u20131.181) per 10 percentage point increase.\nThis study has demonstrated that higher relative humidity and wind speed, and lower atmospheric pressure, were associated with increased pain severity in people with long-term pain conditions.\nThe \u2018worst\u2019 combination of weather variables would increase the odds of a pain event by just over 20% compared to an average day.<br/><br/>There were 2658 patients involved in the research. Discussing potential improvements, \u201cThere are potential limitations to this study.\nIt is possible only people with a strong belief in a weather\u2013pain relationship participated.\nRain and cold weather were the most common pre-existing beliefs, authors say,\u201d they admit. ",
"title": "How the weather affects the pain of citizen scientists using a smartphone app"
}
}
This endpoint generates synopsis from a remote URL. The remote URL can resolve to a document type of any of the formats listed for the POST endpoint
HTTP Request
GET https://summarizer.scholarcy.com/summarize
Query Parameters
Parameter | Default | Description |
---|---|---|
url | null | URL of public, open-access document. Can be a DOI but must be qualified with a resolver domain, e.g. https://doi.org/10.1177/0846537120913497 |
input_text | null | You can pass a text string directly to the endpoint, instead of uploading a file or passing a URL. |
structured_summary | false | Take the document structure into account, considering specific sections such as Introduction, Background, Methods, Results, Discussion, Conclusion. |
summary_type | combined | Level of detail of summary: overview : abstractive synopsis of Scholarcy highlights. detail : abstractive synopsis of Scholarcy summary. combined : union of overview and detail. merged : an abstractive synopsis of the union of Scholarcy highlights and Scholarcy summary. |
focus_level | 4 | This internal hyperparameter controls whether the summary takes a narrow focus on a specific fact or a wider focus on multiple facts within the source. 4 : wide focus. 3 : medium focus. 2 : narrow focus. 1 : narrowest focus |
readership_level | technical-readership- accurate | This controls the level of language complexity and amount of paraphrasing in the output. technical-readership-accurate : output is for a technical/academic reader with a high level of factual accuracy in relation to the source text. technical-readership-fast : output is for a technical/academic reader and provides a little more paraphrasing, which may result in a slight loss in accuracy. However, it is 2x faster than technical-readership-accurate . lay-readership-accurate : output is for a lay/non-expert reader, with moderate paraphrasing and good level of accuracy in relation to the source text. lay-readership-fast : output is for a lay/non- expert reader, with much paraphrasing and reasonable level of accuracy in relation to the source text. However, it is 2x faster than lay-readership-accurate . |
wiki_links | false | Map extracted key terms to Wikipedia entries. |
format_summary | false | Format the summary into a 'mini review' so it can be more easily used as the basis of a referenced report: 1) Personal pronouns referring to the authors are replaced with the author names. 2) The summary is correctly cited with author and date. 3) A formatted reference to the source is generated. |
headline_type | verbatim | Determines how the headline is generated. verbatim (default): uses the main finding extracted directly from the paper. The other options are as for readership_level , i.e. technical-readership-accurate , technical-readership-fast , lay-readership-accurate and lay-readership-fast . If format_summary is true , then headline_type defaults to lay-readership-accurate unless otherwise specified. |
Errors
The Scholarcy API uses the following error codes:
Error Code | Meaning |
---|---|
400 | Bad Request -- Your request is invalid. |
401 | Unauthorized -- Your API key is wrong. |
403 | Forbidden -- The API endpoint requested is hidden for administrators only. |
404 | Not Found -- The API endpoint could not be found. |
405 | Method Not Allowed -- You tried to call the API with an invalid method. |
406 | Not Acceptable -- You requested a format that isn't JSON. |
429 | Too Many Requests -- You're making too many API requests. |
500 | Internal Server Error -- We had a problem with our server. Try again later. |
503 | Service Unavailable -- We're temporarily offline for maintenance. Please try again later. |
504 | Gateway Timeout -- Serving your request took longer than expected. Please try again. |