Usage¶
Basic Usage¶
The most simple use case is to perform a simple search. To do this simply create an instance
of a KitOpen
wrapper object with the desired configuration and then call the search
method on it with the proper parameters.
A simple search can be constructed by passing a string author argument and the start/end years for the search also as strings.
The resulting SearchResult
object can be iterated to get all the publication objects.
from pykitopen import KitOpen, Publication
from pykitopen.config import DEFAULT
kitopen = KitOpen(DEFAULT)
results = kitopen.search({
'author': 'MUSTERMANN, M*',
'start': '2012',
'end' '2016',
'view' Publication.VIEWS.FULL
})
for publication in results:
print(publication.data)
Publication Views¶
As you might have noticed, there is an additional parameter ‘view’, which can be passed to the search parameters.
This parameter is supposed to be an object of the type PublicationView
. This parameter influences, what kind of
data fields are requested for each publication in the search.
Some standard options are available as constant members of the Publication.VIEWS
class. This included for example
the FULL
view, which will request all of the fields and the BASIC
view which will only contain the most basic
information such as ID, author, title etc. Choosing the appropriate view might help to reduce response times.
Custom Views¶
The user is not limited to the predefined views though, it is also possible to define custom views with only the required fields. First of all, a list of all the available fields can be displayed like this:
from pykitopen.publication import PublicationView
print(PublicationView.FIELDS)
A custom view can be created, by simply creating a new instance of the PublicationView
class. A string name and a
subset of the fields list have to be passed to the constructor. This object can then be used to be passed as a search
parameter or even set as a default in the configuration dict.
from pykitopen import KitOpen
from pykitopen.config import DEFAULT
from pykitopen.publication import PublicationView
# Set it as a default
custom_view = PublicationView('MyCustomView', ['author', 'title'])
config = DEFAULT.copy()
config['default_view'] = custom_view
kitopen = KitOpen(config)
# Or use it for a search request directly
kitopen.search({
'author': 'MUSTERMANN, M*,
'view': custom_view
})
Request Batching¶
The problem¶
So the problem is, that the used KITOpen interface at KITOpen Auswertungen does not expose a REST API. The only way to export the more detailed information data is through the download of a ZIP file, which then in turn contains a CSV file.
So the way pykitopen works in the background is: It downloads the zip file, unpacks it into a temporary folder and parses the csv for the actual data.
This creates a practical complication: If the amount of requested data is high, the server takes a long time to create corresponding csv and zip files, which then leads to a timeout for the request…
Batching Strategies¶
To work around this problem, it is possible to get the desired data in batches, instead of everything at once. A single
request will be split into multiple different requests based on some criteria. This behaviour can be controlled with
the "batching_strategy"
key the configuration dict, which is being passed to the KitOpen
wrapper object. The
default behaviour being the NoBatching
strategy, which will request all the data at once.
A good alternative would be the YearBatching
strategy, which will request the data for every year
individually.
from pykitopen import KitOpen
from pykitopen.search import YearBatching
from pykitopen.config import DEFAULT
# It is good practice to base a custom configuration on a copy of the default
config = DEFAULT.copy()
config['batching_strategy'] = YearBatching
pykitopen = KitOpen(config)
Changing the batching strategy does not change anything on the behaviour of SearchResult
,
since the batching is implemented in the background. Each batch is executed, once the iterator reaches the
corresponding point.