bug fix for Relab library: Sometimes data files don't have description#182
bug fix for Relab library: Sometimes data files don't have description#182kvngvikram wants to merge 3 commits intospectralpython:developfrom
Conversation
|
Apologies, I've been away on other projects for a while. If this commit is still relevant, let me know and I will merge. |
|
Hi @tboggs, Yes the issue still persists. But I found another bug with another file. So I wanted to wait until I am confident. So here is the summary of the discussion regarding the format of asc files.
Now, the current issues
Now the current code confuses the word Volatite with Volt. I was only able to find these two because I was working with the entire database instead of downloading fewer files. Here are some more info I got from the maintainer:
Regarding the last two lines:
Regarding some comment lines starting with '*' character (like '#' in python code)
Now.... The two points about no description/comment lines, and handling the I want two more spectra added to the tests (picking these from the complete database I downloaded). But, I also feel that the structure of the function doesn't exactly assume the given Is it ok if I do a different PR ? Should I stick to solving the two issues or more? |
|
Thanks for the detailed explanation. I'm fine with two patches but if one is easier, that works as well. |
|
Hi, I have changed the read_relab_file function, and added test files. Hope they are not too big. Key points of this PR: Most of the complication in the function is because we are trying to parse the comment blocks. They do not have good consistancy. Even the last two lines. The maintainer himself suggested to use the catalogue files for this kind of information. Previously Relab ID is taken as the first line of the file. This is a mistake as now we know that the first line is just the number of datapoints and not some ID. I tried to keep other metadata consistant to before even though I feel they are not right. My comments and opinions (not confined to this PR): Giving you the context of the database, there are two IDs of relab. The sample ID and the spectrum ID. It is possible to have multiple spectra (different spectrometers, angles) for same sample. Sample catalogue has info on properties of sample like collection location, composition, etc. While Spectrum catalogue has info on spectrometer range, angles etc. Ideally a database should have two table schemas, one each to store info of spectrum catalogues and sample catalogue. And each entry in spectrum table has data, spectum ID (the .asc filename itself), and the corresponding SampleID (so that sample info in first table can be used). This means refactoring the entire relab.py with different table schemas. And then attempting to read both catalogue files if the user wants. Or else just read the few/all .asc files into spectrum table. Well, for now I am limiting myself to this PR. Hope this helps, |
Hi,
The Relab spectral database sometimes doesn't have description in their data files (.asc files). This can be any number of lines before the last lines of Source info and Date lines.
And I guess it is arbitary. Some have and some don't.
In the current state for files with no description, creating database results in an error because there is no
s.sample['description']which is necessary.So this pull request adds a test spectra that doesn't have the description along with the existing one.
And the code was edited so that anything lines that do not have 'Source' or 'Date' are appended as description. And even if there is no description, the
s.measurement['name']will be used (as intended previously).Note:
Here is the error message for the newer test files but older code: