Python question

Meow Mix · Aug 11, 2021

Well I've posted elsewhere but I might as well try all my options, someone here was helpful last time when StackExchange was not.

I do not know Python well. I'm piecing together code based on examples I've seen elsewhere, and I've only trained in Python for making plots and things like this. So bear with me as much as possible.

I have data from the CANDELs surveys with some 180,000 galaxies in five different fields (UDS, GOODS-S, GOODS-N, COSMOS, etc.)

Unfortunately for me, the Hubble teams clearly compiled data separately because what they've sent me comes in the form of catalogues with differently named columns. I'm interested in making a master file for all five fields where I can get the median mass calculated by the Hodges-Lehmann estimator in the linear space considering only estimates with the same assumptions for IMF and stellar templates.

But in one catalogue (say UDS) this might be called M_med, while in another catalogue this might be called M_med_HL.

So I've written code that will make a combined csv if the columns are all named the same. But they're not: two of the fields have this one column named differently.

Additionally, two of the other fields don't have the values of this column on a logarithmic scale (I basically just need to log_10 the column).

So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.

I've considered just copying the files and going in to the source file and fixing the column names, but they're "security catalogue" files and I have no idea how to open these except using Python.

This is what I'm going to combine things with:

[GALLERY=media, 9578]Pythmass by Meow Mix posted Aug 11, 2021 at 2:22 AM[/GALLERY]

Meow Mix · Aug 11, 2021

Actually as I typed this I had an idea, I can also open this filetype as ASCII in a program called Topcat, but column names can't be edited. I can create new columns and just copy the data over, and actually, I should also be able to put them on a logarithmic scale and then resave them as a new catalogue (it'd just have some extra columns, but I don't think Python cares if I specify only the ones I want).

This might be a moot point. I'm going to try that, but I'm leaving this up in case it doesn't work.

Secret Chief · Aug 11, 2021

Hmmmm not Monty then.....

Meow Mix · Aug 11, 2021

Secret Chief said:
Hmmmm not Monty then.....

It's only a model

lewisnotmiller · Aug 11, 2021

Meow Mix said:
Well I've posted elsewhere but I might as well try all my options, someone here was helpful last time when StackExchange was not.

I do not know Python well. I'm piecing together code based on examples I've seen elsewhere, and I've only trained in Python for making plots and things like this. So bear with me as much as possible.

I have data from the CANDELs surveys with some 180,000 galaxies in five different fields (UDS, GOODS-S, GOODS-N, COSMOS, etc.)

Unfortunately for me, the Hubble teams clearly compiled data separately because what they've sent me comes in the form of catalogues with differently named columns. I'm interested in making a master file for all five fields where I can get the median mass calculated by the Hodges-Lehmann estimator in the linear space considering only estimates with the same assumptions for IMF and stellar templates.

But in one catalogue (say UDS) this might be called M_med, while in another catalogue this might be called M_med_HL.

So I've written code that will make a combined csv if the columns are all named the same. But they're not: two of the fields have this one column named differently.

Additionally, two of the other fields don't have the values of this column on a logarithmic scale (I basically just need to log_10 the column).

So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.

I've considered just copying the files and going in to the source file and fixing the column names, but they're "security catalogue" files and I have no idea how to open these except using Python.

This is what I'm going to combine things with:

[GALLERY=media, 9578]Pythmass by Meow Mix posted Aug 11, 2021 at 2:22 AM[/GALLERY]

You can't just dump the files into some sort of SQL database and do a simple UNION ALL?

Sorry, don't know Python at all. But I've encountered this type of problem plenty of times in PL/SQL (or straight SQL) and if the data types of the films are the same it's dead easy.

Might be misunderstanding what you're doing though.

Daemon Sophic · Aug 11, 2021

Secret Chief said:
Hmmmm not Monty then.....

Yerda · Aug 11, 2021

Have you used the pandas library before?

I ask because I don't really know how to solve the problem but have played around with dataframes in pandas and merging things and renaming columns was straightforward. This answer from stack overflow shows what I mean:

Merge two data-sets in Python Pandas

Hope this is helpful. I'm a total beginner with python and programming in general.

HonestJoe · Aug 11, 2021

Meow Mix said:
So all in all, I need to change the column name of two of the source docs, and put the column in a logarithmic scale in two of the other source docs, then I will have everything in place to combine them into one master doc.

I don't know Python specifically but in principle, my instinct would be the manipulate the data within your code. After you've loaded your datasets with their varied column headings, you copy the data to new, different data objects with either generic column headings or none at all. Then you can merge the data and output it with whatever column headings you want for your combined output file.

It will depend on your specific requirements and tools whether it is actually easier to change the source files or do it in the code as I described but the latter should be possible somehow.

Valjean · Aug 11, 2021

Secret Chief said:
Hmmmm not Monty then.....

I thought it might be about the ecological devastation caused by introduced pythons in Florida.
https://phys.org/news/2020-07-everglades-pythons-state-sponsored-capture.html

Revoltingest · Aug 11, 2021

ohiogrown · Aug 11, 2021

Is this the same kind of data that can be found here (referenced from archive home page)? I was just trying to get a feel for what the various data files look like.

Meow Mix · Aug 11, 2021

ohiogrown said:
Is this the same kind of data that can be found here (referenced from archive home page)? I was just trying to get a feel for what the various data files look like.

Those look like .fits files, these are galaxy images/cubes. I have those too, but the security catalogues are data on redshift, mass, spectrometry, morphology, etc.

ohiogrown · Aug 12, 2021

Meow Mix said:
Those look like .fits files, these are galaxy images/cubes. I have those too, but the security catalogues are data on redshift, mass, spectrometry, morphology, etc.

Are you using Windows? The reason I ask is that Windows thinks a file ending with ".cat" is a "security catalog", which is actually something completely different. If you renamed the files to add .txt, you probably could look at them with notepad (assuming it didn't balk at such large files).

If I was doing this on a linux system, I'd probably try to use the "cut" command to extract the columns I was interested in from each file. I think that's similar to what you're attempting to do with the "topcat" program.

Meow Mix · Aug 12, 2021

ohiogrown said:
Are you using Windows? The reason I ask is that Windows thinks a file ending with ".cat" is a "security catalog", which is actually something completely different. If you renamed the files to add .txt, you probably could look at them with notepad (assuming it didn't balk at such large files).

If I was doing this on a linux system, I'd probably try to use the "cut" command to extract the columns I was interested in from each file. I think that's similar to what you're attempting to do with the "topcat" program.

I should have posted again to let people know: Topcat method did work, I just made new columns, put the ones needed through log10, unchecked the old columns, and then used Topcat to merge them (didn't even need to run them back through Python) and saved as csv. Now I can load in Python just fine. Probably a long way to do things but it got the job done

Python question

Meow Mix

Chatte Féministe

Meow Mix

Chatte Féministe

Secret Chief

Degrow!

Meow Mix

Chatte Féministe

lewisnotmiller

Grand Hat

Daemon Sophic

Avatar in flux

Yerda

Veteran Member

HonestJoe

Well-Known Member

Valjean

Veteran Member

Revoltingest

Pragmatic Libertarian

ohiogrown

New Member

Meow Mix

Chatte Féministe

ohiogrown

New Member

Meow Mix

Chatte Féministe