@ -12,32 +12,33 @@ This package is designed to generate synthetic data from a dataset from an origi
 
			
		
	
		
		
			
				
					
					## Usage
## Usage
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					After installing the easiest way to get started is as follows (using pandas). The process is as follows:
After installing the easiest way to get started is as follows (using pandas). The process is as follows:
 
			
		
	
		
		
			
				
					
					1. Train the GAN on the original/raw dataset
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					**Train the GAN on the original/raw dataset**
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					import pandas as pd
 
			
		
	
		
		
			
				
					
					import data.maker
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					df      = pd.read_csv('sample.csv')
    import pandas as pd
 
			
				
				
			
		
	
		
		
			
				
					
					column  = 'gender'
    import data.maker
 
			
				
				
			
		
	
		
		
			
				
					
					id      = 'id' 
 
			
				
				
			
		
	
		
		
			
				
					
					context = 'demo'
    df      = pd.read_csv('sample.csv')
 
			
				
				
			
		
	
		
		
			
				
					
					data.maker.train(context=context,data=df,column=column,id=id,logs='logs')
    column  = 'gender'
 
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					    id      = 'id' 
 
			
		
	
		
		
			
				
					
					    context = 'demo'
 
			
		
	
		
		
			
				
					
					    data.maker.train(context=context,data=df,column=column,id=id,logs='logs')
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					The trainer will store the data on disk (for now) in a structured folder that will hold training models that will be used to generate the synthetic data.
The trainer will store the data on disk (for now) in a structured folder that will hold training models that will be used to generate the synthetic data.
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					2. Generate a candidate dataset from the learnt features **Generate a candidate dataset from the learned features**  
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					import pandas as pd
         import pandas as pd 
			
				
				
			
		
	
		
		
			
				
					
					import data.maker
         import data.maker 
			
				
				
			
		
	
		
		
	
		
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					df  = pd.read_csv('sample.csv')
         df  = pd.read_csv('sample.csv') 
			
				
				
			
		
	
		
		
			
				
					
					id  = 'id'
         id  = 'id' 
			
				
				
			
		
	
		
		
			
				
					
					column = 'gender'
         column = 'gender' 
			
				
				
			
		
	
		
		
			
				
					
					context = 'demo'
         context = 'demo' 
			
				
				
			
		
	
		
		
			
				
					
					data.maker.generate(data=df,id=id,column=column,logs='logs')
         data.maker.generate(data=df,id=id,column=column,logs='logs') 
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					## Limitations
## Limitations
 
			
		
	
		
		
			
				
					
					
 
			
		
	
	
		
		
			
				
					
						
						
						
							
								 
						
					 
					@ -46,11 +47,14 @@ GANS will generate data assuming the original data has all the value space neede
 
			
		
	
		
		
			
				
					
					- No new data will be created
- No new data will be created
 
			
		
	
		
		
			
				
					
					    
    
 
			
		
	
		
		
			
				
					
					        Assuming we have a dataset with an gender attribute with values [M,F]. 
        Assuming we have a dataset with an gender attribute with values [M,F]. 
 
			
		
	
		
		
			
				
					
					            
 
			
		
	
		
		
			
				
					
					        The synthetic data will not be able to generate genders outside [M,F]
        The synthetic data will not be able to generate genders outside [M,F]
 
			
		
	
		
		
			
				
					
					        
 
			
		
	
		
		
			
				
					
					- Not advised on continuous values
- Not advised on continuous values
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					        GANS work well on discrete values and thus are not advised to be used.
        GANS work well on discrete values and thus are not advised to be used.
 
			
		
	
		
		
			
				
					
					        e.g:measurements (height, blood pressure, ...)
        e.g:measurements (height, blood pressure, ...)
 
			
		
	
		
		
			
				
					
					- For now will only perform on a single feature.
 
			
		
	
		
		
			
				
					
					
 
			
		
	
		
		
			
				
					
					## Credits :
## Credits :